Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Feature Selection based on Geometric Distance for High-Dimensional Data

Title
Feature Selection based on Geometric Distance for High-Dimensional Data
Authors
이진희
Date Issued
2016
Publisher
포항공과대학교
Abstract
Feature selection is the process of selecting a related subset that affects the performance of a model. Feature selection is beneficial for reducing complexity of learning, improving performance of prediction and providing insights of the problem at hand. In recent years, the dimension of features in database has become larger in many applications. The increase of dimensionality of data becomes a critical issue to lots of existing feature selection methods with respect to efficiency and effectiveness. Reduction of high dimensionality makes learning algorithms more affordable to deal with enormous search space. In this paper, we propose a novel feature selection method based on geometric distance for high dimensional data. It utilizes both the average distance between classes along with the evenness of these distances to evaluate feature subsets. The feature evaluation and selection process used therein is very easy to understand, because it lends itself to a simple geometrical analysis. The proposed inter-class distance reflects the relative class distribution characteristics in a multidimensional feature space. Information on the relative locations of classes in a feature space fosters an intuitive understanding of the likely distribution of the feature set. Furthermore, it conveys important information not only for evaluating the feature subset but for improving classification performance, which is the significant advantage of using the inter-class variance in the Fisher criterion. The proposed method sequentially selects features without making pairwise comparisons. The process of making pairwise comparisons between individual features for an entire feature set has a marked influence on computation speed. And it is critical to reduce the time spent selecting features by minimizing the number of times features are compared without affecting classification performance. The proposed feature evaluation method is not only very fast, but it also makes it easy to connect itself to a variety of feature subset search methods. In this research, the deviation of inter-class distances in the feature space is used to define the distance evenness. The assumption behind the degree of equidistance is that if the compared feature subsets have similar averages of inter-class distances, the classification performance deteriorates more by the inter-class distances shorter than their average than by those longer than their average. As the inter-class distance deviation grows larger, there will be more classes within short distances; hence the margin of separation between classes gets smaller. In conclusion, we presented a feature selection method called GDFS, which uses the relative geometric distribution of different classes. In GDFS, features are selected in such a way that maximizes the average distance between classes along with the distance evenness, which improves classification accuracy. Compared with other selection methods based on information or statistical dependency, GDFS is even faster, making it more effective for searching a high-dimensional feature space. If such a feature selection is appropriately designed, it can be used in many fields that require processing big data in real time. Our experiments demonstrate its markedly better classification performance as well as fast computation compared with existing methods.
URI
http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002241151
https://oasis.postech.ac.kr/handle/2014.oak/93241
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse