A Method for Finding Alternative Clusterings using Feature Selection and Direct Maximization of Clustering Quality
- A Method for Finding Alternative Clusterings using Feature Selection and Direct Maximization of Clustering Quality
- Tao Thanh, Vinh
- Date Issued
- Alternative clustering algorithms target finding alternative groupings of a dataset on which traditional clustering algorithms can only find one grouping even though alternatives could exist.
In this thesis, we proposed a method for finding alternative clusterings of a dataset based on feature selection and direct maximization of clustering quality. We exploited each cluster of the original clustering to find out the possible important features for the new clustering. We transformed the data by weighting those features so that the original clustering will not likely to be found in the new data space. To find another clustering, we used the K-means algorithm with the incremental steps to maximize the quality of new clustering.
We conducted an experiment to compare our approach with other approaches. We tested them on a collection of machine learning datasets and another collection of documents. The former group includes our synthetic data and five others from the UCI website
the latter group is a collection of news articles from The New York Times. Our approach was the most stable one as it resulted in the best JI and DI for most of the tests. Our results showed that by using feature selection, we can improve the dissimilarity between clusterings than the other non-data transformation approaches
by directly maximizing the clustering quality, we can also achieve better clustering quality than the other data transformation based approaches.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.