False Discovery Rate를 활용한 군집분석 및 잡음탐지
- False Discovery Rate를 활용한 군집분석 및 잡음탐지
- Date Issued
- Clustering analysis is an unsupervised learning technique for partitioning objects into several clusters. Assuming that noisy objects are included, we propose a soft clustering method that assigns significant objects into one of a specified number of clusters using multiple testing.
The parameters of the Gaussian mixture model are estimated from the EM algorithm. Using the estimated probability density function, multiple hypothesis testing is established and the positive false discovery rate (pFDR) is estimated, where the p-values are calculated by the Monte Carlo method. The proposed procedure classifies the objects into significant data or noise simultaneously according to the specified pFDR level. The proposed method is applied to real and artificial data sets, showing that it performs reasonably by controlling the pFDR.
The proposed method can be worked poorly for high-dimensional data because of the number of parameters of Gaussian mixture. It can be combined with dimension reduction technique. Experimental results show that it works effectively for high-dimensional data.
The proposed clustering method is applied to the novelty detection problem. After pre-estimating the underlying distribution, the anomalous objects are filtered out. Then from the re-estimating in EM step, the novelty objects can be found by controlling the pFDR.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.