Data Stream Clustering by Controlling Decision Error
- Data Stream Clustering by Controlling Decision Error
- Date Issued
- Data stream clustering is an unsupervised learning method for sequential data. The data stream clustering has some challenging problems such as handling limited memory, dealing with evolving clusters, and detecting noise data. Model based clustering is easy to know about overall cluster information, but hard to respond to cluster evolution. On the other hands, density based clustering responds quickly to cluster evolution. We proposed a new data stream clustering method which is combing the model based clustering and density based clustering. The proposed method finds evolving clusters quickly and exactly and obtains cluster information easily. The proposed method also handles noise data. We assume that clusters follow a Gaussian mixture model and use multiple hypothesis testing for handling noise data. We choose positive false discovery rate (pFDR) as the decision error in our multiple hypothesis testing method. New arrival data include new clusters, we use a density-based algorithm for discovering cluster evolution. Then, we estimate a Gaussian mixture model and make a decision on their clustering by combining the past cluster information and the cluster information for new arrival data. We applied the proposed method to several synthetic and real data sets. Experimental results show that it works effectively for data stream including noise data. And proposed method shows robust result according to input parameters than compared method that is one of density based data stream clustering.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.