Document Detail


Parallel spectral clustering in distributed systems.
MedLine Citation:
PMID:  20421667     Owner:  NLM     Status:  In-Process    
Abstract/OtherAbstract:
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.
Authors:
Wen-Yen Chen; Yangqiu Song; Hongjie Bai; Chih-Jen Lin; Edward Y Chang
Related Documents :
20083457 - Fuzzy forecasting based on fuzzy-trend logical relationship groups.
16984317 - A general approach for two-stage analysis of multilevel clustered non-gaussian data.
19206767 - A new method to generate an almost-diagonal matrix in the boundary integral equation fo...
20031967 - Penalized mixtures of factor analyzers with application to clustering high-dimensional ...
19184577 - Multi-destination and multi-purpose trip effects in the analysis of the demand for trip...
19324837 - Allometry of visceral organs in living amniotes and its implications for sauropod dinos...
Publication Detail:
Type:  Journal Article; Research Support, U.S. Gov't, Non-P.H.S.    
Journal Detail:
Title:  IEEE transactions on pattern analysis and machine intelligence     Volume:  33     ISSN:  1939-3539     ISO Abbreviation:  IEEE Trans Pattern Anal Mach Intell     Publication Date:  2011 Mar 
Date Detail:
Created Date:  2011-04-21     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9885960     Medline TA:  IEEE Trans Pattern Anal Mach Intell     Country:  United States    
Other Details:
Languages:  eng     Pagination:  568-86     Citation Subset:  IM    
Affiliation:
Yahoo! Inc., Sunnyvale, CA 94089, USA. wychen@yahoo-inc.com
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Ray projection for recovering projective transformations and illumination changes.
Next Document:  Learning linear discriminant projections for dimensionality reduction of image descriptors.