Document Detail


Metric learning for text documents.
MedLine Citation:
PMID:  16566500     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.
Authors:
Guy Lebanon
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  IEEE transactions on pattern analysis and machine intelligence     Volume:  28     ISSN:  0162-8828     ISO Abbreviation:  IEEE Trans Pattern Anal Mach Intell     Publication Date:  2006 Apr 
Date Detail:
Created Date:  2006-03-28     Completed Date:  2006-04-18     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9885960     Medline TA:  IEEE Trans Pattern Anal Mach Intell     Country:  United States    
Other Details:
Languages:  eng     Pagination:  497-508     Citation Subset:  IM    
Affiliation:
Department of Statistics, Purdue University, 150 N. University Street, West Lafayette, IN 47907, USA. lebanon@stat.purdue.edu
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Artificial Intelligence*
Automatic Data Processing / methods*
Computer Graphics
Documentation / methods*
Image Enhancement / methods
Image Interpretation, Computer-Assisted / methods*
Information Storage and Retrieval / methods*
Numerical Analysis, Computer-Assisted
Pattern Recognition, Automated / methods*
Signal Processing, Computer-Assisted
User-Computer Interface

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  View from the top: CEO perspectives on executive development and succession planning practices in he...
Next Document:  Graph partitioning active contours (GPAC) for image segmentation.