Document Detail


Knowledge discovery via machine learning for neurodegenerative disease researchers.
MedLine Citation:
PMID:  19623491     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation-maximization (EM) method for semisupervised document classification (Machine Learning 39:103-134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies-Bouldin cluster validity index (IEEE Transactions on Pattern Analysis and Machine Intelligence 1:224-227, 1979), rivaled the state-of-the-art transductive support vector machines (TSVM) (Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of the International Conference on Machine Learning (ICML)). Moreover, the multi-mixture per class EM variant refined search results more quickly with more than one order of magnitude improvement in execution time compared with TSVM. A second tool, CRFNER, uses conditional random fields (Lafferty et al. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-2001) to recognize 15 types of named entities from schizophrenia abstracts outperforming ABNER (Settles (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA)) in biological named entity recognition and reaching F(1) performance of 82.5% on the second set of named entities.
Authors:
I Burak Ozyurt; Gregory G Brown
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural    
Journal Detail:
Title:  Methods in molecular biology (Clifton, N.J.)     Volume:  569     ISSN:  1064-3745     ISO Abbreviation:  Methods Mol. Biol.     Publication Date:  2009  
Date Detail:
Created Date:  2009-07-22     Completed Date:  2009-11-09     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9214969     Medline TA:  Methods Mol Biol     Country:  United States    
Other Details:
Languages:  eng     Pagination:  173-96     Citation Subset:  IM    
Affiliation:
Department of Psychiatry, University of California - San Diego, La Jolla, CA, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Artificial Intelligence*
Cluster Analysis
Computational Biology*
Databases, Factual
Humans
Information Storage and Retrieval
Knowledge Bases
Natural Language Processing
Neurodegenerative Diseases*
Grant Support
ID/Acronym/Agency:
1 U24 RR021992/RR/NCRR NIH HHS

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Applications of bioinformatics to protein structures: how protein structure and bioinformatics overl...
Next Document:  Brain model of text animation as a data mining strategy.