Document Detail


Multi-class protein fold classification using a new ensemble machine learning approach.
MedLine Citation:
PMID:  15706535     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources.
Authors:
Aik Choon Tan; David Gilbert; Yves Deville
Related Documents :
15096585 - On the use of low-frequency normal modes to enforce collective movements in refining ma...
18837035 - An all-atom structure-based potential for proteins: bridging minimal models with all-at...
7584445 - Mmdb: an asn.1 specification for macromolecular structure.
6179935 - Type i collagen segment long spacing banding patterns. evidence that the alpha 2 chain ...
22732275 - An algorithm for the simulation of the growth of root systems on deformable domains.
19792015 - A biologically motivated signal transmission approach based on stochastic delay differe...
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Genome informatics. International Conference on Genome Informatics     Volume:  14     ISSN:  0919-9454     ISO Abbreviation:  -     Publication Date:  2003  
Date Detail:
Created Date:  2005-02-11     Completed Date:  2005-03-22     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  101280573     Medline TA:  Genome Inform     Country:  Japan    
Other Details:
Languages:  eng     Pagination:  206-17     Citation Subset:  IM    
Affiliation:
Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, Glasgow, G12 8QQ, Scotland, United Kingdom. actan@brc.dcs.gla.ac.uk
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Amino Acid Sequence
Amino Acids / chemistry
Artificial Intelligence
Computer Simulation
Models, Molecular
Protein Folding
Protein Structure, Secondary
Proteins / chemistry*,  metabolism*
Chemical
Reg. No./Substance:
0/Amino Acids; 0/Proteins

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Prediction and analysis of beta-turns in proteins by support vector machine.
Next Document:  Multi-class support vector machines for protein secondary structure prediction.