Document Detail


Automated Gene-Model Curation using Global Discriminative Learning.
MedLine Citation:
PMID:  22513996     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
MOTIVATION: Gene-model curation creates consensus gene models by combining multiple sources of protein-coding evidence that may be incomplete or inconsistent. To date, manual curation still produces the highest quality models. However, manual curation is too slow and costly to be completed even for the most important organisms. In recent years, machine-learned ensemble gene predictors have become a viable alternative to manual curation. Current approaches make use of signal and genomic region consistency among sources and some voting scheme to resolve conflicts in the evidence. As a further step in that direction, we have developed eCRAIG (ensemble CRAIG), an automated curation tool that combines multiple sources of evidence using global discriminative training. This allows efficient integration of different types of genomic evidence with complex statistical dependencies to maximize directly annotation accuracy. Our method goes beyond previous work in integrating novel nonlinear annotation agreement features, as well as combinations of intrinsic features of the target sequence and extrinsic annotation features. RESULTS: We achieved significant improvements over the best ensemble predictors available for H. sapiens, C. elegans and A. thaliana. In particular, eCRAIG achieved a relative mean improvement of 5.1% over Jigsaw, the best published ensemble predictor in all our experiments. AVAILABILITY: The source code and data sets are both available at http://www.seas.upenn.edu/~abernal/ecraig.tgz CONTACT: abernal@seas.upenn.edu.
Authors:
Axel Bernal; Koby Crammer; Fernando Pereira
Related Documents :
22366346 - Integration of an atmospheric dispersion model with a dynamic multimedia fate model: de...
21694476 - Linear transformations of variance/covariance matrices.
19797736 - Estimating influenza-associated deaths in the united states.
15737086 - Multi-list methods using incomplete lists in closed populations.
19219456 - Modelling the drought impact on monoterpene fluxes from an evergreen mediterranean fore...
4448366 - The two-locus model with sex differences in recombination.
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2012-4-18
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  -     ISSN:  1367-4811     ISO Abbreviation:  -     Publication Date:  2012 Apr 
Date Detail:
Created Date:  2012-4-19     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Affiliation:
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, P A 19104, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  NanoStringNorm: an extensible R package for the pre-processing of NanoString mRNA and miRNA data.
Next Document:  Socioeconomic status, stature, and obesity in women: 20-year trends in urban Colombia.