Document Detail

Automated Gene-Model Curation using Global Discriminative Learning.
MedLine Citation:
PMID:  22513996     Owner:  NLM     Status:  Publisher    
MOTIVATION: Gene-model curation creates consensus gene models by combining multiple sources of protein-coding evidence that may be incomplete or inconsistent. To date, manual curation still produces the highest quality models. However, manual curation is too slow and costly to be completed even for the most important organisms. In recent years, machine-learned ensemble gene predictors have become a viable alternative to manual curation. Current approaches make use of signal and genomic region consistency among sources and some voting scheme to resolve conflicts in the evidence. As a further step in that direction, we have developed eCRAIG (ensemble CRAIG), an automated curation tool that combines multiple sources of evidence using global discriminative training. This allows efficient integration of different types of genomic evidence with complex statistical dependencies to maximize directly annotation accuracy. Our method goes beyond previous work in integrating novel nonlinear annotation agreement features, as well as combinations of intrinsic features of the target sequence and extrinsic annotation features. RESULTS: We achieved significant improvements over the best ensemble predictors available for H. sapiens, C. elegans and A. thaliana. In particular, eCRAIG achieved a relative mean improvement of 5.1% over Jigsaw, the best published ensemble predictor in all our experiments. AVAILABILITY: The source code and data sets are both available at CONTACT:
Axel Bernal; Koby Crammer; Fernando Pereira
Related Documents :
22642986 - Comparing variational bayes with markov chain monte carlo for bayesian computation in n...
15522856 - Estimating long-term trends in the incidence and prevalence of opiate use/injecting dru...
22634046 - Foraging strategy switching in an antlion larva.
22685376 - Variable selection for semiparametric regression models with iterated penalization.
23654606 - Robustness analysis of room equalization for soundfield reproduction within a region.
21981856 - Simple robust autotuning rules for 2-dof pi controllers.
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2012-4-18
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  -     ISSN:  1367-4811     ISO Abbreviation:  -     Publication Date:  2012 Apr 
Date Detail:
Created Date:  2012-4-19     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, P A 19104, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  NanoStringNorm: an extensible R package for the pre-processing of NanoString mRNA and miRNA data.
Next Document:  Socioeconomic status, stature, and obesity in women: 20-year trends in urban Colombia.