Document Detail


Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.
MedLine Citation:
PMID:  23060617     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
MOTIVATION: RNA-Seq uses the high-throughput sequencing technology to identify and quantify transcriptome at an unprecedented high resolution and low cost. However, RNA-Seq reads are usually not uniformly distributed and biases in RNA-Seq data post great challenges in many applications including transcriptome assembly and the expression level estimation of genes or isoforms. Much effort has been made in the literature to calibrate the expression level estimation from biased RNA-Seq data, but the effect of biases on transcriptome assembly remains largely unexplored.
RESULTS: Here, we propose a statistical framework for both transcriptome assembly and isoform expression level estimation from biased RNA-Seq data. Using a quasi-multinomial distribution model, our method is able to capture various types of RNA-Seq biases, including positional, sequencing and mappability biases. Our experimental results on simulated and real RNA-Seq datasets exhibit interesting effects of RNA-Seq biases on both transcriptome assembly and isoform expression level estimation. The advantage of our method is clearly shown in the experimental analysis by its high sensitivity and precision in transcriptome assembly and the high concordance of its estimated expression levels with quantitative reverse transcription-polymerase chain reaction data.
AVAILABILITY: CEM is freely available at http://www.cs.ucr.edu/~liw/cem.html.
CONTACT: liw@cs.ucr.edu.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors:
Wei Li; Tao Jiang
Related Documents :
20667077 - Inference on population history and model checking using dna sequence and microsatellit...
2282787 - Analysing isoenzyme band patterns using similarity coefficients: a personal computer pr...
16817657 - Stress detection in computer users through non-invasive monitoring of physiological sig...
11387987 - Bioinformatics for rice resources.
11315737 - Mapping of the water quality of lake erken, sweden, from imaging spectrometry and lands...
25419097 - Causal inference in latent class analysis.
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural     Date:  2012-10-11
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  28     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2012 Nov 
Date Detail:
Created Date:  2012-11-12     Completed Date:  2013-07-29     Revised Date:  2013-12-04    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  2914-21     Citation Subset:  IM    
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Gene Expression Profiling / methods*
High-Throughput Nucleotide Sequencing / methods*
Humans
Protein Isoforms / genetics
Sequence Analysis, RNA / methods*
Transcription, Genetic
Grant Support
ID/Acronym/Agency:
R01 AI078885/AI/NIAID NIH HHS
Chemical
Reg. No./Substance:
0/Protein Isoforms
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  CLEVER: Clique-Enumerating Variant Finder.
Next Document:  DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition.