Document Detail


Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data.
MedLine Citation:
PMID:  24267147     Owner:  NLM     Status:  In-Data-Review    
Abstract/OtherAbstract:
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
Authors:
Juhani Kähärä; Harri Lähdesmäki
Related Documents :
20022757 - Comparison of triple-to-double coincidence ratio (tdcr) efficiency calculations and unc...
3432997 - Determination of cardiac output with a modified fick method using estimated instead of ...
24168517 - Numerical implementation of constitutive model for arterial layers with distributed col...
23744637 - Global patterns of nitrogen limitation: confronting two global biogeochemical models wi...
21286807 - Optimization of answer keys for script concordance testing: should we exclude deviant p...
19918917 - The mean measure of divergence: its utility in model-free and model-bound analyses rela...
Publication Detail:
Type:  Journal Article     Date:  2013-08-12
Journal Detail:
Title:  BMC bioinformatics     Volume:  14 Suppl 10     ISSN:  1471-2105     ISO Abbreviation:  BMC Bioinformatics     Publication Date:  2013  
Date Detail:
Created Date:  2013-11-25     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100965194     Medline TA:  BMC Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  S2     Citation Subset:  IM    
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  [Clinical analysis of hemolytic-uremic syndrome associated with Streptococcuspneumoniae serotype 3 i...
Next Document:  Comparison of the efficacies of oral iron and pramipexole for the treatment of restless legs syndrom...