Document Detail

Confidence measures for protein fold recognition.
MedLine Citation:
PMID:  12075015     Owner:  NLM     Status:  MEDLINE    
MOTIVATION: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures. RESULTS: As a baseline, we review the performances of different methods for detecting remote protein folds (sequence alignment and threading, with and without sequence profiles, global and local). The analysis is performed on a large representative set of protein structures. For fold recognition, we find that methods using sequence profiles generally perform better than methods using plain sequences, and that threading methods perform better than sequence alignment methods. In order to assess the quality of the predictions made, we establish and compare several confidence measures, including raw scores, z-scores, raw score gaps, z-score gaps, and different methods of p-value estimation. We work our way from the theoretically well backed local scores towards more explorative global and threading scores. The methods for assessing the statistical significance of predictions are compared using specificity--sensitivity plots. For local alignment techniques we find that p-value methods work best, albeit computationally cheaper methods such as those based on score gaps achieve similar performance. For global methods where no theory is available methods based on score gaps work best. By using the score gap functions as the measure of confidence we improve the more powerful fold recognition methods for which p-values are unavailable. AVAILABILITY: The benchmark set is available upon request.
Ingolf Sommer; Alexander Zien; Niklas von Ohsen; Ralf Zimmer; Thomas Lengauer
Related Documents :
11605665 - A new method for evaluating the bitterness of medicines by semi-continuous measurement ...
19340845 - Propensity scores and m-structures.
25207385 - Corrigendum.
12075015 - Confidence measures for protein fold recognition.
18069745 - Investigating trial and treatment heterogeneity in an individual patient data meta-anal...
18341455 - Comparing risk-prediction methods using administrative or clinical data in assessing ex...
Publication Detail:
Type:  Comparative Study; Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  18     ISSN:  1367-4803     ISO Abbreviation:  Bioinformatics     Publication Date:  2002 Jun 
Date Detail:
Created Date:  2002-06-20     Completed Date:  2003-01-27     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  802-12     Citation Subset:  IM    
Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, D-53754 Sankt Augustin, Germany.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Amino Acid Sequence
Computational Biology
Confidence Intervals
Protein Folding*
Protein Structure, Tertiary
Sequence Alignment / statistics & numerical data

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  A Bayesian network model for protein fold and remote homologue recognition.
Next Document:  Visualizing metabolic activity on a genome-wide scale.