Document Detail

The number of protein folds and their distribution over families in nature.
MedLine Citation:
PMID:  14747997     Owner:  NLM     Status:  MEDLINE    
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.
Xinsheng Liu; Ke Fan; Wei Wang
Related Documents :
24088157 - Physicians under the influence: social psychology and industry marketing strategies.
25246757 - Alcohol use and related behaviors among late adolescent urban youth: peer and parent in...
24313797 - Multidisciplinary team legislative language associated with elder abuse investigations.
18431797 - Immunophenotypic features distinguishing familial chronic lymphocytic leukemia from spo...
21231687 - Single dirac cone topological surface state and unusual thermoelectric property of comp...
22560407 - Parental alienation syndrome in italian legal judgments: an exploratory study.
Publication Detail:
Type:  Journal Article; Research Support, U.S. Gov't, Non-P.H.S.    
Journal Detail:
Title:  Proteins     Volume:  54     ISSN:  1097-0134     ISO Abbreviation:  Proteins     Publication Date:  2004 Feb 
Date Detail:
Created Date:  2004-01-28     Completed Date:  2004-02-26     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  8700181     Medline TA:  Proteins     Country:  United States    
Other Details:
Languages:  eng     Pagination:  491-9     Citation Subset:  IM    
Copyright Information:
Copyright 2003 Wiley-Liss, Inc.
National Lab of Solid State Microstructure, Department of Physics and Institute of Biophysics, Nanjing University, Nanjing, China.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Databases, Protein
Protein Conformation
Protein Folding*
Proteins / chemistry*,  classification*
Reg. No./Substance:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Aromatic side-chain interactions in proteins: near- and far-sequence Tyr-X pairs.
Next Document:  Comparison of backbone dynamics of monomeric and domain-swapped stefin A.