Document Detail


The number of protein folds and their distribution over families in nature.
MedLine Citation:
PMID:  14747997     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.
Authors:
Xinsheng Liu; Ke Fan; Wei Wang
Related Documents :
23133697 - Pattern of risky sexual behavior and associated factors among undergraduate students of...
463877 - Familial hyperlysinemia: enzyme studies, diagnostic methods, comments on terminology.
5090827 - Familial hyperparathyroidism.
12214227 - Spin vector alignment of koronis family asteroids.
17135167 - Sisters in support together against substances (sistas): an alcohol abuse prevention gr...
19448387 - Social network assessment in community-dwelling older persons: results from a study of ...
Publication Detail:
Type:  Journal Article; Research Support, U.S. Gov't, Non-P.H.S.    
Journal Detail:
Title:  Proteins     Volume:  54     ISSN:  1097-0134     ISO Abbreviation:  Proteins     Publication Date:  2004 Feb 
Date Detail:
Created Date:  2004-01-28     Completed Date:  2004-02-26     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  8700181     Medline TA:  Proteins     Country:  United States    
Other Details:
Languages:  eng     Pagination:  491-9     Citation Subset:  IM    
Copyright Information:
Copyright 2003 Wiley-Liss, Inc.
Affiliation:
National Lab of Solid State Microstructure, Department of Physics and Institute of Biophysics, Nanjing University, Nanjing, China.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Databases, Protein
Nature
Probability
Protein Conformation
Protein Folding*
Proteins / chemistry*,  classification*
Chemical
Reg. No./Substance:
0/Proteins

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Aromatic side-chain interactions in proteins: near- and far-sequence Tyr-X pairs.
Next Document:  Comparison of backbone dynamics of monomeric and domain-swapped stefin A.