Document Detail


Searching protein three-dimensional structures in faster than linear time.
MedLine Citation:
PMID:  20426692     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Searching for similar structures from a three-dimensional (3-D) structure database of proteins is one of the most important problems in post-genomic computational biology. To compare two structures, we ordinarily use a measure called the root mean square deviation (RMSD) as the similarity measure. We consider a very fundamental problem of finding all the substructures whose RMSDs to the query are within some given threshold, from a 3-D structure database. The problem also appears in many other fields, such as computer vision and robotics. In this article, we propose the first algorithm that runs in faster than linear time on average. Our new algorithm runs in average-case O(m + N/m(1-epsilon)), where N is the database size, m is the query length, and epsilon is an arbitrary small constant such that 0 < epsilon < 1. It is a significant improvement over previous algorithms on the problem, considering that the best known worst-case time complexity of the problem is O(N log m), and the best known average-case (expected) time complexity of the problem was O(N).
Authors:
Tetsuo Shibuya
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Journal of computational biology : a journal of computational molecular cell biology     Volume:  17     ISSN:  1557-8666     ISO Abbreviation:  J. Comput. Biol.     Publication Date:  2010 Apr 
Date Detail:
Created Date:  2010-04-29     Completed Date:  2010-09-22     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9433358     Medline TA:  J Comput Biol     Country:  United States    
Other Details:
Languages:  eng     Pagination:  593-602     Citation Subset:  IM    
Affiliation:
Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan. tshibuya@hgc.jp
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Amino Acid Sequence
Computational Biology / methods*
Models, Molecular*
Proteins / chemistry*
Time Factors
Chemical
Reg. No./Substance:
0/Proteins

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  The power of detecting enriched patterns: an HMM approach.
Next Document:  A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphic...