Document Detail

Prediction of protein subcellular locations using fuzzy k-NN method.
MedLine Citation:
PMID:  14693804     Owner:  NLM     Status:  MEDLINE    
MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at
Ying Huang; Yanda Li
Related Documents :
11746704 - Use of structure comparison methods for the refinement of protein structure predictions...
16204844 - Prediction of physical protein-protein interactions.
11497614 - Semiempirical prediction of protein folds.
10409824 - Examination of shape complementarity in docking of unbound proteins.
8224814 - A combined classical genetic and high resolution two-dimensional electrophoretic approa...
18808454 - The btb ubiquitin ligases eto1, eol1 and eol2 act collectively to regulate ethylene bio...
Publication Detail:
Type:  Comparative Study; Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't; Validation Studies    
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  20     ISSN:  1367-4803     ISO Abbreviation:  Bioinformatics     Publication Date:  2004 Jan 
Date Detail:
Created Date:  2003-12-24     Completed Date:  2004-05-19     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  21-8     Citation Subset:  IM    
State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Institute of Bioinformatics, Tsinghua University, Beijing 100084, People's Republic of China.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Cellular Structures / chemistry*,  metabolism*
Databases, Protein
Fuzzy Logic*
Gene Expression Profiling / methods
Gene Expression Regulation / physiology
Proteome / chemistry*,  classification,  metabolism*
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Species Specificity
Subcellular Fractions / chemistry,  metabolism
Tissue Distribution
Reg. No./Substance:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Identifying periodically expressed transcripts in microarray time series data.
Next Document:  Efficient filtering methods for clustering cDNAs with spliced sequence alignment.