Document Detail


Prediction of protein subcellular locations using fuzzy k-NN method.
MedLine Citation:
PMID:  14693804     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm
Authors:
Ying Huang; Yanda Li
Related Documents :
16094534 - Structure prediction of a multi-domain ef-hand ca2+ binding protein by propainor.
17942444 - Predicting disulfide connectivity from protein sequence using multiple sequence feature...
17688314 - A hidden markov model for predicting protein interfaces.
11276424 - The dna-repair protein alkb, egl-9, and leprecan define new families of 2-oxoglutarate-...
10684284 - Interaction of the adenovirus iva2 protein with viral packaging sequences.
23872354 - Identification of potential protein dithiol-disulfide substrates of mammalian grx2.
Publication Detail:
Type:  Comparative Study; Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't; Validation Studies    
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  20     ISSN:  1367-4803     ISO Abbreviation:  Bioinformatics     Publication Date:  2004 Jan 
Date Detail:
Created Date:  2003-12-24     Completed Date:  2004-05-19     Revised Date:  2006-11-15    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  England    
Other Details:
Languages:  eng     Pagination:  21-8     Citation Subset:  IM    
Affiliation:
State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Institute of Bioinformatics, Tsinghua University, Beijing 100084, People's Republic of China. hying99@mails.tsinghua.edu.cn
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Animals
Cellular Structures / chemistry*,  metabolism*
Databases, Protein
Fuzzy Logic*
Gene Expression Profiling / methods
Gene Expression Regulation / physiology
Humans
Proteome / chemistry*,  classification,  metabolism*
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Species Specificity
Subcellular Fractions / chemistry,  metabolism
Tissue Distribution
Chemical
Reg. No./Substance:
0/Proteome

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Identifying periodically expressed transcripts in microarray time series data.
Next Document:  Efficient filtering methods for clustering cDNAs with spliced sequence alignment.