Document Detail

Machine learning methods for predictive proteomics.
MedLine Citation:
PMID:  18310105     Owner:  NLM     Status:  MEDLINE    
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.
Annalisa Barla; Giuseppe Jurman; Samantha Riccadonna; Stefano Merler; Marco Chierici; Cesare Furlanello
Related Documents :
15612805 - Flavor release measurement from gum model system.
20377895 - Integrated network analysis of transcriptomic and proteomic data in psoriasis.
20391885 - Visual estimation of three- and four-body center of mass.
19233895 - Supervised feature selection in mass spectrometry-based proteomic profiling by blockwis...
20829925 - Complex-valued error diffusion for off-axis computer-generated holograms.
24788035 - Remote science support during mars2013: testing a map-based system of data processing a...
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't; Review     Date:  2008-02-29
Journal Detail:
Title:  Briefings in bioinformatics     Volume:  9     ISSN:  1477-4054     ISO Abbreviation:  Brief. Bioinformatics     Publication Date:  2008 Mar 
Date Detail:
Created Date:  2008-03-31     Completed Date:  2008-05-27     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100912837     Medline TA:  Brief Bioinform     Country:  England    
Other Details:
Languages:  eng     Pagination:  119-28     Citation Subset:  IM    
FBK, via Sommarive 18, I-38100 Povo (Trento), Italy.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Area Under Curve
Artificial Intelligence*
Biological Markers / analysis*
Gene Expression Profiling
Mass Spectrometry* / instrumentation,  methods
Microarray Analysis
Pattern Recognition, Automated* / methods
Proteomics* / instrumentation,  methods
Signal Processing, Computer-Assisted
Reg. No./Substance:
0/Biological Markers

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Step-wise formation of eukaryotic double-row polyribosomes and circular translation of polysomal mRN...
Next Document:  Approaches to dimensionality reduction in proteomic biomarker studies.