Document Detail

Machine learning methods for predictive proteomics.
MedLine Citation:
PMID:  18310105     Owner:  NLM     Status:  MEDLINE    
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.
Annalisa Barla; Giuseppe Jurman; Samantha Riccadonna; Stefano Merler; Marco Chierici; Cesare Furlanello
Related Documents :
16179785 - Biomarkers in patients with gastric inflammation: a systematic review.
19022815 - Lessons from multiple sclerosis: models, concepts, observations.
11800405 - Methods for estimating the transfer efficiency of a compressed air spray gun.
12380825 - A monte carlo model for studying the microheterogeneity of trace elements in reference ...
16321575 - Heuristic evaluation of paper-based web pages: a simplified inspection usability method...
23809615 - Rationale and development of an on-line quality assurance programme for colposcopy in a...
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't; Review     Date:  2008-02-29
Journal Detail:
Title:  Briefings in bioinformatics     Volume:  9     ISSN:  1477-4054     ISO Abbreviation:  Brief. Bioinformatics     Publication Date:  2008 Mar 
Date Detail:
Created Date:  2008-03-31     Completed Date:  2008-05-27     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100912837     Medline TA:  Brief Bioinform     Country:  England    
Other Details:
Languages:  eng     Pagination:  119-28     Citation Subset:  IM    
FBK, via Sommarive 18, I-38100 Povo (Trento), Italy.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Area Under Curve
Artificial Intelligence*
Biological Markers / analysis*
Gene Expression Profiling
Mass Spectrometry* / instrumentation,  methods
Microarray Analysis
Pattern Recognition, Automated* / methods
Proteomics* / instrumentation,  methods
Signal Processing, Computer-Assisted
Reg. No./Substance:
0/Biological Markers

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Step-wise formation of eukaryotic double-row polyribosomes and circular translation of polysomal mRN...
Next Document:  Approaches to dimensionality reduction in proteomic biomarker studies.