| Machine learning methods for predictive proteomics. | |
| | |
MedLine Citation:
|
PMID: 18310105 Owner: NLM Status: MEDLINE |
Abstract/OtherAbstract:
|
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. |
| | |
Authors:
|
Annalisa Barla; Giuseppe Jurman; Samantha Riccadonna; Stefano Merler; Marco Chierici; Cesare Furlanello |
Related Documents
:
|
15612805 - Flavor release measurement from gum model system. 20377895 - Integrated network analysis of transcriptomic and proteomic data in psoriasis. 20391885 - Visual estimation of three- and four-body center of mass. 19233895 - Supervised feature selection in mass spectrometry-based proteomic profiling by blockwis... 20829925 - Complex-valued error diffusion for off-axis computer-generated holograms. 23647855 - Treatment development: can we find a better way? |
Publication Detail:
|
Type: Journal Article; Research Support, Non-U.S. Gov't; Review Date: 2008-02-29 |
Journal Detail:
|
Title: Briefings in bioinformatics Volume: 9 ISSN: 1477-4054 ISO Abbreviation: Brief. Bioinformatics Publication Date: 2008 Mar |
Date Detail:
|
Created Date: 2008-03-31 Completed Date: 2008-05-27 Revised Date: - |
Medline Journal Info:
|
Nlm Unique ID: 100912837 Medline TA: Brief Bioinform Country: England |
Other Details:
|
Languages: eng Pagination: 119-28 Citation Subset: IM |
Affiliation:
|
FBK, via Sommarive 18, I-38100 Povo (Trento), Italy. |
Export Citation:
|
APA/MLA Format Download EndNote Download BibTex |
| MeSH Terms | |
Descriptor/Qualifier:
|
Algorithms Animals Area Under Curve Artificial Intelligence* Biological Markers / analysis* Gene Expression Profiling Humans Mass Spectrometry* / instrumentation, methods Microarray Analysis Pattern Recognition, Automated* / methods Proteomics* / instrumentation, methods Signal Processing, Computer-Assisted |
| Chemical | |
Reg. No./Substance:
|
0/Biological Markers |
From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
Previous Document: Step-wise formation of eukaryotic double-row polyribosomes and circular translation of polysomal mRN...
Next Document: Approaches to dimensionality reduction in proteomic biomarker studies.