Document Detail

Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers.
MedLine Citation:
PMID:  22677493     Owner:  NLM     Status:  MEDLINE    
OBJECTIVES: To investigate whether (1) machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with optimization; and (3) the number of citations to screen can be reduced.
METHODS: We used an open-source, data-mining suite to process and classify biomedical citations that point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for citation portions and compared classifier performance by considering the value of indexing, various feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with no optimization was: 4 classifiers × 3 feature sets × 3 citation portions. Classifiers included k-nearest neighbor, naïve Bayes, complement naïve Bayes, and evolutionary support vector machine. Feature sets included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts, and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as features extracted from full citations, and full citations with overweighted titles. We optimized features and classifier parameters by manually setting information gain thresholds outside of a process for iterative grid optimization with 10-fold cross-validations. We independently tested models on data reserved for that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated the number of citations needed to screen by reviewers during a second pass through a reduced set of citations.
RESULTS: In phase I, the evolutionary support vector machine returned the best recall for bag of words extracted from full citations; the best classifier with respect to overall performance was k-nearest neighbor. No classifier attained good enough recall for this task without optimization. In phase II, we boosted performance with optimization for evolutionary support vector machine and complement naïve Bayes classifiers. Generalization performance was better for the latter in the independent tests. For evolutionary support vector machine and complement naïve Bayes classifiers, the initial retrieval set was reduced by 46% and 35%, respectively.
CONCLUSIONS: Machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers. Optimization can markedly improve performance of classifiers. However, generalizability varies with the classifier. The number of citations to screen during a second independent pass through the citations can be substantially reduced.
Tanja Bekhuis; Dina Demner-Fushman
Related Documents :
21885253 - What is the present-day eeg evidence for a preictal state?
21933513 - Chaos in easter island ecology.
6085603 - Comparison of the various techniques of identifying human spermatozoa morphology.
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't     Date:  2012-06-05
Journal Detail:
Title:  Artificial intelligence in medicine     Volume:  55     ISSN:  1873-2860     ISO Abbreviation:  Artif Intell Med     Publication Date:  2012 Jul 
Date Detail:
Created Date:  2012-07-10     Completed Date:  2012-10-29     Revised Date:  2013-07-12    
Medline Journal Info:
Nlm Unique ID:  8915031     Medline TA:  Artif Intell Med     Country:  Netherlands    
Other Details:
Languages:  eng     Pagination:  197-207     Citation Subset:  IM    
Copyright Information:
Copyright © 2012 Elsevier B.V. All rights reserved.
Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15232, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Artificial Intelligence*
Bayes Theorem
Biomedical Research / classification*
Data Mining / methods*
Medical Informatics
Review Literature as Topic*
Support Vector Machines*
Grant Support

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Cerebrospinal fluid markers for Alzheimer's disease in a cognitively healthy cohort of young and old...
Next Document:  Formulating latent growth using an explanatory item response model approach.