Document Detail


Identification of misspelled words without a comprehensive dictionary using prevalence analysis.
MedLine Citation:
PMID:  18693937     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Misspellings are common in medical documents and can be an obstacle to information retrieval. We evaluated an algorithm to identify misspelled words through analysis of their prevalence in a representative body of text. We evaluated the algorithm's accuracy of identifying misspellings of 200 anti-hypertensive medication names on 2,000 potentially misspelled words randomly selected from narrative medical documents. Prevalence ratios (the frequency of the potentially misspelled word divided by the frequency of the non-misspelled word) in physician notes were computed by the software for each of the words. The software results were compared to the manual assessment by an independent reviewer. Area under the ROC curve for identification of misspelled words was 0.96. Sensitivity, specificity, and positive predictive value were 99.25%, 89.72% and 82.9% for the prevalence ratio threshold (0.32768) with the highest F-measure (0.903). Prevalence analysis can be used to identify and correct misspellings with high accuracy.
Authors:
Alexander Turchin; Julia T Chu; Maria Shubina; Jonathan S Einbinder
Related Documents :
3556847 - A model for evaluating the stressed patient in the family practice setting.
9751847 - Evaluation of igg rast feia for the assay of venom-specific igg antibodies during venom...
15289197 - Quantitative validation of a general competency composite assessment evaluation.
21144697 - Analysis of the kinematics of total knee prostheses with a medial pivot design.
19670357 - Prescription quality in an acute medical ward.
19265077 - The role of audience characteristics and external factors in continuing medical educati...
Publication Detail:
Type:  Evaluation Studies; Journal Article; Research Support, Non-U.S. Gov't     Date:  2007-10-11
Journal Detail:
Title:  AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium     Volume:  -     ISSN:  1942-597X     ISO Abbreviation:  AMIA Annu Symp Proc     Publication Date:  2007  
Date Detail:
Created Date:  2008-08-12     Completed Date:  2008-11-18     Revised Date:  2010-09-21    
Medline Journal Info:
Nlm Unique ID:  101209213     Medline TA:  AMIA Annu Symp Proc     Country:  United States    
Other Details:
Languages:  eng     Pagination:  751-5     Citation Subset:  IM    
Affiliation:
Partners HealthCare, Boston, MA, USA.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms*
Antihypertensive Agents
Dictionaries as Topic
Humans
Medical Records
Natural Language Processing*
ROC Curve
Software*
Chemical
Reg. No./Substance:
0/Antihypertensive Agents
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  An evaluation of biosurveillance grid--dynamic algorithm distribution across multiple computer nodes...
Next Document:  Variation in use of informatics tools among providers in a diabetes clinic.