Document Detail


Development of a medical-text parsing algorithm based on character adjacent probability distribution for Japanese radiology reports.
MedLine Citation:
PMID:  19057808     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
OBJECTIVES: The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts. METHODS: Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences. RESULTS: We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci- 2(*)Ci- 1) was 1.00. For an example of anatomical location, the term "pulmonary hilum" was parsed as a tri-gram. CONCLUSIONS: Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.
Authors:
N Nishimoto; S Terae; M Uesugi; K Ogasawara; T Sakurai
Related Documents :
9821518 - A development environment for knowledge-based medical applications on the world-wide web.
18336808 - 3d medical volume reconstruction using web services.
8947668 - Scalable and expressive medical terminologies.
11825178 - Evaluating the umls as a source of lexical knowledge for medical language processing.
9821518 - A development environment for knowledge-based medical applications on the world-wide web.
17655178 - Animal related injuries treated at the department of trauma and emergency medicine, med...
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  Methods of information in medicine     Volume:  47     ISSN:  0026-1270     ISO Abbreviation:  Methods Inf Med     Publication Date:  2008  
Date Detail:
Created Date:  2008-12-05     Completed Date:  2009-02-06     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  0210453     Medline TA:  Methods Inf Med     Country:  Germany    
Other Details:
Languages:  eng     Pagination:  513-21     Citation Subset:  IM    
Affiliation:
Department of Medical Informatics, Graduate School of Medicine, Hokkaido University, Sapporo, Hokkaido, Japan. nishimot@med.hokudai.ac.jp
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Access to Information
Algorithms*
Humans
Japan
Markov Chains
Medical Informatics / methods,  organization & administration*
Models, Statistical
Models, Theoretical
Natural Language Processing*
Probability*
Radiology / methods*,  organization & administration
Terminology as Topic*
Tomography, X-Ray Computed*
Unified Medical Language System*

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Usage and Usability of Standard Operating Procedures (SOPs) among the Coordination Centers for Clini...
Next Document:  Prediction of Disease-associated Single Nucleotide Polymorphisms Using Virtual Genomes Constructed f...