Document Detail

Building a text corpus for representing the variety of medical language.
MedLine Citation:
PMID:  11604751     Owner:  NLM     Status:  MEDLINE    
Medical language processing has focused until recently on a few types of textual documents. However, a much larger variety of document types are used in different settings. It has been showed that Natural Language Processing (NLP) tools can exhibit very different behavior on different types of texts. Without better informed knowledge about the differential performance of NLP tools on a variety of medical text types, it will be difficult to control the extension of their application to different medical documents. We endeavored to provide a basis for such informed assessment: the construction of a large corpus of medical text samples. We propose a framework for designing such a corpus: a set of descriptive dimensions and a standardized encoding of both meta-information (implementing these dimensions) and content. We present a proof of concept demonstration by encoding an initial corpus of text samples according to these principles.
P Zweigenbaum; P Jacquemart; N Grabar; B Habert
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't    
Journal Detail:
Title:  Studies in health technology and informatics     Volume:  84     ISSN:  0926-9630     ISO Abbreviation:  Stud Health Technol Inform     Publication Date:  2001  
Date Detail:
Created Date:  2001-10-17     Completed Date:  2002-01-08     Revised Date:  2008-07-10    
Medline Journal Info:
Nlm Unique ID:  9214582     Medline TA:  Stud Health Technol Inform     Country:  Netherlands    
Other Details:
Languages:  eng     Pagination:  290-4     Citation Subset:  IM    
DIAM Service d'Informatique Médicale, Assistance Publique, Hôpitaux de Paris, Département de Biomathématiques, Université Paris 75634 Paris Cedex 13, France.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Medical Records
Natural Language Processing*
Patient Discharge
Reference Books, Medical
Textbooks as Topic

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  Representation of clinical practice guidelines for computer-based implementations.
Next Document:  Health informatics: managing information to deliver value.