Document Detail


An exhaustive, non-euclidean, non-parametric data mining tool for unraveling the complexity of biological systems--novel insights into malaria.
MedLine Citation:
PMID:  21931645     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Complex, high-dimensional data sets pose significant analytical challenges in the post-genomic era. Such data sets are not exclusive to genetic analyses and are also pertinent to epidemiology. There has been considerable effort to develop hypothesis-free data mining and machine learning methodologies. However, current methodologies lack exhaustivity and general applicability. Here we use a novel non-parametric, non-euclidean data mining tool, HyperCube®, to explore exhaustively a complex epidemiological malaria data set by searching for over density of events in m-dimensional space. Hotspots of over density correspond to strings of variables, rules, that determine, in this case, the occurrence of Plasmodium falciparum clinical malaria episodes. The data set contained 46,837 outcome events from 1,653 individuals and 34 explanatory variables. The best predictive rule contained 1,689 events from 148 individuals and was defined as: individuals present during 1992-2003, aged 1-5 years old, having hemoglobin AA, and having had previous Plasmodium malariae malaria parasite infection ≤10 times. These individuals had 3.71 times more P. falciparum clinical malaria episodes than the general population. We validated the rule in two different cohorts. We compared and contrasted the HyperCube® rule with the rules using variables identified by both traditional statistical methods and non-parametric regression tree methods. In addition, we tried all possible sub-stratified quantitative variables. No other model with equal or greater representativity gave a higher Relative Risk. Although three of the four variables in the rule were intuitive, the effect of number of P. malariae episodes was not. HyperCube® efficiently sub-stratified quantitative variables to optimize the rule and was able to identify interactions among the variables, tasks not easy to perform using standard data mining methods. Search of local over density in m-dimensional space, explained by easily interpretable rules, is thus seemingly ideal for generating hypotheses for large datasets to unravel the complexity inherent in biological systems.
Authors:
Cheikh Loucoubar; Richard Paul; Avner Bar-Hen; Augustin Huret; Adama Tall; Cheikh Sokhna; Jean-François Trape; Alioune Badara Ly; Joseph Faye; Abdoulaye Badiane; Gaoussou Diakhaby; Fatoumata Diène Sarr; Aliou Diop; Anavaj Sakuntabhai; Jean-François Bureau
Related Documents :
22879885 - Metagenomic annotation networks: construction and applications.
21868995 - Elastic matching of line drawings.
21696145 - Visualization of molecular fingerprints.
22717345 - Spatial autocorrelation in the response of soft-bottom marine benthos to gas extraction...
25373185 - Natural history and morphology of the hoverfly pseudomicrodon biluminiferus and its par...
20677015 - Long-term dynamics of tropical walking sticks in response to multiple large-scale and i...
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't     Date:  2011-09-09
Journal Detail:
Title:  PloS one     Volume:  6     ISSN:  1932-6203     ISO Abbreviation:  PLoS ONE     Publication Date:  2011  
Date Detail:
Created Date:  2011-09-20     Completed Date:  2012-03-01     Revised Date:  2013-06-27    
Medline Journal Info:
Nlm Unique ID:  101285081     Medline TA:  PLoS One     Country:  United States    
Other Details:
Languages:  eng     Pagination:  e24085     Citation Subset:  IM    
Affiliation:
Institut Pasteur, Unité de Pathogénie Virale, Paris, France.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
ABO Blood-Group System / genetics
Algorithms*
Child
Child, Preschool
Data Mining / methods*
Female
Glucosephosphate Dehydrogenase / genetics
Humans
Infant
Logistic Models
Malaria / epidemiology*,  genetics,  parasitology*
Male
Multivariate Analysis
Mutation
Plasmodium falciparum / isolation & purification
Plasmodium malariae / isolation & purification
Polymorphism, Genetic
Prognosis
Reproducibility of Results
Risk Assessment / methods
Risk Factors
Chemical
Reg. No./Substance:
0/ABO Blood-Group System; EC 1.1.1.49/Glucosephosphate Dehydrogenase
Comments/Corrections
Erratum In:
PLoS One. 2011;6(10). doi:10.1371/annotation/654e34ce-f1cd-4207-b2ac-ebc873b821e9

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  GRIM-1, a novel growth suppressor, inhibits rRNA maturation by suppressing small nucleolar RNAs.
Next Document:  GM-CSF production allows the identification of immunoprevalent antigens recognized by human CD4+ T c...