Document Detail


IntPath--an integrated pathway gene relationship database for model organisms and important pathogens.
MedLine Citation:
PMID:  23282057     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases.
RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath.
CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath.
Authors:
Hufeng Zhou; Jingjing Jin; Haojun Zhang; Bo Yi; Michal Wozniak; Limsoon Wong
Related Documents :
20858267 - Spindle assembly checkpoint genes reveal distinct as well as overlapping expression tha...
15804747 - Highly efficient ex vivo gene delivery into human corneal endothelial cells by recombin...
9494727 - Asd-gfp vectors for in vivo expression technology in pseudomonas aeruginosa and other g...
24620747 - Components of the dorsal-ventral pathway also contribute to anterior-posterior patterni...
18399327 - Temporal representation for gene networks: towards a qualitative temporal data mining.
12969137 - Early aldosterone up-regulated genes: new pathways for renal disease?
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't     Date:  2012-12-12
Journal Detail:
Title:  BMC systems biology     Volume:  6 Suppl 2     ISSN:  1752-0509     ISO Abbreviation:  BMC Syst Biol     Publication Date:  2012  
Date Detail:
Created Date:  2013-01-03     Completed Date:  2013-06-06     Revised Date:  2013-07-11    
Medline Journal Info:
Nlm Unique ID:  101301827     Medline TA:  BMC Syst Biol     Country:  England    
Other Details:
Languages:  eng     Pagination:  S2     Citation Subset:  IM    
Affiliation:
NUS Graduate School for Integrative Sciences & Engineering, National University of Singapore, Singapore.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Animals
Computational Biology / methods*
Databases, Genetic*
Humans
Internet
Mice
Models, Animal*
Mycobacterium tuberculosis / genetics*,  metabolism*
Saccharomyces cerevisiae / genetics*,  metabolism*
User-Computer Interface
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Social network analysis: foundations and frontiers on advantage.
Next Document:  Characterization of [4Fe-4S] cluster vibrations and structure in nitrogenase Fe protein at three oxi...