Document Detail


A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data.
MedLine Citation:
PMID:  21630089     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA ("reads") are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale.
Authors:
Xingpeng Jiang; Joshua S Weitz; Jonathan Dushoff
Related Documents :
22899809 - Functional constraints on tooth morphology in carnivorous mammals.
22753169 - Australian scuba diving fatalities and decompression sickness: erratum and further anal...
22272139 - Identification of (-)(e)-n-[2(s)-hydroxy-2-(4-hydroxyphenyl) ethyl]ferulamide, a natura...
22663559 - New records of acanthocephalans from birds in the philippines with a description of a n...
22582089 - Maximizing the power of genome-wide association studies: a novel class of powerful fami...
21685089 - Reconstruction of genealogical relationships with applications to phase iii of hapmap.
23655519 - A diffusion equation model for investigations on acoustics in coupled-volume systems.
20824189 - Status of biodiversity in the baltic sea.
18046649 - Information retrieval in tip of the tongue states: new data and methodological advances.
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2011-6-1
Journal Detail:
Title:  Journal of mathematical biology     Volume:  -     ISSN:  1432-1416     ISO Abbreviation:  -     Publication Date:  2011 Jun 
Date Detail:
Created Date:  2011-6-1     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  7502105     Medline TA:  J Math Biol     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Affiliation:
Department of Biology, McMaster University, Hamilton, Ontario, Canada.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Genetic effects of adiponectin single nucleotide polymorphisms on the clustering of metabolic risk f...
Next Document:  Analysis of rpoS and bolA gene expression under various stress-induced environments in planktonic an...