Document Detail


DSK: k-mer counting with very low memory usage.
MedLine Citation:
PMID:  23325618     Owner:  NLM     Status:  Publisher    
Abstract/OtherAbstract:
SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count.We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed, user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered.DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 hours. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: http://minia.genouest.org/dsk CONTACT: rayan.chikhi@ens-cachan.org.
Authors:
Guillaume Rizk; Dominique Lavenier; Rayan Chikhi
Related Documents :
25229458 - Canine sense and sensibility: tipping points and response latency variability as an opt...
25163868 - Cognitive neuroscience: the troubled marriage of cognitive science and neuroscience.
18386988 - Optimizing one-shot learning with binary synapses.
25465358 - The p300 in middle cerebral artery strokes or hemorrhages: outcome predictions and sour...
18608878 - Specific language impairment in childhood is associated with impaired mental and social...
1777848 - Attentional factors in visual field asymmetries.
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2013-1-16
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  -     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2013 Jan 
Date Detail:
Created Date:  2013-1-17     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Affiliation:
Algorizk, 75013 Paris, France.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  High FOXM1 expression was associated with bladder carcinogenesis.
Next Document:  miRCancer: a microRNA-cancer association database constructed by text mining on literature.