Document Detail

DSK: k-mer counting with very low memory usage.
MedLine Citation:
PMID:  23325618     Owner:  NLM     Status:  Publisher    
SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count.We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed, user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered.DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 hours. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: CONTACT:
Guillaume Rizk; Dominique Lavenier; Rayan Chikhi
Related Documents :
23446688 - Neural integrators for decision making: a favorable tradeoff between robustness and sen...
25448858 - Interaction of memory systems during acquisition of tool knowledge and skills in parkin...
25176678 - A-16contributions of learning and novelty to practice effects in older adults.
24200208 - Functional connectivity in inhibitory control networks and severity of cannabis use dis...
2283428 - On enhancement of spectral contrast in speech for hearing-impaired listeners.
11738508 - Reinstatement of acquisition performance by the presentation of the outcome after extin...
Publication Detail:
Type:  JOURNAL ARTICLE     Date:  2013-1-16
Journal Detail:
Title:  Bioinformatics (Oxford, England)     Volume:  -     ISSN:  1367-4811     ISO Abbreviation:  Bioinformatics     Publication Date:  2013 Jan 
Date Detail:
Created Date:  2013-1-17     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9808944     Medline TA:  Bioinformatics     Country:  -    
Other Details:
Languages:  ENG     Pagination:  -     Citation Subset:  -    
Algorizk, 75013 Paris, France.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Previous Document:  High FOXM1 expression was associated with bladder carcinogenesis.
Next Document:  miRCancer: a microRNA-cancer association database constructed by text mining on literature.