| DSK: k-mer counting with very low memory usage. | |
| | |
MedLine Citation:
|
PMID: 23325618 Owner: NLM Status: Publisher |
Abstract/OtherAbstract:
|
SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count.We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed, user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered.DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 hours. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: http://minia.genouest.org/dsk CONTACT: rayan.chikhi@ens-cachan.org. |
| | |
Authors:
|
Guillaume Rizk; Dominique Lavenier; Rayan Chikhi |
Related Documents
:
|
23446688 - Neural integrators for decision making: a favorable tradeoff between robustness and sen... 15565768 - Contextual processing of structured data by recursive cascade correlation. 17211628 - Modelling memory functions with recurrent neural networks consisting of input compensat... 17452288 - Brain network for passive word listening as evaluated with ica and granger causality. 2283428 - On enhancement of spectral contrast in speech for hearing-impaired listeners. 11738508 - Reinstatement of acquisition performance by the presentation of the outcome after extin... |
Publication Detail:
|
Type: JOURNAL ARTICLE Date: 2013-1-16 |
Journal Detail:
|
Title: Bioinformatics (Oxford, England) Volume: - ISSN: 1367-4811 ISO Abbreviation: Bioinformatics Publication Date: 2013 Jan |
Date Detail:
|
Created Date: 2013-1-17 Completed Date: - Revised Date: - |
Medline Journal Info:
|
Nlm Unique ID: 9808944 Medline TA: Bioinformatics Country: - |
Other Details:
|
Languages: ENG Pagination: - Citation Subset: - |
Affiliation:
|
Algorizk, 75013 Paris, France. |
Export Citation:
|
APA/MLA Format Download EndNote Download BibTex |
| MeSH Terms | |
Descriptor/Qualifier:
|
|
From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
Previous Document: High FOXM1 expression was associated with bladder carcinogenesis.
Next Document: miRCancer: a microRNA-cancer association database constructed by text mining on literature.