|Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer.|
|Jump to Full Text|
|PMID: 23199210 Owner: NLM Status: MEDLINE|
|New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli.|
|Rommel Thiago Jucá Ramos; Adriana Ribeiro Carneiro; Siomar de Castro Soares; Anderson Rodrigues dos Santos; Sintia Almeida; Luis Guimarães; Flávia Figueira; Eudes Barbosa; Andreas Tauch; Vasco Azevedo; Artur Silva|
|Type: Journal Article; Research Support, Non-U.S. Gov't Date: 2012-12-02|
|Title: Microbial biotechnology Volume: 6 ISSN: 1751-7915 ISO Abbreviation: Microb Biotechnol Publication Date: 2013 Mar|
|Created Date: 2013-02-15 Completed Date: 2013-07-29 Revised Date: 2014-02-14|
Medline Journal Info:
|Nlm Unique ID: 101316335 Medline TA: Microb Biotechnol Country: United States|
|Languages: eng Pagination: 150-6 Citation Subset: IM|
|© 2012 The Authors. Published by Society for Applied Microbiology and Blackwell Publishing Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.|
|APA/MLA Format Download EndNote Download BibTex|
Computational Biology / methods*
Corynebacterium Infections / microbiology, veterinary
Corynebacterium pseudotuberculosis / genetics*, isolation & purification
DNA, Bacterial / analysis
Electrochemical Techniques / instrumentation, methods
Genome, Bacterial / genetics*
Genomics / methods
Horse Diseases / microbiology
Sequence Analysis, DNA* / instrumentation, methods
Journal ID (nlm-ta): Microb Biotechnol
Journal ID (iso-abbrev): Microb Biotechnol
Journal ID (publisher-id): mbt2
Publisher: Blackwell Publishing Ltd
© 2012 The Authors. Microbial Biotechnology published by Blackwell Publishing Ltd and Society for Applied Microbiology.
Received Day: 03 Month: 9 Year: 2012
Revision Received Day: 15 Month: 10 Year: 2012
Accepted Day: 16 Month: 10 Year: 2012
Print publication date: Month: 3 Year: 2013
Electronic publication date: Day: 02 Month: 12 Year: 2012
Volume: 6 Issue: 2
First Page: 150 Last Page: 156
PubMed Id: 23199210
|Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer|
|Rommel Thiago Jucá Ramos1|
|Adriana Ribeiro Carneiro1|
|Siomar de Castro Soares2|
|Anderson Rodrigues dos Santos2|
1Institute of Biological Sciences, Federal University ParáBelém, Pará, Brazil
2Institute of Biological Sciences, Federal University Minas GeraisBelo Horizonte, Minas Gerais, Brazil
3Center for Biotechnology, Institute for Genome Research and Systems BiologyBielefeld, Germany
|Correspondence: *E-mail email@example.com; Tel. (+55) 91 3201 8426; Fax (+55) 91 3201 8426.
With the advent of next-generation sequencer (NGS) platforms, such as 454 Roche, SOLiD and Illumina, there has been an increase in the number of projects for whole-genome sequencing (WGS) mainly due to cost reduction and the increased speed of sequencing and data generation. Nonetheless, problems related to the assembly of sequences arising from these platforms, such as the resolution of repetitive regions of the genome and representation of low-coverage regions with short reads, have made the development of hybrid strategies with different platforms and assemblers necessary to achieve successful assemblies (Kircher and Kelso, 2010; Cerdeira et al., 2011).
Currently, these technologies are being developed and others are emerging, such as the Ion Torrent sequencer. The Ion Torrent identifies nucleotides by using a semiconductor to detect the pH change caused by the release of H+ protons after a nucleotide is incorporated into the sequence, with each nucleotide added in a different cycle (Rothberg et al., 2011). In this platform, sequencing coverage varies according to the chip used: 314, 316 and 318 chips are capable of producing 10 Mb, 100 Mb and 1 Gb, respectively, of sequences with an average read length of 200 bp (http://www.appliedbiosystems.com). Therefore, the Ion Torrent can be used to sequence the genomes of prokaryote organisms in a fast, low-cost manner.
Corynebacterium pseudotuberculosis is included in the CMNR group, which includes bacteria of the genera Corynebacterium, Mycobacterium, Nocardia and Rhodococcus. These bacteria are of interest in veterinary science, and their high lipid content, including mycolic acid, the most prevalent, from the cell wall and meso-diaminopimelic acid distinguishes them from other genera. Polysaccharides, such as arabinose, galactose and some types of mannose, can also be found. As demonstrated in other studies, mycolic acid is the best characterized component and plays an important role in the virulence of Mycobacterium tuberculosis (Dorella et al., 2006).
Corynebacterium pseudotuberculosis infection leads to significant economic losses related to the decrease in productivity of infected animals. Various strains of C. pseudotuberculosis from goats and sheep (biovar ovis) have already been isolated, sequenced and studied, including strains 1002 and C231 (Ruiz et al., 2011). However, there is still a scarcity of genomic information on strains isolated from horse (biovar equi), which causes significant problems for horse breeders in California (USA) (Doherr et al., 1998).
The present study reports the assembly of the complete genome sequence of C. pseudotuberculosis 316, isolated from a horse and sequenced using the Ion Torrent platform. A pipeline was created for genome assembly that consisted of a new tool with a quality filter and in-house scripts for data preprocessing and assembly software used for short reads without the expected requirement of algorithm optimization (Earl et al., 2011). The functional annotation of the genome was subsequently performed, followed by a comparative analysis between the pathogenicity islands (PAIs) identified in strain 316 and other C. pseudotuberculosis strains already deposited in the biological databases.
The three chips used in sequencing yielded a total of 898 389 reads (160 607 819 bp). These data were submitted to the first stage of the pipeline (Fig. 1): quality filter with Quality Assessment software, long-read version, to achieve the maximum quality of sequences possible. After this step, 443 632 reads (37 247 006 bp) remained, which represented 16 × sequence coverage. If the quality filter had been based solely on the average quality of the reads, there would have been only 16 467 reads (1 918 221 bp) remaining, which would represent less than 1 × coverage of the genome.
In the second step, the filtered reads were subjected to assembly using the Velvet and Edena3 software programs. The best assemblies with Velvet and Edena3 were obtained using a k-mer of 31 and coverage cut-off of 5 and k-mer of 45 and coverage cut-off of 6, respectively, and the largest N50 was observed with the Velvet assembly (Table 1).
Once the best results from Velvet and Edena were combined, 19 731 contigs were obtained (N50 of 295). However, after the removal of redundant contigs by the Simplifier software (http://sourceforge.net/projects/simplifier), only 16 287 contigs remained (N50 of 323), resulting in a 17.45% reduction in the number of contigs.
When processing the 16 287 contigs in G4ALL (http://g4all.sourceforge.net/), using the genome of C. pseudotuberculosis FRC41 as a reference (Trost et al., 2010), 160 contigs that aligned in more than one region of the genome, 20 that had non-specific alignments (length < 40 bp and E-value > 1 × 10−5), 629 that did not align and 15 478 that were mapped to unique regions were identified. Similarities were present between the extremities of 1222 of these contigs, which were therefore extended. Only 14 256 contigs that were solely mapped to the reference genome remained.
Contigs mapped against the reference genome using G4ALL (14 436 sequences), even when below the cut-off criteria, were used in the CLC Genomics Workbench 4.7.2 software for alignment against the FRC41 genome. After this alignment, only 312 sequences were not mapped (55 kb), and a primary scaffold was generated with 5758 gaps (3687 of 1 bp, 1003 of 2 bp, 1069 between 2 and 1000 bp and 16 greater than 1000 bp). Following manual curation and with the help of the Ion Torrent reads and unaligned contigs from the G4ALL and CLC mappings, the number of gaps was reduced to 43, producing a draft assembly with 2 289 075 bp.
To ensure that all the sequenced bases were represented in the assembly, the 443 632 filtered reads were aligned against the draft of the Cp316 genome, identifying 36 444 reads that failed to align. Among these, 19 800 generated 219 contigs via de novo assembly and the remaining reads were mapped against genome clusters of C. pseudotuberculosis to produce 7687 contigs. All contigs obtained from the de novo assembly (7906 contigs) were mapped against the draft genome, and only 139 sequences larger than 100 bp were not mapped. The contigs were inserted in the genome with the help of G4ALL, CLC Genomics Workbench and similarity searches in biological databases, which resulted in the completion of a Cp316 genome assembly containing 2 310 587 bp.
With the prior annotation, we identified more than 400 pseudogenes, many of them due to false frameshifts generated by homopolymers. After manual curation with CLC, only 64 pseudogenes remained (Fig. 2A), which is in agreement with the other strains.
The identification of the PAIs of C. pseudotuberculosis 316 (GenBank: CP003077) was performed following genome annotation using the Pathogenicity Island Prediction Software (PIPS) (Soares et al., 2012). Seven PAIs were identified in C. pseudotuberculosis 316 and showed synteny with the PAIs previously described for C. pseudotuberculosis strains 1002 and C231 (PICPs 1–7). However, the putative PAIs of C. pseudotuberculosis 4 and 5 (PICPs 4 and 5) presented large deletions in strain 316 (Figs S1 and S2), as observed from the synteny map (Fig. 2B), including 30 deleted coding sequences (CDSs) when compared with C. pseudotuberculosis strains 1002 and C231 (Fig. 3). Among these 30 CDSs, 22 were annotated as hypothetical proteins and the remaining CDSs presented similarities to an integrase (Cp1002_0990), a phage-associated protein (Cp1002_1448), a p51 protein (Cp1002_1449), rRNA biogenesis protein rrp5 (Cp1002_1450), RNA polymerase factor sigma-70 (Cp1002_1452), DNA methylase (Cp1002_1457) and two ABC transporter ATP binding proteins (Cp1002_1464 and Cp1002_1465). Furthermore, PICP 5 indicated two new CDSs in the C. pseudotuberculosis 316 genome that encode hypothetical proteins.
Interestingly, the PIPS program predicted four additional PAIs in C. pseudotuberculosis 316 (PICPs 8–11) that were also automatically predicted for C. pseudotuberculosis 1002 but were discarded after manual curation. The comparison of C. pseudotuberculosis 316 against the genomes of C. pseudotuberculosis biovar ovis (1002 and C231) and equi (CIP52.97) strains and those of Corynebacterium diphtheriae and Corynebacterium ulcerans strains (Fig. 3A) clearly demonstrated that the putative PAIs are located in ‘hotspots’ for horizontal gene transfer, and these regions will be treated as such (Figs S3–S6). Additionally, PICP 9, similarly to PICPs 4 and 5 (Fig. 3B), also has a large deletion when compared with C. pseudotuberculosis biovar ovis strains (1002 and C231). The deletions in PICPs 4, 5 and 9 are in agreement with C. pseudotuberculosis CIP 52.97, which is also a biovar equi strain, and C. ulcerans strains 809 and BR-AD22 (Trost et al., 2011). Taken together, these observations corroborate the correct assembly of this genome sequence and may be indicative of the host-specific pattern of the biovars equi and ovis.
The organism C. pseudotuberculosis 316 was isolated from the abscess of an American horse in California (USA) and sequenced three times using the Ion Torrent platform (Rothberg et al., 2011) with the 314 chip. A total of 160 Mb of sequence was obtained with 69 × coverage. The genome sequence of C. pseudotuberculosis FRC41 (GenBank: CP002097), containing approximately 2.3 Mb of genetic information, was used as a reference.
The reads produced by the Ion Torrent platform vary in size, and quality values for the bases are reduced as the 3′ region is approached. Thus, to avoid the disposal of reads due to low-quality bases at the extremities and random trimming of the extremities, a long-read version was developed for the Quality Assessment software (Ramos et al., 2011), long-read version (http://sourceforge.net/projects/qualevaluato), which removes the adapters and uses 31 bp seeds to implement a quality filter. The seeds were placed on the first base of the read and moved to the next base until the average quality reached the cut-off value (Phred 20). The seed extension process was initiated from this point until the cut-off value was reached to maximize the use of high-quality regions in the de novo assembly (Fig. 4).
To produce complementary results in the genome assembly, Velvet 1.0.04 (Zerbino and Birney, 2008) and Edena 3 (Hernandez et al., 2008) softwares were used, which utilize the Eulerian Path and overlap–layout–consensus methodologies respectively. The assembly parameter values for the k-mer and coverage cut-off values varied from 29 to 45 and 5 to 15 respectively.
Due to the differently sized reads produced by Ion Torrent and because the Edena3 assembler only accepts sequences of the same size, an in-house script was developed to enable the use of the assembler. Thus, the filtered reads derived from the sequencing were processed by the script, producing new high-quality 50-mer reads (Fig. 1B).
The best set of contigs generated by each of the assemblers was selected and saved in a single file and then subjected to the Simplifier software, which removes redundant sequences from the contig set. The remaining contigs were oriented and ordered using G4ALL, and the BlastN algorithm (Altschul et al., 1990) was used to align the contigs against the reference genome of C. pseudotuberculosis FRC41 considering a minimum size of 40 bp for the alignment. Those sequences that produced significant hits (alignment size > 39 bp and E-value of 1 × 10−5) were analysed and extended by the software considering a minimum of 30 bp of overlap between extremities.
Once the contigs were processed through G4ALL, they were inserted into the CLC Genomics Workbench 4.7.2 software and aligned against the genome of C. pseudotuberculosis FRC41 to generate a scaffold composed of nucleotide sequences that represent regions not covered by the de novo contigs. The coordinates and sizes of the gaps were mapped by an in-house script to be analysed using the CLC Genomics Workbench, in which the alignment of filtered reads was performed against the scaffold. This process was conducted recursively to reduce the number of gaps caused by low-coverage sequencing until it was no longer possible to close the gaps. Thus, version 1 of the genome was produced. This pipeline is presented in Fig. 1.
The filtered reads that failed to align with version 1 of the genome were used for de novo assembly using the CLC Genomics Workbench 4.7.2, and contigs were subjected to mapping against the database composed by a set of genomes of Corynebacterium available at the NCBI (Table 2 ): C. diphtheriae NCTC 13129 (GenBank: NC_002935.2), Corynebacterium glutamicum ATCC 13032 (GenBank: NC_006958.1), C. glutamicum R chromosome (GenBank: NC_009342.1), C. pseudotuberculosis 1002 (GenBank: CP001809.1), C. pseudotuberculosis C231 (GenBank: CP001829.1) and C. pseudotuberculosis FRC41 (GenBank: NC_014329.1). These genomes were used to generate new contigs to be inserted in strain 316 using the G4ALL software by mapping them against the genome of C. pseudotuberculosis 316.
CLC Genomics Workbench 4.7.2 software was used to align all filtered reads against the genome of C. pseudotuberculosis 316 and therefore identify the set of reads that were not mapped. The unmapped reads were used to establish whether there were regions of the genome that had not been represented, and these regions were subsequently inserted.
Glimmer software (Delcher et al., 1999) was used for genome annotation to predict coding regions. Repetitions in the genome were identified by RepeatScout (Price et al., 2005) via a search for similarities against its own database. RNAmmer software (Lagesen et al., 2007) was used to predict rRNAs. The protein domain analysis was performed using the Interpro database, which includes several banks of protein domains, motifs and families, and the Interproscan tool (Quevillon et al., 2005) was used to increase the reliability of the predictions (Hunter et al., 2009).
Frameshifts were identified following the annotations and were mostly generated by the failure to identify homopolymers with Ion Torrent, as cited by Mellmann and colleagues (2011). The frameshifts were corrected through manual curation in the CLC Genomic Workbench, in which the reads produced by the Ion Torrent were aligned against the reference genome of C. pseudotuberculosis strain FRC41 (GenBank: NC_014329) and strain 316 of the same organism. The annotated genome sequence of C. pseudotuberculosis 316 has been deposited in the GenBank database with Accession Number CP003077.
The identification of PAIs was performed using the PIPS program (Soares et al., 2012), Artemis Comparison Tool (ACT) (Carver et al., 2005) and Blast Ring Image Generator (Alikhan et al., 2011). First, the PAIs of C. pseudotuberculosis strain 316 were automatically predicted using PIPS, which uses the classical features of PAIs for prediction, i.e. codon usage deviation, atypical G+C content, a high concentration of virulence factors and hypothetical proteins and the presence of transposases and tRNA flanking regions. Following the automatic analysis, the predicted islands were compared with the seven PAIs present in C. pseudotuberculosis strain 1002 (GenBank: CP001809) and strain C231 (GenBank: CP001829.1), both from biovar ovis, and strain CIP52.97 (GenBank: CP003061) from biovar equi.
M. P. S., A S., V. A., A. R. C., S. S., S. A., A. S., F. F. and E. B. were supported by the National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq). R. T. J. R. acknowledges support from the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – CAPES).
This study is supported by Fundação de Amparo a Pesquisa do Estado do Pará, Superintendência de Desenvolvimento da Amazônia, Life Technologies and Pronex Núcleo Amazônico de Excelência em Genômica de Microorganismos. M. P. S., A. S., V. A., A. R. C., S. S., S. A., A. S., F. F. and E. B. were supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico – CNPq. R. T. J. R. acknowledges support from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – CAPES.
|Alikhan NF,Petty NK,Ben Zakour NL,Beatson SA. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisonsBMC GenomicsYear: 20111240221824423|
|Altschul SF,Gish W,Miller W,Myers EW,Lipman DJ. Basic local alignment search toolJ Mol BiolYear: 19902154034102231712|
|Carver TJ,Rutherford KM,Berriman M,Rajandream M-A,Barrell BG,Parkhill J. ACT: the Artemis Comparison ToolBioinformaticsYear: 2005213422342315976072|
|Cerdeira LT,Carneiro AR,Ramos RTJ,de Almeida SS,D'Afonseca V,Schneider MPC,et al. Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case studyJ Microbiol MethodsYear: 20118621822321620904|
|Delcher AL,Harmon D,Kasif S,White O,Salzberg SL. Improved microbial gene identification with GLIMMERNucleic Acids ResYear: 1999274636464110556321|
|Doherr MG,Carpenter TE,Hanson KMP,Wilson WD,Gardner IA. Risk factors associated with Corynebacterium pseudotuberculosis infection in California horsesPrev Vet MedYear: 1998352292399689656|
|Dorella F,Pacheco LGC,Oliveira SC,Miyoshi A,Azevedo V. Corynebacterium pseudotuberculosis: microbiology, biochemical properties, pathogenesis and molecular studies of virulenceVet ResYear: 20063720121816472520|
|Earl D,Bradnam K,St John J,Darling A,Lin D,Fass J,et al. Assemblathon 1: a competitive assessment of de novo short read assembly methodsGenome ResYear: 2011212224224121926179|
|Hernandez D,François P,Farinelli L,Osteras M,Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computerGenome ResYear: 20081880280918332092|
|Hunter S,Apweiler R,Attwood TK,Bairoch A,Bateman A,Binns D,et al. InterPro: the integrative protein signature database. DatabaseNucleic Acids ResYear: 200937211215|
|Kircher M,Kelso J. High-throughput DNA sequencing—concepts and limitationsBioessaysYear: 20103252453620486139|
|Lagesen K,Hallin P,Rødland EA,Staerfeldt H-H,Rognes T,Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genesNucleic Acids ResYear: 2007353100310817452365|
|Mellmann A,Harmsen D,Cummings C,Zentz EB,Leopold SR,Rico A,et al. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technologyPLoS ONEYear: 20116e2275121799941|
|Price AL,Jones NC,Pevzner PA. De novo identification of repeat families in large genomesBioinformaticsYear: 200521351358|
|Quevillon E,Silventoinen V,Pillai S,Harte N,Mulder N,Apweiler R,Lopez R. InterProScan: protein domains identifierNucleic Acids ResYear: 200533116120|
|Ramos RT,Carneiro AR,Baumbach J,Azevedo V,Schneider MP,Silva A. Analysis of quality raw data of second generation sequencers with Quality Assessment SoftwareBMC Res NotesYear: 2011413021501521|
|Rothberg JM,Hinz W,Rearick TM,Schultz J,Mileski W,Davey M,et al. An integrated semiconductor device enabling non-optical genome sequencingNatureYear: 201147534835221776081|
|Ruiz JC,D'Afonseca V,Silva A,Ali A,Pinto AC,Santos AR,et al. Evidence for reductive genome evolution and lateral acquisition of virulence functions in two Corynebacterium pseudotuberculosis strainsPLoS ONEYear: 20116e1855121533164|
|Soares SC,Abreu VA,Ramos RTJ,Cerdeira L,Silva A,Baumbach J,et al. PIPS: Pathogenicity Island Prediction SoftwarePLoS ONEYear: 20127e3084822355329|
|Trost E,Ott L,Schneider J,Schröder J,Jaenicke S,Goesmann A,et al. The complete genome sequence of Corynebacterium pseudotuberculosis FRC41 isolated from a 12-year-old girl with necrotizing lymphadenitis reveals insights into gene-regulatory networks contributing to virulenceBMC GenomicsYear: 20101172821192786|
|Trost E,Al-Dilaimi A,Papavasiliou P,Schneider J,Viehoever P,Burkovski A,et al. Comparative analysis of two complete Corynebacterium ulcerans genomes and detection of candidate virulence factorsBMC GenomicsYear: 20111238321801446|
|Zerbino DR,Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphsGenome ResYear: 20081882182918349386|
Additional Supporting Information may be found in the online version of this article:
Fig. S1. Comparative analyses of pathogenicity islands of Corynebacterium pseudotuberculosis, revealing the deletion of the PICP4 of C. pseutouberculosis 316.
Fig. S2. Comparative analyses of pathogenicity islands of Corynebacterium pseudotuberculosis, revealing the deletion of the PICP5 of C. pseutouberculosis 316.
Fig. S3. Comparative analyses of pathogenicity islands of C. ulcerans BRAD-22 (PICU6 – at the top), C. pseudotuberculosis 1002 (PICP8 – in the middle) and diphtheriae NCTC13129 (PICD16-at the botton).
Fig. S4. Comparative analyses of pathogenicity islands of C. ulcerans BRAD-22 (at the top), C. pseudotuberculosis 1002 (PICP9 – in the middle) and C. diphtheriae NCTC13129 ( at the botton).
Fig. S5. Comparative analyses of pathogenicity islands of C. ulcerans BRAD-22 (PICU10 – at the top), C. pseudotuberculosis 1002 (PICP10 – in the middle) and C. diphtheriae NCTC13129 (at the botton).
Fig. S6. Comparative analyses of pathogenicity islands of C. ulcerans BRAD-22 (PICU6 – at the top), C. pseudotuberculosis 1002 (PICP11 – in the middle) and C. diphtheriae NCTC13129 (PICD24 – at the botton).
Quality evaluation of Velvet-and Edena-assembled genomes
|N50||Largest contig||Smaller contig||Number of contigs||Bases|
|Velvet||423||2476||100||7 260||2 482 519|
|Edena3||210||1109||100||12 471||2 370 369|
Descriptions of the species used during the comparative analyses
|Genome||Size (bp)||Number of CDS||GC %||rRNA||tRNA||Pseudogene|
|Corynebacterium diphtheriae NCTC 13129||2 488 635||2272||53.5||15||54||47|
|Corynebacterium glutamicum ATCC 13032||3 282 708||3099||53.8||18||60||1|
|Corynebacterium glutamicum R||3 314 179||3080||54.1||18||58||–|
|Corynebacterium pseudotuberculosis 1002||2 335 113||2090||52.2||12||48||53|
|Corynebacterium pseudotuberculosis C231||2 328 208||2091||52.2||11||48||54|
|Corynebacterium pseudotuberculosis FRC41||2 337 913||2110||52.2||12||49||–|
|Corynebacterium pseudotuberculosis 316||2 310 587||2106||52.1||12||49||67|
Previous Document: Complement activation in astrocytomas: deposition of C4d and patient outcome.
Next Document: Bevacizumab as First-Line Therapy in Advanced Non-Small-Cell Lung Cancer: A Brazilian Center Experie...