ABSTRACT
Here, we announce the complete genome sequence of the Mycobacterium avium subsp. avium strain DSM 44156, also deposited as ATCC 25291 and TMC 724. The reference strain was originally described as a serotype 2 strain isolated from a hen by F. D. Chester in 1901.
ANNOUNCEMENT
Mycobacterium avium is the nontuberculous mycobacterial species which is most frequently isolated from animals and humans (1). Due to phenotypical and especially molecular typing techniques, Mycobacterium avium was classified into four subspecies, M. avium subsp. avium, M. avium subsp. paratuberculosis, M. avium subsp. silvaticum, and M. avium subsp. hominissuis (2, 3). However, despite belonging to the same species, M. avium subspecies differ considerably, e.g., in growth behavior, genomic organization, pathogenicity, and host preference. M. avium subsp. avium, M. avium subsp. silvaticum, and M. avium subsp. paratuberculosis are obligate pathogens. M. avium subsp. avium causes avian tuberculosis, a fatal mycobacteriosis, in birds (4). M. avium subsp. silvaticum causes mycobacteriosis in pigeons (2, 5). M. avium subsp. paratuberculosis is the causative agent of paratuberculosis, a chronic, progressive, and fatal enteritis in ruminants (6). In contrast, M. avium subsp. hominissuis is considered an environmental bacterium and opportunistic pathogen for humans, pigs, and other species (1). Subspecies-specific large-sequence polymorphisms are suggested to be responsible for the individual features of each subspecies (7–9). Here, we provide the complete genome sequence of the M. avium subsp. avium reference strain DSM 44156, also deposited as ATCC 25291 and TMC 724, a serotype 2 strain originally isolated from liver of a diseased hen (10). The reference strain is one of the most intensively used strains for studying M. avium pathogenicity.
M. avium subsp. avium DSM 44156 was provided by the Leibniz Institute DSMZ and cultured in Middlebrook 7H9 medium with glycerol, oleic acid, albumin, and dextrose at 37°C to the late exponential growth phase. DNA was isolated using a Genomic-tip 100/G (Qiagen, Hilden, Germany). A SMRTbell template library for long-read sequencing was prepared according to the instructions from Pacific Biosciences and sequenced on the PacBio RS II system (Menlo Park, CA, USA), taking one 240-minute movie, resulting in 50,721 reads with a mean (filtered) read length of 10,177 bp. Libraries for sequencing on an Illumina platform were prepared by applying a Nextera XT DNA library preparation kit with modifications (11) and sequenced on the NextSeq 500 platform (Illumina, San Diego, CA, USA), leading to 8.8 million reads. Long-read genome assembly was performed with the RS_HGAP_Assembly.3 protocol included in SMRT Portal 2.3.0 by applying a target genome size of 10 Mbp, including quality filtering with a minimum subread length of 500 bp and a minimum polymerase read quality of 0.8. The chromosomal contig was circularized, and in particular, artificial redundancies at the ends of the contigs were removed and adjusted to dnaA. Identification of redundancies and the replication gene was done based on BLAST, and circularization and rotation to the replication genes were performed using the genomecirculator.jar tool (https://github.com/boykebunk/genomefinish). Error correction was performed by mapping the Illumina short reads onto the finished genome using the Burrows-Wheeler Aligner (BWA) 0.6.2 in paired-end (sample) mode using default settings (12) with subsequent variant and consensus calling using VarScan 2.3.6 (13). Genome annotation was based on Prokka 1.8 (14) with subsequent manual curation.
The complete circular chromosome comprises 4,956,929 bp with 4,626 predicted genes and 4,529 coding sequences (CDS), 301 of which carry signal peptide sequences. The GC content was determined to be 69.3%. The genome harbors 3 rRNA genes and 58 tRNA genes. The average (long-read) sequencing depth is 87×. Sequencing was conducted from authentic biological material grown from an ampule at DSMZ.
Data availability.The genome sequence has been deposited at NCBI GenBank under accession number CP046507. The version described in this paper is the first version, CP046507.1. The raw sequence reads have been submitted to the NCBI SRA under the accession number of the corresponding BioProject, PRJNA591110.
ACKNOWLEDGMENTS
We thank Simone Severitt, Carola Berg, and Jolanthe Swiderski (all at the Leibniz Institute DSMZ) for excellent technical assistance.
This work was supported by a grant from the German Research Foundation (DFG) to R.G. (Go983/4-1). The publication was supported by the DFG and the University of Veterinary Medicine Hannover Foundation within the funding program Open Access Publishing.
FOOTNOTES
- Received 18 December 2019.
- Accepted 19 January 2020.
- Published 13 February 2020.
- Copyright © 2020 Goethe et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.