Complete Genome Sequences of 61 Mycobacteriophages

Mycobacteriophages—viruses of mycobacteria—provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis. Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc2155 that expand our understanding of phage diversity.

the planet, with a global population of 10 31 particles. With an estimated 10 23 productive infections per second worldwide, the population is vast, dynamic, and genetically diverse (1)(2)(3)(4). As of March 2016, the National Center for Biotechnology Information (NCBI) lists 1,757 Caudovirales genomes, 318 of which infect Mycobacterium hosts. Previous comparative analyses of mycobacteriophages revealed substantial diversity and mosaic architectures resulting from nonhomologous recombination. Integrated research-education programs such as Phage Hunters Integrating Research and Education (PHIRE) (5), Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) (6), the Mycobacterial Genetics Course at the University of Kwazulu-Natal (K-RITH), and the University of California-Los Angeles's Research Immersion Laboratory in Virology, isolated, sequenced, and annotated the phages reported here (Table 1) using M. smegmatis as a host.
Phages were isolated by direct plating of filtered soil extracts or from enriched cultures, followed by plaque purification. Electron microscopy shows that 60 have siphoviral morphotypes, and HyRo is the sole member of the Myoviridae. Most have isometric capsids, the exceptions being Bipper, Sbash, and Zakhe101 with prolate heads. Genomic DNA was extracted from high titer lysates, sheared, and sequenced at the University of Pittsburgh, University of California-Los Angeles, the DOE Joint Genome Institute, or Virginia Commonwealth University using either Sanger, Illumina, Ion Torrent, or 454 technology. Sequence reads were assembled using Newbler (Roche) and Consed (7) and coverage depths range from 47-fold to 2,308-fold, with an average of 200-fold. Sequence assemblies revealed discrete genome ends for 52 phages, and the 9 with circularly permuted assemblies were bioinformatically linearized such that base one was assigned in accord with other mycobacteriophages. Genomes were annotated using DNA Master (http://cobamide2.bio.pitt.edu), Phamerator (8), Glimmer (9), GeneMark (10), Aragorn (11), and tRNAscanSE (12), and functions were determined using the public databases GenBank, Protein DataBase, pfamA, and phagesdb.org with BLAST (13), and HHPred (14). Genomes were assigned to clusters or subclusters as described previously (15).
Notwithstanding the large extant collection of sequenced mycobacteriophage genomes, these newly sequenced phages considerably expand our understanding of mycobacteriophage diversity. Twenty-two are members of the largest cluster, cluster A, but span 7 of the 15 subclusters. The others are broadly distributed across other clusters, including B, C, E, F, G, I, K, L, M, N, O, and P. Cosmo has substantial nucleotide sequence similarity to the singleton phage Wildcat, forming the new cluster V. The eight cluster N phages, Cedasite (G1), and Brusacoram (P) are notable in that they contain integration-dependent immunity systems in which the phage attachment site (attP) is located within the repressor gene (16).
As is typical of other sequenced phage genomes, functions can be assigned to only~25% of the predicted genes, primarily those involved in virion capsid and assembly and well-conserved genes associated with DNA metabolism. Two of the cluster A genomes (Eidsmoe, ArcherNM) contain partitioning systems in place of integration cassettes; several genomes (e.g., Phrann, Xeno) encode toxin-antitoxin systems; and three encode Lsr2 homologs (Lolly9, Lumos, and Snenia).
Nucleotide sequence accession numbers. Nucleotide sequence accession numbers for all phages are shown in Table 1.