ABSTRACT
The genus Yersinia includes three human pathogens, of which Yersinia pestis is responsible for >2,000 illnesses each year. To aid in the development of detection assays and aid further phylogenetic elucidation, we sequenced and assembled the complete genomes of 32 strains (across 9 Yersinia species).
GENOME ANNOUNCEMENT
The genus Yersinia contains 11 species, with three human pathogens, Y. pestis, Y. pseudotuberculosis, and Y. enterocolitica. Of these, Y. pestis is the most virulent, causing >2,000 global cases of plague annually, along with three global pandemics (1, 2). Y. pestis is a category A pathogen and potential biowarfare agent (3, 4), while Y. pseudotuberculosis and Y. enterocolitica cause food-borne self-limiting enteric diseases with low mortality rates (5). Recently, the list of strains for consideration in diagnostic assay development was released by the Association of Analytical Communities (AOAC) International, including strains that should be recognized (inclusivity) and ignored (exclusivity) by the assays (6). Here, we present the completed genome assemblies for 32 (see Table 1) of the 33 listed Yersinia strains (YPNN7 Y. pseudotuberculosis IB was not included due to technical issues).
List of strains included in the data set, their accession numbers, and plasmids
Each genome was assembled using at least two data sets (specific data types and coverages are listed in the NCBI records), from Illumina (short- and/or long-insert paired data), Roche 454 (long-insert paired data), and/or PacBio long reads. The short- and long-insert paired data were assembled together in both Newbler and Velvet and computationally shredded into 1.5-kbp overlapping shreds. If the PacBio coverage was ≥100×, the data were assembled using the PacBio Hierarchical Genome Assembly Process (HGAP) (7). All data were additionally assembled in AllPaths (8). The consensus sequences from both HGAP and AllPaths were computationally shredded into 10-kbp overlapping pieces. All shreds were integrated using Phrap. Possible misassemblies were corrected and repeat regions verified using in-house scripts and manual editing in Consed (9–11). All genomes were assembled to finished-quality completion (12), and each assembly was annotated using an Ergatis-based (13) workflow, with minor manual curation.
The genome sizes averaged 4.68 ± 0.04 Mb (Table 1; the smallest is Yersinia ruckeri YRB, at 3.6 Mb, and the largest is Y. pestis Antiqua, at 4.9 Mb), with up to 4 plasmids (average, 1.6 ± 0.2). Each genome contains 3,161 to 4,419 coding sequences (average, 4,155 ± 39.9) and a G+C content of 47 to 48%. As many of the virulence genes are located on plasmids, it is interesting to note that of the 16 Y. pestis strains, only 9 had all three “traditional” plasmids (pYV/pCD1 [virulence/calcium dependence], pPCP [plasminogen activator], and pMT [murine toxin]), with one strain (Y. pestis Nairobi) containing the pPCP plasmid only.
Nucleotide sequence accession numbers.The GenBank accession numbers for all 32 genomes are listed in Table 1.
ACKNOWLEDGMENTS
Funding for this effort was provided by the Defense Threat Reduction Agency's Joint Science and Technology Office (DTRA J9-CB/JSTO) and the Department of Homeland Security Science and Technology Directorate award HSHQDC-08-X-00790.
This paper is approved by LANL for unlimited release (LA-UR-14-29606).
The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, or the United States Government.
The bacterial strains were obtained from the Department of Defense's Unified Culture Collection (http://www.usamriid.army.mil/ucc/).
FOOTNOTES
- Received 5 February 2015.
- Accepted 6 March 2015.
- Published 30 April 2015.
- Copyright © 2015 Johnson et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported license.