ABSTRACT
Shiga toxin-producing Escherichia coli (STEC) is an enteric foodborne pathogen that can cause mild to severe illness. Here, we report the availability of high-quality whole-genome sequences for 77 STEC strains generated using the PacBio sequencing platform.
GENOME ANNOUNCEMENT
Shiga toxin-producing Escherichia coli (STEC) is a major foodborne pathogen responsible for outbreaks and sporadic cases of diarrheal illness (1). Although the majority of reported STEC infections in the United States are caused by E. coli O157:H7, non-O157 serotypes have grown to be a public health concern both in the United States and internationally, as they can cause severe illness comparable to that caused by STEC O157 (2, 3). Non-O157 STEC has been linked to a range of clinical illnesses, from asymptomatic shedding and mild diarrhea to hemorrhagic colitis and potentially fatal hemolytic-uremic syndrome (HUS); more than 100 STEC serotypes have been linked to such human disease (4). Many of these non-O157 serotypes do not have publicly available PacBio-sequenced genomes.
Here, we report whole-genome sequences for 77 STEC strains representing 43 serotypes. The STEC cultures were grown overnight on blood agar plates at 37°C, and genomic DNA was extracted according to the manufacturer’s protocol (ArchivePure; 5 Prime, Gaithersburg, MD). The DNA was sheared to 20 kb using needle shearing, and the prepared libraries were further size selected using BluePippin (Sage Scientific, Beverly, MA). The large SMRTbell libraries were generated using standard library protocols of the Pacific Biosciences DNA template preparation kit (Pacific Biosciences, Menlo Park, CA). Each strain was sequenced using one, two, or three single-molecule real-time (SMRT) cells. The finished libraries were bound to proprietary P6 version 2 polymerase and sequenced on a PacBio RS II platform using C4 chemistry for 360-min movies. The sequence reads were then filtered and assembled de novo using Falcon, Canu, or the PacBio Hierarchical Genome Assembly Process version 3 (5–7). For 30 strains, whole-genome optical maps were generated using the Argus platform (OpGen, Gaithersburg, MD), and the sequence order was verified using corresponding AflII and NcoI whole-genome maps.
The detected serotypes, accession numbers, and assembly metrics for each genome are listed in Table 1. The average G+C content for all 77 chromosomal sequences was 50.6%. The average coverage ranged from 39.5× to 230.8×, with an average coverage of 109×. All but nine chromosomal sequences were circularized and found to have overlapping ends. Of the nine genomes that could not be circularized due to collapsed or unresolved repeats, a single chromosomal sequence was obtained for 2014C-3741, 2014C-3716, 89-3506, 2013C-3996, and 2013C-3304. The remaining four genomes (2013C-3925, 03-3375, 2014C-4638, and 2012C-4196) had two or more chromosomal contigs. The average genome size of the 73 isolates with a single chromosomal sequence was 5,287,902 bp, ranging from 4,717,123 to 5,858,766 bp. Each genome contained between one and seven plasmids.
Accession numbers and assembly metrics of 77 STEC whole-genome sequences
Accession number(s).The whole-genome sequences have been deposited in the DDBJ/ENA/GenBank under the accession numbers listed in Table 1. The versions described in this paper are first versions.
ACKNOWLEDGMENTS
This work was funded by federal appropriations to the Centers for Disease Control and Prevention, through the Advanced Molecular Detection Initiative line item.
The findings and conclusions of this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention. The use of trade names is for identification only and does not imply endorsement by the Centers for Disease Control and Prevention or by the U.S. Department of Health and Human Services.
FOOTNOTES
- Received 30 March 2018.
- Accepted 30 March 2018.
- Published 10 May 2018.
This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.