Completed Genome Sequences of Strains from 36 Serotypes of Salmonella

ABSTRACT We report here the completed closed genome sequences of strains representing 36 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation between serotypes, particularly as references for mapping of raw reads or to create assemblies of higher quality, as well as to aid in studies of comparative genomics of Salmonella.

S almonella spp. are the leading cause of bacterial gastroenteritis in North America, with over 1.7 million cases per annum (1). Public health jurisdictions are replacing traditional serotyping with whole-genome sequencing (WGS) methodologies for quicker and more accurate outbreak detection and surveillance activities (2). To this end, we previously developed an in silico serotyping platform for Salmonella (3,4).
Unfortunately, the large amount of raw data available in the SRA are primarily composed of Illumina short reads which cannot circularize the Salmonella genome as one contiguous nucleic acid molecule. As of November 2017, the number of fully closed genomes is 501 for Salmonella enterica and 4 for Salmonella bongori. Therefore, we sequenced 36 diverse serotypes of Salmonella using a combination of Illumina and PacBio technologies to produce high-quality genomes for public health and comparative genomics applications. This data set represents 25 novel serotypes with closed reference genomes.
Genomic DNA was isolated using the automated Qiagen EZ1 DNA tissue kit, using the manufacturer's protocol, except 180 l of G2 buffer was used with 10 l of proteinase K and 10 l of lysozyme (10 mg/ml; Sigma-Aldrich, Gillingham, UK). PacBio sequencing was performed at the Génome Québec Innovation Centre (McGill University, Quebec, Canada) using single-molecule real-time (SMRT) cells in an RSII sequencer, which produced 100,000 to 150,000 reads per sample, with an average read length of 6,000 bp. The PacBio read sets were assembled into circular consensus sequences using the HGAP workflow 1.1.13. Illumina sequencing on MiSeq version 3 (600-cycle kit) using Nextera XT libraries was performed at the National Microbiology Laboratory at Winnipeg (Winnipeg, Manitoba, Canada) to a target of 60-fold coverage. The quality of the Illumina read sets was examined using FastQC (http://www.bioinformatics.babraham .ac.uk/projects/fastqc/). Illumina read correction was performed using Lighter version 1.1.1 (https://github.com/mourisl/Lighter). Corrected Illumina reads were then mapped to the PacBio assembly using Bowtie2 version 2.1.0 (http://bowtie-bio.sourceforge.net/ bowtie2/index.shtml) using the very-sensitive-local option. The output was sorted and converted into a bam file using SAMtools version 1.3 (http://samtools.sourceforge.net/) and input to Pilon version 1.2.2 (https://github.com/broadinstitute/pilon). The process was performed iteratively on the corrected assemblies until no changes were made to the output. Final assemblies were examined using Gap5 software version 1.2.14 (http:// www.sanger.ac.uk/science/tools/gap5). Completed assemblies were processed through the Salmonella In Silico Typing Resource (SISTR) (3,4) to confirm that the in silico predictions matched the serotype previously performed by our OIE Reference Laboratory for Salmonellosis in Guelph, Ontario, Canada.
Closed reference genomes provide great value to an understanding of the biology of pathogens, and as such, it is important that genome repositories contain as many of them as possible. These would make important contributions as reference sequences for the WGS assembly of isolates of the same or highly similar serotypes, as well as provide more accurate genomes for comparative and epidemiological studies on outbreak detection and surveillance of Salmonella.
Accession number(s). The genome sequences for these 36 Salmonella isolates have been deposited in DDBJ/ENA/NCBI under BioProject no. PRJNA294295. The GenBank accession numbers are listed in Table 1. The raw sequence data are available in the Sequence Read Archive.

ACKNOWLEDGMENTS
We thank Stephanie Brumwell, Madison McGrogan, and Travis Blimkie for technical support and Marisa Rankin for her help with proofreading the assemblies. We also thank the NCBI PGAP team for annotation services; McGill University, Genome Québec Innovation Centre, Montréal, Québec, for PacBio sequencing; and our colleagues Morag Graham and Matthew Walker at the PHAC National Microbiology Laboratory at Winnipeg, Manitoba, Canada, for the Illumina MiSeq sequencing. We sincerely thank the following for providing isolates: Roger Johnson, Gitanjali Arya, Linda Cole, Ketna Mistry, Ann Perets, and Betty Wilkie at OIE Reference Laboratory for Salmonellosis, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada;