ABSTRACT
Synechococcus spp. are unicellular cyanobacteria that are globally distributed and are important primary producers in marine coastal environments. Here, we report the complete genome sequence of Synechococcus sp. strain WH 8101 and identify genomic islands that may play a role in virus-host interactions.
ANNOUNCEMENT
Synechococcus spp. are responsible for up to 16% of net primary production in the oceans (1). Significant proportions of marine Synechococcus communities can be lysed daily by viruses (2, 3); nevertheless, studies suggest that Synechococcus strains can rapidly become resistant to co-occurring viruses (4, 5). In an effort to identify the genetic determinants that lead to viral resistance, the complete genome of Synechococcus sp. strain WH 8101 was sequenced.
Synechococcus sp. strain WH 8101 was obtained from F. W. Valois, who isolated it in 1981 from surface seawater collected at Woods Hole, Massachusetts (41°31ʹ34ʺN, 70°40ʹ13ʺW), as described previously (6). The strain has been maintained in SN medium since isolation (6). Based on multiple DNA markers and physiological characteristics, WH 8101 has been assigned to Synechococcus clade VIII (7, 8). Only one other member of this clade (Synechococcus sp. strain RS9917) has been sequenced.
A single colony of WH 8101 was isolated on an SN soft-agar plate and then regrown in SN medium prior to DNA isolation (6). Genomic DNA was sequenced using both Illumina MiSeq and PacBio RS II platforms. For Illumina sequencing, DNA was isolated using the PowerWater DNA isolation kit (MoBio Laboratories), and a DNA library was prepared using the WaferGen Apolla 324 next-generation sequencing library preparation system with an IntegenX PrepX DNA library kit. The library was sequenced on the Illumina MiSeq system using the 500-cycle reagent kit v.2. For PacBio sequencing, DNA was isolated using the Genomic-tip 100/G kit (Qiagen), libraries were prepared using the standard PacBio 20-kb protocol, and fragments were size selected (>10 kb) with BluePippin (Sage Science) and sequenced on a PacBio RS II system in one single-molecule real-time (SMRT) cell, using P6-C4 chemistry (6-h movie). Reads (50,981 reads; N50, 20,257 bp) were filtered (>750 bp) and assembled using HGAP.3 (seed cutoff, 6 kb). The consensus sequence was polished by additional rounds of PacBio read mapping and was circularized using information from the bridge mapper tool, all within the SMRT Analysis software (v.2.3.0.140936), using default settings. MiSeq reads were mapped to the initial PacBio assembly using Geneious v.10 with default settings and used for additional quality control and manual correction of indel errors. Coverages were 45× and 175× for the MiSeq and PacBio reads, respectively. A single circular 2,630,292-bp assembly with a G+C content of 63.3% was obtained. The genome was initially annotated using RASTtk (9) and subsequently updated with the NCBI Prokaryotic Genome Annotation Pipeline (NCBI RefSeq database). The genome includes 2,693 protein-coding genes, 41 pseudogenes, 6 rRNAs, and 43 tRNAs.
Genes for viral resistance are often localized to genomic islands (hypervariable regions) in Synechococcus and Prochlorococcus spp. (4, 10). Using previously established criteria (10, 11), 13 genomic islands were identified in WH 8101 (Table 1). These regions were >8 kb and/or contained at least 10 genes that were not in synteny with the genome of the other clade VIII strain, Synechococcus sp. strain RS9917. Genomic islands that were identified in RS9917 (11) and present in WH 8101 were also included. This genomic sequence will be used to identify genetic determinants of cyanophage resistance.
Genomic islands in Synechococcus sp. WH 8101
Data availability.The complete genome sequence of Synechococcus sp. strain WH 8101 has been deposited in GenBank (accession number NZ_CP035914), along with raw sequence and methylation data (accession number PRJNA518918).
ACKNOWLEDGMENTS
This work was supported by the National Science Foundation under grants OCE-1332782 (to M.F.M.) and OIA-1736030 (to M.F.M. and S.W.P.). Illumina sequencing was conducted at a Rhode Island NSF Established Program to Stimulate Competitive Research (EPSCoR) research facility, the Genomics and Sequencing Center, which is supported in part by National Science Foundation EPSCoR cooperative agreement OIA-1655221. PacBio data generation and analysis at the University of Delaware Sequencing and Genotyping Center (Olga Shevchenko and Bruce Kingham) and Bioinformatics Core Facility were enabled by infrastructure supported in part by Delaware INBRE (NIH grant P20 GM103446).
FOOTNOTES
- Received 27 December 2019.
- Accepted 23 January 2020.
- Published 20 February 2020.
- Copyright © 2020 Marston and Polson.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.