ABSTRACT
Spiroplasma turonicum Tab4cT was isolated from a horse fly (Haematopota sp.; probably Haematopota pluvialis) collected at Champchevrier, Indre-et-Loire, Touraine, France, in 1991. Here, we report the complete genome sequence of this bacterium to facilitate the investigation of its biology and the comparative genomics among Spiroplasma spp.
GENOME ANNOUNCEMENT
Spiroplasma turonicum is a bacterium associated with Haematopota sp. horse flies. The type strain Tab4cT was isolated from a single fly, probably Haematopota pluvialis, collected at Champchevrier (Indre-et-Loire, Touraine, France) in 1991 and was assigned to group XVII within the genus (1). As part of our ongoing effort to investigate Spiroplasma genome evolution (2), we determined the complete genome sequence of S. turonicum Tab4cT.
The DNA sample was prepared from the strain maintained in Gail Gasparich's laboratory at Towson University, which was acquired from the USDA/ARS Spiroplasma Culture Collection of Robert Whitcomb in 1996. This strain had been lyophilized after 17 passes from the original isolation. Prior to our completion of this project, a complete genome sequence of this bacterium was published based on another subculture of the same strain using the Pacific Biosciences platform (3). Because of this, we utilized this published sequence (GenBank accession no. CP012328.1) as the reference for a resequencing analysis. We chose the Illumina MiSeq platform to generate 301-bp reads from one paired-end library (~510-bp insert, 1,206,242 reads, ~288-fold coverage). The raw reads were mapped to the reference genome using BWA version 0.7.12 (4), programmatically checked using SAMTOOLS version 1.2 (5), and visually inspected using IGV version 2.3.57 (6).
The procedures for genome annotation were based on those described in our previous studies on Spiroplasma genomes (7–15). The programs RNAmmer (16), tRNAscan-SE (17), and Prodigal (18) were used for gene prediction. The gene names and product descriptions were first annotated based on the homologous genes in other Spiroplasma genomes, as identified by OrthoMCL (19). Subsequent manual curation was based on BLASTp (20) searches against the NCBI nonredundant database (21) and the KEGG database (22, 23). Putative clustered regularly interspaced short palindromic repeats (CRISPRs) were identified using CRISPRFinder (24).
Our resequencing analysis identified 13 polymorphic sites, including 12 single-nucleotide polymorphisms and one 1-bp indel in a homopolymeric region. It is unclear if these polymorphisms reflect true genetic variations or are artifacts of the sequencing technologies used. After correcting for these polymorphisms, the S. turonicum Tab4cT chromosome described in this work is 1,261,375 bp in size and has a G+C content of 24.2%. The two S. turonicum genomes both have one set of 16S-23S-5S rRNA genes, 29 tRNA genes (covering all 20 amino acids), and one 2,940-bp CRISPR locus (containing 44 spacers). However, the annotation of protein-coding genes differs between the two genomes. In CP012328.1, the annotation includes 1,085 protein-coding genes and no pseudogenes. Several of these predicted protein-coding genes appeared to be fragments of disrupted open reading frames and were merged into pseudogenes in our annotation. In the first version of our annotation, the S. turonicum Tab4cT genome contains 1,066 protein-coding genes and eight pseudogenes. Finally, the annotation of gene name and product description in this newly reported S. turonicum genome is more consistent with the majority of published Spiroplasma genomes (7–15).
Accession number(s).The complete genome sequence of S. turonicum Tab4cT has been deposited at DDBJ/EMBL/GenBank under the accession number CP013860.
ACKNOWLEDGMENTS
The funding for this project was provided by the Institute of Plant and Microbial Biology at Academia Sinica and the Ministry of Science and Technology of Taiwan (NSC 101-2621-B-001-004-MY3 and MOST 104-2311-B-001-019) to C.-H.K. W.-S.L. was supported by the TIGP-MBAS program (Academia Sinica and National Chung Hsing University). The sequencing library preparation service was provided by the DNA Microarray Core Laboratory (Institute of Plant and Microbial Biology, Academia Sinica). The Illumina MiSeq sequencing service was provided by the DNA Sequencing Core Facility (Institute of Molecular Biology, Academia Sinica).
FOOTNOTES
- Received 27 July 2016.
- Accepted 1 August 2016.
- Published 22 September 2016.
- Copyright © 2016 Lo et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.