Genome Sequencing of 18 Francisella Strains To Aid in Assay Development and Testing

Francisella tularensis is a highly infectious bacterium with the potential to cause high fatality rates if infections are untreated. To aid in the development of rapid and accurate detection assays, we have sequenced and annotated the genomes of 18 F. tularensis and Francisella philomiragia strains.

notic infection that is mainly transmitted to humans through arthropod bites, direct contact with infected animals, or the inhalation/ingestion of contaminated materials (dust, food, or water). These Gram-negative, nonmotile, facultatively intracellular bacteria are highly infectious (1). The low infectious dose (10 to 50 organisms) and previous weaponization attempts have led to the placement of F. tularensis on the U.S. Centers for Disease Control and Prevention category A biothreat agent list (2)(3)(4). In 1970, a World Health Organization (WHO) expert committee reported that if sufficient quantities of F. tularensis were dispersed over a metropolitan area with 5 million people, it could result in approximately 250,000 acute febrile nonspecific illnesses 3 to 5 days postrelease and 19,000 deaths (5).
Francisella is endemic to the United States and resides within small mammals (mice, voles, rats, squirrels, rabbits, and hares) that serve as natural reservoirs. There are three subspecies of F. tularensis: F. tularensis subsp. tularensis, which can be highly virulent, with a mortality rate of 30 to 60% (without treatment), and the occasional human pathogens F. tularensis subsp. holartica and F. tularensis subsp. mediasiatica. The close phylogenetic relationship of highly virulent strains to opportunistic pathogens and near neighbors has caused taxonomic and strain differentiation difficulties.
Here, we present the annotated genome assemblies for all but one strain listed in the Standard Method Performance Requirements (SMPR) as derived by the Stakeholder Panel on Agent Detection Assays (SPADA) (FTTN10 is not currently maintained in a publicly available culture collection). The 18 genome sequences presented here were assembled to finished or improved highquality draft (IHQD) status (6).
The draft genome assemblies included two or more data sets (data types and coverages are listed in the NCBI records): Illumina (short-and/or long-insert paired data), Roche 454 (long-insert paired data), and PacBio long reads. The short-and long-insert paired data were assembled in both Newbler and Velvet (7) and computationally shredded into 1.5-kbp overlapping shreds. If the PacBio coverage was Ն100ϫ, the data were assembled using HGAP (8). All data were additionally assembled in AllPaths (9). The consensus sequences from both HGAP and AllPaths were computationally shredded into 10-kbp overlapping pieces. All shreds were integrated using Phrap. Possible misassemblies were corrected and repeat regions verified using in-house scripts and manual editing in Consed (10)(11)(12). When combined with the long-insert (~8 to 10-kb) Illumina data, the HGAP assemblies of recent PacBio data are generally capable of reconstructing thẽ 30-kb pathogenicity island repeat found in Francisella genomes. These methods resulted in a finished quality genome for 17 of the 18 isolates (Table 1) (6). Each genome assembly was annotated using an Ergatis-based (13) workflow with minor manual curation.
The genome assemblies range from 1.86 to 2.15 Mb (Table 1), with one chromosome and up to two plasmids. As expected for the genus, the GϩC% was low, ranging from 32 to 33%, and genomes have between 1,769 and 2,107 coding sequences.
Nucleotide sequence accession numbers. The accession numbers for all 18 genome sequences are listed in Table 1.   (6). The number of contigs for the IHQD assembly is listed in parentheses.