ABSTRACT
Microbiology Resource Announcements (MRA) provides peer-reviewed announcements of scientific resources for the microbial research community. We describe the best practices for writing an announcement that ensures that these publications are truly useful resources. Adhering to these best practices can lead to successful publication without the need for extensive revisions.
EDITORIAL
Microbiology Resource Announcements (MRA) provides peer-reviewed announcements of scientific resources for the microbial research community. Such resources include genomes, transcriptomes, amplicon sequence data sets, other sequence collections, culture collections, mutant libraries, and software (Table 1). The MRA editors have established minimum requirements needed to ensure that the laboratory and analytical methods can be replicated by any other research group, making these publications truly useful resources for the community. These minimum requirements are described in checklists for each resource type (https://mra.asm.org/sites/default/files/additional-assets/thumbs/MRA_Author_Checklist.pdf). Adhering to the relevant checklist streamlines the review process for both authors and reviewers. In this guide, we take the most common resource, the genome announcement, and provide detailed recommendations for the three main sections of a genome announcement. Use of these best practices can lead to successful publication without the need for extensive revisions.
Example published resource announcements in MRA
INTRODUCTION AND RATIONALE FOR SEQUENCING
The first section of a genome announcement should provide a brief introduction that focuses on the rationale for or significance of sequencing. This introduction should reference appropriate literature but, due to space constraints, should not be an exhaustive review. A greater emphasis should be put on introducing the characteristics of the isolate(s) and providing a description of the provenance of the organism(s) in a manner that supports using the genome as a resource. The best practice is to comply with the Genomic Standards Consortium Minimum Information about any (x) Sequence (MIxS) checklist (1) (https://gensc.org/mixs), ensuring consistency with the same information available in the biosample accession record or equivalent. Type strains can be noted if the strain is listed as such in a type strain repository, such as the American Type Culture Collection (ATCC), or on a reference website, such as https://bacterio.net. We do not allow claims of priority (e.g., first or novel); all genomes are new in their own unique ways without such claims.
Although an MRA genome announcement requires a taxonomic designation for the organism at the genus level, journal policy does not allow formal descriptions of sequenced organisms as new species, proposals for new taxonomy, or proposals for taxonomic reorganization. Such designations typically require much greater evidentiary support than can be reasonably included in a genome announcement. A formal taxonomic description should be directed to the International Journal of Systematic and Evolutionary Microbiology or Archives of Virology. Therefore, for genome announcements, the taxonomic nomenclature should have been described previously and should be consistent with established nomenclature rules (e.g., listed by https://bacterio.net). If authors would like to submit a resource describing the genome of an organism that does not have a species designation, the authors can designate the organism as a member of an existing genus (e.g., Wolbachia sp. strain wAna [2]) and note the organism’s similarity to, or difference from, its closest relative(s) using appropriate techniques and analyses that are fully described.
The genome announcement should contain a description of how the isolate was acquired. This may include an accession number for a public culture collection, like the ATCC, BEI Resources, the ARS Culture Collection (NRRL), the German Collection of Microorganisms and Cell Cultures (DSMZ), or the Japanese Collection of Microorganisms. Adding information about how the strain was acquired and maintained can be helpful to others trying to interpret the sequencing results. It can be important to understand whether the specimen was acquired directly from a culture collection or from another scientist, along with a relative time frame and method for storing and/or passaging. If a new isolate is described, the genome announcement should include a description of when, where, and how the organism was isolated. A brief description can be followed by a citation to a peer-reviewed manuscript for the full isolation procedures. For new environmental isolates, the latitude and longitude or GPS coordinates of the sample site should be included. For clinical isolates, the best practice is to adhere to the metadata standards for human pathogen/vector genomic sequences (3). Authors of manuscripts describing research involving human or animal subjects must include a statement documenting the approval number and name of their institutional review board (IRB) or institutional animal care and use committee (IACUC). In all cases, the authors should make the sequenced isolate available to the community upon request; the best practice is to deposit the isolate in a culture repository (4).
While MRA does not allow for the inclusion of extensive experimental results, the rationale for sequencing may include a figure or table illustrating a specific trait, such as a strain’s ability to produce fungicides or an antibiotic resistance profile. Those methods must be fully described in the main text or the figure legend with sufficient detail and/or references to be reproduced. If new antibiotic resistance profiles are included, the Clinical and Laboratory Standards Institute (CLSI) standards and methods used to determine antibiotic resistance should be fully described. Elaborate phenotypic results should not be included, as these often require extensive experimental validation. MRA does not allow references to unpublished results or personal communications.
SEQUENCING, ASSEMBLY, AND OTHER BIOINFORMATICS METHODS
Typically, the second section of a genome announcement includes a description of the methods and related outcomes. The paragraph must describe the methods for organism cultivation/acquisition, any taxonomic identification, DNA/RNA isolation, sequencing library preparation, and sequence generation, including platform specifications. The goal of this paragraph is to ensure that the sequencing procedure can be fully replicated and to enable full data reuse; therefore, details such as manufacturer, kit identifiers, and/or modifications to published protocols are essential. For Illumina sequencing, the platform should be described, along with read pairing status, the length of the reads, the number of raw reads in total and/or the sequencing depth, and the methods for quality control and trimming, if applicable. For Pacific Biosciences (PacBio) sequencing, the platform should be described, preferably with the chemistry, along with the library construction method, whether and how DNA was sheared, whether and how DNA was size selected, the read N50, the number of raw reads, and, if applicable, a description of read quality control, error correction, and adapter trimming. For Oxford Nanopore Technologies sequencing, the device and flow cell should be described, along with the library construction methods, the read N50 and number of raw reads, the base caller, read quality control, error correction, and, if applicable, adapter trimming. For libraries constructed with the ligation method (i.e., not RAD/RAPID libraries), whether and how DNA was sheared and size selected should be described. For capillary sequencing, which is still frequently used to sequence some viral genomes, the primers used to amplify and sequence the genome should be provided, which is often best accomplished in a table. The sequencing instrument should be specified, along with the length distribution of the reads, how the reads overlapped, and the Phred quality score threshold used for read trimming.
After describing the sequencing and preassembly quality control methods, the genome assembly and annotation methods should be fully described. Annotation methods vary widely between taxa, such that it is important to follow the best practices in the field, but a few principles are common across all taxa. All software should be cited and a version number included, even for common software like PGAP (5). Settings or options used to run the software should be provided, and we encourage including a statement such as “default parameters were used, except where otherwise noted,” if appropriate. Custom scripts must be made publicly available, and a link with a permanent DOI should be provided for the scripts (e.g., a GitHub repository with an assigned DOI using the data-archiving tool Zenodo). The annotation described in the genome announcement should be consistent with the data that are publicly available with the genome accession number(s) listed in the data availability section.
DESCRIBING THE RESULTS OF THE GENOME SEQUENCING
The final section describes the results, including the complete size, GC content, and final sequencing coverage of the genome. For genome announcements that include the genomes of more than one strain or organism, it is helpful to include a table with this information. That table should also include hyperlinked accession numbers for the genome and the raw data. For draft genomes, the announcement should include the relevant statistics for the assembly, including the number of contigs and the contig N50 value, as well as any method for ordering and orienting contigs. It should be clear what criteria, if any, were used for removal of contigs due to size or contamination screening. For complete linear genomes, this section describes how the ends of the chromosomes/genome were determined to be complete. For circular genomes, the method for identifying the overlap on the contig ends, trimming, and rotating, if applicable, should be specified.
Genome quality assessment using a conserved set of markers can be a useful and important metric for evaluating whether the genome is complete, particularly for large genomes. Assessment with tools like BUSCO (6), CEGMA (7), or CheckM (8) and/or whole-genome reference alignments with tools like MUMmer (9) or PacBio Quiver can be helpful for reporting genome completeness and duplication metrics but may not be universally appropriate. Visualization of assembly graphs can be helpful for assessing completeness using tools like Bandage (10).
Any remaining space is typically dedicated to describing the results in the context of the rationale for sequencing described in the introduction. Authors should avoid claims that imply that a particular gene or operon is functional in a sequenced organism. Functions of genes should be presumed and noted as “putative” or “potential,” unless they have been functionally characterized previously and a reference is provided.
A main figure can include a phylogeny to present the relationship of the new genome resource(s) to other isolates or species. When a phylogeny is included, the methods should be rigorous and fully described in the main text or the figure legend, including the procedures used to select the included sequences, accession numbers for the included sequences, multiple sequence alignment, alignment curation, model selection, tree building with statistical support, and tree visualization.
DATA AVAILABILITY STATEMENT
All American Society for Microbiology (ASM) journals require the inclusion of a sufficient amount of publicly available data so that others can reproduce analyses and results from the published manuscripts. A list of acceptable databases is available (https://journals.asm.org/content/list-data-repositories). For MRA, the underlying data largely consist of sequencing reads and genome assemblies. Accession numbers hyperlinked directly to raw sequencing reads (e.g., SRX, SRR, or SRP accession numbers), from one of the acceptable repositories (International Nucleotide Sequence Database Collaboration [INSDC] [http://www.insdc.org], e.g., SRA or ENA), should be listed in the data availability section. Although it is acceptable to deposit reads that have been modified (for instance, through quality trimming), we ask that authors consider depositing the least modified reads and specify any types of modifications in the data availability statement. Data sets that include human reads should be placed in an appropriate repository, like dbGaP. For large collections of genomes, the data availability section can refer to a table in which the read accession numbers are provided for each isolate.
CONCLUSIONS
If these elements are all included and correct and no problems are identified with the rationale or conclusions, a manuscript could be designated by the editors a unicorn—a paper that is accepted on first submission without revisions. Interested in making your submission the next unicorn? Use the checklist and the examples (Table 1) to guide you in constructing a solid genome announcement. And remember, you can have one figure and one table to support your genome announcement.
ACKNOWLEDGMENTS
J.C.D.H., D.A.R., and V.M.B. were supported by federal funds from the National Institutes of Health, NIAID, under grant U19AI110820. J.C.D.H. was also supported by a National Institutes of Health Director’s Transformative Research Award (grant 1-R01-CA206188). J.J.D. was supported by funding from the National Institutes of Health, NIGMS (grant R01-GM124446-01). I.L.G.N. was supported by funding from the National Institutes of Health, NIAID (grant R01-AI144430-02). K.M.S. was supported by funding from NASA (grant 80NSSC17K0301) and the National Science Foundation (grant MCB1929273). The work conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231 (S.R.). C.P. was supported by funding from the National Science Foundation (grant 1661357). S.R.G. was supported by federal funds from the Department of Defense (grant PR181406P1) and the NIAID (grant U19-AI117673). J.E.S. is CIFAR Fellow in the program Fungal Kingdom: Threats and Opportunities and was supported by NSF grants DEB 1441715 and 1557110.
- Copyright © 2020 Dunning Hotopp et al.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.