To the Editor:
Few branches of the biological sciences have remained untouched by the rapid adoption of next-generation sequencing (NGS). Sequencing can be used to study single cells, identify archeological remains, or monitor tumor heterogeneity in the clinic. Almost anything seems possible, if only we can find the right method.
The number of NGS methods has grown to around 400 over the past decade1; they all share a common purpose, to produce a library of sequenceable DNA. Modifications of library preparation adapt NGS sequencing to address an expanding range of genomic applications, and acronyms are often used to make these memorable, as with ChIP-seq. However, even the name of the most commonly used method, RNA-seq, can be used to refer to very different methodological approaches or biological applications. The naming of NGS methods has not been controlled, and in the absence of basic rules, the names in current use are frequently creative, sometimes entertaining, but not always accurate or informative; too often they can be confusing. Acronyms should be used cautiously, and researchers who modify a published method should not usually create a new acronym2.
Organizing methods and structuring naming to allow new users to navigate the NGS publication landscape is overdue. There are four main cases in which NGS method naming leads to confusion for users.
1. Methods that are essentially the same are published with different names, as with MNase-Seq, MAINE-Seq and Nuc-Seq; we would suggest using MNase-seq for all.
2. Methods with similar names are very distinct—for example, Nuc-Seq is a nucleosome positioning method, whereas Nuc-seq and sNuc-Seq are RNA and DNA nucleus sequencing methods; we would suggest using MNase-seq, snRNA-seq and snS1Nuc-seq, respectively. Similarly, scM&T-seq and scMT-Seq are related but different methods: the first uses whole-genome bisulfite sequencing for methylation analysis, whereas the second uses reduced-representation bisulfite sequencing, a critical distinction.
3. Methods can have homophonic names that sound like they refer to the same method, as with CAP-seq, CapSeq, CAPP-seq or CaptureSeq; we would suggest renaming these to better convey the methods they refer to: for example, rename CAP-seq to CXXCap-seq for the CXXC affinity purification method.
4. Methods are minor modifications of others, so that a new name is unnecessary. We would suggest appending an additional element to the acronym in these cases: for example, CHIPmentation (a ChIP-Seq method that substitutes a transposase library preparation for the standard method) could become tnChIP-seq. The use of “sc” and “sn” by the single-cell sequencing community to denote application of earlier methods to cells or nuclei, for instance with scRNA-Seq, is a good example of this.
Confusion can also arise when searching the literature for specific methods, in cases in which different results are returned depending on the search term used. A search in PubMed for each of the terms RNA-Seq, RNAseq, or RNA Seq returns 11,540, 1,322, and 14,329 results, respectively; but searching for all three together returns 15,552 publications. Yet most RNA-seq publications report differential gene expression analysis of messenger RNAs, which we would suggest be termed mRNA-seq. However, a search for this term returns just 313 publications. Similarly, “scRNA-seq” and “single cell RNASeq” return 91 and 324 publications, respectively. The discrepancy may be due to scRNA-seq being a distinct method, usually attributed to Tang et al.3, although they used the term mRNA-seq throughout their paper. Search engines are also case insensitive, so that searching for “Nuc-seq” returns results for both micrococcal nuclease mapping of nucleosomes (Nuc-Seq) and single-nucleus sequencing of cells in G2/M phase (nuc-seq). (For consistency we use lower-case “seq” for proposed names throughout this text.) Some methods cannot be found unambiguously in a literature search: for example, a search for 3C (chromatin conformation capture) reveals mainly that a large number of papers contain a Figure 3c. Adding the “-seq” suffix would remove this ambiguity. Lastly, the use of nonstandard characters, as in Ψ-seq (pseudouridine site identification sequencing), is guaranteed to raise the ire of informaticians.
We propose naming conventions that aim to reduce this confusion. First, previously published names, such as ChIP-Seq, should be used unless a modification is significant: e.g., HiChIP-Seq.
Second, all sequencing methods should be suffixed with “-seq”. Non-sequencing methods should follow a similar pattern: e.g., ChIP-MS for ChIP followed by mass spectrometry.
Third, case is important: most NGS methods are acronyms formed by capitalizing the phrase that describes the method: e.g., RRBS (reduced-representation bisulfite sequencing). The use of lower-case characters can also help clarify pronunciation, as for ChIP-seq.
Fourth, hyphens should separate methods from technologies: e.g., ChIP-seq, ChIP-MS. They may also be used to concatenate significant protocol modifications, as with PAR-CLIP-Seq, where these do not fundamentally change the aim of the method.
Finally, prefixes and suffixes should be used to add information where a common technique is applied in an uncommon, or novel, setting: e.g., “sc” for single-cell or “Bis” for bisulfite conversion. Care should be taken where a lower-case prefix may confuse the user: for example, scmRNA-Seq should probably be written as sc-mRNA-seq.
To date there has been no curated list of NGS methods excepting Illumina's “For all you seq” posters1. We have created an open-access Wiki (http://www.enseqlopedia.com/enseqlopedia) that describes over 300 methods and lists relevant publications, including the primary reference. We hope this will be useful to, and developed by, the NGS community. The ability to add hints and tips for the wet lab, recommendations for sequencing depth and read length, and/or discussion about analytical tools and experimental design will keep methods updated, unlike for a static publication. All methods have been submitted to the European Bioinformatics Institute's Experimental Factor Ontologies.
Any human naming convention will be limited. While naming conventions are useful, we do not want to lose the quirkiness and joy of method names such as BLESS4 and Rapture5! Developers of new or modified methods could include a statement such as “This method is similar to method X,” making the new method easier to find and group with related methods. Ultimately the community is responsible for the systematic organization of NGS methods to ensure the continued health and growth of genomics. We hope that our proposals here and the suggested naming conventions help in these efforts.
J.H. and J.R. contributed equally to this work. J.H. wrote the manuscript.
Retief, J. D. and Maxkwee K. For all you seq, https://www.illumina.com/content/dam/illumina-marketing/documents/applications/ngs-library-prep/ForAllYouSeqMethods.pdf (Illumina, 2014).
Anonymous. Nat. Methods 8, 521 (2011).
Tang, F. et al. Nat. Methods 6, 377–382 (2009)
Crosetto, N. et al. Nat. Methods 10, 361–365 (2013).
Ali, O.A. et al. Genetics 202, 389–400 (2016).
We acknowledge the contribution of the many authors whose methods are referenced in this manuscript and apologize that restrictions on the number of references meant we did not list their work.
James Hadfield is the owner of the Enseqlopedia.com domain. Jacques Retief is currently employed by and owns shares in Illumina Inc., San Diego, California, USA.
About this article
Cite this article
Hadfield, J., Retief, J. A profusion of confusion in NGS methods naming. Nat Methods 15, 7–8 (2018). https://doi.org/10.1038/nmeth.4558
Principles and Practices of Hybridization Capture Experiments to Study Long Noncoding RNAs That Act on Chromatin
Cold Spring Harbor Perspectives in Biology (2019)
Journal of Cellular and Molecular Medicine (2018)
Methods in Ecology and Evolution (2018)