Reference genomes are only as valuable as the scientific questions they can address. The ruff genome sequence papers exemplify three of the most important aspects of a useful genome: new biological insights, a high-quality resource and population variation data.
Biologists, at their core, are fascinated by the diversity of life, which they seek to document, dissect and understand at the most fundamental level. And what could be more fundamental than the genetic code? From the humble bacterium Haemophilus influenzae, sequenced in 1995, to the Human Genome Project and beyond, the drive to expose each base pair of an organism's genome has reached every branch of the tree of life. To date, hundreds of eukaryotes and many thousands of prokaryotes have had their genomes sequenced. But does every sequenced genome have scientific value? And how do we measure that value?
The genome sequence of the ruff, reported in this issue on page 84 by Lamichhaney et al. and on page 79 by Küpper et al., is a prime example of a genome sequenced for the purpose of answering a specific biological question—one measure of the value of a genome. The three male morphs of the ruff—independent, satellite and faeder—differ from each other in appearance, physiology and behavior, but these complex differences segregate as a simple Mendelian trait. This intricate mating system is certainly unusual although not unique in the animal kingdom; there are many parallels with the marine isopod Paracerceis sculpta, which also has three distinct male morphs controlled by a single genetic locus (Nature 350, 608–610, 1991). The identification from whole-genome sequence data of a 4.5-Mb inversion (or 'supergene') on chromosome 11 associated with the ruff male morphs has solved this apparent conundrum. Similar evolutionary stories have been illuminated through whole-genome sequencing, such as the Batesian mimicry supergene in Papilio butterflies (Nat. Genet. 47, 405–409, 2015). The studies go even further to provide insight into the evolutionary history of the inversion, identify candidate genes involved in the many differences between the morphs and contribute to the scientific discourse around evolutionary mechanisms of maintenance of variation within populations.
Genomes are tools for scientific discovery, which is why Nature Genetics publishes all first reference genome papers under an open access license. To be useful as tools, they must be high quality. In this regard, the ruff genomes are anything but rough. The genome presented by Küpper et al. was assembled from paired-end and mate-pair next-generation sequence data with 137× coverage, plus additional coverage provided by PacBio long reads. The high-quality sequence reads in Lamichhaney et al. are assembled into a genome with N50 scaffold sizes reaching up to 10 Mb. In both studies, resequencing data from satellite and faeder males to complement the independent reference male genome, plus additional SNP genotyping of other individuals, are not only necessary to answer the scientific question at hand but also increase the value of these genomes as resources for future research.
To be sure, the pursuit of knowledge is never ultimately wasted, and there is inherent value to each new genome that is sequenced. But when resources are limited, as is always the case in scientific research, a genome should not be sequenced simply for the sake of doing so—a viewpoint also succinctly stated in the News and Views by Chris Jiggins on page 7. Rather, a genome should be sequenced when it is the right tool for the task at hand.