Much of evolutionary biology is motivated by the principle that you cannot understand one species without comparing it with another. When nineteenth-century naturalists compared the anatomies of humans and other apes, it became clear that these species shared many features and had evolved from a common ancestor. More recently, developments in DNA sequencing — which enabled assembly of the human genome1 in 2001, followed by lower-quality ‘draft’ genomes for other great apes2–4 — have transformed our understanding of this evolutionary process. Writing in Science, Kronenberg et al.5 describe new great-ape genome assemblies, generated using a technology that surpasses previous methods. This work marks a new stage in our ability to study and compare these species.
Genome assembly is often likened to piecing together a jigsaw puzzle — a huge jigsaw for which the box has been lost and we have only a vague idea of what the whole should look like. The analogy holds because sequencing technologies cannot sequence an entire chromosome in one go. Instead, they fragment the genome into many separate pieces, called reads, which have to be matched, overlapped and placed together.
Previous generations of sequencing machines produced reads that were only about a hundred base pairs long, or perhaps a thousand base pairs but at exorbitant cost. Current machines such as Pacific BioScience’s single-molecule real-time (PacBio SMRT) sequencer produce reads tens of thousands of base pairs in length. Even with this improvement, hundreds of thousands of reads are needed to span a genome of three billion base pairs such as that of humans, Moreover, in practice, a large excess is used (typically more than 30 genomes’ worth) to mitigate errors and resolve overlap ambiguities. A further complication arises from the fact that genomes are filled with stretches of DNA in which the same pattern is repeated many times, either in series or scattered throughout the genome. In apes, such repetitive DNA comprises a substantial fraction of the genome.
Because of these difficulties, the first great-ape genome projects used the human genome as a scaffold to help assemble genomic regions that are structurally similar to those of humans — that is, in which corresponding stretches of DNA lie in the same order and are present in a similar number of copies. This strategy enabled better assembly in such regions. But in regions where genome structure has evolved very differently in humans and other great apes, the great-ape draft assemblies tended to be more fragmented, and the resulting variation in assembly quality effectively constituted a bias towards the human genome. These assemblies provided many evolutionary insights, but there has nonetheless been a deficit in our understanding of the genomic elements that make humans unique.
One reason why structural variation is important, particularly on the short evolutionary timescale that separates humans and other great apes, is that it provides a way for genomes to evolve rapidly. When a whole chunk of DNA is removed or duplicated, its molecular function can be inhibited or enhanced in one step, rather than through successive mutations at individual bases. Indeed, much of the great-ape genome seems to be modular in nature, and is therefore susceptible to the kind of building-block alteration that structural variation allows. It is also thought that gene loss is a key mechanism for evolutionary change6,7. This might seem counterintuitive, but genes often act to constrain, rather than promote, a particular function. Disabling them by removing, duplicating or relocating a chunk of DNA might be the simplest way to confer beneficial effects.
Kronenberg et al. used PacBio SMRT to assemble high-quality genomes for a chimpanzee and an orangutan, along with two human genomes for comparison (Fig. 1). The long reads enabled them to do away with the human-genome scaffold used previously, and to increase the typical distance between gaps by about 100-fold compared with previous assemblies. The authors found about 600,000 structural differences between these genomes and that of humans, including more than 17,000 differences specific to humans. Of these, many changes disrupt genes in humans that are not disrupted in other apes. Genes whose activity is suppressed specifically in humans are more likely than other genes to be associated with a human-specific structural variant.
Many genes produce multiple versions, called isoforms, of the protein they encode, each of which can have a different role. Kronenberg and colleagues found evidence that one human-specific structural change — a large deletion in the gene FADS2 — might have altered the distribution of isoforms the gene produces. These isoforms are involved in the synthesis of fatty acids needed for brain development and immune response8, and are difficult to obtain from a purely herbivorous diet. Correspondingly, FADS2 has been a target for natural selection associated with dietary changes towards or away from animal fats in recent human evolution8. Chimpanzees eat a small amount of meat, so it is not known what (if any) human-specific traits might have resulted from this deletion, but it does suggest that shifting dietary patterns could have been a feature of human evolution over long timescales.
Structural variation also seems to have had a role in brain evolution. Human brains are much larger than those of other apes, and it is plausible that genes involved in brain growth and development were key to the evolution of this trait. The authors analysed the sequences of genes that are active in radial glial cells, which are progenitors for neurons and other cells in the brain’s cortex, and compared protein production by these genes in humans and chimpanzees using cortical organoids — 3D models of brain tissue grown in vitro. These analyses revealed that 41% of genes whose activity is suppressed in human radial glial cells are associated with a human-specific structural variant. Again, this is consistent with structural genomic changes causing disruption or loss of gene function during great-ape evolution.
Intriguing as Kronenberg and colleagues’ findings are, there is also a broader significance to their work. Several groups and consortia are applying new sequencing technologies to different organisms. Ultimately, researchers want accurate, high-resolution assemblies for all species, and to compare these genomes on an equal footing. This will improve evolutionary analyses and reveal complex mutation processes that have hitherto been obscured. Large genome assembly currently remains hugely expensive, and even state-of-the-art sequencing tools struggle to resolve repetitive sequences on scales above a few hundred thousand base pairs, making assembly of certain genomes challenging. But tools to read whole genomes with negligible errors on inexpensive hardware are not far away, and are almost available for small bacterial genomes9.
It is clear that we are leaving behind the initial period of evolutionary genomics, in which analyses involved comparing a genome of interest to a few ‘gold standard’ genomes, such as human, mouse or zebrafish. Instead, we are moving towards a more complete and equable genomic view of life.
Nature 559, 336-338 (2018)