Milestones in Genomic Sequencing

Sequencing moves to the twenty-first century

Since its development in the late 1970s, DNA sequencing has become one of the most influential tools in biomedical research, with technologies evolving continuously and new applications emerging over time. Read more.

Audio

The origins and impact of DNA sequencing

To commemorate the 40th anniversary of Sanger sequencing, in 2017 Anand Jagatia spoke with National Human Genome Research Institute Director Eric Green on the impact and potential of DNA sequencing in biomedical research. Listen to the interview.
2000

Drosophila melanogaster genome

Credit: Luciano Richino / Alamy Stock Photo

The year 2000 saw the release of the first full genomic sequence for the fruit fly, Drosophila melanogaster, chosen for sequencing owing to its importance as a model organism and the existence of high-quality sequences with which to compare the shotgun assembly. With whole-genome sequences now available for three major eukaryotic model organisms (Saccharomyces cerevisiae in 1996 and Caenorhabditis elegans in 1998), comparative genomics started to be applied to eukaryotes for the first time.

Related article: The genome sequence of Drosophila melanogaster
2000

Arabidopsis thaliana genome

The genome of Arabidopsis thaliana was the first plant genome — and, after C. elegans and D. melanogaster, the third multicellular organism — to be sequenced, enabling the exploration of the unique physiological and organizational features of flowering plants.

Related article: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

Credit: Pierre BRYE / Alamy Stock Photo
Milestone 1 2001

The Human Genome Project

Credit: Rawpixel Ltd / Alamy Stock Photo

Launched in 1990, the Human Genome Project set out to identify the order, that is, sequence, of all DNA bases to obtain the ‘genetic blueprint’ of humans. In 2001, two pivotal publications reported the first draft of the human genome, obtained by shotgun sequencing, setting the stage for the genomic era. The second phase of the project, which moved from the draft to an essentially finished reference genome, was completed in 2003. Read more.

By Caroline Barranco

Video

How a worm showed us the way to open science

Watch Robert H. Waterston, Professor of Genome Sciences at the University of Washington and one of the pioneers of the Human Genome Project, recall the promise, pitfalls and potential of this landmark international effort.

Please visit YouTube to view this video.
2002

Mus musculus genome

The International Mouse Genome Sequencing Consortium generated the first draft sequence of the mouse genome from female mice of the C57BL/6J strain. The latest version of the mouse reference genome assembly (GRCm39; GCA_000001635.9) was released in July 2020 by the Genome Reference Consortium, which was founded in 2007 to improve the reference genome assemblies of human, mouse and zebrafish.

Related article: Initial sequencing and comparative analysis of the mouse genome

Credit: Vasiliy Vishnevskiy / Alamy Stock Photo
2002

Browsing the genome

Credit: banphote kamolsanei / Alamy Stock Photo

The large amount of genomic data generated by sequencing required new approaches for storing, visualizing and integrating sequence data and its annotations. Genome browsers such as Ensembl and the UCSC Genome Browser emerged to offer a new way of accessing and searching the genomes of multiple species.

Related article: The clickable genome
Milestone 2 2004

Sequencing the unculturable majority

Two key studies unlocked the field of metagenomics — the reconstruction of microbial communities from sequencing data — by providing approaches for unbiased, culture-independent analysis of DNA directly from environmental samples using sequencing technologies. Read more.

By Iain Dickson

Credit: Science Photo Library / Alamy Stock Photo
2005

HapMap

Credit: Mopic / Alamy Stock Photo

The International HapMap Project set out to develop a haplotype map (HapMap) of the human genome that describes the common patterns of human DNA sequence variation. A stepping stone for (and later superseded by) large-scale human genetics projects such as the 1000 Genomes Project, HapMap enabled the discovery of millions of single-nucleotide polymorphisms and served as a reference dataset for many genome-wide association studies in disease research.

Related article: A haplotype map of the human genome
2005

Pan troglodytes genome

The publication of the first draft DNA sequence of a non-human primate, the chimpanzee, provided a treasure trove of information for understanding human biology and evolution, taking us one step closer to answer the lingering question: what makes us human?

Related article: Initial sequence of the chimpanzee genome and comparison with the human genome

Credit: Andrew Mackay / Alamy Stock Photo
2005

Oryza sativa genome

Credit: Aflo Co., Ltd. / Alamy Stock Photo

As one of the most important food crops in the world, rice feeds more than half of the global population. With a genome nearly four times the size of the Arabidopsis genome, the Oryza sativa genome was one of the last genomes to be Sanger-sequenced clone by clone. It was also the first genome sequence to include two completely sequenced, complex centromeres.

Related article: The map-based sequence of the rice genome
Milestone 3 2005

Sequencing — the next generation

Two revolutionary studies introduced high-throughput, massively parallel sequencing technologies able to sequence a bacterial genome at a fraction of the cost and time of traditional Sanger sequencing techniques. Read more.

By Joseph Willson

Credit: Zoran Obradovic / Alamy Stock Photo
Milestone 4 2007

ChIP–seq captures the chromatin landscape

Credit: blickwinkel / Alamy Stock Photo

The development of ChIP–seq, which combined chromatin immunoprecipitation with high-throughput next-generation sequencing, enabled the genome-wide interrogation of chromatin binding patterns of different proteins, lending insight into gene regulation mechanisms, development and epigenetics. Read more.

By Tiago Faial
2008

Chromatin comes into the open with DNase-seq

Genomic regions that are hypersensitive to cleavage by the endonuclease DNase I mark active functional elements such as promoters and enhancers. Following on from initial measurements of DNase hypersensitivity using tiled microarrays and massively parallel signature sequencing in 2006, the first next-generation sequencing-based measurement of DNase hypersensitivity across the genome (DNase-seq) yielded a global view onto the accessible chromatin that established distal enhancers as the predominant regions of open chromatin.

Related article: High-resolution mapping and characterization of open chromatin across the genome

Credit: Nicky Beeson / Alamy Stock Photo
Milestone 5 2008

The dawn of personal genomes

Credit: Redmond Durrell / Alamy Stock Photo

Two studies reported the genomes of an African individual and an Asian individual, respectively, using a new massively parallel sequencing approach based on reversible terminator dyes. Demonstrating the feasibility and resource value of human genome sequences, these studies and the technology they presented paved the way for population-scale genome sequencing. Read more.

By Darren Burgess
Milestone 6 2008

A sequencing revolution in cancer

Ley et al. presented the first whole-genome sequence of a cytogenetically normal acute myeloid leukaemia sample, showing that cancer genome sequencing can identify disease-associated mutations and druggable targets, offering promise for personalized medicine approaches. Read more.

By Safia Danovi

Credit: Sabena Jane Blackbird / Alamy Stock Photo
2008

Genomes Assemble!

Credit: Yuliya Volkovska / Alamy Stock Vector

Genome assembly tools specifically designed for short reads, such as Velvet, ALLPATHS and SOAPdenovo, began to emerge that could reconstruct genomes from sequenced fragments cost-effectively and in a time-efficient manner, leading to a flurry of activity in de novo assembly of large, high-quality genomes.

Related article: Sequence assembly demystified
Milestone 7 2008

Transcriptomes – a new layer of complexity

A series of milestone publications reported the development of high-throughput sequencing of whole transcriptomes, known as RNA sequencing (RNA-seq), across different species. Read more.

By Margot Brandt

Credit: Y H Lim / Alamy Stock Photo
2008

Prenatal genomic medicine

Credit: Ilya Bolotov / Alamy Stock Vector

The discovery of circulating cell-free fetal DNA in maternal plasma in 1997 led to the development of non-invasive prenatal genetic tests for a variety of traits, but detection of fetal chromosomal aneuploidies remained challenging. In 2008, two studies showed success in detecting the most common autosomal aneuploidies by massively parallel sequencing of maternal cell-free plasma DNA, opening up new opportunities for non-invasive prenatal testing.

Related article: Prenatal and pre-implantation genetic diagnosis
2009

An explosion of computational tools

As genome sequencing became more affordable and widespread, its applications rapidly expanded, driving the development of new computational tools to accommodate the requirements of transcriptomics, metagenomics or genetic variant discovery. Read mapping tools such as Bowtie and BWA or the splice-aware aligner TopHat were able to align millions of short reads to the reference genome, and downstream analysis software, such as SAMtools and BreakDancer, facilitated the detection of genetic variants.

Related article: Repetitive DNA and next-generation sequencing: computational challenges and solutions

Credit: michalz86 / Alamy Stock Photo
Milestone 8 2009

Long reads become a reality

Credit: Zoonar GmbH / Alamy Stock Photo

Long-read sequencing technologies began to shed light on hidden parts of the human genome by sealing gaps in existing assemblies, allowing modified bases to be detected on native DNA or RNA, and revealing the complexity of the transcriptome. Read more.

By Ivanka Kamenova
Milestone 9 2009

Exploring whole exomes

Targeted whole-exome sequencing, through the adaptation of microarrays to perform targeted capture of exon sequences from genomic DNA before high-throughput sequencing, allowed researchers to identify disease-causing mutations without prior knowledge of the chromosomal location or biological role of the causal gene. Read more.

By Kyle Vogan

Credit: Panther Media GmbH / Alamy Stock Photo
2009

Mapping the methylome

Credit: wayne besson / Alamy Stock Photo

The discovery that sodium bisulfite converts unmethylated cytosines into uracil drove the development of many new DNA methylation detection and analysis techniques. Combining bisulfite treatment with next-generation sequencing, whole-genome bisulfite sequencing enabled the comprehensive characterization of genome-wide DNA methylation patterns at single-base-pair resolution.

Related article: Principles and challenges of genome-wide DNA methylation analysis
2009

Found in translation

The levels of mRNA and protein in a cell often exhibit only weak correlation. To achieve a more complete picture of cellular gene expression, ribosome profiling was developed as a means to directly measure in vivo protein synthesis. Instead of looking at transcript abundance, ribosome profiling uses deep sequencing of ribosome-protected mRNA fragments to measure the number of ribosomes translating the mRNA genome-wide and in vivo, thus giving insight into post-transcriptional gene expression programmes and their regulation.

Related article: Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling

Credit: Karen Fuller / Alamy Stock Photo
Milestone 10 2009

Probing nuclear architecture with Hi-C

Credit: Marcio Silva / Alamy Stock Photo

Interrogation of chromatin conformation was greatly aided by the development of Hi-C, enabling genome-wide analysis of chromatin contacts and leading to the generation of maps of the genome in 3D space. Read more.

By Andrew Cosgrove
Milestone 11 2009

Sequencing one cell at a time

Moving from genomic analysis of tissues or cells in bulk to performing single-cell sequencing provided a whole new perspective on gene regulation, cell-to-cell heterogeneity and developmental or disease processes. The difficulty of performing analyses at such resolution required many experimental and computational innovations. Read more.

By Aline Lückgen

Credit: Greenshoots Communications / Alamy Stock Photo
2010

From short reads to giant (panda) genome

Credit: Nature Picture Library / Alamy Stock Photo

The genome of the giant panda (Ailuropoda melanoleuca) — specifically of the 2008 Beijing Olympics mascot Jingjing — was assembled from short sequencing reads, a first for such a large, complex genome. This panda genome consisted of ~2.4 billion base pairs, encompassing approximately 21,000 protein-coding genes, and the high genomic diversity reflected in the sequence raised hopes that conservation efforts may preserve the species from extinction.

Related article: The sequence and de novo assembly of the giant panda genome
2010

Expanding sequencing platforms

Following the release of initial commercial sequencing platforms (based on 454 pyrosequencing or Solexa sequencing), a number of alternative sequencing technologies emerged, including the sequencing by oligonucleotide ligation and detection (SOLiD) system, a sequencing by ligation approach using DNA nanoballs for clonal amplification, and an ion semiconductor sequencing technology that enabled non-optical sequencing.

Related article: Coming of age: ten years of next-generation sequencing technologies

Credit: Vladyslav Starozhylov / Alamy Stock Photo
Milestone 12 2010

Waking the dead: sequencing archaic hominin genomes

Credit: The Natural History Museum / Alamy Stock Photo

The publication of the first draft genome of a Neanderthal in 2010 marked a turning point for the palaeogenomics field, making it possible to assemble an ancient genome from next-generation sequencing reads by overcoming previous limitations in ancient DNA research such as limited starting material, contamination and degradation. Read more.

By Rebecca Furlong

Video

How ancient DNA sequencing changed the game

Watch Beth Shapiro, Professor of Ecology and Evolutionary Biology at the University of California, Santa Cruz (UCSC), Associate Director of the UCSC Genomics Institute and Investigator at the Howard Hughes Medical Center, discuss the impact of genomic sequencing on the field of ancient DNA research.

Please visit YouTube to view this video.
Milestone 13 2012

Cataloguing a public genome

The 1000 Genomes Project marked the first of many large-scale sequencing studies cataloguing human genetic variation. As sequencing technologies have improved and reduced in cost, increasingly larger catalogues, such as ExAC and gnomAD, have provided a rich point of reference for the variation found across human populations. Read more.

By Ingrid Knarston

Credit: Cosmo Condina / Alamy Stock Photo
Milestone 14 2012

Our most elemental encyclopaedia

Credit: Oleksandra Korobova / Getty images

The Encyclopedia of DNA Elements (ENCODE) project was the first and largest international effort to characterize the functional side of the human genome. This massive resource has yielded important insights about gene regulation, evolution and disease. Read more.

By Ilse Valtierra

Video

The Story of You: ENCODE and the human genome

Ever since a monk called Mendel started breeding pea plants we have been learning about our genomes. The Human Genome Project identified the entire 3-billion letter code contained in the average human genome. Since 2003, the ENCODE project has been trying to interpret that code, to work out how it is used to make different types of cells and different people — the latest chapter in the story of you.

Please visit YouTube to view this video.
2013

Danio rerio genome

Initiated in 2001, the zebrafish genome-sequencing project released a high-quality sequence assembly generated by shotgun and minimum tiling path sequencing. Containing more protein-coding genes and the highest overall repeat content compared with previously sequenced vertebrates, the genome sequence revealed that more than 70% of human genes have at least one zebrafish orthologue, emphasizing the potential of this model organism for biomedical research.

Related article: The zebrafish reference genome sequence and its relationship to the human genome

Credit: Paulo Oliveira / Alamy Stock Photo
2013

Tackling the epigenome

Credit: Rob Matthews / Alamy Stock Photo

An integrative epigenome profiling method, termed assay for transposase-accessible chromatin using sequencing (ATAC-seq), was shown to inform not only chromatin accessibility but also nucleosome positioning and transcription factor binding as well as their interplay. ATAC-seq employs a highly active Tn5 transposase to insert primers into regions of open chromatin; tagged and amplified fragments are then sequenced and analysed to yield a multidimensional epigenomic profile.

Related article: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position
Milestone 15 2014

Pan-genomes: moving beyond the reference

Pan-genome studies in a variety of species — from microorganisms to plants to humans — have shown that a large amount of genetic variation can be found in the dispensable genome. This observation has called into question our reliance on single reference genomes for assembling and analysing genomes. Read more.

By Dominique Morneau

Credit: Art of Food / Alamy Stock Photo
Video 2015

Epigenome: The symphony in your cells

Why does a heart cell differ from a brain cell when almost every cell in your body has the same genome? The regulation of gene expression establishes a layer of chemical signatures, the epigenome, that ensures cells use their genomes in different ways depending on their roles, similar to how orchestras can perform one piece of music in various ways. In 2015, the NIH Roadmap Epigenomics Consortium reported the integrative analysis of 111 reference human epigenomes, including profiles of histone modification patterns, DNA accessibility, DNA methylation and RNA expression, to systematically characterize epigenomic landscapes in primary human tissues and cells. This work demonstrated how a cell’s epigenome is complex and exquisitely arranged — just like a Beethoven symphony.

Please visit YouTube to view this video.
2016

Short reads go long range

A microfluidics-based sequencing approach, generating data known as linked-reads, substantially increased physical coverage of genomes while reducing the amount of input DNA needed for sequencing. Long fragments of DNA are partitioned into GEMs (gel beads-in-emulsion), with GEM-specific barcodes used to tag amplification products that then undergo short-read sequencing. Reads coming from the same long DNA fragment can be computationally linked through their barcodes to reconstruct haplotype-resolved genome sequences that also provide insight into complex, structural variation.

Related article: Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

Credit: Trigger Image / Alamy Stock Photo
Milestone 16 2017

Genomes go platinum

Credit: Ian M Butterfield (Concepts) / Alamy Stock Photo

De novo genome assemblies were brought into the platinum age in 2017 when Bickhart et al. produced a reference quality domestic goat genome. This new standard was achieved by a synergistic combination of long-read and short-read sequencing technologies with optical and chromatin interaction mapping. Read more.

By Brooke LaFlamme
Milestone 17 2020

Filling in the gaps telomere to telomere

2020 saw the publication of the first gapless, telomere-to-telomere assembly of a human chromosome, the X chromosome. This discovery brought together sequencing technologies and computational tools that had been developed in the preceding decade. Read more.

By Katharine Wrighton

Credit: PORNCHAI SODA / Alamy Stock Photo