Milestones | 10 February 2021

Milestones in Genomic Sequencing

  • Sequencing moves to the twenty-first century

    Since its development in the late 1970s, DNA sequencing has become one of the most influential tools in biomedical research, with technologies evolving continuously and new applications emerging over time. Read more.


    The origins and impact of DNA sequencing

    To commemorate the 40th anniversary of Sanger sequencing, in 2017 Anand Jagatia spoke with National Human Genome Research Institute Director Eric Green on the impact and potential of DNA sequencing in biomedical research. Listen to the interview.

  • 2000

    Drosophila melanogaster genome

    The year 2000 saw the release of the first full genomic sequence for the fruit fly, Drosophila melanogaster, chosen for sequencing owing to its importance as a model organism and the existence of high-quality sequences with which to compare the shotgun assembly. With whole-genome sequences now available for three major eukaryotic model organisms (Saccharomyces cerevisiae in 1996 and Caenorhabditis elegans in 1998), comparative genomics started to be applied to eukaryotes for the first time.

    Related article: The genome sequence of Drosophila melanogaster

  • 2000

    Arabidopsis thaliana genome

    The genome of Arabidopsis thaliana was the first plant genome — and, after C. elegans and D. melanogaster, the third multicellular organism — to be sequenced, enabling the exploration of the unique physiological and organizational features of flowering plants.

    Related article: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

  • Milestone 1 2001

    The Human Genome Project

    Launched in 1990, the Human Genome Project set out to identify the order, that is, sequence, of all DNA bases to obtain the ‘genetic blueprint’ of humans. In 2001, two pivotal publications reported the first draft of the human genome, obtained by shotgun sequencing, setting the stage for the genomic era. The second phase of the project, which moved from the draft to an essentially finished reference genome, was completed in 2003. Read more.

    By Caroline Barranco


    How a worm showed us the way to open science

    Watch Robert H. Waterston, Professor of Genome Sciences at the University of Washington and one of the pioneers of the Human Genome Project, recall the promise, pitfalls and potential of this landmark international effort.

    Please visit YouTube to view this video.

  • 2002

    Mus musculus genome

    The International Mouse Genome Sequencing Consortium generated the first draft sequence of the mouse genome from female mice of the C57BL/6J strain. The latest version of the mouse reference genome assembly (GRCm39; GCA_000001635.9) was released in July 2020 by the Genome Reference Consortium, which was founded in 2007 to improve the reference genome assemblies of human, mouse and zebrafish.

    Related article: Initial sequencing and comparative analysis of the mouse genome

  • 2002

    Browsing the genome

    The large amount of genomic data generated by sequencing required new approaches for storing, visualizing and integrating sequence data and its annotations. Genome browsers such as Ensembl and the UCSC Genome Browser emerged to offer a new way of accessing and searching the genomes of multiple species.

    Related article: The clickable genome

  • Milestone 2 2004

    Sequencing the unculturable majority

    Two key studies unlocked the field of metagenomics — the reconstruction of microbial communities from sequencing data — by providing approaches for unbiased, culture-independent analysis of DNA directly from environmental samples using sequencing technologies. Read more.

    By Iain Dickson

  • 2005


    The International HapMap Project set out to develop a haplotype map (HapMap) of the human genome that describes the common patterns of human DNA sequence variation. A stepping stone for (and later superseded by) large-scale human genetics projects such as the 1000 Genomes Project, HapMap enabled the discovery of millions of single-nucleotide polymorphisms and served as a reference dataset for many genome-wide association studies in disease research.

    Related article: A haplotype map of the human genome

  • 2005

    Pan troglodytes genome

    The publication of the first draft DNA sequence of a non-human primate, the chimpanzee, provided a treasure trove of information for understanding human biology and evolution, taking us one step closer to answer the lingering question: what makes us human?

    Related article: Initial sequence of the chimpanzee genome and comparison with the human genome

  • 2005

    Oryza sativa genome

    As one of the most important food crops in the world, rice feeds more than half of the global population. With a genome nearly four times the size of the Arabidopsis genome, the Oryza sativa genome was one of the last genomes to be Sanger-sequenced clone by clone. It was also the first genome sequence to include two completely sequenced, complex centromeres.

    Related article: The map-based sequence of the rice genome

  • Milestone 3 2005

    Sequencing — the next generation

    Two revolutionary studies introduced high-throughput, massively parallel sequencing technologies able to sequence a bacterial genome at a fraction of the cost and time of traditional Sanger sequencing techniques. Read more.

    By Joseph Willson

  • Milestone 4 2007

    ChIP–seq captures the chromatin landscape

    The development of ChIP–seq, which combined chromatin immunoprecipitation with high-throughput next-generation sequencing, enabled the genome-wide interrogation of chromatin binding patterns of different proteins, lending insight into gene regulation mechanisms, development and epigenetics. Read more.

    By Tiago Faial

  • 2008

    Chromatin comes into the open with DNase-seq

    Genomic regions that are hypersensitive to cleavage by the endonuclease DNase I mark active functional elements such as promoters and enhancers. Following on from initial measurements of DNase hypersensitivity using tiled microarrays and massively parallel signature sequencing in 2006, the first next-generation sequencing-based measurement of DNase hypersensitivity across the genome (DNase-seq) yielded a global view onto the accessible chromatin that established distal enhancers as the predominant regions of open chromatin.

    Related article: High-resolution mapping and characterization of open chromatin across the genome

  • Milestone 5 2008

    The dawn of personal genomes

    Two studies reported the genomes of an African individual and an Asian individual, respectively, using a new massively parallel sequencing approach based on reversible terminator dyes. Demonstrating the feasibility and resource value of human genome sequences, these studies and the technology they presented paved the way for population-scale genome sequencing. Read more.

    By Darren Burgess

  • Milestone 6 2008

    A sequencing revolution in cancer

    Ley et al. presented the first whole-genome sequence of a cytogenetically normal acute myeloid leukaemia sample, showing that cancer genome sequencing can identify disease-associated mutations and druggable targets, offering promise for personalized medicine approaches. Read more.

    By Safia Danovi

  • 2008

    Genomes Assemble!

    Genome assembly tools specifically designed for short reads, such as Velvet, ALLPATHS and SOAPdenovo, began to emerge that could reconstruct genomes from sequenced fragments cost-effectively and in a time-efficient manner, leading to a flurry of activity in de novo assembly of large, high-quality genomes.

    Related article: Sequence assembly demystified

  • Milestone 7 2008

    Transcriptomes – a new layer of complexity

    A series of milestone publications reported the development of high-throughput sequencing of whole transcriptomes, known as RNA sequencing (RNA-seq), across different species. Read more.

    By Margot Brandt

  • 2008

    Prenatal genomic medicine

    The discovery of circulating cell-free fetal DNA in maternal plasma in 1997 led to the development of non-invasive prenatal genetic tests for a variety of traits, but detection of fetal chromosomal aneuploidies remained challenging. In 2008, two studies showed success in detecting the most common autosomal aneuploidies by massively parallel sequencing of maternal cell-free plasma DNA, opening up new opportunities for non-invasive prenatal testing.

    Related article: Prenatal and pre-implantation genetic diagnosis

  • 2009

    An explosion of computational tools

    As genome sequencing became more affordable and widespread, its applications rapidly expanded, driving the development of new computational tools to accommodate the requirements of transcriptomics, metagenomics or genetic variant discovery. Read mapping tools such as Bowtie and BWA or the splice-aware aligner TopHat were able to align millions of short reads to the reference genome, and downstream analysis software, such as SAMtools and BreakDancer, facilitated the detection of genetic variants.

    Related article: Repetitive DNA and next-generation sequencing: computational challenges and solutions

  • Milestone 8 2009

    Long reads become a reality

    Long-read sequencing technologies began to shed light on hidden parts of the human genome by sealing gaps in existing assemblies, allowing modified bases to be detected on native DNA or RNA, and revealing the complexity of the transcriptome. Read more.

    By Ivanka Kamenova

  • Milestone 9 2009

    Exploring whole exomes

    Targeted whole-exome sequencing, through the adaptation of microarrays to perform targeted capture of exon sequences from genomic DNA before high-throughput sequencing, allowed researchers to identify disease-causing mutations without prior knowledge of the chromosomal location or biological role of the causal gene. Read more.

    By Kyle Vogan

  • 2009

    Mapping the methylome

    The discovery that sodium bisulfite converts unmethylated cytosines into uracil drove the development of many new DNA methylation detection and analysis techniques. Combining bisulfite treatment with next-generation sequencing, whole-genome bisulfite sequencing enabled the comprehensive characterization of genome-wide DNA methylation patterns at single-base-pair resolution.

    Related article: Principles and challenges of genome-wide DNA methylation analysis

  • 2009

    Found in translation

    The levels of mRNA and protein in a cell often exhibit only weak correlation. To achieve a more complete picture of cellular gene expression, ribosome profiling was developed as a means to directly measure in vivo protein synthesis. Instead of looking at transcript abundance, ribosome profiling uses deep sequencing of ribosome-protected mRNA fragments to measure the number of ribosomes translating the mRNA genome-wide and in vivo, thus giving insight into post-transcriptional gene expression programmes and their regulation.

    Related article: Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling

  • Milestone 10 2009

    Probing nuclear architecture with Hi-C

    Interrogation of chromatin conformation was greatly aided by the development of Hi-C, enabling genome-wide analysis of chromatin contacts and leading to the generation of maps of the genome in 3D space. Read more.

    By Andrew Cosgrove

  • Milestone 11 2009

    Sequencing one cell at a time

    Moving from genomic analysis of tissues or cells in bulk to performing single-cell sequencing provided a whole new perspective on gene regulation, cell-to-cell heterogeneity and developmental or disease processes. The difficulty of performing analyses at such resolution required many experimental and computational innovations. Read more.

    By Aline Lückgen

  • 2010

    From short reads to giant (panda) genome

    The genome of the giant panda (Ailuropoda melanoleuca) — specifically of the 2008 Beijing Olympics mascot Jingjing — was assembled from short sequencing reads, a first for such a large, complex genome. This panda genome consisted of ~2.4 billion base pairs, encompassing approximately 21,000 protein-coding genes, and the high genomic diversity reflected in the sequence raised hopes that conservation efforts may preserve the species from extinction.

    Related article: The sequence and de novo assembly of the giant panda genome

  • 2010

    Expanding sequencing platforms

    Following the release of initial commercial sequencing platforms (based on 454 pyrosequencing or Solexa sequencing), a number of alternative sequencing technologies emerged, including the sequencing by oligonucleotide ligation and detection (SOLiD) system, a sequencing by ligation approach using DNA nanoballs for clonal amplification, and an ion semiconductor sequencing technology that enabled non-optical sequencing.

    Related article: Coming of age: ten years of next-generation sequencing technologies

  • Milestone 12 2010

    Waking the dead: sequencing archaic hominin genomes

    The publication of the first draft genome of a Neanderthal in 2010 marked a turning point for the palaeogenomics field, making it possible to assemble an ancient genome from next-generation sequencing reads by overcoming previous limitations in ancient DNA research such as limited starting material, contamination and degradation. Read more.

    By Rebecca Furlong


    How ancient DNA sequencing changed the game

    Watch Beth Shapiro, Professor of Ecology and Evolutionary Biology at the University of California, Santa Cruz (UCSC), Associate Director of the UCSC Genomics Institute and Investigator at the Howard Hughes Medical Center, discuss the impact of genomic sequencing on the field of ancient DNA research.

    Please visit YouTube to view this video.

  • Milestone 13 2012

    Cataloguing a public genome

    The 1000 Genomes Project marked the first of many large-scale sequencing studies cataloguing human genetic variation. As sequencing technologies have improved and reduced in cost, increasingly larger catalogues, such as ExAC and gnomAD, have provided a rich point of reference for the variation found across human populations. Read more.

    By Ingrid Knarston

  • Milestone 14 2012

    Our most elemental encyclopaedia

    The Encyclopedia of DNA Elements (ENCODE) project was the first and largest international effort to characterize the functional side of the human genome. This massive resource has yielded important insights about gene regulation, evolution and disease. Read more.

    By Ilse Valtierra


    The Story of You: ENCODE and the human genome

    Ever since a monk called Mendel started breeding pea plants we have been learning about our genomes. The Human Genome Project identified the entire 3-billion letter code contained in the average human genome. Since 2003, the ENCODE project has been trying to interpret that code, to work out how it is used to make different types of cells and different people — the latest chapter in the story of you.

    Please visit YouTube to view this video.

  • 2013

    Danio rerio genome

    Initiated in 2001, the zebrafish genome-sequencing project released a high-quality sequence assembly generated by shotgun and minimum tiling path sequencing. Containing more protein-coding genes and the highest overall repeat content compared with previously sequenced vertebrates, the genome sequence revealed that more than 70% of human genes have at least one zebrafish orthologue, emphasizing the potential of this model organism for biomedical research.

    Related article: The zebrafish reference genome sequence and its relationship to the human genome

  • 2013

    Tackling the epigenome

    An integrative epigenome profiling method, termed assay for transposase-accessible chromatin using sequencing (ATAC-seq), was shown to inform not only chromatin accessibility but also nucleosome positioning and transcription factor binding as well as their interplay. ATAC-seq employs a highly active Tn5 transposase to insert primers into regions of open chromatin; tagged and amplified fragments are then sequenced and analysed to yield a multidimensional epigenomic profile.

    Related article: Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position

  • Milestone 15 2014

    Pan-genomes: moving beyond the reference

    Pan-genome studies in a variety of species — from microorganisms to plants to humans — have shown that a large amount of genetic variation can be found in the dispensable genome. This observation has called into question our reliance on single reference genomes for assembling and analysing genomes. Read more.

    By Dominique Morneau

  • Video 2015

    Epigenome: The symphony in your cells

    Why does a heart cell differ from a brain cell when almost every cell in your body has the same genome? The regulation of gene expression establishes a layer of chemical signatures, the epigenome, that ensures cells use their genomes in different ways depending on their roles, similar to how orchestras can perform one piece of music in various ways. In 2015, the NIH Roadmap Epigenomics Consortium reported the integrative analysis of 111 reference human epigenomes, including profiles of histone modification patterns, DNA accessibility, DNA methylation and RNA expression, to systematically characterize epigenomic landscapes in primary human tissues and cells. This work demonstrated how a cell’s epigenome is complex and exquisitely arranged — just like a Beethoven symphony.

    Please visit YouTube to view this video.

  • 2016

    Short reads go long range

    A microfluidics-based sequencing approach, generating data known as linked-reads, substantially increased physical coverage of genomes while reducing the amount of input DNA needed for sequencing. Long fragments of DNA are partitioned into GEMs (gel beads-in-emulsion), with GEM-specific barcodes used to tag amplification products that then undergo short-read sequencing. Reads coming from the same long DNA fragment can be computationally linked through their barcodes to reconstruct haplotype-resolved genome sequences that also provide insight into complex, structural variation.

    Related article: Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

  • Milestone 16 2017

    Genomes go platinum

    De novo genome assemblies were brought into the platinum age in 2017 when Bickhart et al. produced a reference quality domestic goat genome. This new standard was achieved by a synergistic combination of long-read and short-read sequencing technologies with optical and chromatin interaction mapping. Read more.

    By Brooke LaFlamme

  • Milestone 17 2020

    Filling in the gaps telomere to telomere

    2020 saw the publication of the first gapless, telomere-to-telomere assembly of a human chromosome, the X chromosome. This discovery brought together sequencing technologies and computational tools that had been developed in the preceding decade. Read more.

    By Katharine Wrighton

Milestones in Genomic Sequencing

Download all of the articles in one PDF