Minke whale genome and aquatic adaptation in cetaceans

Journal name:
Nature Genetics
Year published:
Published online

The shift from terrestrial to aquatic life by whales was a substantial evolutionary event. Here we report the whole-genome sequencing and de novo assembly of the minke whale genome, as well as the whole-genome sequences of three minke whales, a fin whale, a bottlenose dolphin and a finless porpoise. Our comparative genomic analysis identified an expansion in the whale lineage of gene families associated with stress-responsive proteins and anaerobic metabolism, whereas gene families related to body hair and sensory receptors were contracted. Our analysis also identified whale-specific mutations in genes encoding antioxidants and enzymes controlling blood pressure and salt concentration. Overall the whale-genome sequences exhibited distinct features that are associated with the physiological and morphological changes needed for life in an aquatic environment, marked by resistance to physiological stresses caused by a lack of oxygen, increased amounts of reactive oxygen species and high salt levels.

At a glance


  1. Orthologous gene clusters in the artiodactyl lineage.
    Figure 1: Orthologous gene clusters in the artiodactyl lineage.

    Shown is a Venn diagram of unique and shared gene families in the minke whale, bottlenose dolphin, cow and pig genomes. The total numbers of gene families are given in parentheses.

  2. Relationship of the minke whale to other mammalian species.
    Figure 2: Relationship of the minke whale to other mammalian species.

    (a) Gene family expansion or contraction. The numbers indicate the number of gene families that have expanded (orange) or contracted (blue) since the split from a common ancestor. MYA, million years ago; MRCA, most recent common ancestor. Timelines indicate the divergence times among species. (b) The expanded peroxiredoxin (PRDX) gene family in the whale lineage.

  3. Cetacean-specific amino acid changes in glutathione metabolism-associated genes and haptoglobin.
    Figure 3: Cetacean-specific amino acid changes in glutathione metabolism–associated genes and haptoglobin.

    (a) A positively selected gene (GSR) in the bottlenose dolphin is shown in a red rectangle. Genes with cetacean-specific amino acid changes (GSR, GPX2, GGT6, GGT7, ANPEP, ODC1 and GCLC) are shown in blue rectangles. The seven cetacean-specific genes are involved in glutathione metabolism pathways (KEGG pathway map00480). The solid lines indicate direct relationships between enzymes and metabolites. The dashed lines indicate that more than one step is involved in a process. (b) The positions of unique amino acid changes in the crystal structure of the haptoglobin-hemoglobin complex. The haptoglobin protein is shown in a cartoon form; the CCP domain is green, and the SP domain is yellow. Of the ten amino acid changes, eight positions are represented by violet sticks; the other two positions are not displayed because they were not included in the complex structure. Hemoglobin is shown with green sticks, and the CCP domain of the contacting haptoglobin is represented in an electrostatic potential surface model (blue, positive; red, negative; white, neutral). The black dots indicate the polar interaction between His137 of haptoglobin and the terminal carboxylate of hemoglobin.

  4. Estimated whale population size history.
    Figure 4: Estimated whale population size history.

    Tsurf, atmospheric surface air temperature; RSL, relative sea level; 10 m.s.l.e., 10 m sea level equivalent; MW, minke whale; FW, fin whale; BD, bottlenose dolphin; PP, finless porpoise; g, generation time; μ, mutation rate (per site, per year). Minke whale and fin whale data were generated on the basis of comparisons with minke whale scaffolds (“-B” after the species abbreviation) during SNV calling, whereas the bottlenose dolphin and finless porpoise data were generated on the basis of comparisons with the bottlenose dolphin scaffolds (“-T” after the species abbreviation) during SNV calling.

Accession codes

Primary accessions


NCBI Reference Sequence

Sequence Read Archive


  1. Thewissen, J.G., Cooper, L.N., Clementz, M.T., Bajpai, S. & Tiwari, B.N. Whales originated from aquatic artiodactyls in the Eocene epoch of India. Nature 450, 11901194 (2007).
  2. Dawkins, R. The Ancestor's Tale, A Pilgrimage to the Dawn of Life (Houghton Mifflin Harcourt Press, Boston, 2004).
  3. Wilson, D.E. & Reeder, D.M. Mammal Species of the World (Johns Hopkins University Press, 2005).
  4. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476482 (2011).
  5. McGowen, M.R., Grossman, L.I. & Wildman, D.E. Dolphin genome provides evidence for adaptive evolution of nervous system genes and a molecular rate slowdown. Proc. Biol. Sci. 279, 36433651 (2012).
  6. Sun, Y.B. et al. Genome-wide scans for candidate genes involved to the aquatic adaptation of dolphins. Genome Biol. Evol. 5, 130139 (2013).
  7. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 10611067 (2007).
  8. Arnason, U., Gullberg, A. & Janke, A. Mitogenomic analyses provide new insights into cetacean origin and evolution. Gene 333, 2734 (2004).
  9. O'Brien, S.J., Menninger, J.C. & Nash, W.G. An Atlas of Mammalian Genomes (John Wiley & Sons Publishers, New York, 2006).
  10. Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, el70 (2005).
  11. Qiu, Q. et al. The yak genome and adaptation to life at high altitude. Nat. Genet. 44, 946949 (2012).
  12. Rhee, S.G., Chae, H. & Kim, K. Peroxiredoxins: a historical overview and speculative preview of novel mechanisms and emerging concepts in cell signaling. Free Radic. Biol. Med. 38, 15431552 (2005).
  13. Neumann, C.A., Cao, J. & Manevich, Y. Peroxiredoxin 1 and its role in cell signaling. Cell Cycle 8, 40724078 (2009).
  14. Zachara, N.E. et al. Dynamic O-GlcNAc modification of nucleocytoplasmic proteins in response to stress. A survival response of mammalian cells. J. Biol. Chem. 279, 3013330142 (2004).
  15. Ngoh, G.A., Watson, L.J., Facundo, H.T. & Jones, S.P. Augmented O-GlcNAc signaling attenuates oxidative stress and calcium overload in cardiomyocytes. Amino Acids 40, 895911 (2011).
  16. Jones, S.P. et al. Cardioprotection by N-acetylglucosamine linkage to cellular proteins. Circulation 117, 11721182 (2008).
  17. Gonchar, O. & Mankovska, I. Antioxidant system in adaptation to intermittent hypoxia. J. Biol. Sci. 10, 545554 (2010).
  18. Blokhina, O., Virolainen, E. & Fagerstedt, K.V. Anioxidants, oxidative damage and oxygen deprivation stress: a review. Ann. Bot. (Lond.) 91, 179194 (2003).
  19. Pompella, A., Visvikis, A., Paolicchi, A., De Tata, V. & Casini, A.F. The changing faces of glutathione, a cellular protagonist. Biochem. Pharmacol. 66, 14991503 (2003).
  20. Foyer, C.H. et al. Overexpression of glutathione reductase but not glutathione synthetase leads to increases in antioxidant capacity and resistance to photoinhibition in Poplar trees. Plant Physiol. 109, 10471057 (1995).
  21. Andersen, C.B. et al. Structure of the haptoglobin-haemoglobin complex. Nature 489, 456459 (2012).
  22. Shaffer, S.A., Costa, D.P., Williams, T.M. & Ridgway, S.H. Diving and swimming performance of white whales, Delphinapterus leucas: an assessment of plasma lactate and blood gas levels and respiratory rates. J. Exp. Biol. 200, 30913099 (1997).
  23. Williams, T.M., Haun, J.E. & Friedl, W.A. The diving physiology of bottlenose dolphins (Tursiops truncatus). I. Balancing the demands of exercise for energy conservation at depth. J. Exp. Biol. 202, 27392748 (1999).
  24. Firth, J.D., Ebert, B.L. & Ratcliffe, P.J. Hypoxic regulation of lactate dehydrogenase A. Interaction between hypoxia-inducible factor 1 and cAMP response elements. J. Biol. Chem. 270, 2102121027 (1995).
  25. Ortiz, R.M. Osmoregulation in marine mammals. J. Exp. Biol. 204, 18311844 (2001).
  26. Rajpar, M.H., Harley, K., Laing, C., Davies, R.M. & Dixon, M.J. Mutation of the gene encoding the enamel-specific protein, enamelin, causes autosomal-dominant amelogenesis imperfecta. Hum. Mol. Genet. 10, 16731677 (2001).
  27. Meredith, R.W., Gatesy, J., Cheng, J. & Springer, M.S. Pseudogenization of the tooth gene enamelysin (MMP20) in the common ancestor of extant baleen whales. Proc. Biol. Sci. 278, 9931002 (2011).
  28. Liang, L. et al. Adaptive evolution of the Hox gene family for development in bats and dolphins. PLoS ONE 8, e65944 (2013).
  29. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 6065 (2008).
  30. Arnason, U. & Gullberg, A. Comparison between the complete mtDNA sequences of the blue and fin whale, two species that can hybridize in nature. J. Mol. Evol. 37, 312322 (1993).
  31. Palumbi, S.R. & Cipriano, F. Species identification using genetic tools: the value of nuclear and mitochondrial gene sequences in whale conservation. J. Hered. 89, 459464 (1998).
  32. Reeves, R., Stewart, B., Clapham, P. & Powell, J. National Audubon Society: Guide to Marine Mammals of the World (Alfred A. Knopf, New York, 2002).
  33. Caballero, S. & Baker, C.S. Captive-born intergeneric hybrid of a Guiana and bottlenose dolphin: Sotalia guianensis × Tursiops truncatus. Zoo Biol. 29, 647657 (2010).
  34. Herzing, D., Moewe, K. & Brunnick, B. Interspecies interactions between Atlantic spotted dolphins, Stenella frontalis and bottlenose dolphins, Tursiops truncatus, on Great Bahama Bank, Bahamas. Aquat. Mamm. 29, 335341 (2003).
  35. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493496 (2011).
  36. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311317 (2010).
  37. Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578579 (2011).
  38. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  39. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  40. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644652 (2011).
  41. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  42. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999).
  43. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351i358 (2005).
  44. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955964 (1997).
  45. Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121D124 (2005).
  46. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435W439 (2006).
  47. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 7894 (1997).
  48. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990).
  49. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988995 (2004).
  50. Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
  51. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 11051111 (2009).
  52. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511515 (2010).
  53. UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40, D71D75 (2012).
  54. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 4548 (2000).
  55. Zdobnov, E.M. & Apweiler, R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847848 (2001).
  56. Bru, C. et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33, D212D215 (2005).
  57. Attwood, T.K. & Beck, M.E. PRINTS—a protein motif fingerprint database. Protein Eng. 7, 841848 (1994).
  58. Attwood, T.K., Beck, M.E., Bleasby, A.J. & Parry-Smith, D.J. PRINTS—a database of protein motif fingerprints. Nucleic Acids Res. 22, 35903596 (1994).
  59. Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290D301 (2012).
  60. Ponting, C.P., Schultz, J., Milpetz, F. & Bork, P. SMART: identification and annotation of domains from signaling and extracellular protein sequences. Nucleic Acids Res. 27, 229232 (1999).
  61. Mi, H. et al. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33, D284D288 (2005).
  62. Hulo, N. et al. The PROSITE database. Nucleic Acids Res. 34, D227D230 (2006).
  63. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 2529 (2000).
  64. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 2730 (2000).
  65. Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572D580 (2006).
  66. Hahn, M.W., Demuth, J.P. & Han, S.G. Accelerated rate of gene gain and loss in primates. Genetics 177, 19411949 (2007).
  67. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307321 (2010).
  68. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 15861591 (2007).
  69. Chenna, R. et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 34973500 (2003).
  70. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248249 (2010).
  71. Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. USA 102, 1055710562 (2005).
  72. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 94409445 (2003).
  73. Taylor, B.L., Chivers, S.J., Larese, J. & Perrin, W.F. Generation length and percent mature estimates for IUCN assessments of cetaceans. Administrative report LJ-07-01 (Southwest Fisheries Science Center, National Marine Fisheries Service, 2007).

Download references

Author information

  1. These authors contributed equally to this work.

    • Hyung-Soon Yim,
    • Yun Sung Cho &
    • Xuanmin Guang


  1. Korea Institute of Ocean Science and Technology, Ansan, Republic of Korea.

    • Hyung-Soon Yim,
    • Sung Gyun Kang,
    • Jae-Yeon Jeong,
    • Sun-Shin Cha,
    • Hyun-Myung Oh,
    • Jae-Hak Lee,
    • Eun Chan Yang,
    • Kae Kyoung Kwon,
    • Yun Jae Kim,
    • Tae Wan Kim,
    • Wonduck Kim,
    • Jeong Ho Jeon,
    • Sang-Jin Kim,
    • Dong Han Choi,
    • Hyun Sook Lee &
    • Jung-Hyun Lee
  2. Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea.

    • Yun Sung Cho,
    • Sungwoong Jho,
    • Hak-Min Kim,
    • Young-Ah Shin,
    • Byung Chul Kim &
    • Jong Bhak
  3. Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China.

    • Xuanmin Guang,
    • Yuan Zheng,
    • Zhuo Wang,
    • Yan Chen,
    • Ming Chen,
    • Awei Jiang,
    • Erli Li,
    • Shu Zhang,
    • Lili Yu,
    • Sha Liu &
    • Jun Wang
  4. Department of Marine Biotechnology, University of Science and Technology, Daejeon, Republic of Korea.

    • Sung Gyun Kang,
    • Jae-Yeon Jeong,
    • Sun-Shin Cha,
    • Kae Kyoung Kwon,
    • Sang-Jin Kim,
    • Hyun Sook Lee &
    • Jung-Hyun Lee
  5. Ocean Science and Technology School, Korea Maritime University, Busan, Republic of Korea.

    • Sun-Shin Cha
  6. Theragen BiO Institute, TheragenEtex, Suwon, Republic of Korea.

    • Junsu Ko,
    • Hyunmin Kim,
    • Hyun-Ju Jung,
    • Tae Hyung Kim,
    • Kung Ahn,
    • Jesse Cooper,
    • Sin-Gi Park,
    • Chang Pyo Hong &
    • Jong Bhak
  7. Shaanxi Yulin Energy Group Co. Ltd., Yulin, Shaanxi, China.

    • Haolong Hou
  8. Department of Molecular Medicine, School of Medicine, Gachon University, Incheon, Republic of Korea.

    • Wook Jin
  9. Department of Biological Sciences, College of Natural Sciences, Pusan National University, Busan, Republic of Korea.

    • Heui-Soo Kim
  10. Laboratory of Genome Biology, Department of Animal Biotechnology, Konkuk University, Seoul, Republic of Korea.

    • Chankyu Park &
    • Kyooyeol Lee
  11. Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.

    • Sung Chun
  12. Marine Mammal and Turtle Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, La Jolla, California, USA.

    • Phillip A Morin
  13. Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia.

    • Stephen J O'Brien
  14. College of Veterinary Medicine, Seoul National University, Seoul, Republic of Korea.

    • Hang Lee
  15. Department of Anatomy and Cell Biology, College of Veterinary Medicine, Seoul National University, Seoul, Republic of Korea.

    • Jumpei Kimura
  16. Marine Biodiversity Institute of Korea (MABIK), Ministry of Ocean and Fisheries, Sejong, Republic of Korea.

    • Dae Yeon Moon
  17. Evolutionary Ecology Group, Department of Zoology, University of Cambridge, Cambridge, UK.

    • Andrea Manica
  18. Department of Molecular Genetics and Microbiology, University of New Mexico Health Sciences Center, Albuquerque, New Mexico, USA.

    • Jeremy Edwards
  19. School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea.

    • Sangsoo Kim
  20. Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Jun Wang
  21. King Abdulaziz University, Jeddah, Saudi Arabia.

    • Jun Wang
  22. Program in Nano Science and Technology, Department of Transdisciplinary Studies, Seoul National University, Suwon, Republic of Korea.

    • Jong Bhak
  23. Advanced Institutes of Convergence Technology Nano Science and Technology, Suwon, Republic of Korea.

    • Jong Bhak


The whale genome project was initiated by KIOST. Research collaboration and analysis was carried out by KIOST, the Genome Research Foundation, BGI and the institutes of other participating authors within the whale genome consortium. Jung-Hyun Lee, H.S.L. and J.B. supervised the project. H.-S.Y. coordinated the project. P.A.M., H.L., J. Kimura and D.Y.M. provided samples, advice and associated information. Library construction, sequencing and genome assembly for the draft reference genome were carried out by Y.Z., E.L. and S.Z. Several cetacean genome resequencings were performed by H.-J.J. Experimental validations were performed by S.G.K., W.J. and K.A. Bioinformatics data processing and analyses of genetic variation data were carried out by X.G., Z.W., Y.C., H.H., M.C., A.J., L.Y., S.L., Y.S.C., J.-Y.J., S.-S.C., H.-M.O., Jae-Hak Lee, E.C.Y., K.K.K., Y.J.K., T.W.K., W.K., J.H.J., S.-J.K., D.H.C., S.J., H.-M.K., J. Ko, H.K., T.H.K., J.C., S.-G.P., Y.-A.S., C.P.H., S.C., H.-S.K., K.L. and C.P. H.-S.Y., Y.S.C., X.G., Z.W., S.G.K., J.-Y.J., S.-S.C., H.-M.O., Jae-Hak Lee, E.C.Y., K.K.K., Y.J.K., T.W.K., W.K., J.H.J., S.-J.K., J.W., S.J.O., A.M., J.E., S.K., B.C.K., H.S.L., Jung-Hyun Lee and J.B. wrote, edited and revised the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (5,207 KB)

    Supplementary Figures 1-41, Supplementary Tables 1-57 and Supplementary Note

Additional data