The mutation spectrum revealed by paired genome sequences from a lung cancer patient

Journal name:
Date published:

Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease1, 2. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum3, 4, 5, 6, 7, 8. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines9, 10, 11, 12, 13. Here we present the complete sequences of a primary lung tumour (60× coverage) and adjacent normal tissue (46×). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.

At a glance


  1. The genomic landscape of somatic alterations.
    Figure 1: The genomic landscape of somatic alterations.

    ad, Various types of genomic profiles of the adenocarcinoma sample in this study. a, Experimentally confirmed somatic structural variations. Red lines indicate interchromosomal structural variations whereas blue lines represent intrachromosomal structural variations. b, Regions of loss of heterozygosity and allelic imbalance are in green and were based on the Affymetrix SNP 6.0 array data. c, Copy number profiles were derived from the Agilent array data with red indicating copy number gain and blue representing copy number loss (scale range: 0 to 4 copies). d, Each red dot represents the number of high-confidence somatic SNVs in a 1Mb window. This figure was created using the Circos program28.

  2. Somatic single-nucleotide mutation trends and patterns.
    Figure 2: Somatic single-nucleotide mutation trends and patterns.

    a, Somatic mutations are primarily G·CT·A transversions. Distribution of specific nucleotide changes among germline and somatic variations in the lung genome. G·CT·A transversions account for 46% of high-confidence somatic mutations, whereas most germline variations are A·TG·C or G·CA·T transitions. b, Expressed genes have a lower mutation rate than non-expressed genes. Genes that are expressed in the tumour sample, as determined by microarray data, have a mutation rate of 8.3 per Mb (including introns and 3′ and 5′ UTRs) that is substantially lower than the mutation rate of 17.5 per Mb observed in unexpressed genes. Mutation rates in transcribed strands (pink bars) are lower than those in non-transcribed strands (blue bars). c, Promoters are depleted for somatic mutations. To obtain the mutation rates in the promoter regions, we examined the 50,675 high-confidence somatic variations and the same number of randomly selected germline and simulated mutations, and calculated the number of variations per Mb in the regions immediately upstream of transcription start sites. In regions up to 5kb upstream of transcription start sites, there are significantly fewer somatic mutations than germline variations or random variations. Error bars represent standard deviation of the mutation rates from 1,000 random samplings.

  3. A model for how the multiplicity of mutations within the MAPK cascade may act together to drive constitutive pro-growth signalling.
    Figure 3: A model for how the multiplicity of mutations within the MAPK cascade may act together to drive constitutive pro-growth signalling.

    Red shapes indicate amplification, purple ovals indicate loss of heterozygosity and/or deletion, black stars indicate mutation, grey ovals indicate no detected changes. The tumour harbours an activating point mutation in KRAS as well as copy number gains in KRAS and EGFR. Furthermore, there are high-level copy number gains of SHC1, GRB2, SOS, ARAF, MAP3K3 and ELK1, suggesting that there are at least eight potentially activating genetic lesions within this particular pathway29. MAP3K3 (not displayed in the figure) can act as a branch point between MAPK and SAPK signalling. This tumour exhibits multiple activating signals via high-level copy number gains within the p38 pathway including MKK6 and p38 itself. Also, there is a point mutation in DUSP22, a negative regulator of p38 signalling. ERK and p38 directly affect the transition from G1 to S phase by transcriptionally activating MYC and MAX, which regulate cyclin D2 (CCND2) transcription. The p38 cascade activates NFKB, which also activates CCND2 transcription. This tumour harbours a potentially inactivating mutation in NFKBIA, which normally prevents NF-κB from entering the nucleus. Active p38 signal transduction plus loss of NFKBIA could lead to aberrant activation of CCND2 transcription via MYC/MAX and NF-κB. Furthermore, activated ELK1 (via MAPK) leads to transcription of FOS. MYC/MAX and FOS/JUN (AP1) collaborate to transcribe CDK4. Cyclin D2 binds to and activates CDK4, a complex regulated by the cyclin-dependent kinase inhibitor 4A (p16). This sample showed copy number losses on 9p21, which contains the tumour suppressor gene p16. Thus, this tumour has multiple activating and inactivating hits that may drive oncogenic signal transduction through MAPK and related pathways, thereby overwhelming the G1/S cell cycle checkpoint and leading to unregulated cellular proliferation.

Accession codes

Primary accessions


Gene Expression Omnibus


  1. Parkin, D. M., Bray, F., Ferlay, J. & Pisani, P. Global cancer statistics, 2002. CA Cancer J. Clin. 55, 74108 (2005)
  2. Herbst, R. S., Heymach, J. V. & Lippman, S. M. Lung cancer. N. Engl. J. Med. 359, 13671380 (2008)
  3. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722729 (2008)
  4. Davies, H. et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 75917595 (2005)
  5. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 10691075 (2008)
  6. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153158 (2007)
  7. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719724 (2009)
  8. Weir, B. A. et al. Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893898 (2007)
  9. Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361, 10581066 (2009)
  10. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 6672 (2008)
  11. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809813 (2009)
  12. Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191196 (2009)
  13. Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184190 (2010)
  14. Hecht, S. S. Tobacco smoke carcinogens and lung cancer. J. Natl Cancer Inst. 91, 11941210 (1999)
  15. Chu, P. G. & Weiss, L. M. Expression of cytokeratin 5/6 in epithelial neoplasms: an immunohistochemical study of 509 cases. Mod. Pathol. 15, 610 (2002)
  16. Tan, D. et al. Thyroid transcription factor-1 expression prevalence and its clinical implications in non-small cell lung cancer: a high-throughput tissue microarray and immunohistochemistry study. Hum. Pathol. 34, 597604 (2003)
  17. Wistuba, I. I. & Gazdar, A. F. Lung cancer preneoplasia. Annu. Rev. Pathol. 1, 331348 (2006)
  18. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 7881 (2010)
  19. Forbes, S. A. et al. The catalogue of somatic mutations in cancer (COSMIC). Curr. Protoc. Hum. Genet. doi:10.1002/0471142905.hg1011s57 (2008)
  20. Stenson, P. D. et al. The human gene mutation database: 2008 update. Genome Med. 1, 13 (2009)
  21. Hicks, J. et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 14651479 (2006)
  22. Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 2000720012 (2007)
  23. Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 12961303 (2007)
  24. Soda, M. et al. Identification of the transforming EML4ALK fusion gene in non-small-cell lung cancer. Nature 448, 561566 (2007)
  25. Lin, E. et al. Exon array profiling detects EML4ALK fusion in breast, colorectal, and non-small cell lung cancers. Mol. Cancer Res. 7, 14661476 (2009)
  26. Rowley, J. D. A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290293 (1973)
  27. Tomlins, S. A. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644648 (2005)
  28. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 16391645 (2009)
  29. Dhillon, A. S., Hagan, S., Rath, O. & Kolch, W. MAP kinase signalling pathways in cancer. Oncogene 26, 32793290 (2007)

Download references

Author information


  1. Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, California 94080, USA

    • William Lee,
    • Zhaoshi Jiang,
    • Jinfeng Liu,
    • Peter M. Haverty,
    • Peng Yue,
    • Yan Zhang,
    • Colin Watanabe,
    • Robert Gentleman &
    • Zemin Zhang
  2. Department of Molecular Biology, Genentech Inc., South San Francisco, California 94080, USA

    • Yinghui Guan,
    • Jeremy Stinson,
    • Deepali Bhatt,
    • Connie Ha,
    • Frederic J. de Sauvage,
    • Zora Modrusan &
    • Somasekar Seshagiri
  3. Complete Genomics Inc., Mountain View, California 94043, USA

    • Krishna P. Pant,
    • Michael I. Kennemer,
    • Igor Nazarenko,
    • Andrew B. Sparks,
    • Dennis G. Ballinger &
    • Radoje Drmanac
  4. Department of Pathology, Genentech Inc., South San Francisco, California 94080, USA

    • Stephanie Johnson &
    • Howard Stern
  5. Department of Oncology Diagnostics, Genentech Inc., South San Francisco, California 94080, USA

    • Sankar Mohan,
    • David S. Shames &
    • Ajay Pandita


W.L., project coordination, SNV and overall data analysis and preparation of manuscript; Z.J., structural variation analysis and preparation of manuscript; J.L., mutation pattern and trend analysis, loss of heterozygosity analysis, expression analysis and preparation of manuscript; P.M.H., copy number/loss of heterozygosity analysis, pathway analysis, expression analysis and preparation of manuscript; P.Y., mutation analysis and preparation of manuscript; Y.G. and Z.M., PCR validation of structural variations; J.S., D.B. and S.S., MassArray mutation validation; Y.Z., bioinformatic prediction of mutations and data processing; K.P.P., M.I.K., I.N. and A.B.S., DNA nanoball preparation and sequencing, base calling, quality control and structural variation mapping; C.H. and Z.M., microarray data production; S.J. and H.S., sample handling and pathology analysis; C.W., structural variation breakpoint mapping; D.S.S., pathway analysis and data interpretation; R.G., manuscript critiques and statistical analysis; F.J.d.S., project coordination and manuscript commenting; A.P. and S.M., FISH analysis; R.D. and D.G.B., project coordination, data interpretation and manuscript commenting; Z.Z., project design, data interpretation and preparation of manuscript.

Competing financial interests

Authors are employees of either Genentech Inc. or Complete Genomics Inc. Employees of Complete Genomics have stock options in the company.

Corresponding author

Correspondence to:

Sequence data has been submitted to the NCBI Short Read Archive under accession number SRA012097. Microarray data has been submitted to the NCBI Gene Expression Omnibus under accession number GSE20585.

Author details

Supplementary information

PDF files

  1. Supplementary Information (4.8M)

    This file contains Supplementary Sections S1-S10, Supplementary References, legends for Supplementary Tables 1-7 and Supplementary Figures 1-17 with legends.

Excel files

  1. Supplementary Tables (10.7M)

    This file contains Supplementary Tables 1 – 7, including column descriptions. See Supplementary Information file for legends.

Additional data