Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease1,2. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum3,4,5,6,7,8. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines9,10,11,12,13. Here we present the complete sequences of a primary lung tumour (60× coverage) and adjacent normal tissue (46×). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Primary accessions


Gene Expression Omnibus

Data deposits

Sequence data has been submitted to the NCBI Short Read Archive under accession number SRA012097. Microarray data has been submitted to the NCBI Gene Expression Omnibus under accession number GSE20585.


  1. 1.

    , , & Global cancer statistics, 2002. CA Cancer J. Clin. 55, 74–108 (2005)

  2. 2.

    , & Lung cancer. N. Engl. J. Med. 359, 1367–1380 (2008)

  3. 3.

    et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)

  4. 4.

    et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 7591–7595 (2005)

  5. 5.

    et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008)

  6. 6.

    et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007)

  7. 7.

    , & The cancer genome. Nature 458, 719–724 (2009)

  8. 8.

    et al. Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893–898 (2007)

  9. 9.

    et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361, 1058–1066 (2009)

  10. 10.

    et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008)

  11. 11.

    et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809–813 (2009)

  12. 12.

    et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2009)

  13. 13.

    et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010)

  14. 14.

    Tobacco smoke carcinogens and lung cancer. J. Natl Cancer Inst. 91, 1194–1210 (1999)

  15. 15.

    & Expression of cytokeratin 5/6 in epithelial neoplasms: an immunohistochemical study of 509 cases. Mod. Pathol. 15, 6–10 (2002)

  16. 16.

    et al. Thyroid transcription factor-1 expression prevalence and its clinical implications in non-small cell lung cancer: a high-throughput tissue microarray and immunohistochemistry study. Hum. Pathol. 34, 597–604 (2003)

  17. 17.

    & Lung cancer preneoplasia. Annu. Rev. Pathol. 1, 331–348 (2006)

  18. 18.

    et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010)

  19. 19.

    et al. The catalogue of somatic mutations in cancer (COSMIC). Curr. Protoc. Hum. Genet. 10.1002/0471142905.hg1011s57 (2008)

  20. 20.

    et al. The human gene mutation database: 2008 update. Genome Med. 1, 13 (2009)

  21. 21.

    et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 1465–1479 (2006)

  22. 22.

    et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007)

  23. 23.

    et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007)

  24. 24.

    et al. Identification of the transforming EML4ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007)

  25. 25.

    et al. Exon array profiling detects EML4ALK fusion in breast, colorectal, and non-small cell lung cancers. Mol. Cancer Res. 7, 1466–1476 (2009)

  26. 26.

    A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973)

  27. 27.

    et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005)

  28. 28.

    et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009)

  29. 29.

    , , & MAP kinase signalling pathways in cancer. Oncogene 26, 3279–3290 (2007)

Download references


We thank T. Wu for critical reading of manuscript, C. Santos for sample handling, M. Vasser and the DNA Synthesis Group for oligonucleotide synthesis, J. Turcotte and G. Cavet for coordination, G. Nilsen for data submission, J. Fitzgerald and A. Baucom for data storage, J. Lee for laboratory support, A. Bruce for graphical assistance, and T. Bhangale, S. Jhunhunwala and A. Halpern for discussion.

Author information


  1. Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, California 94080, USA

    • William Lee
    • , Zhaoshi Jiang
    • , Jinfeng Liu
    • , Peter M. Haverty
    • , Peng Yue
    • , Yan Zhang
    • , Colin Watanabe
    • , Robert Gentleman
    •  & Zemin Zhang
  2. Department of Molecular Biology, Genentech Inc., South San Francisco, California 94080, USA

    • Yinghui Guan
    • , Jeremy Stinson
    • , Deepali Bhatt
    • , Connie Ha
    • , Frederic J. de Sauvage
    • , Zora Modrusan
    •  & Somasekar Seshagiri
  3. Complete Genomics Inc., Mountain View, California 94043, USA

    • Krishna P. Pant
    • , Michael I. Kennemer
    • , Igor Nazarenko
    • , Andrew B. Sparks
    • , Dennis G. Ballinger
    •  & Radoje Drmanac
  4. Department of Pathology, Genentech Inc., South San Francisco, California 94080, USA

    • Stephanie Johnson
    •  & Howard Stern
  5. Department of Oncology Diagnostics, Genentech Inc., South San Francisco, California 94080, USA

    • Sankar Mohan
    • , David S. Shames
    •  & Ajay Pandita


  1. Search for William Lee in:

  2. Search for Zhaoshi Jiang in:

  3. Search for Jinfeng Liu in:

  4. Search for Peter M. Haverty in:

  5. Search for Yinghui Guan in:

  6. Search for Jeremy Stinson in:

  7. Search for Peng Yue in:

  8. Search for Yan Zhang in:

  9. Search for Krishna P. Pant in:

  10. Search for Deepali Bhatt in:

  11. Search for Connie Ha in:

  12. Search for Stephanie Johnson in:

  13. Search for Michael I. Kennemer in:

  14. Search for Sankar Mohan in:

  15. Search for Igor Nazarenko in:

  16. Search for Colin Watanabe in:

  17. Search for Andrew B. Sparks in:

  18. Search for David S. Shames in:

  19. Search for Robert Gentleman in:

  20. Search for Frederic J. de Sauvage in:

  21. Search for Howard Stern in:

  22. Search for Ajay Pandita in:

  23. Search for Dennis G. Ballinger in:

  24. Search for Radoje Drmanac in:

  25. Search for Zora Modrusan in:

  26. Search for Somasekar Seshagiri in:

  27. Search for Zemin Zhang in:


W.L., project coordination, SNV and overall data analysis and preparation of manuscript; Z.J., structural variation analysis and preparation of manuscript; J.L., mutation pattern and trend analysis, loss of heterozygosity analysis, expression analysis and preparation of manuscript; P.M.H., copy number/loss of heterozygosity analysis, pathway analysis, expression analysis and preparation of manuscript; P.Y., mutation analysis and preparation of manuscript; Y.G. and Z.M., PCR validation of structural variations; J.S., D.B. and S.S., MassArray mutation validation; Y.Z., bioinformatic prediction of mutations and data processing; K.P.P., M.I.K., I.N. and A.B.S., DNA nanoball preparation and sequencing, base calling, quality control and structural variation mapping; C.H. and Z.M., microarray data production; S.J. and H.S., sample handling and pathology analysis; C.W., structural variation breakpoint mapping; D.S.S., pathway analysis and data interpretation; R.G., manuscript critiques and statistical analysis; F.J.d.S., project coordination and manuscript commenting; A.P. and S.M., FISH analysis; R.D. and D.G.B., project coordination, data interpretation and manuscript commenting; Z.Z., project design, data interpretation and preparation of manuscript.

Competing interests

Authors are employees of either Genentech Inc. or Complete Genomics Inc. Employees of Complete Genomics have stock options in the company.

Corresponding author

Correspondence to Zemin Zhang.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Sections S1-S10, Supplementary References, legends for Supplementary Tables 1-7 and Supplementary Figures 1-17 with legends.

Excel files

  1. 1.

    Supplementary Tables

    This file contains Supplementary Tables 1 – 7, including column descriptions. See Supplementary Information file for legends.

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.