Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing

Journal name:
Nature Genetics
Year published:
Published online

Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3× coverage) and two Asian genomes (AK1, with 27.8× coverage and AK2, with 32.0× coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.

At a glance


  1. Overview of the CNV discovery project for Asian populations.
    Figure 1: Overview of the CNV discovery project for Asian populations.

    The genomic DNA from ten Altaic Korean individuals, ten CHB HapMap individuals of Chinese ancestry, ten JPT HapMap individuals with Japanese ancestry and three platform-control comparison resource individuals (AK1, NA12878 and NA19240) were used for aCGH experiments. Genome sequence data from three subjects (AK1, AK2 and NA10851) were used to filter out false positive CNV calls and to obtain absolute CNV calls.

  2. Original approach for calling absolute copy number status.
    Figure 2: Original approach for calling absolute copy number status.

    (a) Right: The top, panel shows aCGH data for a genomic region on chromosome 3 in AK1 as compared to the reference sample, NA10851. The second panel down shows read-depth information for the same genomic region, derived from whole-genome sequencing data of NA10851. The third panel down is a 'corrected' absolute copy number result for this genomic region in AK1 using the absolute copy number algorithm analysis method described in this study. The bottom panel displays the same genomic region for AK1 using read-depth information derived from whole-genome sequencing data. (b) Comparison between relative copy number states and absolute copy number values for CNV segments, before and after corrections for NA10851 copy number states. Out of the total 21,905 CNVs identified in the 30 Asian individuals by aCGH (that is, by relative copy number states), the relative copy number values of 10,925 were not affected by CNVs in the NA10851 reference. Among 10,980 'obscure' CNVs, 4,970 were determined to be non-CNVs by absolute calls and were removed from the final list of CNVs. An additional 3,164 CNVs, which were considered 'covert' CNVs and were initially missed by the aCGH experiments, were also identified by the absolute copy number state calling algorithm.

  3. Frequency of copy number gains and losses among 33 individuals.
    Figure 3: Frequency of copy number gains and losses among 33 individuals.

    (a) Distribution of absolute copy number gains (copy number >2) and losses (copy number <2) in 33 individuals. (b) Distribution of relative and absolute copy number gains and losses by CNV size. The x and y axes represent size and number of CNV segments, respectively.

  4. Putative Asian population-specific copy number variants.
    Figure 4: Putative Asian population–specific copy number variants.

    (a) Venn diagram showing validated putative Asian-specific CNVEs. The lower part of the figure (blue) indicates the ethnic distribution of 4,959 CNVEs that were discovered by a 42M NimbleGen aCGH platform and validated with a genotyping microarray in the same study2. The upper part of the figure indicates that 3,547 out of 5,177 CNVEs found among the 30 Asian individuals in this study do not reach a 1-bp overlap with CNVEs recently found by the Genome Structural Variation Consortium2. The Genome Structural Variation Consortium reported that they found 4,978 validated CNVEs, but we show only 4,959 of them in this Venn diagram because 19 were nonpolymorphic. (b) Distribution of gene ontology categories for genes in which coding sequences overlap with common copy number–gain regions (outer circle) and copy number–loss regions (inner circle) identified from 30 Asian subjects. (c) CNVE location and number of Asian individuals involved (bar graph, right). Red, copy number gain; green, copy number loss. Selected genes and miRNAs are also shown on the left.

  5. Effect of aCGH platforms in the CNV discovery.
    Figure 5: Effect of aCGH platforms in the CNV discovery.

    Absolute CNVs found from two HapMap individuals (NA12878 and NA19240) in this study using the Agilent 24M aCGH array were compared with CNVs found by genotyping microarray in the Genomic Structural Variation Consortium data. We also compared CNVs from AK1 obtained by Agilent 24M and Nimblegen 42M microarrays.

Accession codes

Referenced accessions


Gene Expression Omnibus


  1. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949951 (2004).
  2. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature advance online publication, doi:10.1038/nature08516 (7 October 2009).
  3. Perry, G.H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685695 (2008).
  4. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444454 (2006).
  5. Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 10111015 (2009).
  6. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 7581 (2006).
  7. Lindner, I. et al. Putative association between a new polymorphism in exon 3 (Arg109Cys) of the pancreatic colipase gene and type 2 diabetes mellitus in two independent Caucasian study populations. Mol. Nutr. Food Res. 49, 972976 (2005).
  8. Shiffman, D. et al. Analysis of 17,576 potentially functional SNPs in three case-control studies of myocardial infarction. PLoS One 3, e2895 (2008).
  9. Cunninghame Graham, D.S. et al. Association of LY9 in UK and Canadian SLE families. Genes Immun. 9, 93102 (2008).
  10. Larson, M.G. et al. Framingham Heart Study 100K project: genome-wide associations for cardiovascular disease outcomes. BMC Med. Genet. 8 Suppl 1, S5 (2007).
  11. Samuels, Y. et al. High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 (2004).
  12. Lee, J.W. et al. PIK3CA gene is frequently mutated in breast carcinomas and hepatocellular carcinomas. Oncogene 24, 14771480 (2005).
  13. Colin, Y. et al. Genetic basis of the RhD-positive and RhD-negative blood group polymorphism as determined by Southern analysis. Blood 78, 27472752 (1991).
  14. Wang, Y.H. et al. Detection of RhD(el) in RhD-negative persons in clinical laboratory. J. Lab. Clin. Med. 146, 321325 (2005).
  15. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 5664 (2008).
  16. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420426 (2007).
  17. Lohmueller, K.E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994997 (2008).
  18. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727732 (2005).
  19. Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851855 (2006).
  20. Burchard, E.G. et al. The importance of race and ethnic background in biomedical research and clinical practice. N. Engl. J. Med. 348, 11701175 (2003).
  21. Horowitz, R.E. Gastric cancer in Japan. N. Engl. J. Med. 359, 23932394, author reply 2394–2395 (2008).
  22. Hossain, P., Kawar, B. & El Nahas, M. Obesity and diabetes in the developing world—a growing challenge. N. Engl. J. Med. 356, 213215 (2007).
  23. Jee, S.H. et al. Body-mass index and mortality in Korean men and women. N. Engl. J. Med. 355, 779787 (2006).

Download references

Author information

  1. These authors contributed equally to this work.

    • Hansoo Park,
    • Jong-Il Kim &
    • Young Seok Ju
  2. These authors jointly directed the project.

    • Charles Lee &
    • Jeong-Sun Seo


  1. Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Korea.

    • Hansoo Park,
    • Jong-Il Kim,
    • Young Seok Ju,
    • Sheehyun Kim,
    • Seungbok Lee,
    • Dongwhan Suh,
    • Dongwan Hong,
    • Hyunseok Peter Kang,
    • Yun Joo Yoo,
    • Jong-Yeon Shin,
    • Hyun-Jin Kim,
    • Maryam Yavartanoo,
    • Young Wha Chang &
    • Jeong-Sun Seo
  2. Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts, USA.

    • Hansoo Park,
    • Omer Gokcumen,
    • Ryan E Mills,
    • Jung-Sook Ha,
    • Wilson Chong,
    • Ga-Ram Hwang,
    • Katayoon Darvishi &
    • Charles Lee
  3. Harvard Medical School, Boston, Massachusetts, USA.

    • Hansoo Park,
    • Omer Gokcumen,
    • Ryan E Mills,
    • Jung-Sook Ha,
    • Katayoon Darvishi &
    • Charles Lee
  4. Psoma Therapeutics Inc., Seoul, Korea.

    • Hansoo Park,
    • Jong-Il Kim,
    • Jong-Yeon Shin &
    • Jeong-Sun Seo
  5. Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Korea.

    • Jong-Il Kim,
    • Young Seok Ju,
    • Maryam Yavartanoo &
    • Jeong-Sun Seo
  6. Macrogen Inc., Seoul, Korea.

    • Sheehyun Kim,
    • HyeRan Kim,
    • Song Ju Yang,
    • Kap-Seok Yang,
    • Hyungtae Kim &
    • Jeong-Sun Seo
  7. Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea.

    • Seungbok Lee,
    • Dongwhan Suh,
    • Hyun-Jin Kim &
    • Jeong-Sun Seo
  8. Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    • Matthew E Hurles,
    • Nigel P Carter &
    • Chris Tyler-Smith
  9. The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada.

    • Stephen W Scherer
  10. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

    • Stephen W Scherer


J.-S.S. and C.L. planned and managed the project. H.P., J.-I.K., Y.S.J., O.G., R.E.M., Y.J.Y., J.-Y.S., J.-S.H., W.C., G.-R.H. and K.D. executed and analyzed aCGH experiments. J.-I.K., Y.S.J., S.K., D.H., H.-J.K. and D.H. executed sequencing of the genome and analyzed sequence data. D.S., S.L., M.Y., Y.W.C., HyeRan Kim, S.J.Y., K.-S.Y. and Hyungtae Kim performed validation experiments; M.E.H., S.W.S., N.P.C. and C.T.-S. assisted in data analyses; J.-S.S., C.L., H.P., J.-I.K., Y.S.J. and H.P.K. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3M)

    Supplementary Figures 1–11, Supplementary Tables 1–12 and Supplementary Note

  2. Supplementary Figure 3 (23M)

    Read-depth information for 721 validated CNVs in AK1 using data for AK1 and NA10851

Excel files

  1. Supplementary Table 3 (2M)

    Absolute CNVs of 30 Asians

  2. Supplementary Table 5 (40K)

    List of primers for qPCR and breakpoint sequencing experiments

  3. Supplementary Table 7 (1M)

    List of 5,177 CNVE identified in the 30 Asians studied

  4. Supplementary Table 8 (348K)

    List of OMIM genes in identified CNVs

  5. Supplementary Table 9 (40K)

    List of microRNAs overlapping the personal CNVs identified in the study

  6. Supplementary Table 10 (52K)

    List of fusion gene overlapping the personal CNVs identified in this study

  7. Supplementary Table 11 (116K)

    Modified PANTHER ontology analysis

Additional data