Detection of large-scale variation in the human genome

Iafrate, A John; Feuk, Lars; Rivera, Miguel N; Listewnik, Marc L; Donahoe, Patricia K; Qi, Ying; Scherer, Stephen W; Lee, Charles

doi:10.1038/ng1416

Brief Communication
Published: 01 August 2004

Detection of large-scale variation in the human genome

A John Iafrate^1,2,
Lars Feuk^3,4,
Miguel N Rivera^1,2,
Marc L Listewnik¹,
Patricia K Donahoe^2,5,
Ying Qi^3,4,
Stephen W Scherer^3,4^na1 &
…
Charles Lee^1,2^na1

Nature Genetics volume 36, pages 949–951 (2004)Cite this article

23k Accesses
2167 Citations
32 Altmetric
Metrics details

Abstract

We identified 255 loci across the human genome that contain genomic imbalances among unrelated individuals. Twenty-four variants are present in > 10% of the individuals that we examined. Half of these regions overlap with genes, and many coincide with segmental duplications or gaps in the human genome assembly. This previously unappreciated heterogeneity may underlie certain human phenotypic variation and susceptibility to disease and argues for a more dynamic human genome structure.

You have full access to this article via your institution.

Download PDF

The sequences of 150,119 genomes in the UK Biobank

Article Open access 20 July 2022

Pan-genomics in the human genome era

Article 07 February 2020

Structural variation in the sequencing era

Article 15 November 2019

Main

Variation in the human genome is present in many forms, including single-nucleotide polymorphisms, small insertion-deletion polymorphisms, variable numbers of repetitive sequences, and genomic structural alterations¹. Molecular genetic and cytogenetic analyses have catalogued many variations in the human genome, but little is known about large-scale copy-number variations (LCVs) that involve gains or losses of several kilobases to hundreds of kilobases of genomic DNA among phenotypically normal individuals. To investigate these LCVs in the human genome, we applied array-based comparative genomic hybridization (array CGH)² to the genomes of 55 unrelated individuals. The arrays used contain selected large insert DNA fragments, distributed roughly every 1 Mb throughout the human genome. We compared genomic DNA samples isolated from 39 unrelated healthy control individuals with normal karyotypes and from 16 individuals with previously characterized chromosomal imbalances³ with pooled male or female genomic DNAs from karyotypically and phenotypically normal control individuals (Supplementary Methods online). Including samples from individuals with chromosomal imbalances allowed us to monitor the sensitivity and specificity of experiments, and we did detect all expected abnormalities (we excluded the regions of known imbalances from our analysis).

Our experiments identified 255 individual genomic clones that showed comparative gains or losses among the samples that we tested. Most of the clones seemed to be randomly distributed throughout the genome (Fig. 1 and Supplementary Table 1 online). On average, we observed 12.4 LCVs per individual. Most of these variants involve single large-insert genomic clones, suggesting that each of the identified LCVs may result from gains or losses involving as much as 2 Mb of DNA sequence (1 Mb to each flanking clone). We identified 102 LCVs (41%) that occurred in more than one individual and 24 LCVs that were present in > 10% of the individuals studied. The remaining 153 clones may represent LCVs that occur at lower frequencies. The genomic regions that we identified probably do not represent false positives, because control self-versus-self hybridization experiments indicated that there was less than 1 false positive clone for every two experiments (< 1 per 5,264 clones; Supplementary Fig. 1 online). Of the 102 recurring LCVs, 26 (25.5%) mapped to regions overlapping previously recognized segmental duplications^4,5. This proportion is significantly higher than that observed for all clones on the array (7.3%; P < 0.0001). Moreover, 13 (12.7%) of the recurring LCVs reside within 100 kb of gaps in the current presentation of the human genome sequence; this proportion is also significantly higher than the expected 3.6% for all the clones of the array (P < 0.0001). This suggests that the presence of these LCVs may complicate the accurate assembly of sequences at these chromosomal loci^3,5,6.

**Figure 1: Distribution of LCVs in the human genome.**

We found that 142 of 255 (56%) polymorphic clones overlapped with known coding regions and that 67 clones encompassed one or more entire genes. This suggests that LCVs are not limited to intergenic or intronic regions. Fourteen LCVs were located near loci associated with human genetic syndromes or with cancer (Supplementary Table 2 online). Because these variants exist in the genomes of phenotypically normal individuals, they may not be a direct cause of genetic disease, but their presence could lead to chromosomal rearrangements that give rise to disease^7,8,9 or more subtle phenotypic variation by influencing expression of specific genes¹⁰.

The most common LCV (identified in 49.1% of the individuals studied) encompassed the amylase alpha 1a and alpha 2a locus (AMY1A-AMY2A) at chromosome region 1p13.3 (ref. 11). We detected relative gains (in 23.6% of cases) and losses (in 25.5% of cases) at this locus and confirmed the array CGH results using metaphase-interphase fluorescence in situ hybridization (FISH), high-resolution fiber FISH and quantitative PCR (Fig. 2 and Supplementary Methods online). The length of this polymorphic region was estimated to range from 150 kb to 425 kb, and quantitative PCR results indicated that the length varied by a factor of 2.5 among the same individuals. Metaphase and interphase FISH data for this locus and 18 others indicated that each of the LCVs analyzed was confined to localized chromosomal regions (Supplementary Table 1 online). Therefore, the formation of these LCVs probably reflects, as with the amylase locus, tandem copy-number changes rather than duplication events involving other chromosomal loci. Eight identified LCVs map to regions previously described to exhibit variable copy numbers of genes or pseudogenes in the general population (Supplementary Table 2 online).

**Figure 2: The most common polymorphism (BAC RP11-259N12) reflects differing numbers of tandem repeats in amylase genes.**

We described more than 200 LCVs in the human genome. Twenty-four of these variants are present in > 10% of the individuals that we studied, and six of these variants are present at a frequency of > 20%. Because the array platform used for these studies comprises only 12% of the total human genome sequence, denser arrays¹² will probably detect additional new genomic variations. In contrast to most small-scale genetic polymorphisms, the LCVs described here may have an important functional effect on the evolution of the human genome. To catalog this large-scale genomic variation, we collated all the available information into a database (the Genome Variation Database), which will be a crucial resource in correlating these genomic variations with experimental findings and clinical outcomes.

URLs. The Genome Variation Database is available at http://projects.tcag.ca/variation/. The Human Recent Segmental Duplication Browser is available at http://projects.tcag.ca/humandup/.

Note: Supplementary information is available on the Nature Genetics website.

References

Wright, A.F. Nature Encyclopedia of the Human Genome vol. 2, 959–968 (Nature Publishing Group, London, 2003).
Google Scholar
Albertson, D.G. & Pinkel, D. Hum. Mol. Genet. 12, R145–R152 (2003).
Article CAS Google Scholar
Scherer, S.W. et al. Science 300, 767–772 (2003).
Article CAS Google Scholar
Bailey, J.A. et al. Science 297, 1003–1007 (2002).
Article CAS Google Scholar
Cheung, J. et al. Genome Biol. 4, R25 (2003).
Article Google Scholar
Eichler, E.E., Clark, R.A. & She, X. Nat. Rev. Genet. 5, 345–354 (2004).
Article CAS Google Scholar
Shaw, C.J. & Lupski, J.R. Hum. Mol. Genet. 13, R57–R64 (2004).
Article CAS Google Scholar
Giglio, S. et al. Am. J. Hum. Genet. 68, 874–883 (2001).
Article CAS Google Scholar
Osborne, L.R. et al. Nat. Genet. 29, 321–325 (2001).
Article CAS Google Scholar
Hollox, E.J., Armour, J.A. & Barber, J.C. Am. J. Hum. Genet. 73, 591–600 (2003).
Article CAS Google Scholar
Groot, P.C., Mager, W.H. & Frants, R.R. Genomics 10, 779–785 (1991).
Article CAS Google Scholar
Ishkanian, A.S. et al. Nat. Genet. 36, 299–303 (2004).
Article CAS Google Scholar

Download references

Acknowledgements

We thank J. Kim for biostatistical assistance. This work was supported in part by a grant from the Friends of the Dana-Farber Cancer Institute (C.L.), the Brigham and Women's Hospital Pathology Department training grant (A.J.I.), Genome Canada (S.W.S.) and an National Institute of Health program project grant (P.K.D.). L.F. is supported by The Swedish Medical Research Council, and S.W.S. is an Investigator of the Canadian Institutes of Health Research and an International Scholar of the Howard Hughes Medical Institute.

Author information

Stephen W Scherer and Charles Lee: These authors contributed equally to this work.

Authors and Affiliations

Department of Pathology, Brigham and Women's Hospital, 20 Shattuck St., Thorn 6-28, Boston, 02115, Massachusetts, USA
A John Iafrate, Miguel N Rivera, Marc L Listewnik & Charles Lee
Harvard Medical School, Boston, 02115, Massachusetts, USA
A John Iafrate, Miguel N Rivera, Patricia K Donahoe & Charles Lee
Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, M5G 1X8, Ontario, Canada
Lars Feuk, Ying Qi & Stephen W Scherer
Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
Lars Feuk, Ying Qi & Stephen W Scherer
Department of Surgery and Pediatric Surgical Research Laboratories, Massachusetts General Hospital, Boston, 02114, Massachusetts, USA
Patricia K Donahoe

Authors

A John Iafrate
View author publications
You can also search for this author in PubMed Google Scholar
Lars Feuk
View author publications
You can also search for this author in PubMed Google Scholar
Miguel N Rivera
View author publications
You can also search for this author in PubMed Google Scholar
Marc L Listewnik
View author publications
You can also search for this author in PubMed Google Scholar
Patricia K Donahoe
View author publications
You can also search for this author in PubMed Google Scholar
Ying Qi
View author publications
You can also search for this author in PubMed Google Scholar
Stephen W Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Charles Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles Lee.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iafrate, A., Feuk, L., Rivera, M. et al. Detection of large-scale variation in the human genome. Nat Genet 36, 949–951 (2004). https://doi.org/10.1038/ng1416

Download citation

Received: 14 May 2004
Accepted: 21 July 2004
Published: 01 August 2004
Issue Date: 01 September 2004
DOI: https://doi.org/10.1038/ng1416

This article is cited by

Ethnic and functional differentiation of copy number polymorphisms in Tunisian and HapMap population unveils insights on genome organizational plasticity
- Lilia Romdhane
- Sameh Kefi
- Sonia Abdelhak
Scientific Reports (2024)
Nexus between genome-wide copy number variations and autism spectrum disorder in Northeast Han Chinese population
- Shuang Qiu
- Yingjia Qiu
- Yawen Liu
BMC Psychiatry (2023)
Exploring quantitative traits-associated copy number deletions through reanalysis of UK10K consortium whole genome sequencing cohorts
- Sejoon Lee
- Jinho Kim
- Jung Hun Ohn
BMC Genomics (2023)
Frequent copy number variants in a cohort of Mexican-Mestizo individuals
- Silvia Sánchez
- Ulises Juárez
- Sara Frias
Molecular Cytogenetics (2023)
Whole-genome sequencing reveals an association between small genomic deletions and an increased risk of developing Parkinson’s disease
- Ji-Hye Oh
- Sungyang Jo
- Sun Ju Chung
Experimental & Molecular Medicine (2023)

Detection of large-scale variation in the human genome

Abstract

Similar content being viewed by others

The sequences of 150,119 genomes in the UK Biobank

Pan-genomics in the human genome era

Structural variation in the sequencing era

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1

Supplementary Table 1

Supplementary Table 2

Supplementary Methods (PDF 124 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Ethnic and functional differentiation of copy number polymorphisms in Tunisian and HapMap population unveils insights on genome organizational plasticity

Nexus between genome-wide copy number variations and autism spectrum disorder in Northeast Han Chinese population

Exploring quantitative traits-associated copy number deletions through reanalysis of UK10K consortium whole genome sequencing cohorts

Frequent copy number variants in a cohort of Mexican-Mestizo individuals

Whole-genome sequencing reveals an association between small genomic deletions and an increased risk of developing Parkinson’s disease

As normal as normal can be?

Search

Quick links

Abstract

Similar content being viewed by others

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links