Copy-number variation — deleted or duplicated regions of DNA — is widespread in the human genome. A systematic population survey of the common variants provides an invaluable resource for further studies.
What makes people different? Much of the answer comes from inherited differences, and interpreting the extensive variation between people's genomes is a necessary part of understanding the human genome. Variation in the form of single base changes (single nucleotide polymorphisms, SNPs), and repetitive DNA, is already well documented. Adding an extra dimension to human genetic variation is the increasingly evident prevalence and functional importance of copy-number variation. Although most human DNA is present in exactly two copies per cell — one from each parent — some regions can be variably duplicated or deleted, leading to population variation in the number of copies inherited by different individuals. In a Nature paper that has just appeared online, Conrad et al.1 report a working map for frequent human copy-number variation. It is a landmark in providing an unprecedented combination of completeness and spatial resolution, and is likely to stand as a definitive resource for years.
This, though, is by no means the first genome-wide survey of human copy-number variation2,3,4,5. Previous investigations involving a technique called array-CGH — comparative genomic hybridization to microarrays of DNA targets — have detected numerous examples of copy-number variants (CNVs)2,3. Array-CGH involves hybridizing fluorescently labelled genomic DNA from the test individual simultaneously with DNA from a reference individual (also labelled, but differently) to a set of 'target' DNA sequences from different parts of the human genome. Where test and reference DNAs both have the same numbers of copies of a DNA sequence, that target will give a standard ratio of signals from the two fluorescent labels. If there is a different copy number, the ratio will shift — for example towards a lower representation of the test sample label for a region in which there is a deletion (Fig. 1).
However, because comparative hybridization has hitherto been measured using relatively large pieces of DNA, the extent of DNA involved in a deletion or duplication has often been defined imprecisely. Consequently, there are real difficulties in interpreting precise location in current databases of CNVs; for example, are two independent reports of CNVs in approximately the same place detecting different variants or simply rediscovering the same one? A more precise alternative for discovering CNVs uses DNA sequencing to identify non-standard sequences around the junctions of deletions or duplications4,5. But even with the power of current sequencing technologies, relatively few individuals can be thoroughly surveyed using this method.
Conrad et al.1 solve the problem of comprehensively defining variation at high precision by introducing a step-change in the spatial resolution of genome-wide array-CGH. Despite the problems imposed by repetitive DNA in the human genome, their survey examined comparative hybridization at no fewer than 42 million locations, using a short, synthetic DNA target for each location tested — an average spacing of about 56 base pairs. The result is comparative intensity data for each synthetic target, which can be analysed for evidence of deletion or duplication. These data were noisy (and so needed to be averaged over several neighbouring probes to be reliable), but in practice the high density of coverage closes the gap between previous hybridization approaches and sequence-based discovery methods. This high-resolution platform was used to survey DNA from 40 unrelated individuals (20 Africans and 20 Europeans), giving a probability of better than 95% of finding CNVs present at a frequency of 5% or more.
Even applying conservative criteria for inferring CNVs, requiring ten consecutive targets to agree in reporting a deletion or duplication, nearly 12,000 putative variants were initially identified, with each individual tested differing in copy number from the reference sample at more than 1,000 distinct sites. More than 8,000 CNVs were then firmly established using a variety of validation methods — most significantly, samples from the Wellcome Trust Case-Control Consortium disease-association study6 were independently typed for the CNVs, the results of which will be reported separately.
Collectively, the CNVs overlap about 13% of human genes. Some deletions remove entire genes; others will cause loss of gene function via frameshifts, in which the triplet DNA coding register is shifted backwards or forwards. Deletions or duplications, especially those affecting an entire gene, have a higher a priori probability of affecting the gene's function than individual SNPs. Conrad et al.1 immediately applied their new data to investigate the potential role of CNVs in disease, by cross-checking SNPs implicated in previous human disease studies against SNPs they found to be associated with CNVs. Could a local CNV be the real cause of some of these predispositions to disease (with the SNP acting as an indirect reporter)? If so, the SNP identified as over-represented in disease should correlate with chromosomes carrying a CNV. Reassuringly, this survey for CNV–SNP–disease associations produced a list including three well-established examples — CNVs associated with Crohn's disease7 (Fig. 1), psoriasis8 and obesity9. Other CNVs on the list then become strong candidates for constituting the functional basis of the observed associations of SNPs with other disorders. Although these might be invaluable leads for understanding particular disorders, the authors are clear that the CNVs cannot solve the 'missing heritability' problem: in even the best-worked cases of disorders for which genetic predispositions have been characterized, most of the total risk attributable to genetic factors remains unexplained.
This study1 has not found all human CNVs — the smallest CNVs, the less frequent CNVs and those embedded in complex, repetitive DNA will all have had a good chance of escaping detection. But Conrad et al.1 will have discovered and characterized nearly all the CNVs big enough and frequent enough to matter, probably including many that will prove to be involved in disease.
The authors also provide superb resources that will allow other researchers to use their data to find out more. These include a detailed listing of the genomic locations of the CNVs found, the genotypes of reference individuals and (most useful of all) a web-based archive of (nearly) raw data from the original 40 comparative hybridization experiments. Making hybridization data freely available allows others to undertake detailed analyses of specific regions, for example to investigate potential variants not meeting the strict criteria imposed in this study. The Single Nucleotide Polymorphism database (dbSNP) and International HapMap Project provide essential data for research into SNPs. Information from this study1 will likewise become the first-line source of CNV data for investigating human variation, genome evolution and disease genetics.
Conrad, D. F. et al. Nature doi:10.1038/nature08516 (2009). | Article
Iafrate, A. J. et al. Nature Genet. 36, 949–951 (2004).
Redon, R. et al. Nature 444, 444–454 (2006).
Korbel, J. O. et al. Science 318, 420–426 (2007).
Kidd, J. M. et al. Nature 453, 56–64 (2008).
The Wellcome Trust Case Control Consortium Nature 447, 661–678 (2007).
McCarroll, S. A. et al. Nature Genet. 40, 1107–1112 (2008).
de Cid, R. et al. Nature Genet. 41, 211–215 (2009).
Willer, C. J. et al. Nature Genet. 41, 25–34 (2009). | Article
About this article
Biology & Philosophy (2010)