Introduction

Variations in one's genomic DNA make one unique in terms of disease susceptibility and response to drugs. Single nucleotide polymorphisms (SNPs) are the most widely studied form of genetic variations and few of the SNPs have been linked to susceptibility to diseases and serves as a marker for certain disorders. Beside SNPs, submicroscopic copy-number variations (CNVs) are now considered important form of genetic variations. Findings in past few years have indicated a strong association of CNVs with several complex and common disorders that have a profound effect on our health.

CNVs are a segment of DNA that is 1 kb or larger and present at a variable copy number in comparison with a reference genome.1, 2 CNVs in general are stable and can be inherited. At times they can also develop spontaneously during meiosis. The exact mechanism of the development of new CNV is not clearly understood.3 Deletions, duplications, segmental duplications, insertions, inversions and translocations represent some of the processes resulting in CNV (Figure 1). Variations in the gene copy number can be detected using a variety of platforms, which have evolved rapidly in recent past.4, 5

Figure 1
figure 1

Types of genomic variants. Genomic variants in form of CNVs can be classified primarily as deletion, duplication, segmental duplication and inversion. These variations can encompass the entire gene or a segment of a particular gene represented in the figure.

Any of the processes mentioned earlier may result into disruption of genes in the region and therefore may result in the development of diseases. The phenotypic effect on the disease process will, however, depend on how CNVs affect dosage-sensitive genes or their regulatory elements. It is also conceivable that the development of the phenotypic disease may not depend upon a single CNV but a combination of various CNVs and other genetic variations such as SNPs (Figure 2). Therefore, identification of the all the susceptibility factors affecting the diseases development is a better predictive diagnosis rather than correlating the diseases with fewer susceptibility factors.

Figure 2
figure 2

Genetic variations and susceptibility to polygenic diseases. A schematic representation involving several genetic variants (SNP, CNV, INDELs, and so on) and possible involvement of environmental factors in defining susceptibility to multifactorial diseases. A stronger correlation to disease susceptibility can be explained by taking in account the various factors affecting susceptibility to diseases such as environment factor, different type of genomic variants (SNP, CNVs, indels, and so on) as depicted in the figure.

The challenge was to identify meaningful differences in the genomic information between individuals. When the database for SNP variations in the human genome became available, it was soon realized that the available technologies were inadequate to detect other forms of variations such as CNVs, insertion–deletions (INDELs) and inversions. It was also realized that the other variations such as CNVs, INDELs and inversions are present in the human genome with frequencies much higher than expected. At the same time these variants were too small to be detected by the microscopic techniques and therefore a new set of convenient and cheaper technology platforms evolved enabling us to map the human genome for CNVs.5 In 2006, Redon et al.6 constructed a first-generation CNV map of the human genome with the use of SNP genotyping arrays and Whole-Genome TilePath BAC arrays and among the 270 individuals studied, 12% of the human genome was found to be covered by CNVs whereas the total number of CNVs included in the Database of Genomic Variants (DGV) was reported to account for 29.7% of the human genome, which is often over estimated. Some examples of CNVs include single or multiple genes, which contribute to clinical phenotype whereas smaller CNVs affecting single exons may also account for a proportion of human diseases.7 Recently, the 1000 genome project came up with a map of human genome variation based on the population-scale genome sequencing. In this approach, they screened individuals under three categories namely: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios and exon-targeted sequencing of 697 individuals from seven populations at the pilot scale. They analyzed 15 million SNPs, 1 million short INDELs and 20 000 structural variants describing the location, allele frequency and local haplotype structure of the variants. They also found that on an average each person carried 250–300 loss of function variants in annotated genes and 50–100 variants previously implicated in inherited disorders.8

There are several database where the CNVs are cataloged (DGV; http://projects.tcag.ca/variation/), public database such as Toronto DGV and European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations (ECARUCA, http://www.ecaruca.net). Another database, which archives the clinically relevant CNVs is DECIPHER (DatabasE of Chromosomal Imbalance and Phenotype in Human using Ensembl Resources; http://www.sanger.ac.uk/PostGenomics/decipher/).9 Till (May 2009), over 2200 cases of >50 diseases have been included in DECIPHER.7 The structural variation in the human genome and its implication in human health has been recently reviewed by Stankiewicz and Lupski.10

Furthermore, it is likely that these numbers are underestimated and advances in the technologies will help us in discovering more CNVs in the human genome.4 The major limitation in studying CNVs, despite of the recent technical advances, is the size or the breakpoint position, the total number and its gene content. It's conceivable that with the availability of various accessible technologies we might have a high-resolution CNV map of the human genome in the near future. However, the implication of CNV on health will have to wait several large-scale correlation studies not only with one CNV but also with permutations and combinations of various likely variations.

Techniques to detect CNVs

With the discovery of CNVs as the new form of genetic variations contributing to the disease susceptibility and progression, the task is how to detect these structural variations. The method to detect CNV should be convenient and inexpensive so that it can be applied to a large number of samples from a given population. Only when the data are available from various different populations the real impact of CNVs on health can be assessed. The major hurdle in detection is that a larger fraction of these CNVs do not have defined breakpoint. Demarcated breakpoints usually permit development of simple detection methods around the breakpoints. Beside conventional PCR several other modified PCR-based techniques have evolved, which are considered to be robust assays for screening the targeted region of the genome. Among these quantitative methods are multiplex ligation-dependent probe amplification, multiplex amplification and probe hybridization, quantitative multiplex PCR of short fluorescent fragments, semiquantitative fluorescence in situ hybridization, dynamic allele-specific hybridization and paralogue ratio test. The methodologies to detect CNV have been recently reviewed by Dhawan and Padh (2009).4

Genome-wide association studies have been widely used techniques to identify gene(s) and its variants associated in disease process. Genome-wide association studies are generally carried out in cases versus control populations.1, 11 Evaluating the technicality of each method, and its advantages and drawbacks, weighed against the objectives of the study can help us select the appropriate techniques.

CNV and pathophysiology

Recently, many CNVs have been reported to affect disease susceptibility. Among them are various complex neurological disorders, cardiovascular diseases, infectious and autoimmune diseases, metabolic diseases, cancer and several other common disorders (Table 1).

Table 1 CNV genes associated with diseases studied in different population

CNV and neurological disorders

Parkinson's disease

Parkinson's disease (PD) is a progressive nervous disease associated with the symptoms such as muscular rigidity, resting tremors, bradykinesia (slowing of movements) and posture instability. It is found to affect 1% of the population over 50 years of age.12 A novel triplication in SNCA gene, located on 4q21 is found to be linked to autosomal dominant PD and showed the dosage effect. Further, the gene expression profiling reflected an approximately twofold increase of SNCA protein in blood, mRNA in the brain tissue and also the deposition of large aggregates of this protein in the brain tissue.13

CNV in terms of gain of the copies of SNCA gene showed a profound effect on PD. The SNCA triplication was reported in a family of Swedish American descent with autosomal dominant early-onset PD.13 SNCA duplication was also reported in a Swedish family, suggesting a dosage effect of SNCA in selected cases of PD.13

Alzheimer's disease

Alzheimer's disease (AD) is a progressive neurological disorder characterized by dementia among elderly people, which is mainly due to the intracellular tangles and extracellular plaques of amyloid getting deposited in the vulnerable regions of the brain.

Copy number gain of the APP gene (the amyloid precursor protein) is hypothesized to be one of the causes of AD.15 In the Dutch population, duplication of the APP gene has been reported and was found to be associated with autosomal dominant early-onset AD and cerebral amyloid angiopathy.14, 15, 16

It is interesting to note the association of the Alzheimer's with Down's syndrome that is due to trisomy 21. APP gene maps on the same chromosome so those with Down's syndrome have three copies of the gene and are more prone to Alzheimer's. Genetic variations in the form of point mutations, in at least 15 genomic loci17 and genetic variations in the promoter region of APP gene are found to be associated with AD.18

Mental retardation and developmental disorders

Mental retardation (MR) that is a nonprogressive cognitive impairment is also affected by the CNV seen in the genomic loci of several genes. Studies of large cohorts may help in detecting and confirming the roles of rare de novo CNVs in MR. MR occurs in 2–3% of newborns in the general population, however, its cause has remained elusive.19 X-linked MR showed six overlapping duplications at the Xp11.22 in six unrelated males. Further, it was noted that this duplication covered a 320-kb region involving four genes (SMC1A, RIBC1, HSD17B10 and HUWE1), three candidates of which may convey the phenotype of MR.19 Apart from duplication, many other forms of genetic variations such as point mutation in SMC1A and HUWE1 genes and a silent mutation in HSD17B10 gene conveyed the phenotypes of MR along with other distinguished characteristic.20 The conclusion drawn from the above findings showed that it is a dosage-sensitive gene, which confers the MR phenotype in the patients with duplicated genes. MECP2, X-linked methyl-CpG-binding protein 2 gene at Xq28 is also found to be associated with the developmental delay, MR and fatal infantile encephalopathy in males, and recently it has been reported that low copy number of MECP2 gene confers a clinical phenotype, resulting in MR or developmental delay and altered neurological symptoms (particularly seizures) phenotypes in males.21, 22, 23, 24 Needless to say, this is perhaps one of the most complex phenotype and the clear picture will emerge when all possible loci and their interactions are well defined.

Apart from this several other submicroscopic duplications and deletions in 17q13.3 involving LIS1 and/or the 14-3-3ɛ genes are shown to confer a risk to MR and several other characteristics features.25

The deletion in exonic regions of NRXN1 gene, located on chromosome 2p16.3, is found to predispose one to a wide spectrum of developmental disorders. These neurexins are a group of highly polymorphic cell surface proteins involved in synapse formation and signaling. Variants comprising a variety of mutations such as missense mutation, translocation, whole-gene deletion and intra-genic copy number changes result into a significant association with a variety of phenotypic changes such as autism, schizophrenia (SZ) and nicotine dependence.26

Autism

Autism is a pervasive neurodegenerative disorder characterized by impaired communication or linguistic skills, social interaction, cognition, some form of repetitive and restricted stereotyped interest, ritual or other behavior. The symptoms vary from person-to-person and no two persons have identical symptoms and hence called Autism spectrum disorder.27 A case–control study using the representational oligonucleotide microarray analysis technology was applied to identify the CNVs involved in Autism. Representational oligonucleotide microarray analysis was first explored for the detection of the genomic aberrations in cancer and healthy humans. In this technique by arraying oligonucleotide probes designed from the human genome sequence, and hybridizing with representations from cancer helped in detecting regions with altered copy number.28 Subsequently with the help of this technique revealed more spontaneous development of CNVs in Autism spectrum disorder patients than in unaffected controls.29

Both duplication and deletion were observed for Autism spectrum disorder. There are several loci associated with autism susceptibility such as duplication of 15q11-13 (AUTS4; MIM 608636) and deletion of 16p11.2 (AUTS14; MIM611913). A reciprocal microduplication and recurrent microdeletion at 16p11.2 have been shown to be associated with autism and may account for 1% of the cases.30 Applying the homozygosity mapping analysis in pedigrees several large inherited, homozygous deletions were observed. On analysis, it was found that this deletion spans 886 kb on chromosome 3q24 affecting DIA1 gene, whose expression level changes in relation to neuronal activity.31

Schizophrenia

Schizophrenia (SZ) is a chronic, debilitating illness with extensive neurological and psychiatric features. Its prevalence is 1% of the population. Several CNVs are found to be associated with SZ.32, 33

Various deletions located on positions such as 1q21.1, 15q11.2 and 15q13.3 were found to be associated with SZ and psychosis, when studied in case–control samples analyzed by the International Schizophrenia Consortium and subsequently confirmed by other studies.34, 35 Furthermore, the previously reported deletion at 22q11.2 associated with SZ phenotype in DGV/velocardiofacial syndrome was also confirmed by these groups.36 Another group of researchers identified 90 CNVs in 54 patients, of which 13 were rare CNVs disrupting genes associated with SZ such as MYT1L, CTNND2 and ASTN2.37 To confirm the association of these rare CNVs a large cohort of samples is needed.

Bipolar disorder

Bipolar disorder is a psychiatric disorder characterized with profound and prolong mood swings and depression. It has been found that submicroscopic variation in GSK3β gene, which codes for glycogen synthetase kinase, a key component of Wnt signaling pathway and a target of lithium salt is involved in the susceptibility to bipolar disorder. The duplication or the increase in the copy number of GSK3β gene disrupts the 3′-coding element as well as affects the neighboring genes. Findings from the study suggested that there was a significant increase in the GSK3β copy number in the bipolar disorder patients as compared with control.38

CNV and susceptibility to other common disorders

HIV/AIDS susceptibility

Chemokines are secreted proteins involved in the immunoregulatory and inflammatory processes. The CCL3L1 gene copy number influences the susceptibility to HIV/AIDS. CCL3L1 gene, which encodes on 17q12, is the major coreceptor for CCR5 and is a dominant suppressive chemokine. Therefore, an increase in the copy number of this gene leads to reduction in susceptibility to HIV, as reported for the Caucasian population. The copy number of the CCL3L1 gene varies from 0 to 10 copies in the Caucasian population.39

The recruitment of lymphocytes by β-chemokines is a feature of autoimmunity conditions such as rheumatoid arthritis. The finding of the association of CCR5Δ32 variant with protection against rheumatoid arthritis led to hypothesis that gene copy number of CCL3L1 gene influences susceptibility to rheumatoid arthritis and type 1 diabetes. When studied in two independent Caucasian cohorts (New Zealand and UK population) it was found that high copy number (higher than two copies) of CCL3L1 gene was a risk factor for rheumatoid arthritis.40

Crohn disease and psoriasis

Crohn disease (CD) is a chronic inflammatory bowel disease, causing inflammation of the digestive tract. It has been shown that deficient expression of defensins, which are endogenous antimicrobial peptides protecting intestinal mucosa against bacterial invasion, can lead to chronic CD. Therefore, it was hypothesized that the low copy number of the β-defensin gene cluster may also be associated with chronic CD. Various other reported deletions and SNPs in genes have shown a strong correlation with CD. HBD-2 (human beta-defensin 2) gene is found to be associated with CD. When studied, it was found that patients with ulcerative colitis and healthy individuals have a median of 4 copies per diploid genome (range 2–10 copies), whereas patients with CD had lower copy number as compared with controls (P=0.002). Further, it was found that individuals with less than three copies of HBD-2 gene have a significantly higher risk of developing colonic CD as compared with individuals having four or more copies (odds ratio of 3.06).41

In contrast, the increased copy number of β-defensin genes was shown to be associated with psoriasis, which is a chronic autoimmune skin disease with a prevalence of 2–3% in individuals of the European ancestry.42, 43 Apart from β-defensin genes, individuals with deletion in LCE3B and LCE3C genes of late cornified envelope (LCE) gene cluster are found to be susceptible to psoriasis. The absence of the well characterized 32199 bp region was significantly associated (P=1.38E−08) with risk of psoriasis when studied in family-based samples from Spain, The Netherlands, Italy and the United States (P=5.4E−04).43

Immunity-related GTPase family, M was found to be associated with CD through earlier SNP studies. Recently, a 20-kb deletion has been found to attribute susceptibility to CD. The deleted portion is located upstream of the gene and is found to be in perfect linkage disequilibrium. It has been hypothesized that as immunity-related GTPase family M expression can affect the autophagy of internalized bacteria, the deletion might alter the expression level of immunity-related GTPase family M, thus, contributing to phenotype associated with CD.44

Pancreatitis

Pancreatitis is a multigene-associated disorder, including the cationic trypsinogen gene PRSS1. In the earlier studies, the R122H missense mutation was found to increase the activity of trypsin in vitro, which led to the suggestion that PRSS1 might be a dosage-sensitive gene. Upon further analysis, a novel 605 kb triplication was observed in a cohort of 34 French families with hereditary pancreatitis encompassing the PRSS1 gene resulting into CNV of PRSS1 gene.45

Systemic lupus erythematosus and glomerulonephritis

Systemic lupus erythematosus (SLE) is a chronic autoimmune disease of connective tissues and affects the skin, joints, kidney and serosal membranes, due to failure in regulation of the immune system. A strong correlation was found between the CNV of FCGR3B gene and SLE where an increased risk of development of SLE in individuals with fewer than the two copies of FCGR3B gene reported in the UK cohorts and the same correlation was confirmed in an another Caucasian population.46 FcγR3B is a glycosylphosphatidylinositol-linked, low-affinity receptor for immunoglobulin G found predominantly on human neutrophils. The low copy number of FCGR3B gene is associated with impaired clearance of the immune complex, which is a characteristic feature of SLE.47 Complement component 4 (C4, including C4A and C4B) gene mutations were also found to be associated with SLE. On examining the Americans of European descent, the copy number of C4 gene varied from 2 to 6 (C4A, 0–5; C4B 0–4). The risk of SLE was found to be increased in subjects with low C4 copies but decreased in those with high C4 copies.48

Asthma

Asthma is a chronic inflammatory genetic disorder characterized by a condition marked by recurrent attacks of dyspnea and constriction of the bronchii, also termed as bronchial asthma whereas the bronchial asthma due to allergy is called atopic asthma.

Genetic polymorphism of the glutathione S-transferase (GST) gene, which is mainly involved in the antioxidant defense is well known as a risk factor for several environmental diseases and was thus hypothesized that CNV in the GST genes might be associated with the asthma susceptibility.49, 50 CNVs of the GST genes were examined in patients with atopic asthma and it was found that the null genotypes of GSTT1 and GSTM1 together with GSTP1 Val/Val polymorphism have a significant role in the asthma pathogenesis.51

CNV and cardiovascular disease

Structural variations are also found to be affecting the susceptibility to many cardiovascular diseases. A common CNV in LPA gene on chromosome 6, which encodes for atherogenic apolipoprotein (a), which is the primary determinant of the plasma lipoprotein is a risk factor for atherosclerosis.52, 53 Apart from this CNVs were also found to be associated with the lipoprotein disorders. It was found that the low-density lipoprotein receptor gene (LDLR) is found to be affected in the patients with familial hypercholesterolemia.53, 54

CNV and metabolic diseases

Type 2 diabetes

Type 2 diabetes is a chronic metabolic disorder characterized by high glucose level or insulin resistance. Various SNPs have been linked to the high risk for diabetes but other genetic variations such as CNV were not explored. Recently, it has been discovered that CNV at leptin receptor, which is mainly involved in satiety and energy expenditure is found to be significantly associated with risk to type 2 diabetes. The role of leptin and leptin receptor in obesity is well established and common genetic variation such as SNPs at the LEPR gene locus are found to be associated with obesity, hyperinsulinemia, type 2 diabetes mellitus and variation in the levels of leptin in different populations. The expansion of the map of genetic variation has revealed many new loci associated with the disease. In a recent study, the association of the LEPR gene locus encompassing 200 kb on chromosome 1 was studied in the Korean population using the genome-wide SNP array data and it was found that CNV at the LEPR gene locus is significantly associated with metabolic traits and the risk to type 2 diabetes mellitus.55

Overweight and obesity

People with body mass index of 25 kg m−2 are considered overweight and those with body mass index of 30 kg m−2 are considered obese. As per one estimate 66% of the US population is overweight and among them about 30% are obese.56 Obesity represents complex metabolic disorder affecting multiple systems and the cause of obesity is complex and unclear. It still remains a multigenic and multifactorial condition influenced by ‘‘environment’’. It is conceivable that beside environmental factors, several types of genetic variants such as SNP, CNV and others might be involved in precipitating obesity. In one instant, an early onset severe obesity in the Caucasian population was linked to deletion of 16p11.2 segment resulting in severe hyperphagia and insulin resistance. Examination of the deleted region revealed genes such as SH2B1, known to be involved in signaling pathways involving leptin and insulin. In another case, CNV at 10q11.22 was found associated with body mass index. The region in question has PPYR1 gene, regulating energy balance.57, 58

Amylase gene

CNVs is found not only contributing to the disease in general but also is found to be involved with diet, such as salivary amylase gene (AMY1 gene), which is associated with the starchy food consumption in a population. It has been reported that higher copy number of AMY1 gene is correlated positively with salivary amylase protein levels and a population with high-starch food consumption has higher copy number of AMY1 gene and improves the digestion of starchy foods and may reduce the burden of intestinal diseases.59

CNV and drug metabolism

Cytochrome P450 (CYP450) is a superfamily of hemoproteins involved in metabolism of xenobiotics such as clinically used drugs, procarcinogens and environmental pollutants.

Among superfamily of CYPs, CYP2A6 is an important human hepatic P450 enzyme, which is involved in drug metabolism including nicotine. CNVs such as duplication and deletion are found to be associated with the smoking behavior and susceptibility toward lung cancer and tobacco-related diseases. The association of the deletion variant was studied in the Chinese population. Frequency of the deletion variant (CYP2A6del) when studied in 96 Chinese subjects, was found to be 15.1%, but only 1% in Finns (n=100) and 0.5% in Spaniards (n=100).60

Apart from deletion, a novel duplication was reported in the African-American population and was found to increase nicotine metabolism and may affect smoking behavior in contrast to the European-American, Korean or Japanese populations.61

Another gene studied in the Caucasian-American and in the African-American population is SULT1A1 gene, which catalyzes the sulfate conjugation of a wide variety of drugs and is found to be genetically polymorphic. Apart from established SNPs at the 5′-flanking region, it also shows CNV. The range of copies of SULT1A1 gene varies from one to approximately five copies in both the populations. When the enzyme activity in the human liver and platelet samples was checked it showed a positive correlation with the number of copies of SULT1A1 gene.62

Further, CNV mainly in the form of deletion has been reported in GSTs conferring the susceptibility to cancers in various populations hypothesizing that the lack of these enzymes may impair metabolic elimination of various carcinogenic compound thereby increasing risk toward cancers. These phase II GSTs GSTT1, GSTM1 and GSTP1 catalyse glutathione-mediated reduction of exogenous and endogenous electrophiles. These GSTs have broad and overlapping substrate specificities and it has been hypothesized that allelic variants associated with less effective detoxification of potential carcinogens may confer an increased susceptibility to cancer.63 Lack of GSTM1 enzyme may impair metabolic elimination of carcinogenic compounds from the body thereby increasing cancer risk.64, 65

CNV and cancer

Understanding cancer genetics and identifying all possible variant alleles that might predispose one to a variety of cancer is the prime objective. To this end, several biomarkers through SNP studies have revealed an association with cancer and many other complex traits.66

In search for the common CNVs, which are associated with malignancy, a map was created that cataloged all the known CNVs whose loci coincide with that of the already known cancer-related genes. Upon analysis it was found that 49 cancer genes directly encompassed or overlapped by CNV in more than one person in a large reference population. Further, validating the initial observation it was found that many of the genes were reported in the DGV and 40% of the cancer-related genes are disrupted by a CNV as analyzed by DGV. Thus, it can be proposed that structural variations are found to be associated with risk for cancer. Deletions and duplication in the cancer-related genes are found to be polymorphic in different population.67

One of the examples of CNV as a risk to cancer with conferred phenotypic effect is MTUS1 gene, which maps on chromosome 8p spanning a deletion of 1128 bp covering the entire exon 4 of the gene. When studied in the German population, the deletion of exon 4 of MTUS1 was found to be associated with the slower progression of disease in both familial and high-risk breast cancer patients.68

There are many large structural variations, which might predispose to cancer but it has been less appreciated as the deletions and duplication breakpoints are not well characterized in many cases and the PCR method to detect these large structural variation is not reliable. Therefore, to characterize these structural variations involved in cancer syndromes newer methods such as multiplex ligation-dependent probe amplification and others needs to be used, which allow the detection of copy number changes in a single gene or exon.

Recently, apart from the constitutional mutations predisposing to cancer there are several acquired (somatic) copy number alterations present in the tumor genome, which need to be analyzed with the help of high-throughput technology.67

Conclusion

The human genome variations comprise of SNPs, as well as other variants such as CNVs, INDELs, inversions and larger structural alterations. SNPs have been widely studied in various populations primarily because of the ease of detection in large number of samples. However, recent technological advances have opened up investigations into other types of variations mentioned above. In past few years, we have learned a lot about the CNVs and their implication on our health and diseases. Observation from the comprehensive maps generated by Redon et al.6 and Jakobsson et al.69 have established CNVs as one of the prominent genetic variation having an important role in inter-individual and inter-ethnic differences in susceptibility to common and complex diseases.70

Still more technical advances are awaited for large-scale survey of CNVs, INDELs and other variants in many racial and ethnic populations. CNVs are also thought to have a key role in the evolution by gene duplication and exon reshuffling contributing to major phenotypic consequences. Our attempts to establish such variations and their link to health and disease will remain limited until reliable data of all such genetic variations in several different populations is generated. Only then the reliable genetic basis of disease development and drug response will be understood. It is also conceivable that a combination of several genetic variants will dictate the development of complex diseases. With this limitation in this review an attempt is made to evaluate link between the known CNVs to the development of complex diseases. It is heartening that in several cases an association is established between diseases and CNVs, however, the observations need to be independently replicated in other populations.

Future perspective

Until recently SNP was the basis of studying human genome variability and its contribution to phenotypic variations and disease susceptibility. However, with the discovery of other structural variations (deletions, duplications and inversions) a better understanding of the genetic variability or diseases susceptibility is emerging. It has been seen that CNVs contribute to phenotypic change or disease by various molecular mechanisms mentioned elsewhere. Based on such mechanism, the structural variations predispose to susceptibility to various complex diseases. The mechanism of disease development may not be very apparent in several cases. For any new CNV locus, it has to be studied in different populations to catalog its extent of diversity. The observation then has to be validated with a large sample size using the high-resolution simple method yielding distribution map of such CNV.

Another important aspect with the discovery of structural variation is to study its evolutionary perspective. For example, it has been reported in a study that segmental duplication events are found to be affecting the genome variability greater than single base-pair change in chimpanzee and human genomes.71, 72, 73 CNV distribution among various populations should pave the way to understand evolution and mechanism of development of newer CNV. Finally, it is very important to take in account all structural variations to fully understand the mechanism underlying the phenotype, disease development and human diversity. The available variants then can be studied in all possible permutations and combinations to identify a right combination leading to phenotypic changes. In future, when this becomes possible, we may have most phenotypes linked to a combination of few variants. That will be the ultimate outcome of the human genome initiative started in 1980s.