Introduction

Disturbances in the dopamine neurotransmitter system have long been suggested to play a crucial role in the pathogenesis of schizophrenia.1 However, involvement of dopamine-related genes in the development of this disorder has remained elusive. The dopamine D4 receptor gene (DRD4), located on chromosome 11p15.5,2, 3 has received considerable interest because clozapine, a neuroleptic, which is often effective for treatment-resistant symptoms, has a high affinity to this receptor.4, 5 Also, D4 receptor is upregulated in the postmortem brain tissues from schizophrenic patients.6, 7, 8, 9 To investigate genetic association of DRD4 with psychiatric phenotypes and traits, many studies have focused on a 48-bp variable number of tandem repeat (VNTR) in exon 3. This tandem repeat varies in length, comprising two (2R) to eleven (11R) 48-bp repeat units, and codes for the third intracellular loop of the receptor protein.10 The 7R variant protein has functional properties distinct from other size variants in terms of clozapine binding10 and effect on post-synaptic intracellular signal transduction.11, 12 Also, there has been growing literature reporting genetic association of the allele 7R with psychiatric traits/illnesses including novelty seeking,13, 14 attention-deficit hyperactivity disorder (ADHD)15, 16 as well as schizophrenia.13, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 Some studies declared association of the VNTR size polymorphisms with schizophrenia,13, 17, 21, 22, 24, 27 although not all studies agree.18, 23, 25, 26, 28, 30, 31, 32, 33

Also, supporting the functional role of the VNTR is evidence of selection favoring the allele 7R of the VNTR.34, 35 It was shown that the ratio of non-synonymous (Ka) to synonymous (Ks) substitutions, Ka/Ks, is higher than 1 in this tandem repeat regions, and that strong linkage disequilibrium (LD) exists between the allele 7R and the surrounding DRD4 polymorphisms. Further, on the basis of inferred gene genealogy of VNTR alleles, the allele 7R was supposed to have originated as a rare event involving multiple mutations and gene conversions. The observed high frequency of the allele 7R in many European ancestry populations despite a complicated mechanism of its origin added support to the notion of selection and functional importance of the VNTR. In the world-wide populations, 35 sequence variants of the 48-bp repeat unit are known to exist, each of which is referred to as a ‘motif’. To date, 56 different VNTR haplotypes, with each being composed of an unique combination of motifs, have been reported in humans.35 However, no study has ever exploited the data on sequence variation both from human and chimpanzee samples for procedures of population genetics analysis.

In this study, we attempted to make precise inference on ancestral haplotype to construct a gene tree of VNTR by subcloning and sequencing both human and chimpanzee samples. We constructed a phylogenetic network of motifs using an in-house computer program and inferred a possible genetic relationship of the VNTR haplotypes. These analyses allowed us to re-evaluate the proposed notion of selection acting on this gene locus. We also studied the VNTR for association with schizophrenia in Japanese case-control and pedigree sample sets.

In this study, the term ‘allele’ means the length variant of VNTR. The same length does not necessarily reflect the same VNTR sequence as is shown in studies including this one (see below in the Results section). A ‘motif’ refers to a sequence variant of the 48-bp repeat unit of the VNTR. Also, we mean a VNTR sequence or a motif composition by the term ‘haplotype’.

Materials and methods

Length-analysis, subcoloning and re-sequencing of the VNTR region

The VNTR locus was amplified by polymerase chain reaction (PCR) using fluorescent-labeled primers. PCR was carried out using an ABI 9700 thermocycler (Applied Biosystems, Foster City, CA). PCR fragments were analyzed on an ABI PRISM 3700 Genetic Analyzer (Applied Biosystems) and the genotype, or the constitution of size alleles, of each individual sample was determined using GeneScan 3.5.2 and Genotyper 3.6 software (Applied Biosystems). Subsequently, the VNTR sequences of 102 human chromosomes from control subjects as well as 20 chimpanzee (Pan troglodytes) chromosomes were determined. We carried out subcloning of a PCR product encompassing the entire VNTR region using TOPO TA Cloning kit (Invitrogen, Carlbad, CA) and subsequently direct sequencing using the DYEnamic ET terminator cycle sequencing kit (Amersham Biosciences, Piscataway, NJ) and the ABI PRISM 3730 Genetic Analyzer (Applied Biosystems). Sequences were aligned by the SEQUENCHER program (Gene Codes Corporation, Ann Arbor, MI).

The primer sequences and detailed information on the reaction conditions are available upon request.

Phylogenetic network analysis

Phylogenetic networks of human and chimpanzee motifs were constructed as earlier described36, 37 using an in-house computer program. Possible unequal crossing-over and nucleotide substitution events in the evolutionary history were inferred.

Subjects for a case-control study

For case-control association analysis, samples from 570 unrelated cases of schizophrenia (285 men, 285 women; mean age 47.0±11.4 years), and 570 age- and sex-matched controls (285 men, 285 women; mean age 46.7±11.1 years) were analyzed.38 Furthermore, for family-based association test, we studied 124 pedigree sample sets with 376 members, of whom 163 were affected. This included 80 independent and complete trios (schizophrenic offspring and their parents), 15 probands with one parent, 13 probands with affected siblings, and 30 probands with discordant siblings.39 Probands consisted of 72 males and 52 females.

The diagnosis of schizophrenia was made by consultation according to DSM-IV criteria with consensus from at least two experienced psychiatrists. All available medical records were taken into consideration. Control subjects were recruited from hospital staffs and volunteers who showed no evidence of psychoses during brief interviews with psychiatrists. All subjects were from central Japan. The study was approved by the Ethics Committee of RIKEN, and all participants provided written informed consent.

Statistical tests for association

In case-control analysis, the allelic and genotypic distributions were tested for association by a Monte-Carlo test as implemented in CLUMP program40 (number of simulations and the random number seed set to 10 000 and 100, respectively). Fisher's exact test was also performed to test each individual allele for association. In the analysis of pedigree samples, transmission disequilibrium test was performed to test for global association and for individual allele association using PDT41 and FBAT (http://www.biostat.harvard.edu/~fbat/default.html) software, respectively.

Results

Re-sequencing of the entire VNTR region

Seven length variants of the VNTR (2R, 3R, 4R, 4.5R, 5R, 6R, 7R) were detected (Table 1). We then carried out subcloning for 102 human chromosomes so that all the size alleles can be sequenced. Eleven motifs and 14 VNTR haplotypes were identified (Table 2 and Supplementary Table S1 in Supplementary Information). We named the human motifs H1 to H35. Three haplotypes were newly identified in this study (Table 2). The frequency of the allele 7R in the Japanese cohort was extremely low (0.5%), in contrast to that of world-wide average (19.2%) (Table 1).35 The allele 4.5R was recently named so because of its size between 4 and 5 repeats.42 We identified this length variant in eight chromosomes (four chromosomes in each of cases and controls). We re-sequenced six of these chromosomes, and it revealed an insertion of an 18-bp sequence, completely identical with the 3′ immediate downstream sequence of the VNTR into the middle portion of the most common 4R haplotype (Figure 1). We excluded the possibility that the 4.5R may be an artifact that was generated from the PCR and/or subcloning processes, based on the results that allele compositions (eg genotypes) in all the samples containing 4.5R allele were perfectly consistent between the results from GeneScan analysis and those from subcloning analysis. In addition, we did not detect any novel amplicons except for the 4.5R in all the clones that we picked up and sequenced.

Table 1 Distribution of the VNTR lengths in case (schizophrenia)-control samples
Table 2 Motif compositions of the 48-bp VNTR haplotypes (human)
Figure 1
figure 1

The allele 4.5R of the VNTR. The 4.5R refers to an allele with a length between 4 and 5 repeats. Sequencing of this size allele revealed an insertion of an 18-bp sequence, completely identical with the sequence immediate 3′ downstream sequence of the VNTR, to the most common 4R haplotype.

To add information for inference of an ancestral motif/haplotype, 20 chimpanzee chromosomes were also analyzed by subcloning. Chimpanzee VNTRs from all the chromosomes were the same in size, each consisting of five repeats of a 48-bp unit. There were seven sequence variants (motifs) of a 48-bp unit (Supplementary Table S1 in Supplementary Information), giving rise to three different VNTR haplotypes (Table 3). One haplotype was novel but all of the seven constituent motifs were included in the earlier report.43

Table 3 Motif compositions of the 48-bp VNTR haplotypes (chimpanzee)

To obtain more information on the sequence diversity around the VNTR, we subsequently studied a fraction of human and chimpanzee samples, and examined the adjacent two SNPs, rs1870723 (G/A) and rs7482904 (G/C) located at 165-bp and 185-bp downstream of the VNTR, respectively (Table 4). A haplotype, G–G, was in synteny with most VNTR haplotypes including the most common 4R (H1–H2–H3–H4) haplotype, whereas G–C haplotype were found syntenic with all the 6R and 7R haplotypes. Another haplotype, A–C, was observed in majority of 2R haplotypes and one 4R (H1–H2–H13–H4) haplotype. All the chromosomes from chimpanzee samples had the G–G haplotypes.

Table 4 Haplotypes of the 48-bp VNTR and adjacent two SNPs (human)

Phylogenetic analysis for the VNTR

Human and chimpanzee VNTR haplotypes are composed of various motifs (Supplementary Table S1 in Supplementary Information) and their phylogenetic relationship should be quite complex. The earlier43 and current studies detected a total of nine chimpanzee motifs, which we designated as C1 to C9 (for the sequences of C8 and C9, see Supplementary Table S2 in Supplementary Information). Among the human motifs, H1, H2, H3, and H4 were most frequent and thus are likely to have constituted an ancestral haplotype. To further corroborate this inference, we conducted a phylogenetic network analysis to reveal relationships among these four human motifs and all the nine chimpanzee motifs. Figure 2 shows the phylogenetic network for those 13 motifs. C1 and C2 differ by only one nucleotide, and they are closest to H1. C3, C4, C5, C8, and C9 are similar with each other, and they are closest to H2. Remaining two chimpanzee motifs (C6 and C7) differ by only one nucleotide, and they are closest to H4. There is no chimpanzee motif that is close to H3. Three black circles in Figure 2 indicate locations of common ancestral sequences for human and chimpanzee, suggesting orthologous relationships. It seems that the common ancestor of human and chimpanzee possessed a haplotype similar to H1–H2–H3–H4. If this scenario is true, chimpanzee should have lost the third motif during its evolution. When we examined chimpanzee haplotypes, order of motifs were concordant with corresponding human motifs: C1 and C2 for H1, C3C5 for H2, and C6 and C7 for H4 (see Table 3).

Figure 2
figure 2

Phylogenetic network for human and chimpanzee motifs. Motifs H1–H4 and C1–C9 are from human and chimpanzee samples, respectively. See Table 2 and Supplementary Table S1 for detailed information on motif sequences. Black circles designate possible nodes of common ancestor for human and chimpanzee lineages. Numbers on lines are nucleotide site positions, and those with asterisks indicate parallel changes. C1 and C2 differ by only one nucleotide, and they are closest to H1. C3, C4, C5, C8, and C9 are similar with each other, and they are closest to H2. Remaining two chimpanzee motifs (C6 and C7) differ by only one nucleotide, and they are closest to H4. There is no chimpanzee motif that is close to H3.

With the assumption of ‘H1–H2–H3–H4’ being the ancestral haplotype, we constructed a genetic relationship of haplotypes of the human VNTR (Figure 3a). Derivation of most haplotypes can be accounted for by unequal crossing-over. For example, haplotypes H1–H4, H1–H2–H4, H1–H2–H31, H1–H11–H4, and H1–H2–H3–H2–H3–H4 can be generated from various kinds of unequal crossing-overs of the most frequent haplotype, H1–H2–H3–H4 (Figure 3a). We have to assume a total of two nucleotide substitutions (shown with open triangles in Figure 3a): one from haplotype H1–H2–H6–H5–H2–H4 to H1–H2–H6–H5–H2–H20 and another from H1–H2–H6–H5–H2–H5–H4 to H1–H2–H27–H5–H2–H5–H4. Even the complex transformation from haplotype H1–H2–H3–H4 to H1–H2–H6–H5–H2–H5–H4 can be reconstructed as six cycles of unequal crossing-overs (Figure 3b).

Figure 3
figure 3

A possible genetic relationship of haplotypes of the human VNTR. (a) Possible transformation from haplotype 1–2–3–4 to 1–2–6–5–2–5–4. Unequal crossing-over events and nucleotide substitutions are indicated by black circles and open triangles, respectively. Arrows indicate direction of changes. Estimated patterns of unequal crossing-over are shown next to product of crossing-overs. The asterisks point to recombination within motifs, which give rise to new motifs. (b) Presumed pathways of how the 7R allele can be generated from the 4R allele with possible intermediate haplotypes. The prefix ‘H’ for each motif is omitted.

Association with schizophrenia

In the analysis of 570 cases and 570 controls, no significant association was found in the global distribution of the 48-bp VNTR size alleles (P=0.166). (Table 1). Regarding individual haplotypes, the allele 3R, frequency of which was only 0.003 and 0.012 in cases and controls, respectively, was significant (P=0.012) by Fisher's exact test. The comparison between the allele 7R and all the other alleles in the case-control samples gave an insignificant result (P=0.163) (Table 1). Evidence of association was not found in the analysis of 124 pedigrees with schizophrenic offspring (global P=0.506). None of the individual alleles were significant and the allele 7R was not observed in this pedigree data set (Table 5).

Table 5 Transmission disequilibrium test for the VNTR in 124 pedigrees

Discussion

In this study of DRD4, we particularly focused on the VNTR in exon 3 in terms of documented evidence of selective sweep acting on the allele 7R of the VNTR. It is proposed that the allele 7R was originated as a rare event involving both mutations and gene conversions and prevailed rapidly by positive selection.34, 35 However, the allele 7R was found very rare (0.5%) in our Japanese samples, and this agreed with an earlier report.31, 32, 33 This low frequency of the allele 7R was also observed in Chinese populations.20, 26 Also, the allele 7R of an African population was reported to have an intermediate frequency of 0.21.44 This implies lack of selective sweep at this locus at least in Asian population history. Therefore, we further studied the entire VNTR region by re-sequencing both the Japanese human and chimpanzee samples.

The most common 4R (H1–H2–H3–H4) was considered ancestral haplotype, as three of the constituent motifs were found connected to chimpanzee motif clusters and the order of motifs of this haplotype is consistent with that of chimpanzee. Although 4R (H1–H2–H3–H4) was postulated to be ancestral haplotype earlier based on the human motif frequencies,35 the current study is first to provide support for this inference by phylogenetic analysis program exploiting both human and chimpanzee sequence data.

A two-SNP haplotype ‘rs1870723G–7482904G’, which is in synteny with the most common 4R (H1–H2–H3–H4) VNTR, is considered to constitute the distal part of the ancestral haplotype. This two-SNP haplotype along with ‘rs1870723A–7482904C’ have been reported, but the intermediate ones (‘rs1870723G–7482904C’ and ‘rs1870723A–7482904G’) were missing in the earlier studies. This study detected an intermediate haplotype ‘rs1870723G–7482904C’ attached to multiple VNTR haplotypes. For this G–C haplotype to be present, either a nucleotide substitution or a recombination in the very short genomic interval between these two sites (20-bp long) must occur.

From the VNTR haplotype data, the most frequent 7R (H1–H2–H6–H5–H2–H5–H4) haplotype can be generated by only six unequal crossing-overs. Five descendent haplotypes (1–2–13–5–2–5–4, 1–2–13–4, 1–2–27–5–2–5–4, 1–2–6–5–2–4, and 1–2–6–5–2–20 in Figure 3) were thought to have been derived from this 7R haplotype. Our sequence data on rs1870723 and rs7482904 revealed the prevalence of the intermediate two-SNP haplotype attached exclusively to this common 7R and its descendants. Thus the lineage downstream of the most frequent 7R (H1–H2–H6–H5–H2–H5–H4) is thought to be long enough to add diversity to the sequence around the VNTR. Thus our findings do not agree with earlier documented lines of evidence for selection favoring for the allele 7R, which claim complexity of its generation involving multiple mutations and gene conversions and its high prevalence despite short lineage.35 We, therefore, argue that the drastic difference in the frequency of the allele 7R among populations may well be a consequence of random genetic drift. The possibility of balancing selection or different selective forces acting on different populations cannot completely be ruled out. In the case of the angiotensinogen gene (AGT) polymorphism, this idea was supported by supposedly varying degree of physical advantages of water-retention in different geographical conditions.44 It may be, however, difficult to formulate population-specific advantages for a specific psychiatric trait that the DRD4 may affect.

We further investigated this region for association with schizophrenia. No evidence of global association was found between the 48-bp alleles of VNTR and the disease. The allele 3R showed significant under-representation in cases than in controls. This allele, however, was observed in only 14 chromosomes (1.2%) in controls. The possibility of the allele 3R being true protective variant for only a small fraction of subjects cannot be excluded, although this apparent association of a rare allele may be because of population stratification. Association of DRD4 VNTR with schizophrenia has not been consistently shown in the earlier studies. Although a role of this polymorphism in female schizophrenia was not be excluded by a meta-analysis,45 our data provided no evidence of gender-specific association. Earlier, we suggested a possible role of promoter variants of this gene in the development of schizophrenia.46 Promoter variant and VNTR size alleles are in LD both in our Japanese samples (data not shown) and in world-wide populations.34 Documented associations of the VNTR with ADHD and with other psychiatric traits may reflect LD between VNTR alleles and promoter variants.

There are limitations in this study. First, as DRD4 VNTR is a complex repeat polymorphism involving historical recombinations and was not subject to rigorous statistical tests for selection, although this problem is shared by all studies on the same subject. Second, our association analysis is based on a case-control design, where both false-positive and false-negative results can be produced. Though our analysis of the same sample set using STRUCTURE software47 detected no evidence of population stratification.38 Third, the sample size in the association study is obviously limited. When the genotypic relative risk is set to 1.2 with the multiplicative model, the current size of case-control samples had a power of detecting 0.313 and 0.360 at the threshold of P=0.05 even for common size alleles 2R and 4R, respectively.49 Conversely, to gain the power of 0.8 for detecting a risk allele with the same relative risk, the sample size would need to be 2100 cases and 2100 controls for the allele 2R, and 1700 cases and the 1700 controls for the allele 4R.48 Fourth, in association analysis, alleles were coded according to their lengths. This introduces loss of information on sequence diversity, because even the same size allele is known to have multiple variants according to its constituent motifs. This sample size would not have power to detect association of each VNTR haplotype, many of which have low frequencies.