Main

In 1996, association of particular variants of the dopamine D4 receptor gene (DRD4) with the personality trait of novelty seeking was reported by two groups (Ebstein et al. 1996; Benjamin et al. 1996), and heralded the beginning of molecular study of genes for normal personality variables. Before this genotyping era, a number of studies established the importance of genetic contribution to personality (Bouchard 1994; Plomin et al. 1994). Progress in personality research was aided by the development of measures of personality, based on a biopsychosocial view of personality, that divide the traits into four temperaments: novelty seeking (NS); harm avoidance; reward dependence; and persistence in the Tridimensional Personality Questionnaire (TPQ, Cloninger 1987). This provided a reliable method for the study of normal personality, as well as the study of psychopathology. The division into temperament dimensions allowed for the hypothesis that the genetic and neuroanatomical basis for each was provided for by three principal neurotransmitters: dopamine; serotonin; and norepinephrine, respectively. The assignments of particular molecular substrates were based on behavioral pharmacological observations in humans and animals, and animal lesion experiments, as well as self-stimulation paradigms in animals (Cloninger 1987). However, extrapolation to humans is largely unsupported empirically.

After cloning the DRD4 gene over 8 years ago (Van Tol et al. 1991), there was interest in subsequent demonstration of DNA sequence variation (polymorphism) in part of the gene that is transcribed to protein (Van Tol et al. 1992). Since then, a number of attempts to find a function for, and more importantly, a disorder related to, this gene have been largely unsuccessful. Linkage and association studies in the two major neuropsychiatric diseases, schizophrenia and bipolar disorder, have not shown that DRD4 is a major gene for either of these disorders. For a period, it almost seemed as if DRD4 was a gene looking for a function, although a possible role in the action of atypical antipsychotics has received attention (Seeman et al. 1997a). Despite initial reports that D4-like sites were elevated in postmortem striatum from schizophrenic patients (Seeman et al. 1993), it now seems that these sites are derived from the D2 gene but are not pharmacologically identical to the classic dopamine D2 receptors (Seeman et al. 1997b).

More recently, studies of DRD4 have investigated its potential role in behavioral traits. A number of published studies investigated the possible role of DRD4 in NS. These reports focused on the polymorphism consisting of a variable number of tandem repeats (VNTR) in DRD4. This spawned association studies, because the polymorphism is in the coding region. Small differences observed in functional assays have also been reported for some of the VNTR alleles (Van Tol et al. 1992; Asghari et al. 1994, 1995) and have been used as a rationale for studying this “functional” polymorphism. This review first describes the current knowledge about the DRD4 gene and its product the D4 receptor. It then focuses on reports of association of NS and the VNTR of DRD4.

DRD4 Gene and D4 Receptor

The molecular basis of the VNTR of DRD4 is a 48 base pair repeat unit in the third axon of the gene (Van Tol et al. 1992), and between 1–10 tandem copies of the repeat motif have been reported in humans (Van Tol et al. 1992; Lichter et al. 1993; Chang et al. 1996; T. Li and D. Collier, personal communication). The common alleles in Caucasians contain four and seven repeats. The sequence of repeat elements is not identical, with 19 different sequence repeat elements known thus far (Lichter et al. 1993). The DNA sequences produced by these elements were investigated in 178 unrelated chromosomes from 13 ethnic groups, and 29 unique alleles were identified, a few of which were common but with many rare alleles (Lichter et al. 1993; Nakatome et al. 1998). Eighty percent of the seven repeat alleles sequenced had the same sequence. Sequence analysis of the VNTR is complicated because of the high guanine and cytosine nucleotide content of the repetitive region that promotes heteroduplex formation. Each 48 base pair element codes for a 16 amino acid sequence, and the 29 DNA sequence variants are predicted to code for 22 different amino acid sequences.

A comprehensive study of DRD4 VNTR allele frequencies using 1,327 individuals sampled from 36 populations worldwide, found that the range of four and seven allele frequencies varies greatly from 0.16 to 0.96 and 0.01 to 0.78, respectively (Chang et al. 1996). Based on the data provided elsewhere (Lichter et al. 1993), the calculated heterozygosity of the VNTR based on allele lengths is 0.50, as compared to a heterozygosity of 0.60 using sequence. D4 genes of animals have also been studied, and the rat D4 gene does not seem to have a VNTR (Lichter et al. 1992). Studies of nonhuman primates, including chimpanzees, gorillas, orangutans, baboons, and squirrel monkeys suggest that the ancestral hominoid gene had five repeats, and this has led to various speculations regarding evolutionary relationships of nonhuman primates (Livak et al. 1995). Unique nucleotide sequences of the 48 base pair motif and repeat numbers were described in each nonhuman species studied that distinguished alleles of different species. Identical adjacent copies of repeat units were observed in squirrel monkeys, a feature not reported in humans.

The D4 receptor is a member of the group of G-protein coupled receptors that has seven transmembrane-spanning domains and belongs to the D2-like family of dopamine receptors (Seeman 1992). The VNTR codes for amino acids in the third cytoplasmic loop of the receptor, a region of G-protein coupled receptors thought to have functional importance with respect to G-protein coupling. It was postulated that the VNTR variants of DRD4 might result in differences in second messenger coupling or signal transduction (Van Tol et al. 1992), and functional characterization of DRD4 with various VNTR alleles was performed in vitro. Following up on earlier work (Van Tol et al. 1992), characterization of dopamine binding demonstrated similarities between variants with a two, four, and seven repeat alleles in COS-7 cells (equilibrium dissociation constant for the receptor-agonist complex, Ki 10, 10, and 16 nM, respectively, in the absence of Gpp(NH)p; Asghari et al. 1994). Dopamine binding in GH4 cells also produced different results, with the two-repeat allele showing differences from the four- and seven-repeat alleles (Ki 7.5, 2.2, and 1.8 nM, respectively, for the two, four, and seven repeats (Sanyal and Van Tol 1997). Most pharmacological work to date has concentrated on antagonist binding to variants of the receptor: “all the different forms of the human receptors displayed similar binding profiles for all ligands, although small differences were observed” (Asghari et al. 1994).

To study second messenger systems, investigation of the potency of dopamine to mediate inhibition of forskolin-stimulated cAMP in the three different variants was performed in CHO-K1 cells stably expressing D4 (Asghari et al. 1995). Differences in cAMP inhibition were reported between the two and seven variants (p = .001), the four and seven variants (p = .0001), as well as when the two and four variants were combined and compared with the seven variant (ANOVA F(2,53) = 12.5 p = .0001). The in vitro EC50 values for the two, four, and seven repeat allele variants were (means ± SEM) 18.8 ± 2.7; 13.8 ± 2.7, and 36.9 ± 4.6 nM, respectively. However, this was interpreted cautiously: “the polymorphic repeat sequence causes only small changes” (Asghari et al. 1995). Looking at the effect of antagonists in terms of their ability to inhibit these effects found no significant differences in potency, except possibly for clozapine, which had about a twofold lower potency at the seven allele variant, as compared to either the two or four allele (p < .05). Also, a DRD4 mutant was constructed with no VNTR region, which still had signal transduction properties similar to the normal variants (EC50 = 17.6 nM). It seems that the reported functional agonist differences are small, and the biological relevance, if any, remains unclear. It should be noted that these functional studies have not yet been repeated in other laboratories, probably in part because of the small differences observed. Some authors however, have interpreted these results differently: “modest physiological differences have been observed between the short and long forms” (Ebstein et al. 1997b) without distinguishing the possible unique properties of each allele. Interestingly, for the dopamine D2 receptor, in vitro differences have been described in post-translational processing and intracellular trafficking between two alternatively spliced isoforms (Fishburn et al. 1995). These variants differ by 29 amino acids in the third cytoplasmic loop. Differences in the proportion and rate of dopamine-induced sequestration of these two D2 isoforms have also been characterized (Itokawa et al. 1996). In summary, there are no differences in the affinity of dopamine for the three variants, although the seven repeat variant seems to have slightly lower efficacy of inhibition of cAMP. Because of reports of differences of D2 receptor variants in post-translational processing, similar studies of D4 would be of interest.

Anatomic Distribution of D4 Receptor

Levels of expression of D4 in the human brain seem to be as low compared to D2 receptors. Because of similarity of both of the DNA sequence and pharmacological properties to other D2-like receptors (D3, D2), it has been difficult to determine the precise site of D4 using ligand-binding studies. However, a novel D4 selective ligand, [H]NGD 94-1, has found specific high-affinity binding in human hippocampus, hypothalamus, dorsal-medial thalamus, entorhinal and prefrontal cortices, and the lateral septal nucleus (Primus et al. 1997). Alternative approaches looking for D4 receptor immunoreactivity have reported localization to GABAergic neurones in the human cerebral cortex, hippocampus, thalamic reticular nucleus, globus pallidus, pars reticularis of the substantia nigra, and a subset of cortical pyramidal neurones (Mrzljak et al. 1996). DFR1, a monoclonal antibody raised to the aminoterminal peptide of the predicted extracellular region of the receptor was used to demonstrate evidence for D4 immunoreactivity in human postmortem brain in four out of six subjects (Lanau et al. 1997). A 50-kDa labeled band was obtained from the entorhinal, cingulate, and frontal cortices, as well as the substantia nigra and cerebellum; however, interindividual variability was noted. They also noted a similar pattern of distribution using Western blot analysis. The levels of D4 mRNA are undetectable in the striatum using in situ hybridization, but D4-like transcripts have been detected in the prefrontal cortex and primary visual cortex (Meador-Woodruff et al. 1997). Uncertainty about the role of D4 arises from the demonstration of much higher levels of D4 mRNA in the rat heart as compared to rat brain tissue (O'Malley et al. 1992), although D4 has not been demonstrated in the human heart (Matsumoto et al. 1995). The retina has the highest levels of D4 mRNA in humans (Matsumoto et al. 1995).

Recently, an analysis of homozygous Drd4 knockout mice has been published (Rubinstein et al. 1997). Although seemingly normal, the knockouts displayed reduced spontaneous locomotor activity and rearing, but outperformed normal mice on the Rotarod, a test of coordination. Furthermore, they were supersensitive to the stimulation of locomotor activity produced by ethanol, cocaine, and methamphetamine. They also demonstrated elevated dopamine turnover in the dorsal striatum. Transgenic animal models may provide insight into subtle behavioral differences. However, caution must be exercised when extrapolating from animals to humans, and, as shown by the report of a naturally occurring human DRD4 knockout (see below; Nöthen et al. 1994).

Additional Polymorphisms of DRD4

A total of nine polymorphisms of DRD4 have been described to date, including the VNTR (Figure 1 ). An SmaI polymorphism in the 5′ noncoding region of DRD4 has been reported with A1 allele frequency of 0.95 in Caucasians (Petronis et al. 1994b), and 0.99 in Asians (Nakatome et al. 1996). This seems to be the same as a C-T polymorphism at nucleotide –11 with allele frequency in Germans, where the allele frequency of the wild-type allele is 0.93 (Cichon et al. 1995). A mononucleotide (Gn) repeat occurs in intron 1, with alleles representing from six to ten guanine nucleotides present at this site (Petronis et al. 1994a). In Caucasians, allele frequencies are 0.03, 0.16, 0.03, 0.65, and 0.13 for the 6, 7, 8, 9, and 10 repeat alleles, respectively (Petronis et al. 1994a). A 12 bp duplication in the first exon, coding for the N-terminal extracellular region of DRD4 has been described (Catalano et al. 1993), with a triple repeat allele also reported (Hebebrand et al. 1997). Allele frequencies have been reported as 4% for A2 in Italians. A single T-G base pair substitution in exon 3, which codes for the fifth transmembrane region has been characterized, which substitutes glycine for valine at amino acid position 194 (Seeman et al. 1994). This polymorphism has only been demonstrated in people of African ancestry with an allele frequency of 0.125, but has not been observed in Caucasians. The resulting protein is insensitive to dopamine. An adolescent with sickle cell disease was found to be homozygous for this polymorphism but, otherwise, seemed phenotypically normal under neurological and psychiatric clinical examination (Liu et al. 1996).

Figure 1
figure 1

Diagram of DRD4 gene with polymorphic sites. DRD4 has four exons (I–IV)and nine polymorphisms have been identified; nucleotide positions of polymorphisms are given in brackets.

A 13 bp frameshift in exon 1, coding for the second transmembrane region, which is predicted to result in a truncated nonfunctional protein, has been described (Nöthen et al. 1994). The allele frequency has been reported as 0.02 in a sample of control subjects in Germany, and 0.003 in an Italian sample (Di Bella et al. 1996). A homozygote for this polymorphism was also described who would be predicted to be equivalent to a human DRD4 knockout. He was obese, had symptoms of autonomic hyperactivity, and developed an adjustment disorder with depressed mood after a surgical operation. It is possible that any or none of these phenotypes is related to this null DRD4.

A PstI RFLP has been described in the 5′ untranslated region of DRD4, which has an allele frequency in Caucasians of 0.23 (Paterson et al. 1996), and this seems to be attributable to a 120 bp duplication located 1.2 kb upstream of the initiation codon (Seaman et al. in press). Two additional polymorphisms of DRD4 have been described (Cichon et al. 1996): a single nucleotide substitutions at position +31 (which changes codon 11 from a glycine to arginine) with allele frequency in Germans of 0.01; and one patient with diagnoses of obsessive-compulsive disorder and panic disorder was heterozygous for a rare 21 bp deletion affecting codons 36–42 (nucleotides +106 to +126).

Association of the DRD4 VNTR with Novelty Seeking

Two back-to-back publications in Nature Genetics in early 1996 provided the first suggestions of the identification of a specific gene involved in normal behavioral traits, by suggesting an association of VNTR alleles of DRD4 (termed D4DR in those papers) with NS. Because these studies provided the incentive for replication studies, we provide a detailed criticism of these two studies, other studies are summarized in Table 1. The first report (Ebstein et al. 1996) analyzed four factors of the TPQ, of which NS was proposed to relate to dopaminergic systems (Cloninger, 1987). This analysis was conducted in 123 unrelated Israeli university staff and students: 90 Ashkenazi, 34 non-Ashkenazi (Jewish, mixed, Arab). The authors focused only on the four-repeat and seven-repeat alleles (termed four and seven alleles hereafter) in their analysis, because these two account for the majority of all alleles and were those reported to be functionally different (see above). They found that the seven allele was positively associated with elevated NS with a score of 18 ± 1 when the seven allele was present, as compared to 15.4 ± 0.5 in individuals without a seven allele (p = .013). Comparison of scores of those with genotype 4,4 to those with 4,7 revealed a slight difference between scores (p = .026). They made no correction for multiple testing, stating that an an priori hypothesis was proposed. No significant effect of ethnicity, age, and sex on the association using one-way analysis of covariance was found, and a “summary of all effects,” including all three covariates, had a nonsignificant effect on the outcome. Although they reported that NS declines with age, such a correlation was not observed in their predominantly young sample. No significant differences in the VNTR allele frequencies were found for other the three personality temperaments of the TPQ. They then proceeded to look at what they termed “short alleles,” which include those with 2–5 repeats, and “long alleles” with 6–8 repeats, and found significant associations (p = .011–.022). They justified this approach of grouping alleles, because it had been used previously for a VNTR near the insulin gene in insulin-dependent diabetes, as well as a dinucleotide repeat of the monoamine oxidase A gene in early onset alcoholism.

Table 1 Summary of DRD4 VNTR Studies of Personality in Normal Subjects

The second of these papers (Benjamin et al. 1996) looked at 315 individuals participating in two protocols investigating genetic factors in personality, sexual orientation, HIV progression, and alcoholism, and were recruited from universities, clinics, and homophile organizations. Because these protocols had initially focused on the X chromosome, 95% were male, and the broad ethnic mix was 92% white non-Hispanic, 4% Asian, 3% Hispanic, and 1% other. Approximately half the sample were homosexuals, and the rest were heterosexuals. There were a total of 291 siblings from 131 families, seven parents, and 17 unrelated individuals. They cited the work of Ebstein et al. (1996) as a rationale to split genotypes into those with alleles from the short group (2–5 repeats) and those with one or two alleles from the long group (6–8 repeats). To assess their subjects they used the NEO-personality inventory, which has five major personality factors, each of which is divided into six facets. The NEO results were converted into estimated TPQ–NS scores using a weighted-scores regression from all five of the NEO ratings, incorporating correction for age, sex, ethnicity, and sexual orientation using regression. Their analysis found that three of four NS facets (NS1, NS3, NS4) were significantly associated with the presence of one or more long group of alleles (p = .0008; ns; 0.013 and 0.051, respectively), as was the over-all NS score (p = .016).

The authors then proceeded to test for associations between groups of alleles and NEO scores in families using a method that corrects for statistical dependence among members of a family (George and Elston 1987). An across-pedigrees association for NS was found when the subjects were divided into those with a “long” allele, and those without a “long” allele (1 df, p = .0011). Next, they analyzed for evidence of transmission disequilibrium between NS and the VNTR using 60 sib-pairs from 31 pedigrees that had one sibling with at least one allele from the “long” and “short” groups of alleles. The authors found modest evidence for linkage disequilibrium using groups of alleles (p = .01) and, therefore, concluded that the association they reported was attributable to genetic transmission as opposed to the alternative explanation of population stratification. There are a number of concerns about these approaches that we discuss later. Their conclusion ends by stating that the DRD4 VNTR accounts for approximately 10% of the genetic variance of NS.

A number of attempts to replicate these findings have now been reported (Table 1), and we discuss several important issues in detail. Many of these studies did not use the exact methods employed in either of the original publications. In one small replication attempt, Ebstein et al. (1997b) obtained a nonsignificant result using a t-test. However, the investigators then proceeded to calculate a difference in the range of NS scores when the subjects were categorized by the presence or absence of the 7 allele, using a Moses range test (5% of the control group was trimmed). By using this test, they found significant differences for the presence and absence of the 7 allele (p = .001; p = .01). Identical results were obtained when alleles were grouped into 2–5 and 6–8 repeat alleles. The Moses test assumes that the experimental variable will affect some subjects in one direction and other subjects in the opposite direction. It is not clear whether such analysis is entirely appropriate, and it is of interest to contrast the results obtained with this test to the nonsignificant results from the t-test. Next, Ebstein and co-workers reported results for the combination (n = 218) of their new sample (Ebstein et al. 1997b), and their original study (Ebstein et al. 1996), finding modest significance for the presence of a 7 allele (p = .01). However, when the sample was divided according to sex, a nonsignificant finding was present in males (p > .1) and a marginally significant result in females (p = .04). Also, they note in their report that the values for reward dependence and persistence were incorrectly calculated in their original publication (Ebstein et al. 1996). An alternative approach, used by others, employed hierarchical multiple regression incorporating demographics, diagnosis, genotype and interactions involving three DRD4 polymorphisms to study NS (Gelernter et al. 1997).

In an attempt to extend earlier findings, the temperament of 81 2-week old neonates was assessed using the Brazelton Neonatal Behavioral Assessment Scale. This includes 46 items scored on 4- or 9-point cases, which are reduced to seven summary core clusters, six of which were used (orientation, motor organization, range of state and state regulation, autonomic stability, and reflexes; Ebstein et al. 1998). Univariate tests produced only one result that remains significant after correction for seven comparisons for the DRD4 VNTR and orientation (uncorrected p = .00038). It is not clear how orientation in neonates relates to NS in adulthood.

DRD4 in Alcoholism and Drug Abuse

It has been argued that people who score high on NS scales may include those with high impulsivity, and it has been further suggested that substance abusing subjects have higher NS scores than do controls (reviewed in Bardo et al. 1996). Although higher allele frequencies of the 3 and 6 alleles compared to published data were found in a sample of alcoholics, no control group was used in this study (George et al. 1993). Further studies in Finnish (Adamson et al. 1995), Japanese (Muramatsu et al. 1996), U.S. (Parsian et al. 1997), Scandinavian (Geijer et al. 1997), German (Sander et al. 1997), and Taiwanese (Chang et al. 1997) alcoholics have failed to produce results that survive correction for multiple tests. Another group chose to study the transmission of the 7 allele because of the previous reports related to this allele, but found no evidence for excess parental transmission of the 7 allele in a sample of 29 U.S. alcoholics (Parsian et al. 1997). Similar negative results have been obtained in opioid-dependent subjects of Sephardic Jewish and Israeli Arabic background (Kotler et al. 1997) and in Chinese heroin abusers (Li et al. 1997). These studies are summarized in Table 2.

Table 2 Summary of DRD4 VNTR Studies in Alcoholism and Drug Abuse

Possible confounding of the association of DRD4 with NS by alcoholism may stem from the strongest genome-wide linkage for alcoholism in a large Southwestern Native American family at D11S1984. This family produced a lod score of 3.2 (p = .00007) at D11S1984 using sib-pair regression on DSM-IIIR alcoholism (Long et al. 1998). D11S1984 is close to the tip of the short arm of chromosome 11, mapping very near to DRD4. One NS study used subjects selected for risk factors for HIV (Benjamin et al. 1996), and this may increase the confounding effect of alcoholism. A quantitative trait locus for ethanol drinking in mice maps near to the murine Drd4 (Phillips et al. 1994).

DRD4 VNTR in Attention Deficit Hyperactivity Disorder (ADHD)

Children with ADHD are characterized by impulsive behavior, and many of their symptoms are consistent with NS. Speculations about the role of DRD4 in ADHD have been provided elsewhere (Sunohara and Kennedy 1998). A case-control study of 39 children with ADHD and controls matched for sex and ethnicity found a higher frequency of cases with the 7 or 8 repeat alleles when the rare alleles were grouped together with common alleles, as compared to controls (p < .01; LaHoste et al. 1996). When the 7 allele was analyzed separately, the frequency in cases was 22%, as compared to 9% in controls. Probands possessing at least one 7 allele had higher ratings of ADHD symptomatology, as compared to those without the 7 allele. This has recently been followed up with a study using a new sample of 52 probands with parental controls (Swanson et al. 1998) employing the haplotype relative risk method, in which the alleles that the parents do not pass onto their children are used as the controls. The 7 allele was transmitted 30 times from the parents, as compared to 17 nontransmissions (χ2 = 4.65, 1 df, p < .035). Results using linkage disequilibrium from the combined sample of 100 parent-ADHD proband trios reported elsewhere (Swanson et al. 1998), and an unpublished proband and parents sample from Toronto (Sunohara in preparation) finds some evidence for excess transmission of the 7 allele (Z = 2.6 p = .0087) compared to the untransmitted 7 allele. However, a statistic that considers all alleles together (Schaid 1996) does not reach conventional significance (p = .074). Two recent studies of the DRD4 VNTR is ADHD provide weak positive results (Rowe et al. 1998; Smalley et al. 1998); whereas, another is negative (Castellanos et al. 1998).

DISCUSSION

A review of the studies of DRD4 and NS (Table 1), it is unclear at present whether there is a true association. Further carefully designed and executed attempts at replication are necessary. The problems of case-control association studies and their possible confounding by population stratification has been discussed elsewhere (Kidd 1993). The first question is: What to do with alleles or haplotypes that are rarely observed? Probably the safest option is to discard them; however, as discussed above, some have decided to group them together with more common alleles. Erring on the side of caution would lead to discarding such alleles from the analysis. To pool alleles together rationally, demonstration that there are no significant functional differences between them is needed, and failure to do so may result in biased results. Approaches using transmission disequilibrium, whereby the frequency of transmissions of each allele from parents with two distinct alleles to their children is compared to nontransmission of that allele, have been shown to be insensitive to population stratification (Spielman and Ewens 1996). This approach has been used for DRD4 in Tourette's syndrome (Grice et al. 1996; Hebebrand et al. 1997) with contradictory results, as well as for ADHD (Swanson et al. 1998; Smalley et al. 1998). Missing parental DNA may make such an approach unfeasible in certain circumstances, and one alternative utilizes siblings as controls (Curtis, 1997). Subdividing the sample on high and low trait values, and then performing transmission disequilibrium in each group separately is one approach for the assessment of the role of a particular gene in a quantitative trait.

Genotyping other informative polymorphisms of DRD4, in addition to the VNTR, allows for the construction of DRD4 haplotypes. Haplotypes consist of unique combinations of alleles at a number of polymorphisms across a gene. A nonradioactive heteroduplex method has recently been described for genotyping two exon 1 polymorphisms (12 bp duplication and 13 bp deletion), which will allow the construction of haplotypes (Chang and Kidd 1997). The same group (Chang et al. 1997) genotyped six polymorphisms of DRD4 in their study of alcoholism. Some have genotyped additional polymorphisms, but did not attempt to construct haplotypes (Jönsson et al. 1997, 1998; Gelernter et al. 1997). Because of the potential for large numbers of different haplotypes, such a study may require large sample sizes to reach adequate conclusions. For analysis, rare haplotypes can either be discarded, or pooled together into a “mixed bag,” and analysis based only on common haplotypes. Linkage disequilibrium between the various polymorphisms of DRD4 has not been fully reported; however, the Gly11Arg seems to be in linkage disequilibrium with the VNTR (Hebebrand et al. 1997). Another possible cause for apparent association is segregation distortion, which has not been excluded to occur at DRD4.

Statistical Considerations

The issue of performing multiple statistical tests needs to be addressed. For example, in the initial report (Ebstein et al. 1996) four personality traits were compared in both the 7 alleles present/absent group and the 4,4 and 4,7 genotype groups, making a total of eight tests. No correction of multiple tests was performed, because it was stated that there was an a priori hypothesis. However the hypothesis was related to the possible role of the dopaminergic system in NS (Cloninger 1987), but did not point to any specific gene from this system, nor to any particular polymorphism or allele. In 1987, no dopamine receptor genes had been cloned. The potential number of genes that could be involved in personality include all those expressed in the brain: there are an estimated 30,000 mRNAs in the human brain (Sutcliff 1988). Furthermore, the exact role of D4 in the dopamine system is currently unclear, making specific hypotheses very problematic.

If a Bonferroni correction for multiple comparisons had been used for the number of tests presented, as has been suggested (Wickens 1989:13; Altman 1991:210–212), then the critical cut off for significance would drop from p = .05 to .00625. Thus, the result of the association of the presence of the 7 repeat (p = .013) would no longer reach nominal significance, nor would the genotype association (p = .026). After follow-up of this report with an enlarged sample, (Ebstein et al. 1997b) marginal significance was found only in females (p, = .04) which again would not withstand correction for multiple tests. In the “replication” study (Benjamin et al. 1996), 22 statistical tests were performed. Using the conservative Bonferroni method, with a critical level α ≤ 0.05, the critical value for 22 tests is p ≤ .0023, on the basis of which only the over-all NS score (p = .0016) attains significance. The question of whether correction for multiple tests should be performed when a specific relationship has been proposed depends upon how many such relationships have been proposed, and adjustment for the number of tests, used by some (Jönsson et al. 1997), would seem prudent. Furthermore, some analyses have compared individual allele and genotype frequencies, resulting in at least 15 separate tests being performed, without any correction for multiple tests.

Elsewhere, it has been convincingly argued that if there are five vulnerability genes for a brain disorder, with a potential total of 20,000 candidate genes, then the probability that any one candidate will be truly pathogenic is 5:20,000 (Crowe 1993). Using p < .05 as a cut off will produce 995 false positives, a false positive rate of 99.5%, which is clearly unacceptable. Morton (1955) proposed a lod score of 3 as the criteria for genome wide linkage studies of single gene disorders. For complex traits, suggested thresholds for significant linkage require lod score ≥ 3.3 (p ≤ 4.9 × 10−5; Lander and Kruglyak 1995), and more recently, proposing genome-wide association studies for complex diseases, the criteria for significance was set at p < 5 × 10−8 (Risch and Merikangas 1996). However, such levels of significance are not appropriate for current studies. Finally, when interactions between polymorphisms of different candidate genes are investigated (e.g., Ebstein et al. 1997c, 1998) the number of potential tests rises exponentially, and positive results should be interpreted with great caution.

Two-sided statistical tests have generally been used for these NS studies (Ebstein, personal communication; Jönsson et al. 1997). However, some (Li et al. 1997) used a one-sided test in one of their analyses, which would have been borderline significant if a two-sided test had been used (p = .046). Statistical opinion would normally err on the side of caution, recommending the use of two-sided tests routinely (Altman 1991: 171). Other concerns include the nonindependence of TDT analysis when performed repeatedly in multiple siblings from the same family, as highlighted by a study of DRD4 in Tourette's syndrome (Grice et al. 1996). This has been addressed and alternatives suggested (Martin et al. 1997).

It is pertinent to question why the 4 and, particularly, the 7 alleles have been singled out in some of the association studies; whereas, in some cases, other alleles have been grouped together with either the 4 or 7 alleles. According to the data reviewed above, there seems to be little biological rationale for such an approach. Differences of the modulation of cAMP have only been observed on three alleles common in Caucasians (2, 4, and 7). This approach of grouping alleles has been applied differently between studies. Most studies of the Caucasian population have grouped 2–5 and 6–8 repeat alleles together (e.g., Ebstein et al. 1996), but the 7 allele is very rare in Asian populations (Chang et al. 1996). As a result, investigators studying Asian samples have split the alleles differently, using the 4 and 5 repeat alleles as the dividing line (Muramatsu et al. 1996; Li et al. 1997; Ono et al. 1997). Furthermore, the practice of splitting into short and long alleles based on VNTR length is misleading, because the actual number of alleles based on sequence differences is at least 25 (Lichter et al. 1993). The study of the length of alleles alone may conceal potential unique properties attributable to differences in amino acid sequences resulting from DNA sequence variation of alleles with the same length. Pooling of alleles has been performed in some studies of other diseases, most notably insulin-dependent diabetes with the VNTR of the insulin gene, where there are over 50 alleles. However, a detailed transmission disequilibrium study of this polymorphism of the insulin gene (Bennett et al. 1995) used individual allele lengths and found that the transmission of alleles with neighboring allele lengths either increased or decreased risk for disease.

The value of replication of such studies cannot be underestimated. In studies of association for complex traits, a true finding would be expected to be replicable in a range of ethnic groups, given sufficient statistical power. Such consistency has been demonstrated for the association of the insulin gene VNTR and insulin-dependent diabetes mellitus (Bennett et al. 1996). Thus, to define the effect of a particular D4 haplotype on a such phenotype such as NS clearly, a common DNA sequence across ethnic groups must be uncovered, and the use of VNTR repeat length alone is unlikely to be rewarding.

It is of note that in one of the original association studies (Benjamin et al. 1996), each individual was treated as an independent datum, although most were genetically related to another sibling in the study. Attempts were made to correct for statistical dependence of related subjects in this study (Benjamin et al. 1996), although this approach (George and Elston 1987) cannot distinguish between the measured “familiality” that has a genetic origin from common environmental causes. Furthermore, in their use of linkage disequilibrium, it was not made clear why only a subset of the families was selected; the reason is presumably that parental DNA was only available for seven parents from the total sample of 131 families. Therefore, it was not possible to construct parental genotypes for the majority of sibling pairs, leaving only a subgroup where the parental genotypes could be constructed with some certainty. It should be noted that when missing parents are discarded, an elevated type I error has been described (Curtis and Sham 1995). Moreover, the selection of sibling pairs for the linkage disequilibrium study was based on the “long” and “short” genotypes of the siblings (Benjamin et al. 1996), and this may result in bias. Also, it would seem to be more appropriate to have designed the study to include ascertainment of parental DNA and then to use the whole sample with actual allele lengths rather than grouping the alleles. With these concerns, the authors' claim that their modest evidence for linkage disequilibrium (p = .01) was based on genetic transmission rather than population stratification is not supported.

Overall, when viewing all 11 studies that have analyzed relationships between NS and the VNTR, results of the predominant group (a total of 11), after correction for multiple tests, are negative (Ebstein et al. 1996; Malhotra et al. 1996; Jönsson et al. 1997; Ebstein et al. 1997b; Vandenbergh et al. 1997; Ono et al. 1997; Sander et al. 1997; Sullivan et al. 1998; Jönsson et al. 1998). Only one study finds weak evidence for association of higher NS with the presence of the 7 allele (Benjamin et al. 1996). The findings are similar for studies of DRD4 with alcoholism and drug abuse (Table 2);whereas, there seems to be a weak association of the 7-repeat length allele with ADHD (Table 3).

Table 3 Summary of DRD4 VNTR Studies in ADHD

Possible reasons for nonreplication of reported associations include: (1) the presence of another nearby polymorphism that is in linkage disequilibrium with the studied polymorphism in one population, but not in another; (2) the possibility of population stratification, whereby population admixture results in a positive finding, caused either by a mixture of ethnic groups or by cohort effects; (3) population specific effects–a certain polymorphism is etiologically important only in a specific population; (4) a small effect that some studies may not have sufficient power to detect; and (5) inconsistent measure of NS.

Such contradictory results are very familiar in studies of many complex traits, with nonreplication of previously reported associations common, and the correct interpretation of such discordant results is unclear. The most parsimonious interpretation of the NS studies with the VNTR is a single false- positive result (Benjamin et al. 1996), with no replication, strongly suggesting no true association. A number of explanations for nonreplication have been proposed (Ebstein and Belmaker 1997a), including that the phenotypic variance accounted for by the VNTR is likely to be small and may be confounded by noise from demographic or methodological differences among studies. Ebstein and Belmaker also suggest using young mixed heterosexual groups, noting that one group studied males only (Malhotra et al. 1996) and mention that in their expanded study (Ebstein et al. 1997b), the effect is seen in females only. This contrasts with one of the original studies that reported an effect in a sample consisting nearly exclusively of males (Benjamin et al. 1996). It would certainly be of interest if distinct genetic pathways were involved in personality for males and females. More parsimonious would be the assumption that the dissonance of results is most likely to be caused by type I errors attributable to multiple comparisons. An interesting potential complexity might relate to the proximity of DRD4 on the short arm of chromosome 11 to two genes, H19 and IGF2, which are imprinted with expression dependent on the parental origin of each gene. However, the study of DRD4 cDNA from the tempolateral cortex of one heterozygous individual, found no evidence for imprinting of DRD4 in the normal adult brain (Cichon et al. 1996), or in brain tumor tissue (Nöthen et al. 1994). Both maternal and paternal alleles were expressed in equal amounts, but as pointed out by the authors, this does not exclude potential imprinting of DRD4 during development or in other brain regions.

In summary, evidence supporting a role of the VNTR of DRD4 in NS is far from conclusive, there being a number of statistical and methodological concerns about many of the studies, and these have also recently been echoed elsewhere (Baron 1998). Functional differences between VNTR variants of D4 seem to be small, and extrapolating such variation to provide a strong rationale for candidate gene hypothesis requires caution. In the future, characterizing the exact molecular genetic basis of complex human behaviors will be likely to uncover other genes with similar contradictory and unconvincing evidence as DRD4. We have made a number of suggestions to improve the reporting and interpretation of such studies, including the use of correction for multiple statistical comparisons, the use of linkage disequilibrium studies, and performing haplotype analysis.