The serotonin transporter (5-HTT) encoded by SLC6A4 is a key molecule that regulates serotonergic neurotransmission at the synaptic cleft, affecting emotions and stress responses. Therefore, numerous studies have focused on elucidating the pathophysiological implications of psychiatric disorders. Animal studies using constitutive Slc6a4 knockout mice showed increased anxiety-like phenotypes in various behavioral tests1,2,3. Furthermore, human studies indicate that dysregulation of 5-HTT in the serotonergic system is implicated in the emotional and behavioral disturbances of psychiatric disorders, including anxiety, depression, bipolar disorders (BD), schizophrenia (SZ), and autism4.

SLC6A4 has a functional polymorphism (serotonin transporter-linked polymorphic region, 5-HTTLPR) in its promoter region. 5-HTTLPR consists of two major alleles: the short (S) and the long (L), each of which has 14 or 16 repeat units5, composed of 20–23 bp of highly homologous sequence units6, with the S allele exhibiting weaker transcriptional activity than the L allele7,8,9. On the basis of the dichotomous classification, myriad case-control association studies have been performed. Most of the studies and several meta-analyses claimed that the S allele confers sensitivity to environmental stress, which in turn increases vulnerability to anxiety and depressive symptoms10,11,12, although it remains controversial13,14, and the recent largest meta-analysis and case-control studies failed to replicate this finding15,16.

In addition to the major S and L alleles, 5-HTTLPR has a number of rare variants. To date, extrashort (XS) repeats (11–13 repeats)17,18,19, 15 repeats5,19, and extralong (XL) repeats (17–24 repeats)5,18,19,20,21,22,23,24,25,26 have been reported. These rare variants showed a higher frequency in Asian and African populations than in European and Native American populations19,27. However, despite the extensive genetic studies on 5-HTTLPR so far, the extent of variations and allele frequencies (AFs) have not been addressed in detail in a large population.

Here, we report the patterns and AFs of 5-HTTLPR rare variants in the Japanese population in detail. Through the genotyping of two cohorts, we examined 5-HTTLPR in a total of 2894 Japanese subjects. The first cohort (the case-control study set, CCSS) consisted of 1366 subjects, including 485 controls and 881 patients with major psychosis (BD and SZ). The second cohort (the Arao cohort study set, ACSS) consisted of 1528 subjects who are community-dwelling elderly individuals. In total, we identified 11 novel 5-HTTLPR alleles. One of the novel alleles had the longest subunit ever reported, consisting of 28 tandem repeats. We named this extremely long allele XL28-A. Interestingly, a luciferase reporter assay confirmed that XL28-A has no transcriptional activity. XL28-A was found in two unrelated patients with BD in the CCSS and one subject in the ACSS, who did not show depressive symptoms or a decline in cognitive functions. Therefore, it is unlikely that XL28-A is associated with psychiatric disorders, despite its apparent functional deficit. Our results suggest that unraveling and interpreting the complex genetic variations of 5-HTTLPR will be important for further understanding its role in psychiatric disorders.

Materials and methods

Study subjects

We used two independent Japanese DNA sample sets to examine the genetic variations of 5-HTTLPR. The first sample set (the CCSS) consisted of 1366 subjects, including 485 controls (CTs) and 881 patients with major psychosis (450 BD and 431 SZ). The details of the CCSS were previously reported28. All patients were diagnosed according to the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) criteria by experienced psychiatrists. CTs were collected based on voluntary recruitments from employees, students, and their friends and were interviewed by senior psychiatrists. All CTs were confirmed to have met the following criteria: (i) no current or past Axis-I psychiatric or physical diagnoses and (ii) no first-degree relatives with SZ or BD. For both patients and CTs, we excluded subjects with a history of current and past neurological illnesses, traumatic brain injuries, electroconvulsive therapy, and substance abuse.

The second sample set (the ACSS) consisted of 1528 elderly subjects. The ACSS addresses community-dwelling Japanese individuals aged over 65 who live in Arao city in Kumamoto prefecture in Japan. This is a part of the Japan Prospective Studies Collaboration for Aging and Dementia. Among the endpoints available, we utilized the Mini-Mental State Examination (MMSE)29 and Geriatric Depression Scale (GDS)30 for this study. These data are collected for the purpose of examining the baseline dementia or depression level at the start of the project.

The ethics committees of Kumamoto University and other collaborative research organizations approved this study. All subjects received a detailed description of this study and provided written informed consent.

DNA extraction and genotyping

Extraction of genomic DNA from peripheral blood cells (PBCs) and genotyping in the CCSS were previously reported28 and are briefly summarized in the Supplementary Methods. In the ACSS, genomic DNA was extracted from PBCs with a QIAamp DNA Blood Mini QIAcube Kit (Qiagen, Hilden, Germany) using a QIAcube (Qiagen). 5-HTTLPR was amplified with the following primers: FWD 5′-CTTTGCGTTTTCTGTTGCCC−3′ and REV 5′-GGAGGCCAGGAACGATAGGA−3′. PCR amplification was performed in a total volume of 20 µL solutions containing the following compositions: 10 µL of 2× PCR buffer for KOD FX (TOYOBO, Osaka, Japan), 4 µL of 12 mM dNTP mix, 0.6 µL each of 10 µM primers, 2 U of KOD FX (TOYOBO) and 10 ng of genomic DNA. The thermocycling conditions were as follows: an initial cycle of 2 min at 94 °C, followed by 30 cycles of 10 s at 98 °C and 1 min at 68 °C. All PCR amplicons were analyzed by MultiNA (SHIMAZU, Kyoto, Japan) and by bidirectional Sanger sequencing with 5′-GGCGTTGCCGCTCTGAATGC−3′ or 5′-CAGGGCGGGGACCGCAAGGT−3′. In some amplicons, we additionally performed TA cloning using a TOPO cloning kit (Invitrogen, Carlsbad, CA) followed by Sanger sequencing analysis of individual colonies.

Plasmid construction

The target 5-HTTLPR variant was amplified with the following primers: FWD 5′-AAgagctcGGTGAAATTCCCAAGCTTGTTG-3′ with the SacI site (lowercase letters) and REV 5′AActcgagTTCTGGTGCCACCTAGACGC-3′ with the XhoI site (lowercase letters). PCR amplification was performed in a total volume of 25 µL solutions containing the following compositions: 2.5 µL of 10 × PCR buffer for KOD-Plus-Neo (TOYOBO), 1.5 µL of 25 mM MgSO4 (Promega, Madison, WI, USA), 0.5 µL of 10 mM dNTP mix (Invitrogen), 2 µL each of 10 µM primers, 0.5 U of KOD-Plus-Neo (TOYOBO) and 50 ng of genomic DNA. The thermocycling conditions were as follows: (1) an initial cycle of 2 min at 94 °C, (2) 33 cycles of 10 s at 98 °C and 30 s at 72 °C. The PCR fragments were gel-purified with a MinElute Gel Extraction Kit (Promega) followed by 3′ A-attachment with TaKaRa Ex Taq (Takara, Tokyo, Japan) and cloned into the PCR 2.1 vector using the TOPO cloning kit. The SacI and XhoI fragments were subcloned into the pGL4.10 firefly luciferase reporter vectors (Promega). The single bacterial colony was cultured in a large volume, and the plasmid vector was purified with an Endotoxin-free plasmid DNA purification kit (NucleoBond Xtra EF purification system, Macherey-Nagel GmbH, Düren, Germany), followed by ethanol precipitation. All constructs were bidirectionally sequenced to check for proper orientation, sequence specificity and the absence of artificial mutations.

Luciferase reporter assay

Undifferentiated rat raphe-derived RN46 cell lines (Sigma-Aldrich, St. Louis, MO, USA) grown under standard conditions were cotransfected with 2 µg of each luciferase reporter construct, 200 ng of the pGL4.73 Renilla luciferase reporter vector (Promega) as an internal control, and 200 ng of pCMV-EGFP vector as a positive control using an electroporator (NEPA21, NEPA GENE, Tokyo, Japan). Twenty-four hours after transfection, cells were harvested and lysed, and luciferase activity was measured using a Dual Luciferase Assay System (Promega) and a GloMax™ 96 Microplate Luminometer (Promega) following the manufacturer’s protocol. Each construct was transfected three times in each of three assays for a total of 9 independent transfections. Firefly/Renilla ratios were calculated in each transfection and normalized to the mean ratio of blank control vectors, pGL4.10 (Promega).


Identification of novel variants of 5-HTTLPR in the CCSS

We previously performed genotyping of 5-HTTLPR in 1366 subjects of the CCSS and reported the epigenetic role of major alleles, including Asian-specific L16-C28. In this study, we examined rare alleles (AF < 5%) and found three known S variants (S14-B, S14-D and S14-E), one known L variant (L16-B), and three known XL alleles (XL19-A, XL20-A, and XL22-A). In addition, we identified seven novel alleles (S14-H, S15-D, S15-E, L16-I, L16-L, XL22-C, and XL28-A) (Fig. 1A, B). Representative alleles were visualized by electrophoresis on agarose gel (Fig. 1C).

Fig. 1: Nucleotide sequences of repeat units and the genetic architecture of 5-HTTLPR alleles.
figure 1

A Names, genetic architecture, and allele frequencies of 5-HTTLPR alleles identified in this study. The allele names refer to the comprehensive summary of 5-HTTLPR alleles6, and individual repeat units are identified by the Greek-letter nomenclature introduced by Nakamura et al5. Novel alleles are underlined. The total number of alleles and their allele frequencies (%) are given. CCSS: case-control study set, ACSS: Arao cohort study set. B Structure of repeat units. Each repeat unit is assigned a Greek letter following the nomenclature of Nakamura et al. The nucleotide sequence of ζ is shown in bold. Nucleotide substitutions are highlighted in light gray, and insertions are in dark gray compared to ζ. Dark boxes in novel repeat units indicate nucleotide substitutions or insertions compared to the original repeat units. C Representative 5-HTTLPR genotypes in agarose gel electrophoresis. The lengths of the PCR amplicons are as follows: S14-A, 457 bp; S15-E, 479 bp; L16-A, 499 bp; L16-D, 499 bp; XL19-A, 567 bp; XL20-A, 583 bp; XL22-A, 625 bp; and XL28-A, 751 bp.

Structure of the novel 5-HTTLPR alleles in the CCSS

Among the novel alleles, S14-H, L16-I, L16-L, and S15-D were likely to be generated from base substitutions in the known repeat units. S14-H has a T/C substitution at the 12th base position in ι of the common S14-A allele. L16-I has a C/T substitution at the 7th base position in ο of the common L16-A allele. L16-L has a C/T substitution at the 22nd base position in ξ of L16-A. On the other hand, S15-D was generated by single-base cytosine insertion at the end of ρ of S15-A6. Each of these mutated repeat units in S14-H, L16-I, L16-L, and S15-D was described as ι’, ο’, ξ’, and ρ’, respectively (Fig. 1B).

The other three novel alleles, S15-E, XL22-C, and XL28-A, were likely to be generated from novel arrangements of the known repeat units. S15-E contains a tandem duplication of a single η repeat compared with S14-A. XL22-C has a replacement of ξ with θ at the place of the 8th repeat unit in XL22-A. XL28-A has tandem duplications of eight ζ-η units compared with the one ζ-η unit allele S14-A.

Allele frequencies of the rare variants in psychosis in the CCSS

AFs with regard to diagnostic groups are listed in Supplementary Table 1. Using Fischer’s exact test, none of the rare variants (AF < 5%), including novel alleles, showed significant deviation in patients compared to controls (p ≥ 0.05). Nonetheless, among the seven novel alleles, XL28-A was found in two unrelated patients with BD and not in controls. Considering that XL28-A is the longest allele so far identified, we performed a functional assay of this allele.

Functional characterization of XL28-A using a luciferase reporter assay

We examined the promoter activities of XL28-A and four known alleles (S14-A, L16-C, XL20-A, and XL22-A) using a luciferase reporter assay in the rat raphe-derived RN46 cell line. Each allele has a different number of ζ-η tandem repeats from one to eight (Fig. 1A, Table 1). We observed that the promoter activity was maintained as the number of tandem repeats was up to 5, with a slight decrease in longer repeats. However, we found that the promoter activity of XL28-A with 8 tandem repeats was at the same level as a blank control vector, indicating that XL28-A has no apparent promoter activity (Table 1).

Table 1 Promoter activities of XL28-A and other alleles with a different number of ζ-η.

Validation of XL28-A in the general population

We then extensively screened for the presence of XL28-A and other rare alleles using the ACSS (N = 1528). Through genotyping in the ACSS, we found six novel alleles (S14-G, L16-I, L16-J, L16-K, XL20-F and XL28-A). Among them, two alleles, L16-I and XL28-A, were already found in the CCSS, and the other four were newly identified in this cohort (Fig. 1A). By combining the subjects of the CCSS and the ACSS, none of the rare alleles showed significant deviation in patients compared to controls by Fisher’s exact test (p ≥ 0.05).

Structure of the novel 5-HTTLPR alleles (S14-G, L16-J, L16-K, and XL20-F) in the ACSS

S14-G has a C/T substitution at the 19th base position in ε of S14-A. L16-J has two consecutive C/T substitutions at the 8th and 9th base positions in ο of L16-A. These mutated repeat units were described as ε’ and ο” (Fig. 1B). L16-K has a replacement of φ with ο or η at the place of the 7th repeat unit in L16-A or L16-C. XL20-F has an insertion of a θ-ι unit between the 7th and 8th repeat units in XL18-A6.

Relationship between the novel 5-HTTLPR alleles and either MMSE or GDS in elderly subjects

Among the endpoints available in the ACSS, we utilized MMSE and GDS. A subject harboring the XL28-A showed that both scores were at normal levels and within the normal variation in the ACSS population (Fig. 2). In addition, we observed that the other five novel alleles also did not show the possibility of major depressive disorder or dementia.

Fig. 2: Assessment of novel 5-HTTLPR alleles on the MMSE and GDS in elderly subjects.
figure 2

Scores of the MMSE and GDS are plotted. Colors indicate the total number of subjects. Arrows indicate the scores of subjects with novel alleles. The dotted line indicates the general thresholds of the MMSE (< 23) and GDS (> 8). Scores above (MMSE) and below (GDS) the thresholds indicate that the subject has no decline in cognitive function or no tendency toward depression. All indicated subjects had a novel allele that was heterozygous with S14-A.


In this study, we identified a total of 11 novel rare alleles and obtained the AFs of known rare alleles of 5-HTTLPR in two independent Japanese cohorts. Among the reported rare alleles, we did not observe 11, 17, 21, and 24 repeat alleles, which were found in other ethnic populations18,24,25. We also did not observe 13 and 18 repeat alleles, which were reported in Japanese subjects19,31,32. Considering the sample sizes, our study achieved the most accurate and comprehensive estimation of the rare alleles in the Japanese population.

Genetic sequence and architecture of novel allelic variants

Among the 11 novel alleles, 8 were estimated to originate from either L16-A or S14-A. In addition, variable duplication of ζ-η subunits was involved in the other three alleles. XL28-A has eight ζ-η unit repeats inside the genomic rearrangement region, while S14-A has only one ζ-η unit. The exact duplications of the ζ-η unit have been identified in L16-C, XL18-A, XL20-A, XL20-E, and XL22-A. It would be reasonable to expect the presence of other intermediate numbers of repeat units with tandem duplications of ζ-η, such as 24 and 26.

XL28-A lost its promoter activity

Our promoter assay revealed that the extremely long XL28-A, which harbors 8 ζ-η tandem repeats, had no promoter activity, while other alleles with a smaller number of repeats maintained their activities (Table 1). A previous study demonstrated that lymphoblastoid cell lines (LCLs) derived from American-African females homozygous or heterozygous with XL alleles, which are 81 bp longer than the L allele and presumably XL with 20 repeats, represent higher expression levels compared to those with SS or LL33. This result raised the hypothesis that an increased length of the 5-HTTLPR allele is associated with increased promoter activity. Our contrary result would be explained by the fact that the expression level in the previous study was affected by other regulatory polymorphisms, such as StIn234 and rs25531 SNP35, in SLC6A4. Another possibility would be that we only examined ζ-η tandem repeats, and the XL alleles consisting of other units may exhibit a length-dependent increase in promoter activity.

Using the transcription factor binding profile database JASPAR (, we found that the ζ-η tandem repeat has presumptive binding sites of the TFAP2 family, consisting of five members: TFAP2α, TFAP2β, TFAP2γ, TFAP2δ, and TFAP2ε. It is known that TFAP2 proteins have essential roles in neuronal development36,37. In humans and mice, TFAP2β exists in the cerebellum, midbrain, medulla, and pons throughout adulthood38 and represses the promoter activity of SLC6A4 via the TFAP2 binding site at the rs25531 SNP of unit μ in L16-D39. Given that the presumptive TFAP2 binding site we found in this study is located over the ζ and η units, the accumulation of ζ-η tandem repeats might lead to the repression of the promoter activity of SLC6A4.

Pathophysiological role of the XL alleles

In the CCSS, we found no evidence of an association of the AF of XL alleles with BD or SZ. Contrary to our initial expectation, our extensive genotyping in the ACSS showed that the AF of XL28-A was not associated with psychiatric disorders. In addition, a subject with XL28-A did not show an apparent decline in cognitive function, nor was there a signature of depression in the elderly subjects. The absence of promoter activity of 5-HTTLPR, therefore, has no apparent effect on phenotypes related to psychiatric disorders. This would be explained by complementation by other cis-regulatory elements, such as StIn234 and rs25531 SNP35, in SLC6A4 and/or by another 5-HTTLPR allele. Therefore, it would be interesting to pursue the functionality by generating cells with homozygous XL28-A.

Our results suggest that the identification and interpretation of numerous 5-HTTLPR genetic variations will be important for understanding the role of 5-HTTLPR in psychiatric disorders. We have previously reported that psychiatric patients harboring low-activity alleles showed altered DNA methylation at specific CpG sites in SLC6A4 and altered amygdala volume28. Integrative analysis of DNA methylation and 5-HTTLPR as well as brain structure will be worth studying in the future.