Identification and functional characterization of the extremely long allele of the serotonin transporter-linked polymorphic region

SLC6A4, which encodes the serotonin transporter, has a functional polymorphism called the serotonin transporter-linked polymorphic region (5-HTTLPR). The 5-HTTLPR consists of short (S) and long (L) alleles, each of which has 14 or 16 tandem repeats. In addition, the extralong (XL) and other rare alleles have been reported in 5-HTTLPR. Although they are more frequent in Asian and African than in other populations, the extent of variations and allele frequencies (AFs) were not addressed in a large population. Here, we report the AFs of the rare alleles in a large number of Japanese subjects (N = 2894) consisting of two cohorts. The first cohort (case-control study set, CCSS) consisted of 1366 subjects, including 485 controls and 881 patients with psychosis (bipolar disorder or schizophrenia). The second cohort (the Arao cohort study set, ACSS) consisted of 1528 elderly subjects. During genotyping, we identified 11 novel 5-HTTLPR alleles, including 3 XL alleles. One novel allele had the longest subunit ever reported, consisting of 28 tandem repeats. We named this XL28-A. An in vitro luciferase assay revealed that XL28-A has no transcriptional activity. XL28-A was found in two unrelated patients with bipolar disorder in the CCSS and one healthy subject in the ACSS who did not show depressive symptoms or a decline in cognitive function. Therefore, it is unlikely that XL28-A is associated with psychiatric disorders, despite its apparent functional deficit. Our results suggest that unraveling the complex genetic variations of 5-HTTLPR will be important for further understanding its role in psychiatric disorders.


Introduction
The serotonin transporter (5-HTT) encoded by SLC6A4 is a key molecule that regulates serotonergic neurotransmission at the synaptic cleft, affecting emotions and stress responses. Therefore, numerous studies have focused on elucidating the pathophysiological implications of psychiatric disorders. Animal studies using constitutive Slc6a4 knockout mice showed increased anxiety-like phenotypes in various behavioral tests [1][2][3] . Furthermore, human studies indicate that dysregulation of 5-HTT in the serotonergic system is implicated in the emotional and behavioral disturbances of psychiatric disorders, including anxiety, depression, bipolar disorders (BD), schizophrenia (SZ), and autism 4 .
SLC6A4 has a functional polymorphism (serotonin transporter-linked polymorphic region, 5-HTTLPR) in its promoter region. 5-HTTLPR consists of two major alleles: the short (S) and the long (L), each of which has 14 or 16 repeat units 5 , composed of 20-23 bp of highly homologous sequence units 6 , with the S allele exhibiting weaker transcriptional activity than the L allele [7][8][9] . On the basis of the dichotomous classification, myriad case-control association studies have been performed. Most of the studies and several meta-analyses claimed that the S allele confers sensitivity to environmental stress, which in turn increases vulnerability to anxiety and depressive symptoms [10][11][12] , although it remains controversial 13,14 , and the recent largest meta-analysis and case-control studies failed to replicate this finding 15,16 .
In addition to the major S and L alleles, 5-HTTLPR has a number of rare variants. To date, extrashort (XS) repeats (11-13 repeats) 17-19 , 15 repeats 5,19 , and extralong (XL) repeats (17-24 repeats) 5,18-26 have been reported. These rare variants showed a higher frequency in Asian and African populations than in European and Native American populations 19,27 . However, despite the extensive genetic studies on 5-HTTLPR so far, the extent of variations and allele frequencies (AFs) have not been addressed in detail in a large population.
Here, we report the patterns and AFs of 5-HTTLPR rare variants in the Japanese population in detail. Through the genotyping of two cohorts, we examined 5-HTTLPR in a total of 2894 Japanese subjects. The first cohort (the casecontrol study set, CCSS) consisted of 1366 subjects, including 485 controls and 881 patients with major psychosis (BD and SZ). The second cohort (the Arao cohort study set, ACSS) consisted of 1528 subjects who are community-dwelling elderly individuals. In total, we identified 11 novel 5-HTTLPR alleles. One of the novel alleles had the longest subunit ever reported, consisting of 28 tandem repeats. We named this extremely long allele XL 28-A. Interestingly, a luciferase reporter assay confirmed that XL 28-A has no transcriptional activity. XL 28-A was found in two unrelated patients with BD in the CCSS and one subject in the ACSS, who did not show depressive symptoms or a decline in cognitive functions. Therefore, it is unlikely that XL 28-A is associated with psychiatric disorders, despite its apparent functional deficit. Our results suggest that unraveling and interpreting the complex genetic variations of 5-HTTLPR will be important for further understanding its role in psychiatric disorders.

Study subjects
We used two independent Japanese DNA sample sets to examine the genetic variations of 5-HTTLPR. The first sample set (the CCSS) consisted of 1366 subjects, including 485 controls (CTs) and 881 patients with major psychosis (450 BD and 431 SZ). The details of the CCSS were previously reported 28 . All patients were diagnosed according to the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) criteria by experienced psychiatrists. CTs were collected based on voluntary recruitments from employees, students, and their friends and were interviewed by senior psychiatrists. All CTs were confirmed to have met the following criteria: (i) no current or past Axis-I psychiatric or physical diagnoses and (ii) no first-degree relatives with SZ or BD. For both patients and CTs, we excluded subjects with a history of current and past neurological illnesses, traumatic brain injuries, electroconvulsive therapy, and substance abuse.
The second sample set (the ACSS) consisted of 1528 elderly subjects. The ACSS addresses communitydwelling Japanese individuals aged over 65 who live in Arao city in Kumamoto prefecture in Japan. This is a part of the Japan Prospective Studies Collaboration for Aging and Dementia. Among the endpoints available, we utilized the Mini-Mental State Examination (MMSE) 29 and Geriatric Depression Scale (GDS) 30 for this study. These data are collected for the purpose of examining the baseline dementia or depression level at the start of the project.
The ethics committees of Kumamoto University and other collaborative research organizations approved this study. All subjects received a detailed description of this study and provided written informed consent.

DNA extraction and genotyping
Extraction of genomic DNA from peripheral blood cells (PBCs) and genotyping in the CCSS were previously reported 28 and are briefly summarized in the Supplementary Methods. In the ACSS, genomic DNA was extracted from PBCs with a QIAamp DNA Blood Mini QIAcube Kit (Qiagen, Hilden, Germany) using a QIAcube (Qiagen). 5-HTTLPR was amplified with the following primers: FWD 5′-CTTTGCGTTTTCTGTTGCCC−3′ and REV 5′-GGAGGCCAGGAACGATAGGA−3′. PCR amplification was performed in a total volume of 20 µL solutions containing the following compositions: 10 µL of 2× PCR buffer for KOD FX (TOYOBO, Osaka, Japan), 4 µL of 12 mM dNTP mix, 0.6 µL each of 10 µM primers, 2 U of KOD FX (TOYOBO) and 10 ng of genomic DNA. The thermocycling conditions were as follows: an initial cycle of 2 min at 94°C, followed by 30 cycles of 10 s at 98°C and 1 min at 68°C. All PCR amplicons were analyzed by MultiNA (SHIMAZU, Kyoto, Japan) and by bidirectional Sanger sequencing with 5′-GGCGTTGCCG CTCTGAATGC−3′ or 5′-CAGGGCGGGGACCGCA AGGT−3′. In some amplicons, we additionally performed TA cloning using a TOPO cloning kit (Invitrogen, Carlsbad, CA) followed by Sanger sequencing analysis of individual colonies.

Plasmid construction
The target 5-HTTLPR variant was amplified with the following primers: FWD 5′-AAgagctcGGTGAAATTCC CAAGCTTGTTG-3′ with the SacI site (lowercase letters) and REV 5′AActcgagTTCTGGTGCCACCTAGACGC-3′ with the XhoI site (lowercase letters). PCR amplification was performed in a total volume of 25 µL solutions containing the following compositions: 2.5 µL of 10 × PCR buffer for KOD-Plus-Neo (TOYOBO), 1.5 µL of 25 mM MgSO 4 (Promega, Madison, WI, USA), 0.5 µL of 10 mM dNTP mix (Invitrogen), 2 µL each of 10 µM primers, 0.5 U of KOD-Plus-Neo (TOYOBO) and 50 ng of genomic DNA. The thermocycling conditions were as follows: (1) an initial cycle of 2 min at 94°C, (2) 33 cycles of 10 s at 98°C and 30 s at 72°C. The PCR fragments were gelpurified with a MinElute Gel Extraction Kit (Promega) followed by 3′ A-attachment with TaKaRa Ex Taq (Takara, Tokyo, Japan) and cloned into the PCR 2.1 vector using the TOPO cloning kit. The SacI and XhoI fragments were subcloned into the pGL4.10 firefly luciferase reporter vectors (Promega). The single bacterial colony was cultured in a large volume, and the plasmid vector was purified with an Endotoxin-free plasmid DNA purification kit (NucleoBond Xtra EF purification system, Macherey-Nagel GmbH, Düren, Germany), followed by ethanol precipitation. All constructs were bidirectionally sequenced to check for proper orientation, sequence specificity and the absence of artificial mutations.

Luciferase reporter assay
Undifferentiated rat raphe-derived RN46 cell lines (Sigma-Aldrich, St. Louis, MO, USA) grown under standard conditions were cotransfected with 2 µg of each luciferase reporter construct, 200 ng of the pGL4.73 Renilla luciferase reporter vector (Promega) as an internal control, and 200 ng of pCMV-EGFP vector as a positive control using an electroporator (NEPA21, NEPA GENE, Tokyo, Japan). Twenty-four hours after transfection, cells were harvested and lysed, and luciferase activity was measured using a Dual Luciferase Assay System (Promega) and a GloMax™ 96 Microplate Luminometer (Promega) following the manufacturer's protocol. Each construct was transfected three times in each of three assays for a total of 9 independent transfections. Firefly/Renilla ratios were calculated in each transfection and normalized to the mean ratio of blank control vectors, pGL4.10 (Promega).

Identification of novel variants of 5-HTTLPR in the CCSS
We previously performed genotyping of 5-HTTLPR in 1366 subjects of the CCSS and reported the epigenetic role of major alleles, including Asian-specific L 16-C 28 . In this study, we examined rare alleles (AF < 5%) and found three known S variants (S 14-B , S 14-D and S 14-E ), one known L variant (L 16-B ), and three known XL alleles (XL 19-A , XL 20-A , and XL 22-A ). In addition, we identified seven novel alleles (S 14-H , S 15-D , S 15-E , L 16-I , L 16-L , XL 22-C , and XL 28-A ) (Fig. 1A, B). Representative alleles were visualized by electrophoresis on agarose gel (Fig. 1C).

Structure of the novel 5-HTTLPR alleles in the CCSS
Among the novel alleles, S 14-H , L 16-I , L 16-L , and S 15-D were likely to be generated from base substitutions in the known repeat units. S 14-H has a T/C substitution at the 12th base position in ι of the common S 14-A allele. L 16-I has a C/T substitution at the 7th base position in ο of the common L 16-A allele. L 16-L has a C/T substitution at the 22nd base position in ξ of L 16-A . On the other hand, S 15-D was generated by single-base cytosine insertion at the end of ρ of S 15-A 6 . Each of these mutated repeat units in S 14-H , L 16-I , L 16-L , and S 15-D was described as ι', ο', ξ', and ρ', respectively (Fig. 1B).
The other three novel alleles, S 15-E, XL 22-C , and XL 28-A , were likely to be generated from novel arrangements of the known repeat units. S 15-E contains a tandem duplication of a single η repeat compared with S 14-A . XL 22-C has a replacement of ξ with θ at the place of the 8th repeat unit in XL 22-A . XL 28-A has tandem duplications of eight ζ-η units compared with the one ζ-η unit allele S 14-A .

Allele frequencies of the rare variants in psychosis in the CCSS
AFs with regard to diagnostic groups are listed in Supplementary Table 1. Using Fischer's exact test, none of the rare variants (AF < 5%), including novel alleles, showed significant deviation in patients compared to controls (p ≥ 0.05). Nonetheless, among the seven novel alleles, XL 28-A was found in two unrelated patients with BD and not in controls. Considering that XL 28-A is the longest allele so far identified, we performed a functional assay of this allele.

Functional characterization of XL 28-A using a luciferase reporter assay
We examined the promoter activities of XL 28-A and four known alleles (S 14-A , L 16-C , XL 20-A, and XL 22-A ) using a luciferase reporter assay in the rat raphe-derived RN46 cell line. Each allele has a different number of ζ-η tandem repeats from one to eight (Fig. 1A, Table 1). We observed that the promoter activity was maintained as the number of tandem repeats was up to 5, with a slight decrease in longer repeats. However, we found that the promoter activity of XL 28-A with 8 tandem repeats was at the same level as a blank control vector, indicating that XL 28-A has no apparent promoter activity ( Table 1).

Validation of XL 28-A in the general population
We then extensively screened for the presence of XL 28-A and other rare alleles using the ACSS (N = 1528). Through genotyping in the ACSS, we found six novel alleles (S 14-G , L 16-I , L 16-J , L 16-K , XL 20-F and XL 28-A ). Among them, two alleles, L 16-I and XL 28-A , were already found in the CCSS, and the other four were newly identified in this cohort (Fig. 1A). By combining the subjects of the CCSS and the ACSS, none of the rare alleles showed significant deviation in patients compared to controls by Fisher's exact test (p ≥ 0.05).
Structure of the novel 5-HTTLPR alleles (S 14-G , L 16-J , L 16-K , and XL 20-F ) in the ACSS S 14-G has a C/T substitution at the 19th base position in ε of S 14-A . L 16-J has two consecutive C/T substitutions at the 8th and 9th base positions in ο of L 16-A . These mutated repeat units were described as ε' and ο" (Fig. 1B). L 16-K has a replacement of φ with ο or η at the place of the 7th repeat unit in L 16-A or L 16-C . XL 20-F has an insertion of a θ-ι unit between the 7th and 8th repeat units in XL 18-A 6 .

Relationship between the novel 5-HTTLPR alleles and either MMSE or GDS in elderly subjects
Among the endpoints available in the ACSS, we utilized MMSE and GDS. A subject harboring the XL 28-A showed that both scores were at normal levels and within the normal variation in the ACSS population (Fig. 2). In addition, we observed that the other five novel alleles also did not show the possibility of major depressive disorder or dementia.

Discussion
In this study, we identified a total of 11 novel rare alleles and obtained the AFs of known rare alleles of 5-HTTLPR in two independent Japanese cohorts. Among the reported rare alleles, we did not observe 11, 17, 21, and 24 repeat alleles, which were found in other ethnic populations 18,24,25 . We also did not observe 13 and 18 repeat alleles, which were reported in Japanese subjects 19,31,32 . Considering the sample sizes, our study achieved the most accurate and comprehensive estimation of the rare alleles in the Japanese population.

XL 28-A lost its promoter activity
Our promoter assay revealed that the extremely long XL 28-A , which harbors 8 ζ-η tandem repeats, had no promoter activity, while other alleles with a smaller number of repeats maintained their activities (Table 1). A previous study demonstrated that lymphoblastoid cell lines (LCLs) derived from American-African females homozygous or heterozygous with XL alleles, which are 81 bp longer than the L allele and presumably XL with 20 repeats, represent higher expression levels compared to those with SS or LL 33 . This result raised the hypothesis that an increased length of the 5-HTTLPR allele is associated with increased promoter activity. Our contrary result would be explained by the fact that the expression level in the previous study was affected by other regulatory polymorphisms, such as StIn2 34 and rs25531 SNP 35 , in SLC6A4. Another possibility would be that we only examined ζ-η tandem repeats, and the XL alleles consisting of other units may exhibit a length-dependent increase in promoter activity.
Using the transcription factor binding profile database JASPAR (http://jaspar.genereg.net/), we found that the ζ-η tandem repeat has presumptive binding sites of the TFAP2 family, consisting of five members: TFAP2α, TFAP2β, TFAP2γ, TFAP2δ, and TFAP2ε. It is known that TFAP2 proteins have essential roles in neuronal development 36,37 . In humans and mice, TFAP2β exists in the cerebellum, midbrain, medulla, and pons throughout adulthood 38 and represses the promoter activity of SLC6A4 via the TFAP2 binding site at the rs25531 SNP of unit μ in L 16-D 39 . Given that the presumptive TFAP2 binding site we found in this study is located over the ζ and η units, the accumulation of ζ-η tandem repeats might lead to the repression of the promoter activity of SLC6A4.

Pathophysiological role of the XL alleles
In the CCSS, we found no evidence of an association of the AF of XL alleles with BD or SZ. Contrary to our initial expectation, our extensive genotyping in the ACSS showed that the AF of XL 28-A was not associated with psychiatric disorders. In addition, a subject with XL 28-A did not show an apparent decline in cognitive function, nor was there a signature of depression in the elderly subjects. The absence of promoter activity of 5-HTTLPR, therefore, has no apparent effect on phenotypes related to psychiatric disorders. This would be explained by complementation by other cis-regulatory elements, such as StIn2 34 and rs25531 SNP 35 , in SLC6A4 and/or by another 5-HTTLPR allele. Therefore, it would be interesting to pursue the functionality by generating cells with homozygous XL 28-A .
Our results suggest that the identification and interpretation of numerous 5-HTTLPR genetic variations will be important for understanding the role of 5-HTTLPR in psychiatric disorders. We have previously reported that psychiatric patients harboring lowactivity alleles showed altered DNA methylation at specific CpG sites in SLC6A4 and altered amygdala volume 28 . Integrative analysis of DNA methylation and 5-HTTLPR as well as brain structure will be worth studying in the future. Colors indicate the total number of subjects. Arrows indicate the scores of subjects with novel alleles. The dotted line indicates the general thresholds of the MMSE (< 23) and GDS (> 8). Scores above (MMSE) and below (GDS) the thresholds indicate that the subject has no decline in cognitive function or no tendency toward depression. All indicated subjects had a novel allele that was heterozygous with S 14-A .