Characterization of the human TARDBP gene promoter

The expression of TDP-43, the main component of neuronal intracellular inclusions across a broad spectrum of ALS and FTD disorders, is developmentally regulated and studies in vivo have shown that TDP-43 overexpression can be toxic, even before observation of pathological aggregates. Starting from these observations, the regulation of its expression at transcriptional level might represent a further key element for the pathogenesis of neurodegenerative diseases. Therefore, we have characterized the human TARDBP promoter, in order to study the transcriptional mechanisms of expression. Mapping of cis-acting elements by luciferase assays in different cell outlined that the activity of the promoter seems to be higher in SH-SY5Y, Neuro2A, and HeLa than in HEK293. In addition, we tested effects of two SNPs found in the promoter region of ALS patients and observed no significant effect on transcription levels in all tested cell lines. Lastly, while TDP-43 overexpression did not affect significantly the activity of its promoter (suggesting that TDP-43 does not influence its own transcription), the presence of the 5′UTR sequence and of intron-1 splicing seem to impact positively on TDP-43 expression without affecting transcript stability. In conclusion, we have identified the region spanning nucleotides 451–230 upstream from the transcription start site as the minimal region with a significant transcription activity. These results lay an important foundation for exploring the regulation of the TARDBP gene transcription by exogenous and endogenous stimuli and the implication of transcriptional mechanisms in the pathogenesis of TDP-43 proteinopathies.


Results
Bioinformatic analysis of the TARDBP promoter. The characterization of TARDBP gene sequence (Ensembl gene ID: ENSG00000120948) was initially performed by a bioinformatic analysis using the UCSC Genome Browser (GRCh37/hg19). The TARDBP gene, mapped to chromosome 1p36.22 region, spans 12870 bp from position 11,072,679 to 11,085,548 on the forward DNA strand. Figure 1 shows the TARDBP promoter sequence (starting from the 1316 nucleotide upstream from the TSS of NM_007375.3 TDP-43 transcript). Inspection of UCSC TARDBP transcripts matched with those retrieved from the Database of Transcriptional Start Site (DBTSS, http:// dbtss. hgc. jp 53,54 ), most of transcription start sites (TSS) mapped at the corresponding positions indicated in the Reference Sequence NM_007375 (Fig. 1A,B). Apparently, there are no tissue-specific TSSs, since only one mRNA species seems to be predominant in Adult and Fetal Tissues, as well as in different tissues (Fig. 1B).
Subsequently, by querying the Eukaryotic Promoter Database 55   Conservation of TARDBP promoter sequence throughout evolution. In order to study the evolutionary conservation of TARDBP promoter, the 1316 nucleotides upstream of the TSS of TARDBP transcripts from human and other species were retrieved using GenBank and Ensembl Genome Browser (http:// www. ensem bl. org/). Alignments were performed using MUSCLE software 57 . Among primates, we compared the genomic sequences of the putative TARDBP promoter region from Hominoidea (Homo sapiens-Human, Pan paniscus-Bonobo, and Nomascus leucogenys-Gibbon), from of New World monkey (Callithrix jacchus-Common marmoset) and from Old World monkey (Macaca mulatta-Rhesus macaque; Papio anubis-Olive baboon). In general, this comparison highlighted that the relative degree of similarity among all three species increases significantly with respect to the proximity of the TSS. Then, in the region spanning from nucleotide -1316 to -1000 upstream from TSS, the hominoid sequences seem to be closer to each other than with that of New and Old World monkeys. On the other hand, in the region spanning the 1000 nucleotides upstream from TSS, the sequences from all primates show a high degree of identity (Fig. 2). On the other hand, the alignment of the human, mouse and rat putative TARDBP promoter sequences shows a low degree of global similarity (Fig. 3A). Nevertheless, the region of the rodent gene promoters spanning approximately 500nt upstream from the TSS shows a higher level of similarity, suggesting that proximal region of the putative TARDBP promoter region might encompass regulatory elements conserved across primates and rodents. In fact, pairwise sequence alignment between the 1316nt-sequence upstream of TSS of the human/mouse and human/rat TARDBP orthologous genes outlined a 62%/63% of local similarity/identity (with 8% and 5% gaps, respectively) between the sequences of the two species, in 300/500nt-region upstream of the respective TSSs (Fig. 3B). In consideration of the observations pinpointing how age regulated variations in TDP-43 expression seem to be evolutionary conserved at least in mouse and fly 58,59 , we tried to perform pairwise sequence alignment between the 1316ntsequence upstream of TSS of the mouse and fruit fly TARDBP ortholog genes. The local alignment of two the  Characterization of cis-regulatory sequences of TARDBP promoter. In order to identify the minimal functional sequence responsible for the transcriptional activity of the promoter of the human TARDBP gene, a luciferase assay was set up with deletion-fragments of the putative TARDBP promoter constructs cloned into the pGL4 luciferase reporter vector. Different segments of the putative regulatory region of the promoter were amplified through a PCR reaction, and eight constructs differing in length were produced: the sequences encompassed between 27 nucleotides downstream from the TSS (+ 1nt, RefSeq NM_007375.3) to 1316, 927, 451, 380, 320, 280, 230 and 180 nucleotides upstream from the TSS (Fig. 4). The promoter fragments were subcloned into the pGL4.11 vector, transfected in cell lines of different tissue origin (HEK293, HeLa, Neuro2A and SH-SY5Y) and the luciferase activity of each construct was normalized versus the 1316 construct (i.e., 1316 = 1).
Intracellular analysis showed that most of the promoter activity was retained in the fragments spanning from 1316 to 451 nucleotides upstream from the TSS within all tested cell lines (Fig. 4). Further deletions resulted in dramatic reduction of activity (380 and 330 constructs), with almost complete loss of activity when the promoter was shortened to 230 and 180 nucleotides upstream from the TSS, in all tested cell lines. These results show that the minimal promoter region encompasses the 451 nucleotides upstream from the TSS of RefSeq NM_007375.
We also tried to detect any kind of tissue-specific promoter activity. To this aim, we compared the activity of the 1316, 927 and 451 promoter fragments among SH-SY5Y, Neuro2A, HeLa, and HEK293 cell lines, considering the HEK293 cells as the unitary reference. Indeed, the constructs showed approximately 6-8 × higher activity and Rat (ENSRNOT00000049822) TARDBP transcripts was generated with MUSCLE alignment software (http:// www. ebi. ac. uk/ Tools/ muscle/ index. html). (B) Pairwise Sequence Alignment was used to identify regions of identity/similarity between the promoter sequences of the Human/Mouse and Human/Rat TARDBP orthologs. The local similarities between the sequences were identified by using the EMBOSS Matcher Local alignment tool (https:// www. ebi. ac. uk/ Tools/ psa/).

TDP-43 does not influence its own transcription.
After characterizing the sequence of the promoter of TARDBP gene and its activity in different cell lines, we wished to evaluate the ability of TDP -43 to influence, directly or indirectly, the synthesis of its own transcript. For transfections of these constructs, a stable HEK293- The TARDBP 5′UTR and intron 1 splicing positively impact the luciferase expression. The 5′UTR region of TARDBP gene encompasses exon 1 (102 bp) and the first 12 bp of exon 2, separated by intron 1 (972 bp). In order to explore the presence of additional elements able to modulate TDP-43 expression, the functional impact of the 5′ UTR (construct 451 + Ex1Ex2) and intron 1 (construct 451 + Ex1-IVS1-Ex2) of TARDBP was analysed by generating variants of the 451 plasmid (Fig. 8A). A first variant was containing the TARDBP 5′UTR (exon 1, 102 bp and exon 2, 12 bp) correctly spliced (451 + Ex1Ex2). The second construct was created by inserting the region encompassing exon1 (102 bp), intron 1 (972 bp) and the first 12 bp of exon 2 of TARDBP gene in between the 451 promoter and the luciferase ATG codon (451 + Ex1-IVS1-Ex2 wt). The third construct was a mutant of latter construct where the 3′ splice site of intron 1 was disrupted (451 + Ex1-IVS1-Ex2 mut).
In SH-SY5Y cells, the presence of a correctly sized amplicon was observed only after transfection of the 451, 451 + Ex1Ex2 and 451 Ex1-IVS1-Ex2 wt constructs (Fig. 8B). Uncropped images for agarose gel shown in Fig. 8B are shown in Supplementary Fig. 2. Quantitation of the mRNA species was determined by Real Time PCR (Fig. 8C). By using the 451 construct as the reference (= 1), the relative expression of the 451 + Ex1Ex2 (containing the pre-arranged and correctly spliced 5′UTR region) was 2.6 × that of the reference, while the 451 + Ex1-IVS1-Ex2 wt construct showed a 25 × increment in relative expression compared to the -451 construct). Conversely, the mRNA levels of the 451 + Ex1-IVS1-Ex2 mut construct were negligible (0.1 × vs 451 control). In order to test if mutant transcript is unstable due to low efficiency of splicing, the RNA level of 451 + Ex1-IVS1-Ex2 wt and 451 + Ex1-IVS1-Ex2 mut constructs were monitored with and without addition of a transcriptional inhibitor (Actinomycin D) that causes complete transcriptional arrest 63 . Considering the higher mRNA levels derived from the intron-containing construct (451 + Ex1-IVS1-Ex2 wt), to prevent the possibility that the decay machinery might be saturated by the spliced mRNAs, we used proportionately less of this construct, while keeping constant the amount of total transfected DNA. While it was not possible to determine the half-life of the 451 + Ex1-IVS1-Ex2 mut mRNA (whose expression was negligible), the mRNA species of both the 451 + Ex1-IVS1-Ex2 wt and the 451 + Ex1Ex2 constructs exhibited half-lives of ∼ 7 h, indicating no significant differences in mRNA stability dependent on splicing (Fig. 8D). Lastly, the relative luciferase activities positively correlate with the occurrence of splicing. In fact, the same constructs were transfected in SH-SY5Y cells and their luciferase activity was measured (Fig. 9): the activity of the 451 + Ex1Ex2 wt (containing the pre-arranged and correctly spliced 5′UTR  www.nature.com/scientificreports/ region) was 1.5 × that of the control. The 451 + Ex1-IVS1-Ex2 wt construct showed a 5 × increment in luciferase activity (as compared to the control, -451 construct). On the other hand, as expected, the 451 + Ex1-IVS1-Ex2 mut construct showed insignificant activity when compared with the other constructs (Fig. 9). Altogether these results suggest that the presence of the 5′UTR as well as the correct splicing event of intron 1 are elements able to modulate the luciferase expression (and, potentially, of TDP-43) at transcriptional level.

Discussion
We have experimentally characterized the promoter region of the human TARDBP gene. A previous prediction indicated that the core regions of the TARDBP promoter could be located between 500 nucleotides upstream of exon 1 and 66 nucleotides downstream of exon 1 and in two regions of intron 1 (212 and 613 nucleotides) 52 .
Our predictions failed to identify the canonical TATA box and CAAT box transcriptional regulatory elements and interrogation of the database DBTSS (cataloging the positions of TSSs in the genome) outlined how, in most TDP-43 transcripts, transcription starts from the TSS indicated in the main transcript isoform NM_007375.
Regarding the spatio-temporal expression pattern, TDP-43 is a developmentally regulated protein 58,64 , while its distribution in adult tissues seems to be ubiquitous 6,58,64 . These observations are consistent with the hypotheses supported by recent studies reporting that TATA box is present in only a minority of promoters [65][66][67] , often with tissue-specificity such as liver and muscle [68][69][70] .
From an evolutionary perspective, it is interesting to note that, among primates, the greatest degree of evolutionary conservation encompasses the 600nts upstream of the TSS. On the other hand, the alignment of human and rodents sequences highlights the 200nts upstream of the TSS as the region of higher homology. These observations suggested that the region in-between plays an important role in directing transcription of the TARDBP gene. Indeed, this hypothesis is consistent with the experimental mapping performed in this study that has outlined the region spanning nucleotides 451-230 as necessary for the minimal promoter activity in all the tested cell lines (Fig. 4).
The comparison of the transcriptional activity in different cell lines highlights how the TARDBP promoter activity is high in cell lines of neuronal origin as well as in HeLa, as compared to the human embryonic kidney 293 cells (Fig. 5). These observations show that the promoter is active in human and mouse cell lines of different tissue origin (as expected) and suggest that apparent relative lower activity observed in HEK293 cells might rely on biochemical and/or genomic factors associated to this specific cell line. In addition, it should be also considered that, in comparison with other tested cell lines, HEK293 cell line has no evident tissue-specific gene expression and show expression of markers of several tissues 71 .
It is well established that neuronal TDP-43 cytoplasmic inclusions are present in several different neurodegenerative diseases, such as FTD, ALS and Alzheimer and that the sequestration of this protein in aggregates may create gain-and loss-of-function events causing (directly or indirectly) cellular toxicity or alteration of the TDP-43-regulated gene expression. Understanding the factors that control TDP-43 expression within cells will certainly provide a better insight into disease origin and progression. Indeed, it has been observed that an increase of TDP-43 protein levels can activate the aggregation process in both cytoplasmic and nuclear compartments 20,60 . In addition, studies with animal models have shown that TDP-43 overexpression can be neurotoxic, even without the presence of aggregates 30,32,33,72,73 . Indeed, although TDP-43 can regulate its own expression through a negative feedback loop 44 , some observations suggest that regulation of its expression might may also occur at transcriptional level. In particular, TDP-43 expression seems to be developmentally regulated 64 , but also TDP-43 expression has been shown to decrease with age in a variety of organisms such as fruit fly 59 and mouse 58 .
Looking for potential modifiers of the TARDBP promoter activity, we have focused the attention on two SNPs (c.1-562t>c, rs9430335 and c.1-100t>c, rs968545) previously identified within the TARDBP promoter of SALS patients. Although the variants do not change any transcription factor binding site, we have functionally tested their transcriptional impact. Although, in our system, we did not find statistically significant differences in the promoter activity of each haplotype, we can hypothesize that the potential regulatory SNPs do not lead to variation of TARDBP promoter activity directly, but cannot exclude that other possibilities of how these may affect transcription levels do exist. The variants might affect transcription at endogenous copy numbers, but not for overexpressed exogenous constructs. In addition regulatory elements, outside the promoter region included in our reporter plasmid and might interplay with the variants and modulate the activity of the promoter.
Subsequently, since TDP-43 autoregulates its own expression through a negative feedback mechanism via direct interaction with its own 3′UTR 44,48,74,75 , we verified if there are transcriptional effects of TDP-43 on its own promoter activity and it does not modulate its own transcription.
Then, considering the known ability of the 5′UTRs to affect mainly protein translation efficiency 76 , we next sought to test the possible effects of the TARDBP 5′UTR and intron 1 splicing on the expression of the associated reporter gene (Fig. 8).
On one hand, the observation of a significant increase in luciferase activity associated with the presence of the TARDBP "pre-arranged" 5′UTR (construct 451 + Ex1Ex2) indicated that this element can positively influence gene expression. Consistently, recent studies have shown that approximately 35% of human 5′UTRs are annotated as harboring introns 77 and recent observations suggest that introns in UTRs may have specific regulatory functions by affecting the rate of transcription rather than transcript stability. On the other hand, our experiments aimed at testing the relative mRNA decay rates of the luciferase constructs indicate that these elements have little or no impact on the stability of the mature mRNA, and therefore, they might affect gene expression through modulation of other steps of RNA metabolisms, such as transcription, mRNA levels, mRNA export, translational efficiency [78][79][80][81] .
In conclusion, growing lines of evidence suggest that the pathophysiology of ALS, FTD and other neurodegenerative diseases might be the consequence of the convergence of multiple risk factors acting also on changes  64 , in a cell specific fashion, and, through aging, these fluctuations are evolutionary conserved at least in mice and flies 58,59 . Notwithstanding there are no clear data of age-related changes in TDP-43 in human 82,83 , altogether these observations suggest that a better understanding of the mechanisms and factors involved in controlling the activity of the TARDBP promoter and, consequently, TDP-43 expression will give clues to uncover novel processes implicated in the onset and progression of the TDP-43 proteinopathies. Statistical analysis was performed using Student's T or ANOVA tests. In order to verify the possible influence of TDP-43 on its own promoter, we used the previously generated human stable cell line Flp-In HEK293 -TDP-43 wild type, carrying the human TDP-43 transgene locus-specific-integrated, whose expression is driven by an inducible version of the Tetracycline (Tet) promoter 44 .

Constructs
One day before transfection, Flp-In HEK293 -TDP-43 wild type (wt) cells were seeded at 60% confluence. Lipofectamine reagent was used for transfection with 300 ng of the reporter plasmids and 20 ng of pRL-TK vector. After 12 h, the medium was changed, and the cells were incubated further for 24 h. Flag-tagged TDP-43 expression was induced with 1 µg/ml tetracycline. The luciferase/renilla activities were then assayed and statistical analyses were performed using Anova one-way followed by Tukey's test.
Gene expression quantitation. In order to quantify the mRNA levels of the 5′UTR TDP-43 constructs, co-transfections of luciferase constructs and pRL-TK were carried out as described in the section "Transfections and Luciferase assays".
Quantitation of the 5′UTR TDP-43 mRNAs by Real Time PCR, co-transfections of luciferase constructs and pRL-TK were carried out as described in the section "Transfections and Luciferase assays". The extracted RNA (1 µg) was DNAse-treated and reverse-transcribed using the M-MLV reverse transcriptase (Thermo Fisher Scientific, Waltham, MA, USA), and oligo-dT (0.2 µg/reaction as primer), in a final volume of 40 µl, essentially as indicated in the manufacturer's instructions. The CFX96 real-time PCR detection system (Bio-Rad Laboratories, Redmond, WA, USA) was used for all qPCRs. SYBR green assays were optimized (annealing temperature, and incubation time) with the iQ SYBR Green Supermix (Bio-Rad), according to the manufacturer's instructions, using tenfold serial dilutions of cDNAs. The amplification mixture consisted of 0.2 μM primers, 5 μl of SYBR Green Supermix, and 4 μl template DNA in a total volume of 10 μl. Samples were amplified with the following program: initial denaturation at 95 °C for 2 min, followed by 40 cycles of denaturation for 15 s at 95 °C and annealing/elongation for 30 s at 60 °C. The expression of the co-transfected renilla orf was used to normalize the luciferase qPCR data.
For experiments with actinomycin D, SH-SY5Y cells were transfected with 3000 ng of no-intron constructs (451, 451 + Ex1Ex2) or of the 451 Ex1-IVS1-Ex2 mut plasmid or 600 ng of the 451 Ex1-IVS1-Ex2 mut construct in a p100 plate. Total DNA was normalized to 3000 ng by adding an appropriate amount of pUC18 vector. Twelve hours later, cells deriving from each transfection were harvested and plated in different wells (one for each actinomycin incubation time point). Twenty-four hour after plating, actinomycin D (5 μg/ml) was added. The relative levels of the indicated mRNAs were assessed at the designated time points, following shutoff of transcription using qRT-PCR. Samples were collected at the indicated times using TRIzol. Total RNA from these samples was analyzed via qRT-PCR using the abovementioned Luciferase primers and normalization was carried out versus the housekeeping gene RPII (RPII_f: 5′-GCC CAC GTC CAA TGA CAT -3′; RPII_r: 5′-GTG CGG CTG CTT CCA TAA -3′). Relative transcript abundances were determined using the ΔΔCt method 85 .
Western blot. Cells extracts were prepared in PBS containing 1 × protease inhibitor (Roche Diagnostics, Indianapolis, IN, USA). Proteins were separated by SDS-PAGE and transferred to nitrocellulose (Cytiva, formerly GE Healthcare, Chicago, IL, USA) and protein detection was carried out with standard Western blotting techniques. After transfer, membranes were incubated for 10 h in blocking solution (5% nonfat dry milk in PBS containing 0.1% Tween-20, T-PBS) to prevent non-specific binding. Subsequently, membranes were incubated for 1 h at room temperature with specific primary antibodies diluted in blocking solution. Expression levels of both endogenous and flag-tagged human wild-type TDP-43 was monitored by using a commercially available polyclonal anti-TDP-43 antibody (Proteintech, Rosemont, IL, USA, 10782-2-AP). Endogenous tubulin was used as a loading control, using an in-house made mouse monoclonal antibody. Immunoblots were developed by using the ECL Star Enhanced Chemiluminescent Substrate (EuroClone, Pero, Milan, Italy).