Aire is a transcription factor that controls T cell tolerance by inducing the expression of a large repertoire of genes specifically in thymic stromal cells. It interacts with scores of protein partners of diverse functional classes. We found that Aire and some of its partners, notably those implicated in the DNA-damage response, preferentially localized to and activated long chromatin stretches that were overloaded with transcriptional regulators, known as super-enhancers. We also identified topoisomerase 1 as a cardinal Aire partner that colocalized on super-enhancers and was required for the interaction of Aire with all of its other associates. We propose a model that entails looping of super-enhancers to efficiently deliver Aire-containing complexes to local and distal transcriptional start sites.
Medullary thymic epithelial cells (mTECs) are involved in both negative selection of effector T cells and positive selection of regulatory T cells1. A unique feature of mTECs, which is critical for their roles in tolerance induction, is expression of a large fraction of the genome, particularly scores of loci encoding antigens characteristic of fully differentiated parenchymal cells (peripheral-tissue antigens, PTAs)2,3,4. Much of this transcription is driven by Aire5. mTECs from Aire-deficient mice and humans show severely compromised PTA expression, causing these individuals to develop autoimmune infiltrates and autoantibodies targeting multiple peripheral tissues6,7.
Several observations argue that Aire is a transcriptional regulator that operates differently from conventional transcription factors6,7. First, unlike traditional factors, the transcriptional effect of Aire on mTECs involves a large, although still select, portion of the genome2,3,4. Experimental introduction of Aire into extra-thymic cells induces expression of large sets of transcripts, which differ from cell type to cell type and also diverge from those induced in mTECs8. Second, Aire-induced gene expression has a strong element of stochasticity, with individual mTECs transcribing only a small subset of the total repertoire of induced PTA transcripts3,4,9,10. The subsets of transcripts induced in individual cells exhibit both intra- and inter-chromosomal clustering3,4. Third, Aire appears to not bind to a particular promoter or enhancer motif, exhibiting only a low, non-discriminatory affinity for DNA11. Instead, it seems to recognize generic features of transcriptionally quiescent sites, such as chromatin marks typical of silenced loci, for example, unmethylated lysine 4 of histone 3 (H3K4me0)11,12, and promoters with stalled RNA polymerase II (RNA-PolII)13,14.
Screening approaches have uncovered a large cast of structural and functional Aire partners, which fall into multiple functional classes, notably nuclear transport, chromatin structure and/or binding, transcription (including the DNA-damage response), and pre-mRNA processing15,16. However, we remain ignorant of the genomic location, architecture, biogenesis and function of the resulting Aire-containing complexes. We addressed these issues by exploiting recent advances in genome-wide chromatin mapping techniques, which now permit analysis of the small mTEC numbers available ex vivo17, and by applying diverse biochemical approaches. We found that Aire was located on and activated super-enhancers, defined as chromatin regions that host exceptionally high concentrations of transcriptional regulators. The topoisomerase TOP1 emerged as a cardinal Aire partner that colocalized on super-enhancers and was required for Aire interaction with all of its other partners.
Aire is located on mTEC super-enhancers
Our first goal was to map the genome-wide distribution of Aire ex vivo in mTECs, particularly its relationship to diagnostic histone marks, using chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq). Aire bound to 42,124 sites scattered throughout the genome of Ly51loMHCIIhi mTECs (called mTEChi hereafter) from 4–6-week-old female C57BL/6(B6) Aire+/+ mice, in comparison with the <12,000 sites detected in Aire-transfected cell lines14,18. The Aire signal was robust and reproducible, with >75% of the binding sites being common to the two biological replicates routinely examined (Supplementary Fig. 1 and Supplementary Table 1). Genome-wide, Aire bound primarily to intergenic, transcriptional start site (TSS) and intronic regions (Fig. 1a). Aire was situated along both Aire-induced and Aire-neutral genes but there was a substantially higher ratio of intergenic/TSS localization in the former compared with the latter set of genes (7.3/1 versus 1.2/1). The density of Aire around TSSs was demonstrably lower for Aire-induced than Aire-neutral genes, whereas its representation on intergenic and intronic regions did not vary much with the gene-induction status (Fig. 1b).
Super-enhancers are chromatin elements that serve as extended and overloaded depots for a multiplicity of general and cell-type-specific transcription factors19,20. Because they are often associated with, and regulate, genes diagnostic of fully differentiated cell types (that is, PTA genes), we investigated whether such elements might preferentially harbor Aire in mTECs. Following standard practice21, we defined mTEC super-enhancers as stretches >3.9 kb of histone 3 acetylated at lysine 27 (H3K27ac), a chromatin mark indicative of active enhancers. By this definition, mTEChi had 1,170 super-enhancers, of ∼30 kb average size, scattered over the chromosomes (Fig. 1c and Supplementary Fig. 2a). H3K4me1, another active chromatin mark, was selectively included in these super-enhancers, whereas H3K27me3, indicative of inactive chromatin, was excluded (Fig. 1d,e and Supplementary Fig. 2b), supporting the validity of our mTEChi super-enhancer designations. Aire was also preferentially bound to these super-enhancers, along with its structural and functional associate RNA-PolII (Fig. 1d,e and Supplementary Fig. 2b). To determine whether the super-enhancers that we had delineated were merely a conglomerate of typical enhancers, we bioinformatically sorted all of the super-enhancers and conventional enhancers from the 'hockey plot' derived from the ROSE algorithm21 (Fig. 1c), focused on individual H3K27ac peaks in the two sets of enhancers, and compared their Aire densities according to the ChIP-seq data. Super-enhancers hosted higher densities of Aire than did conventional enhancers (Fig. 1f).
To address the functional importance of Aire localization to super-enhancers, we compared ChIP-seq data obtained in parallel on mTEChi isolated from B6.Aire+/+ and B6.Aire−/− littermates. The density of H3K27ac marks on super-enhancers was significantly lower in Aire−/− mTEChi (Fig. 1g). In addition, a reanalysis of published ChIP-seq data sets22 revealed a lower density of H3K27ac marks on super-enhancers from immature Aire− mTECs compared with mature Aire+ mTECs from wild-type mice (Fig. 1g).
To localize the transcriptional effect of Aire in relation to super-enhancers, we compared gene expression in regions stretching 500 kb 5′ or 3′ of super-enhancers in Aire+ and Aire− mTEChi. Aire induced transcription in genomic regions extending in both directions from, but not overlapping, super-enhancers (Fig. 1h), whereas similar Aire-dependent transcriptional changes did not occur up- or downstream of random, size-matched genome stretches (Fig. 1h).
In addition, we performed ATAC-seq (assay of transposase-accessible chromatin followed by high-throughput sequencing23) to provide a genome-wide view of chromatin accessibility in ex vivo mTEChi. The chromatin stretches delineated above to be mTEChi super-enhancers had elevated ATAC signal densities (compared with other chromatin regions) in Aire+ mTEChi, but not in control earskin fibroblasts (Fig. 2a), indicating that they were preferentially open in the former case. As we had seen for H3K27ac signals, ATAC signals were substantially higher for individual peaks in super-enhancers than in conventional enhancers (Fig. 2b). In addition, comparison of ATAC signals in mTEChi of Aire+/+ and Aire−/− mice revealed that Aire induced the accessibility of super-enhancers (Fig. 2c) and, to a lesser extent, that of conventional enhancers (data not shown). This effect is perhaps most evident from a plot of Aire ChIP-seq signal against the fold change in ATAC signals in Aire+ versus Aire− mTEChi. This ATAC signal differential was higher in super-enhancers than in random size-matched chromatin stretches (Fig. 2d), as indicated by the preferential concentration of ATAC signal on mTEChi super-enhancers in Aire+ mTEChi in comparison with Aire− mTEChi and control earskin fibroblasts (Fig. 2e). These observations indicate that Aire was preferentially associated with and activated super-enhancers, resulting in transcriptional induction upstream and downstream.
Aire participates in multiple multi-protein complexes
Given that the architecture and dynamics of Aire-containing multi-protein complexes are ill defined, we next investigated how Aire-containing complexes assemble to promote transcription. Because the low numbers of mTEChi that can be isolated from mouse thymi precludes this type of study, we used HEK293T cells transfected with a construct encoding Aire with a FLAG tag at the amino terminus (FLAG-Aire). A plot of genome-wide H3K27ac densities from published ChIP-seq data24 revealed that Aire and RNA-PolII were preferentially located on super-enhancers in HEK293T cells (Supplementary Fig. 3).
Gel filtration chromatography provides information on protein complex number and composition. Because Aire is known to form molecular conglomerates of >670 kDa25, we applied nuclear extracts from FLAG-Aire-HEK293T cells to a Superose-6 column and immunoblotted eluted fractions with a FLAG antibody (Ab). Aire was broadly distributed, spanning fractions 9–14 (669–2,000 kDa; Fig. 3a). We then pooled fractions 9–11 and fractions 12–14, immunoprecipitated Aire-containing complexes with a FLAG Ab, and immunoblotted the precipitated material with antibodies recognizing a panel of Aire partners. Based on their co-elution profiles, Aire partners divided into three groups: those maximally eluted in fractions 9–11 (SFRS3, DDX5), in fractions 12–14 (DNA-PKcs, Ku80, PARP-1, DSIF, CDK9, BRD4) and in all of the Aire-containing fractions (TOP2A, RNA-PolII) (Fig. 3b). These findings indicate that Aire participates in at least two multi-protein complexes.
Next, we performed antibody pre-clearing experiments to reveal the extent to which designated Aire partners co-reside in complexes. DNA-PKcs is one of the Aire-interacting proteins that is most consistently detected15. Co-immunoprecipitation experiments revealed that DNA-PKcs associated with all of the Aire partners implicated in transcription (Ku80, PARP-1, BRD4, CDK9, TOP2A, DSIF, RNA-PolII), but none of those involved in pre-mRNA processing (DDX5, SFR53) (Fig. 3c). Pre-clearing of nuclear extracts with a DNA-PKcs Ab removed the Aire-containing complexes that also hosted DNA-PKcs and, to a differential degree, Aire-containing complxes that hosted its other partners (Fig. 3c). Plotting for each Aire partner its propensity to associate with DNA-PKcs versus its degree of interaction with Aire after DNA-PKcs depletion (Fig. 3d) revealed three classes of Aire-partner-interacting proteins: those not co-residing with DNA-PKcs in Aire-containing complexes (DDX5, SFRS3), those largely (>80%) co-residing (Ku80, BRD4, DSIF, RNA-PolII), and those partially (40–60%) co-residing (PARP-1, TOP2A, CDK9). These results also suggest the existence of two to three distinct Aire-containing complexes.
To address how removal of particular protein partners influence the formation of Aire-containing complexes, we co-transfected FLAG-Aire-HEK293T cells with one of four pre-validated cognate shRNAs for individual Aire partners (expressed in the pLKO.1 vector) and assessed the ability of Aire to interact with its remaining partners in immunoprecipitation experiments. Because knockdown of BRD4 was not efficient using this approach, we used the small molecule inhibitor I-BET151 (ref. 26). We focused on the Aire partners implicated in transcriptional regulation, as the partners involved in transcription and pre-mRNA processing are known to behave independently in this assay15. Knockdown of DNA-PKcs or PARP-1 expression in HEK293T cells abolished the interaction of Aire with all of the Aire partners examined (Fig. 3e and Supplementary Fig. 4), indicating that these factors are essential for the assembly of the Aire-containing complexes. Inhibition of BRD4 and CDK9 inhibited the interactions between Aire and its associates CDK9, TOP2A and DSIF (Fig. 3e and Supplementary Fig. 4), which are known to be involved in transcriptional elongation27,28, but not those involved in the DNA-damage response29. Knockdown of TOP2A and DSIF compromised the ability of Aire to interact with only DSIF and RNA-PolII (Fig. 3e and Supplementary Fig. 4). Together, these observations indicate that Aire participated in at least two, and potentially three, multi-protein complexes.
TOP1 is a primary Aire partner
We previously proposed that TOP2A is an early Aire partner, suggesting a scenario in which Aire 'freezes' the enzymatic activity of TOP2A, thereby stabilizing double-stranded breaks (DSBs) and inciting the DNA-damage response via DNA-PK activation15. Because shRNA-mediated knockdown of TOP2A had a limited effect and inhibited the association of Aire only with DSIF and RNA-PolII (Fig. 3e and Supplementary Fig. 4), we investigated whether the ability of Aire to promote DSBs15 reflects an early interaction with other topoisomerases. First, we revisited mass-spectrometry (MS) data from several published or unpublished experiments aimed at identifying proteins that co-immunoprecipitate with Aire in HEK293T cells (ref. 15 and data not shown). TOP2A peptides were detected in most of these experiments, but peptides from TOP2B and TOP1 were also found (Table 1). All three enzymes were detected on immunoblots of proteins co-immunoprecipitated with Aire in FLAG-Aire-HEK293T cells (Fig. 4a). shRNA-mediated knockdown of TOP2B in these same cells significantly decreased Aire interactions with only DSIF and RNA-PolII (Fig. 4b and Supplementary Fig. 5a), indicating that TOP2B had a restricted effect on the association of Aire with its partners, similar to what was observed above after dampening TOP2A expression. In contrast, knockdown of TOP1 strongly inhibited the interaction of Aire with all partners tested, except DDX5 and SFRS3, which are involved in pre-mRNA processing (Fig. 4c and Supplementary Fig. 5b), a pattern reminiscent of that seen following dampening of DNA-PKcs or PARP-1 expression. Notably, TOP1 knockdown reduced the association of Aire with TOP2B and TOP2A (Fig. 4d), whereas TOP2B knockdown inhibited Aire interaction with only TOP2A, and TOP2A knockdown failed to affect the association of Aire with either TOP2B or TOP1 (Fig. 4d). The antibody pre-clearing assay showed that removal of TOP1-containing complexes strongly compromised the interaction of Aire with all of its partners that were implicated in transcription (Fig. 4e and Supplementary Fig. 5c), but did not affect associations with partners involved in pre-mRNA processing. Addition of the DNA intercalator ethidium bromide during the pulldown revealed that the Aire-containing complexes were not preformed, but rather required DNA binding30 (Supplementary Fig. 5d).
To evaluate the scenario in which TOP1 induces the DSBs that initiates the formation of Aire-containing multi-protein complexes, whereas TOP2A and TOP2B are involved in downstream events, we performed ChIP-seq analysis on ex vivo mTEChi, comparing the distribution of TOP1 and TOP2A (for which reliable ChIP-seq antibodies were available) with those of H3K27ac (which defines super-enhancers), Aire and γH2AX (which delineates regions adjacent to DSBs29). TOP1 and γH2AX colocalized highly preferentially with Aire at super-enhancer regions (Fig. 5a,b and Supplementary Fig. 6), whereas the distribution of TOP2A was more dispersed, spreading beyond super-enhancers locally and far-distally (Fig. 5a,b and Supplementary Fig. 6). In addition, a comparison of the Aire-induced changes in super-enhancer-localized γH2AX ChIP-seq signals with Aire-induced changes in topoisomerase ChIP-seq signals revealed a strong correlation for TOP1, but not TOP2A (Fig. 5c). Thus, Aire coordinately affected the localizations of TOP1 and DSBs.
TOP2A was less concentrated than TOP1 at mTEChi super-enhancers, whereas the overall genomic distribution, specifically the partitioning between intergenic, TSS and intronic regions, of the two topoisomerases was similar, with relatively little TOP2A and TOP1 localizing to exonic regions (Fig. 5d). Focusing on the statistically significant TOP1 peaks annotated to Aire-induced genes, we found co-association of Aire, RNA-PolII and γH2AX at intergenic, TSS and intronic stretches; as expected, the γH2AX peaks were broad, especially at the TSSs (Fig. 5e). We detected little TOP2A in the TOP1 peaks, regardless of the genome element examined (Fig. 5e). In contrast, the statistically significant TOP2A peaks showed co-association with Aire, RNA-PolII, γH2AX and TOP1, but almost exclusively at the TSSs (Fig. 5f). These findings also suggest that TOP1 and TOP2A have divergent roles in Aire-induced transcription in mTECs, with TOP1 primarily being involved during initial complex assembly at super-enhancers and TOP2A mainly being involved during subsequent events.
TOP1 and TOP2 are required for Aire induction of gene expression
We used the topoisomerase inhibitors topotecan and etoposide, which block TOP1 and TOP2, respectively, to assess the importance of these topisomerases in vivo. These small-molecule inhibitors stabilize the enzymes when covalently bound to the DNA they just clipped, thereby inhibiting religation28. We injected 4-week-old B6.Aire+/+ female mice intraperitoneally with vehicle (DMSO), topotecan, etoposide or both drugs every day for 3 d, and then sorted the mTEChi fraction. None of the drugs altered the proportions of the major thymocyte or stromal cell compartments, as determined by flow cytometry (Supplementary Fig. 7). Nor did topotecan or etoposide treatment detectably influence mTEChi expression of Aire or any of the topoisomerases examined, as determined by flow cytometry (Supplementary Fig. 8a,b). Microarray-based gene-expression profiling, performed in biological triplicate on RNA from sorted mTEChi from drug-treated versus vehicle-treated mice, revealed that topotecan and etoposide, and especially the two together, preferentially repressed expression of the set of genes normally upregulated by Aire in mTEChi (Fig. 6a). Although the numbers of genes inhibited by each of the three treatments were similar (625–700), the reduction was less strong for etoposide than topotecan, whereas using both drugs resulted in the strongest reduction (Fig. 6a).
Feature-level analysis of gene-expression profiling data has been used to show that the effect of Aire on transcription is minimal just after the TSS, but increases after about 200 nucleotides, reflecting an effect on RNA-PolII pausing14,26. Plotting the ratio of Aire− to Aire+ mTEChi expression values versus distance from the TSS revealed the effect of Aire on RNA-PolII pausing (Fig. 6b), as previously described14,26. Feature-level analysis of the mTEChi gene-expression data from mice treated with etoposide and topotecan versus vehicle alone suggests that these drugs had a preferential effect on distal features (>200 nucleotides from TSS), and a weaker preference with the single-drug treatments (Fig. 6b). Thus, topoisomerase inhibitors, especially in combination, operate at least in part by potentiating RNA-PolII pausing.
Co-immunoprecipitation studies on FLAG-Aire-HEK293T cells revealed that topotecan treatment blocked the ability of Aire to interact with all three topoisomerases, whereas etoposide compromised the association of Aire with TOP2, but not TOP1 (Fig. 6c). Together, these data indicate that both TOP1 and TOP2 had a substantial role in Aire-induced gene expression in mTECs. However, their roles were different, with TOP1 appearing to exert a stronger, earlier effect.
TOP1 and TOP2 are necessary for imprinting tolerance
Finally, we examined the immunologic consequences of disrupting the interaction between Aire and TOP1 or TOP2 by evaluating the clonal deletion of self-reactive thymocytes. We quantified the fraction of CD4+ T cells that recognize peptide P2 (294–306) of the retinal protein IRBP. Clonal deletion of these cells is known to be Aire dependent31. We injected 4-week-old B6.Aire+/+ mice intraperitoneally with topotecan, etoposide or just vehicle every third day for 3 weeks, and followed this with intraperitoneal inoculation with the P2 peptide in complete Freund's adjuvant. Compared with vehicle, treatment with etoposide or topotecan reduced the expression of Irbp mRNA in whole-thymus homogenates 10 d after P2 injection (Supplementary Fig. 8c). In addition, significantly more P2-specific CD4+ T cells were found in the peripheral lymphoid organs of drug- versus vehicle-treated mice (Fig. 7a), as determined by staining of pooled lymph node and spleen leukocytes with the Ab:P2 tetramer. Together, these observations indicate that topotecan and etoposide reduced negative selection of IRBP P2-specific T cells. Using the same assay, treatment with etoposide or topotecan did not reduce the negative selection of CD4+ T cells that bound the Ab:P7 tetramer, containing peptide 786–797 of IRBP (Fig. 7b), which is selected in the thymus independently of Aire31.
To further elucidate the effect of topoisomerase inhibition on immunologic tolerance, we quantified the leuckocyte cell infiltration in various organs targeted by inflammatory infiltrates in Aire−/− mice on a NOD genetic background. These mice were selected for analysis because the autoimmunity that typically occurs in the absence of Aire is prominent in NOD mice32. We intraperitoneally injected NOD.Aire+/+ pups with topotecan or etoposide at days 2, 4 and 8 after birth, and histologically assessed various organs for leukocytic infiltration at 15 weeks of age. Inhibition of TOP1 and TOP2 with topotecan and etoposide significantly augmented leukocyte attack on the retina and lung at 15 weeks of age compared with vehicle treatment, but we did not see an increase in infiltration in stomach, lacrimal gland or salivary gland (Fig. 7c). The inflammatory attack on the retina and lung was not seen when the same experiment was performed on Aire−/− mice (Supplementary Fig. 8d,e), suggesting that it was not a result of nonspecific drug toxicity. In Aire−/− mice, treatment with TOP inhibitors protected the mice from tissue pathology, as expected from prior data showing that etoposide and Aire induce the same set of transcripts in Aire-deficient cells15. These results indicate that inhibition of TOP1 or TOP2 results in a break in immunological tolerance akin to, but not a precise mimic of, that characteristic of mice lacking Aire. Considering these insights on Aire-containing multi-protein complexes and chromosomal localization, we propose a model of the molecular mechanism of Aire that involves preferential localization of Aire on super-enhancers for efficient delivery of Aire-containing complexes to the TSSs of its target genes (Supplementary Fig. 9).
We used advanced genomic and biochemical approaches to investigate the molecular mechanism of Aire action. Aire was found on both Aire-induced and Aire-neutral genes, particularly along chromatin stretches overloaded with H3K27ac and H3K4me1 and underloaded with H3K27me3, a profile of histone marks that is routinely used to delineate super-enhancers19,21,33. Furthermore, the super-enhancers of mature Aire+/+ mTEChi were enriched in H3K27ac marks vis à vis both mature Aire−/− mTEChi and immature Aire− mTEChi from Aire+/+ mice, indicating that Aire activates super-enhancers. In addition, multiple topoisomerases were important for Aire induction of mTEC gene expression. TOP1 was a primary Aire partner, co-concentrated on super-enhancers and critical for Aire association with all of its other partners, whereas TOP2 was more involved at later stages of transcription.
Super-enhancers are defined as exceptionally long chromatin stretches hosting exceptionally high densities of general and cell-type-specific transcription factors19,20,21. They are thought to serve as depots for effective collection of relevant transcriptional regulators to enable their efficient and coordinate delivery to TSSs via intra-chromosomal looping or inter-chromosomal interactions. Super-enhancers are preferentially associated with genes that set the identity of and control the activities of fully differentiated cell types, or that are rapidly induced following environmental or physiologic stimulation. Localization of Aire in super-enhancers could explain several of its unusual features. First, Aire has a huge effect on mTEC transcription, regulating around 20% of the genes expressed in this cell type3,4. High-concentration depots of Aire-containing multi-protein complexes could drive this prodigious activity, perhaps in the nuclear speckles reported to host Aire and certain of its critical partners (for example, CBP)34,35. Second, although Aire exerts a strong influence on the mTEC transcriptome at the population level, its effect on an individual cell is much more restrained3,4,9,10. The repertoire of Aire-induced transcripts in single mTECs exhibits both intra- and inter-chromasomal clustering3,4. Dynamic looping of super-enhancers along a chromosome or engagement of a super-enhancer and TSS on different chromosomes could provide a potential framework for understanding such cell-by-cell variation. Third, the fact that super-enhancers are characteristically associated with genes mobilized during terminal differentiation of parenchymal cells might explain the preferential influence of Aire on loci encoding PTAs. Lastly, there is a strong correspondence between those genes induced by Aire and those repressed by a small-molecule inhibitor of BET proteins, among which is BRD4, a critical Aire partner26. Super-enhancers, which are overloaded with BRD4, are also known to be particularly sensitive to BET protein inhibitors36,37. Indeed, the preferential localization of BRD4 on upstream intergenic regions and its relative depletion from TSSs26 anticipated the partitioning of Aire that we observed.
There is a growing body of evidence that topoisomerases are important for gene transcription38,39. These enzymes cleave and rapidly reseal one (TOP1) or both (TOP2A/B) DNA strands, thereby generating a transient break through which topological changes are effected. Failure to complete the enzymatic reaction leads to trapping of a covalent DNA-topoisomerase intermediate, resulting in a single-stranded nick in the case of TOP1 and a double-stranded break in the case of TOP2. Confrontation of TOP1-induced nicks by the replication or transcription machinery often culminates in DSBs40,41. Several processes involved in the transcription of protein-coding genes generate topological strain that is relieved by these topoisomerases: chromatin remodeling42,43; synthesis of enhancer RNAs (eRNAs) and nucleosome depletion in enhancers44; DNA contortions at TSSs as a result of nucleosome depletion, transcription factor binding or RNA-PolII pausing45,46; and transcriptional elongation, which induces positive supercoils upstream of the polymerase and negative supercoils downstream of it28,47,48.
We previously suggested that Aire stabilizes TOP2A-induced DSBs at and downstream of TSSs15, thereby initiating recruitment of a histone-eviction complex composed of DNA-PKcs, Ku80, TOP2, PARP-1 and FACT that facilitates transcriptional elongation49. This notion was based on several lines of evidence: co-immunoprecipitation of TOP2A, DNA-PKcs and the other members of the eviction complex with Aire; the ability of Aire to promote DSBs in vitro and in vivo; and the strong correspondence between mTEC genes induced by Aire and by treatment of Aire-deficient cells with the TOP2 poison etoposide. However, our current results argue that this scenario is incorrect, or at least incomplete. Instead, we found that TOP1 was a primary Aire partner, seeding the formation of Aire-containing multi-protein complexes, notably at super-enhancers, through recruitment of elements of the DNA-damage response such as γH2AX, DNA-PKcs, Ku80 and PARP-1. Both TOP2A and TOP2B were also required for the induction of gene expression by Aire, but rather in subsequent events, a function that might still be performed by the histone-eviction complex. Findings in other systems support several elements of this revised scenario: eRNAs and TOP1-mediated DNA breaks are linked44; the DNA breaks induced by TOP1 can mobilize the DNA-damage response44; and TOP1 and TOP2 cooperate to optimize transcription28,43,47.
On the basis of these and previous11,14,15,26 observations, we propose a simplified model of Aire-induced gene expression. Attracted by hypomethylated H3K4me0 and/or repelled by hypermethylated H3K4me3 (ref. 11), Aire localizes to mTEC super-enhancers and interacts with TOP1, stabilizing DNA DSBs promoted by eRNA transcription and nucleosome depletion15 and thereby promoting recruitment of γH2AX, DNA-PKcs and other elements of the DNA-damage response, including Ku80 and PARP-1 (ref. 15). General transcription factors, such as RNA-PolII, CBP and BRD4 (ref. 26), are also drawn in, resulting in super-enhancers that host high concentrations of Aire-containing multi-protein complexes. Via chromatin looping, the super-enhancers serve as transcription factor depots for regional TSSs, particularly those contorted by paused RNA-PolII14, which itself can promote TOP2-induced DSBs, and thus independently seed the formation of some Aire-containing complexes. BRD4 in the complexes recruits pTEFb (composed of CycT1+CDK9 subunits)26, which phosphorylates DSIF, thereby lifting transcriptional pausing and promoting elongation. TOP1, and especially TOP2, ride along with RNA-PolII to relieve the torsional stresses introduced behind and in front of it. Additional factors (for example, DDX5, SFRS3) are independently incorporated into Aire-containing complexes, serving to link the transcription and splicing machineries.
Further validation of this model will probably require substantial evolution of existing genome-scale methods. The cell-to-cell variability of the effect of Aire, best appreciated from single-cell RNA-seq, may ultimately demand single-cell chromatin-capture approaches.
Maintenance, generation and treatment of mice.
Mice were housed and bred under specific-pathogen-free conditions at the Harvard Medical School Center for Animal Resources and Comparative Medicine (Institutional Animal Care and Use Committee protocol #02954). C57BL/6 (B6) Aire+/− mice5 were bred to generate Aire+/+ and Aire−/− littermates for experiments. Igrp-Gfp (Adig) reporter mice were provided by Dr. Mark Anderson, and were appropriately bred to yield Aire+/+ and Aire−/− littermates. Unless specified otherwise, females were used.
For the microarray experiments, 4-week-old B6.Aire+/+ and B6.Aire−/− mice were injected intraperitoneally with 5 mg/kg topotecan (Sigma-Aldrich), etoposide (Sigma-Aldrich) or both drugs, dissolved in dimethylsulfoxide (DMSO), once a day for three consecutive days. For the tetramer-staining experiments, mice of the same types were administered 1.25 mg/kg topotecan or etoposide once a day every third day for 3 weeks. To analyze effects on autoimmunity, we intraperitoneally injected Aire+/+ and Aire−/− NOD/LtJ pups with 0.675 mg/kg topotecan or 1.25 mg/kg etoposide once a day on the second, fourth and eighth days after birth.
Isolation, sorting and analysis of thymic and dermal cells.
Thymus tissue from individual 4–6-week-old Aire+/+ or Aire−/− mice was minced with scissors to release thymocytes and the fragments were digested with collegenase (Roche) and DNase (Sigma-Aldrich) for 15 min, then with collagenase/dispase (Roche) for 30 min, as previously described14. The released cells were stained with primary antibodies (Abs) (MHCII-APC; Ly51-PE; CD45-PE/Cy5), and CD45+ cells were depleted by MACS separation with anti-PE beads (Miltenyi). DAPI−CD45−Ly51loMHCIIhi mTECs were sorted on a MoFlo instrument (Cytomation) into Trizol for RNA preparation (for microarray) or into Fetal Bovine Serum (FBS) (Gibco) for ChIP-seq library preparation, while GFP+(Aire+) mTECs were sorted in FACS buffer [phosphate-buffered saline (PBS), 0.5% bovine serum albumin, 2mM EDTA] for ATAC-seq library preparation.
For isolation of dermal fibroblasts, earskin tissue was minced with scissors and digested with collagenase type IV (Gibco) and DNase (Sigma-Aldrich) for 60 min. The single-cell suspension was stained with primary Abs (CD45-APC; EpCAM-APC; CD31-FITC; Ter-119-FITC; Sca-1-PE), and DAPI−CD45−EpCAM−CD31−Ter-119−Sca-1+ dermal fibroblasts were sorted in FACS buffer for preparation of ATAC-seq libraries.
For flow cytometric sorting or analysis, the following Abs were used: Ly51-PE (108308, BioLegend); CD45-PE/Cy5 (103110, BioLegend); MHCII-APC (107614, BioLegend); Aire (14-5934-80, eBioscience); TOP1 (ab85038, Abcam); TOP2A (ab12318, Abcam); TOP2B (ab72334, Abcam); CD3ɛ-PE (100308, BioLegend), CD4-PerCP/Cy5.5 (100434, BioLegend), CD8a-APC/Cy7 (100714, BioLegend), CD19-PE/Cy7 (115520, BioLegend), B220-PE/Cy7 (103222, BioLegend), CD11b-PE/Cy7 (101216, BioLegend), CD11c-PB (117322, BioLegend), F4/80-PE/Cy7 (123114, BioLegend). CD45-APC (103112, BioLegend), EpCAM-APC (118214, BioLegend), CD31-FITC (1625-02, SouthernBiotech), Ter-119-FITC (116206, BioLegend), Sca-1-PE (12-5981-83, eBioscience). Anti-rat IgG secondary Abs conjugated with FITC were from SouthernBiotech, while anti-rabbit IgG-Alexa Fluor 647 Abs were purchased from Jackson Immunoresearch.
1.5 × 105 mTEChi from 4–6-week-old female B6 mice were used for each ChIP-seq sample, adapting published protocols17,50. Briefly, mTECs were cross-linked with 1% formaldehyde for 8 min, sorted and lysed for 10 min on ice in RadioImmunoPrecipitation Assay (RIPA) buffer [10mM Tris-HCl (pH 8.0), 1mM EDTA (pH 8.0), 140mM NaCl, 1% Triton X-100, 0.1% sodium dodecyl sulfate (SDS) and 0.1% sodium deoxycholate] supplemented with complete protease inhibitor cocktail (Roche). Chromatin was sheared using an AFA Focused-ultrasonicator (Covaris) for 15 min (duty cycle 2%, intensity 3, cycle/burst 200) and the sheared material was cleared by a 10-min centrifugation at 13,000 rpm at 4 °C. The cleared material was immunoprecipitated overnight at 4 °C with Abs conjugated to magnetic Protein-G beads (Life Technologies, Dynabeads), followed by extensive washing of the beads with ice-cold RIPA, high-salt RIPA [10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0), 500 mM NaCl, 1% Triton X-100, 0.1% SDS, 0.1% sodium deoxycholate], LiCl [10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0), 250 mM LiCl, 0.5% NP-40 and 0.5% sodium deoxycholate] and TE [10 mM Tris-HCl (pH 8.0) and 1 mM EDTA (pH 8.0)]. Chromatin derived from 1.5 ×105 cells immunoprecipitated with specific Abs was eluted from the beads, treated with 1 μg DNase-free RNase (Roche) for 30 min at 37 °C and with Proteinase K (Roche) for 2 h at 37 °C followed by reverse cross-linking by leaving the plate at 65 °C overnight. DNA from reverse cross-linked material was purified with SPRI beads (Agencourt AMPure XP beads, Beckman Coulter); and sequential steps of end-repair, A-base addition, adaptor-ligation and PCR amplification (15 cycles) were performed to prepare the ChIP-seq library for each sample, as described previously17. ChIP-seq for H3K27me3 was performed as previously reported51.
Individual ChIP-seq libraries were size-selected for 200–500-bp fragments with SPRI beads. Equivalent amounts of barcoded libraries were pooled and sequenced using HiSeq 2500 or NextSeq 500 (Illumina) instruments. To control for background noise, we immunoprecipitated sheared chromatin with purified rabbit IgG Abs, and a ChIP-seq library was prepared and sequenced as described above.
For ChIP-seq analysis, the following Abs were used: Aire (14-5934-80, eBioscience); TOP1 (ab3825 and ab85038, Abcam); TOP2A (WH0007153M1, Sigma-Aldrich); RNA-PolII (MMS-128P, Covance); γH2AX (05-636, Millipore); H3K4me1 (ab8895, Abcam and 07-436, Millipore); H3K27ac (ab4729, Abcam) and H3K27me3 (ab6002, Abcam).
Short reads (50 bp, single end) were aligned to the mouse reference genome (mm10) using bowtie aligner version 2.2.4 (ref. 52). Reads with multiple alignments were removed with samtools (v1.1) and de-duplicated with picard (v1.130). To identify peaks from ChIP-seq reads, we used the HOMER package makeTagDirectory followed by the findPeaks command with the 'histone' parameter53. Peaks displaying fourfold enrichment and poison P-value of 1x10−4 against background IgG ChIP were considered significant and were used for further analysis. To visualize individual ChIP-seq data on Integrative Genomics Viewer (IGV)54, we converted bam output files from picard into normalized bigwig format using the bamCoverage function in deepTools (v1.6) with options – fragmentLength 200 –normalizeUsingRPKM55. HOMER-generated peak files for H3K27ac were used for the identification of super-enhancers, using the ROSE algorithm described previously21, wherein enhancer peaks are stitched together if they are located within 12.5 kb of each other and if they do not have multiple active promoters in between; enhancers were then ranked according to increasing H3K27ac signal intensity. Heatmaps as in Figure 1d and line plots as in Figure 1b were generated using program ngs.plot56.
1 × 104 mTECs or fibroblasts from 4–6-week-old female Adig mice were used for preparation of ATAC-seq libraries, adapting published protocols23,57. Briefly, cells were suspended in 100 μl of cold hypotonic lysis buffer [10 mM Tris-HCl (pH 7.5), 10 mM NaCl, 3 mM MgCl2 and 0.1% NP40], followed by immediate centrifugation at 550 g for 30 min. The pellet was re-suspended in 5 μl of transposition reaction mix [1 μl of Tagment DNA Enzyme and 2.5 μl of Tagment DNA Buffer from Nextera DNA Sample Prep Kit (Illumina), 1.5 μl H2O], and was incubated for 60 min at 37 °C for DNA to be fragmented and tagged. For library preparation, two sequential seven-cycles of PCR were performed to enrich small tagmented DNA fragments. After the first PCR, the libraries were selected for small fragments (less than 600 bp) using SPRI beads followed by a second round of PCR with the same conditions in order to obtain the final library. Libraries were sequenced on the NextSeq 500 instrument to generate paired-end short reads (50 bp, forward; 34 bp, reverse). Data were processed essentially as per ChIP-seq analysis, except reads mapping to mitochondrial DNA (1∼7%) were removed before analysis and peaks were identified using the 'factor' parameter in the findPeaks command of the Homer package.
Cell culture and transfection.
HEK293T cells were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% FBS, L-glutamate and penicillin/streptomycin antibiotics, and were maintained in a humidified atmosphere at 37 °C with 5% CO2. For transfection, the cells were seeded in 10cm tissue-culture plates, and were transfected with the specified plasmids using TransIT reagent (Mirus) according to the manufacturer's instructions. A plasmid driving expression of wild-type mouse Aire FLAG-tagged at the amino terminus (FLAG-Aire) was constructed by in-frame insertion between the BglII and SalI sites of the pCMV-tag1 vector (Clontech), as described previously58.
Gel filtration chromatography, immunoprecipitations and mass spectrometry.
HEK293T cells were transfected with empty or FLAG-Aire-containing pCMV-tag1 vector. 48 h later, the cells were harvested and lysed in a hypotonic lysis buffer [0.05% NP-40, 10 mM HEPES, 1.5 mM MgCl2, 10 mM KCl, 5 mM EDTA (3 ×107 cells/ml)] plus complete protease inhibitor cocktail (Roche), pH 7.4, followed by incubation on ice for 15 min. Nuclei were separated from the cytosolic fraction by centrifugation at 800 g for 10 min, and were incubated at 4 °C for 1 h in a native nuclear extraction buffer [50 mM Bis-Tris, 750 mM 6-aminocaproic acid, 3 mM CaCl2, 10% Glycerol, EDTA-free complete protease inhibitor cocktail (Roche) and micrococcal nuclease (Nuclease S7; Roche), pH 7.4, (6 ×107 cells/ml)].
For gel filtration chromatography, ∼1.5 mg of nuclear extract was injected into a Superose-6 10/300 GL column (GE Healthcare Life Sciences), and was separated by fast-protein liquid chromatography (FPLC) using the elution buffer 20 mM HEPES, 150 mM NaCl, 20 mM KCl, 0.5 mM MgCl2, 3 mM CaCl2, EDTA-free complete protease inhibitor cocktail, pH 7.4. 27 fractions of 1ml each were collected. The indicated fractions, used directly or pooled, were concentrated via filter centrifugation (Amicon Ultra, 10 kDa cutoff, Millipore) to 500 μl.
For immunoprecipitations, concentrated, pooled chromatographic fractions (from the above-mentioned chromatography experiment) or nuclear extracts (for standard immunoprecipitation experiments) were incubated with the indicated Abs conjugated to Protein-G Sepharose beads (Life Technologies) overnight with rotation at 4 °C. To identify DNA-dependent protein interactions, nuclear extracts were treated with ethidium bromide (100 μg/ml) for 15 min at 4 °C followed by 10-min centrifugation at 13,000 rpm at 4 °C and supernatants were used for immunoprecipitation as above. Ethidium bromide was also included in all the later washing steps for this experiment. Beads were washed thrice with ice-cold PBS containing 0.05% NP-40, and once with ice-cold PBS. Bound proteins were eluted by boiling the beads in sample buffer for 15 min, separated by SDS-PAGE, electro-transferred to polyvinylidene difluoride (PVDF) membranes (Bio-Rad), blocked for 60 min with 5% non-fat dried milk solution in PBST [PBS (pH 7.4), 0.05% Tween 20], and were probed with primary Abs overnight at 4 °C. After a wash with PBST, membranes were incubated with secondary Abs linked to horseradish peroxidase. The blots were then developed with an enhanced chemiluminescence detection system (Thermo Scientific) as per the manufacturer's instructions. For quantification, the chemiluminescent images were processed with Multi Gauge v2.3 (Fujifilm).
For immunoprecipitation studies, Abs recognizing the following proteins were used: FLAG-tag (M2 mouse mAb, Sigma-Aldrich); DNA-PKcs (MS-423-P1, Thermo Scientific); Ku80 (ab55408, Abcam); PARP-1 (9542, Cell Signaling); TOP1 (ab85038, Abcam); TOP2A (ab12318, Abcam); TOP2B (ab72334, Abcam); RNA-PolII (sc-899, Santa Cruz); SPT5 (sc-28678, Santa Cruz); CDK9 (sc-484, Santa Cruz); BRD4 (ab84776, Abcam); SFRS3 (H00006428-M08, Abnova) and DDX5 (sc-166167, Santa Cruz). Anti-mouse and anti-rabbit IgG secondary Abs conjugated with horseradish peroxidase were purchased from Jackson Immunoresearch.
Mass-spectrometry technique for analysis of FLAG-Aire immunoprecipitates has been detailed earlier15. Briefly, nuclear extracts from FLAG-Aire-HEK293T cells were incubated with 20 μl Protein-G Sepharose beads conjugated to anti-FLAG Abs overnight with rotation at 4 °C. Beads were washed three times with ice-cold PBS containing 0.05% NP-40, and once with ice-cold PBS. Immunoprecipitated proteins were eluted by boiling in sample buffer for 15 min, and were separated by 10% SDS-PAGE. Gels were stained with Coomassie G-250, and tryptic digests of individual lanes were analyzed by LC-MS/MS using an LTQ mass spectrometer. Analysis of the MS/MS data was performed using the SEQUEST algorithm as described previously15.
shRNA-mediated knockdown of Aire partners.
Knockdown of Aire-partners in HEK293T cells was accomplished by expression of cognate shRNAs (four per partner) in the lentiviral vector pLKO.1, procured from the RNAi Consortium of the Broad Institute. shRNAs targeting LacZ served as controls. We transduced HEK293T cells with individual shRNA-containing lentivirus particles, selected them using Puromycin (Gibco), then transfected them with the FLAG-Aire plasmid. 48 h later, we performed immunoprecipitations with anti-FLAG Abs, as described above. The densities of immunoprecipitated protein bands were quantified with Multi Gauge v2.3 (Fujifilm). The densities for all immunoprecipitated protein bands, after transduction of LacZ (two hairpins) or cognate shRNAs (four hairpins), were averaged for two independent experiments and scaled considering immunoprecipitation after LacZ transduction as 100%.
Microarray and quantitative PCR analyses.
RNA was prepared from mTEChi of individual mice treated with vehicle (DMSO) alone, topotecan, etoposide or both drugs followed by amplification and cDNA preparation as previously described58. cDNA was either hybridized to Affymetrix ST1.0 microarrays or used for quantitative PCR analysis of Irbp expression. Quantitative PCR was performed using Power SYBR Green master mix (Thermo Scientific) and the StepOnePlus real-time PCR system (Applied Biosystems). Primer sequences were Irbp-forward, CTACAACCGGCCCAATGACT; Irbp-reverse, AAGTAAATTCCTCGGCGGCA; Hprt-forward, TGCCGAGGATTTGGAAAAAGTG; Hprt-reverse, TGGCCTCCCATCTCCTTCAT.
Microarray data were processed using the robust multiarray average (RMA) algorithm for probe-level normalization and analyzed by the multiplot module of GenePattern (Broad Institute).The feature-level analysis of microarray data was performed as described previously, with slight modifications14. Briefly, we processed the raw probe-level data files (.CEL) from Affymetrix ST1.0 microarrays with the RMA algorithm to generate normalized exon-level and gene-level data files for each sample. The genome-wide locations of microarray probes on “mm10 (mouse) build” were extracted from the Affymetrix website. Exon-level expression values for Aire-induced genes (Aire+/+/Aire−/− >2) were taken for further analysis if the gene displayed exon1 imbalance (i.e. the ratio between exon Aire+/+/Aire−/− fold change to transcript Aire+/+/Aire−/− fold change was >2 or <0.5). For Aire-induced genes flagged for exon1 imbalance, expression levels of the exons were plotted against their distance from the TSSs.
Inhibitor- or vehicle-treated mice were immunized with 100 μg P2 peptide (IRBP271–290) or P7 peptide (IRBP771-790) emulsified in complete Freund's adjuvant, as described previously31. APC-conjugated Ab:P2 and Ab:P7 tetramers were generated by the National Institutes of Health Tetramer Core Facility. For tetramer staining, 10 d after immunization, peripheral lymph node and spleen cells were pooled and stained for 1 h at 25 °C, followed by magnetic-bead purification using anti-APC beads to enrich for tetramer-positive cells. The selected cells were stained with antibodies to CD3ɛ, CD4, CD8, B220, CD19, F4/80, CD11b and CD11c. Stained cells were analyzed on an LSRII (BD Biosciences), and tetramer-reactive cells were gated as CD3+CD4+CD8−CD11b−CD11c−F4/80−B220−CD19− using FlowJo software (TreeStar). Tetramer-positive cells were enumerated by counting the total number of cells by MACSQuant (Miltenyi Biotech), and determining the fraction of tetramer-reactive cells on FlowJo.
Autoimmune disease monitoring.
Inhibitor- or DMSO-treated Aire+/+ mice were sacrificed at 15 weeks of age, while similarly treated Aire−/− were sacrificed at 12 weeks of age or when they had lost 15–20% body weight relative to that of littermates. The designated tissues were removed, fixed in 10% formalin and embedded in paraffin. Tissue sections were stained with hematoxylin and eosin (H+E), and infiltration of various organs was scored. In general, scores of 0, 0.5, 1, 2, 3 and 4 indicate no, trace, mild, moderate, or severe lymphocytic infiltration, and complete destruction, respectively. For retinal degeneration, 0 = lesion present without any photoreceptor layer lost; 1 = lesion present, but less than half of the photoreceptor layer lost; 2 = more than half of the photoreceptor layer lost; 3 = entire photoreceptor layer lost without or with mild outer nuclear layer attack; and 4 = the entire photoreceptor layer and most of the outer nuclear layer destroyed. All samples were scored blindly and independently by two investigators.
Data were routinely presented as mean ± s.d. or s.e.m. Statistical significance was assessed by Student's t test, χ2 test or the Wilcoxon rank-sum test, as specified in individual figure legends.
The ATAC-seq, ChIP-seq and microarray datasets reported in this manuscript can be accessed in GEO with accession codes GSE92594, GSE92597 and GSE92509. Other referenced publically available data sets: B6.Aire+/+ and B6.Aire−/− RNA-seq, SRR2038194, SRR2038195, SRR2038196 and SRR2038197; H3K27ac ChIP-seq for Aire+ and Aire− mTECs, GSE74257; H3K27ac and H3K4me1 ChIP-seq for HEK293T cells, GSE51633. Aire, RNA-PolII and IgG ChIP-seq data in Aire-transfected HEK293T cells came from ref. 14.
Gene Expression Omnibus
Sequence Read Archive
We thank G. Buruzula, K. Rothamel, A. Rhoads, K. Hattori, A. Lopez, G. Gopalan, K. Waraska, M. Thorsen for experimental assistance and C. Laplace for help with manuscript preparation. The NIH Tetramer Core Facility (contract HHSN272201300006C) kindly provided tetramers. This work was supported by NIH grants R01 DK060027 and R01 AI088204. K.B. was supported by American Diabetes Association Mentor-Based Postdoctoral Fellowship #7-12-MN-51 to D.M.
Integrated supplementary information
About this article
Cellular and Molecular Life Sciences (2018)