Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization

Abstract

AIRE is an unconventional transcription factor that enhances the expression of thousands of genes in medullary thymic epithelial cells and promotes clonal deletion or phenotypic diversion of self-reactive T cells1,2,3,4. The biological logic of AIRE’s target specificity remains largely unclear as, in contrast to many transcription factors, it does not bind to a particular DNA sequence motif. Here we implemented two orthogonal approaches to investigate AIRE’s cis-regulatory mechanisms: construction of a convolutional neural network and leveraging natural genetic variation through analysis of F1 hybrid mice5. Both approaches nominated Z-DNA and NFE2–MAF as putative positive influences on AIRE’s target choices. Genome-wide mapping studies revealed that Z-DNA-forming and NFE2L2-binding motifs were positively associated with the inherent ability of a gene’s promoter to generate DNA double-stranded breaks, and promoters showing strong double-stranded break generation were more likely to enter a poised state with accessible chromatin and already-assembled transcriptional machinery. Consequently, AIRE preferentially targets genes with poised promoters. We propose a model in which Z-DNA anchors the AIRE-mediated transcriptional program by enhancing double-stranded break generation and promoter poising. Beyond resolving a long-standing mechanistic conundrum, these findings suggest routes for manipulating T cell tolerance.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The Z-DNA and NFE2–MAF-binding motifs are salient features of the extended promoters of AIRE-induced genes.
Fig. 2: Identification of TF-motif variants associated with allelic imbalances in the chromatin accessibility and expression of AIRE-induced genes.
Fig. 3: Enhancing Z-DNA stability promotes the expression of AIRE-induced genes.
Fig. 4: Z-DNA motifs are positively associated with the generation of DSBs, the strength of which correlates positively with poising of the promoters of AIRE-inducible genes.
Fig. 5: DSBs facilitate AIRE-induced gene expression.

Similar content being viewed by others

Data availability

All sequencing data reported in this Article have been deposited as a SuperSeries at the GEO under accession code GSE224557. Specifically, the ATAC–seq, BLISS, ChIP–seq, ChIPmentation, CUT&Tag, bulk RNA-seq and scRNA-seq data are available under accession codes GSE224551, GSE224552, GSE224553, GSE224554, GSE224555, GSE224556 and GSE253215, respectively. Public datasets used in this article are as follows: GSE92594 (ATAC–seq for WT and Aire-KO mTECs), GSE92597 (RNA Pol II, AIRE and IgG ChIP–seq for WT mTECs), GSE180937 (MED1 and IgG ChIP–seq for WT mTECs) and GSE102526 (ATAC–seq for mTECs from Brg1-WT and Brg1-cKO mice). Source data are provided with this paper.

Code availability

Code and scripts used in this study are available at Zenodo (https://doi.org/10.5281/zenodo.10472904).

References

  1. Anderson, M. S. et al. Projection of an immunological self shadow within the thymus by the aire protein. Science 298, 1395–1401 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Sansom, S. N. et al. Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia. Genome Res. 24, 1918–1931 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Meredith, M., Zemmour, D., Mathis, D. & Benoist, C. Aire controls gene expression in the thymic epithelium with ordered stochasticity. Nat. Immunol. 16, 942–949 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Brennecke, P. et al. Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat. Immunol. 16, 933–941 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. van der Veeken, J. et al. Natural genetic variation reveals key features of epigenetic and transcriptional memory in virus-specific CD8 T cells. Immunity. 50, 1202–1217 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Novakovsky, G. et al. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023).

    Article  CAS  PubMed  Google Scholar 

  7. Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Link, V. M. et al. Analysis of genetically diverse macrophages reveals local and domain-wide mechanisms that control transcription factor binding and function. Cell 173, 1796–1809 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Bansal, K., Yoshida, H., Benoist, C. & Mathis, D. The transcriptional regulator Aire binds to and activates super-enhancers. Nat. Immunol. 18, 263–273 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Rodriguez-Martinez, J. A. et al. Combinatorial bZIP dimers display complex DNA-binding specificity landscapes. eLife. 6, e19272 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Rich, A., Nordheim, A. & Wang, A. H. The chemistry and biology of left-handed Z-DNA. Annu. Rev. Biochem. 53, 791–846 (1984).

    Article  CAS  PubMed  Google Scholar 

  12. Georgakopoulos-Soares, I. et al. High-throughput characterization of the role of non-B DNA motifs on promoter function. Cell Genom. 2, 100111 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Umerenkov, D. et al. Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease. Life Sci. Alliance 6, e202301962 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kouzine, F. et al. Permanganate/S1 nuclease footprinting reveals non-B DNA structures with regulatory potential across a mammalian genome. Cell Syst. 4, 344–356 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Liu, R. et al. Regulation of CSF1 promoter by the SWI/SNF-like BAF complex. Cell 106, 309–318 (2001).

    Article  CAS  PubMed  Google Scholar 

  17. Zhang, J. et al. BRG1 interacts with Nrf2 to selectively mediate HO-1 induction in response to oxidative stress. Mol. Cell. Biol. 26, 7942–7952 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Liu, H., Mulholland, N., Fu, H. & Zhao, K. Cooperative activity of BRG1 and Z-DNA formation in chromatin remodeling. Mol. Cell. Biol. 26, 2550–2559 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Shin, S. I. et al. Z-DNA-forming sites identified by ChIP-seq are associated with actively transcribed regions in the human genome. DNA Res. 23, 477–486 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Marshall, P. R. et al. Dynamic regulation of Z-DNA in the mouse prefrontal cortex by the RNA-editing enzyme Adar1 is required for fear extinction. Nat. Neurosci. 23, 718–729 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nat. Genet. 51, 1652–1659 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Zhang, T. et al. ADAR1 masks the cancer immunotherapeutic promise of ZBP1-driven necroptosis. Nature 606, 594–602 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Thomas, T. J., Gunnia, U. B. & Thomas, T. Polyamine-induced B-DNA to Z-DNA conformational transition of a plasmid DNA with (dG-dC)n insert. J. Biol. Chem. 266, 6137–6141 (1991).

    Article  CAS  PubMed  Google Scholar 

  24. Brooks, W. H. Increased polyamines alter chromatin and stabilize autoantigens in autoimmune diseases. Front. Immunol. 4, 91 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wang, G. & Vasquez, K. M. Z-DNA, an active element in the genome. Front. Biosci. 12, 4424–4438 (2007).

    Article  CAS  PubMed  Google Scholar 

  26. Meng, Y. et al. Z-DNA is remodelled by ZBTB43 in prospermatogonia to safeguard the germline genome and epigenome. Nat. Cell Biol. 24, 1141–1153 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Pommier, Y., Sun, Y., Huang, S. N. & Nitiss, J. L. Roles of eukaryotic topoisomerases in transcription, replication and genomic stability. Nat. Rev. Mol. Cell Biol. 17, 703–721 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Puc, J. et al. Ligand-dependent enhancer activation regulated by topoisomerase-I activity. Cell 160, 367–380 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Madabhushi, R. et al. Activity-induced DNA breaks govern the expression of neuronal early-response genes. Cell 161, 1592–1605 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Pessina, F. et al. Functional transcription promoters at DNA double-strand breaks mediate RNA-driven phase separation of damage-response factors. Nat. Cell Biol. 21, 1286–1299 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Sperling, A. S., Jeong, K. S., Kitada, T. & Grunstein, M. Topoisomerase II binds nucleosome-free DNA and acts redundantly with topoisomerase I to enhance recruitment of RNA Pol II in budding yeast. Proc. Natl Acad. Sci. USA 108, 12693–12698 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Shykind, B. M. et al. Topoisomerase I enhances TFIID-TFIIA complex assembly during activation of transcription. Genes Dev. 11, 397–407 (1997).

    Article  CAS  PubMed  Google Scholar 

  33. Abramson, J., Giraud, M., Benoist, C. & Mathis, D. Aire’s partners in the molecular control of immunological tolerance. Cell 140, 123–135 (2010).

    Article  CAS  PubMed  Google Scholar 

  34. Guha, M. et al. DNA breaks and chromatin structural changes enhance the transcription of autoimmune regulator target genes. J. Biol. Chem. 292, 6542–6554 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Giraud, M. et al. Aire unleashes stalled RNA polymerase to induce ectopic gene expression in thymic epithelial cells. Proc. Natl Acad. Sci. USA 109, 535–540 (2012).

    Article  CAS  PubMed  Google Scholar 

  37. Oven, I. et al. AIRE recruits P-TEFb for transcriptional elongation of target genes in medullary thymic epithelial cells. Mol. Cell. Biol. 27, 8815–8823 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Durand-Dubief, M. et al. Topoisomerase I regulates open chromatin and controls gene expression in vivo. EMBO J. 29, 2126–2134 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Creemers, G. J., Lund, B. & Verweij, J. Topoisomerase I inhibitors: topotecan and irenotecan. Cancer Treat. Rev. 20, 73–96 (1994).

    Article  CAS  PubMed  Google Scholar 

  40. Maruyama, A., Mimura, J., Harada, N. & Itoh, K. Nrf2 activation is associated with Z-DNA formation in the human HO-1 promoter. Nucleic Acids Res. 41, 5223–5234 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Koh, A. S. et al. Rapid chromatin repression by Aire provides precise control of immune tolerance. Nat. Immunol. 19, 162–172 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Michelson, D. A. et al. Thymic epithelial cells co-opt lineage-defining transcription factors to eliminate autoreactive T cells. Cell 185, 2542–2558 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Michelson, D. A. & Mathis, D. Thymic mimetic cells: tolerogenic masqueraders. Trends Immunol. 43, 782–791 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Givony, T. et al. Thymic mimetic cells function beyond self-tolerance. Nature 622, 164–172 (2023).

    Article  CAS  PubMed  Google Scholar 

  45. Giraud, M. et al. An RNAi screen for Aire cofactors reveals a role for Hnrnpl in polymerase release and Aire-activated ectopic transcription. Proc. Natl Acad. Sci. USA 111, 1491–1496 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL-HLT (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).

  47. Avsec, Z. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Forrest, A. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  CAS  PubMed  Google Scholar 

  51. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at arxiv.org/abs/1606.08415 (2020).

  52. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, Las Vegas, 2016).

  54. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at arxiv.org/abs/1312.6034 (2014).

  55. Grant, C. E. & Bailey, T. L. XSTREME: comprehensive motif analysis of biological sequence datasets. Preprint at bioRxiv https://doi.org/10.1101/2021.09.02.458722 (2021).

  56. Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Cer, R. Z. et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 41, D94–D100 (2013).

    Article  CAS  PubMed  Google Scholar 

  59. Derbinski, J. et al. Promiscuous gene expression patterns in single medullary thymic epithelial cells argue for a stochastic mechanism. Proc. Natl Acad. Sci. USA 105, 657–662 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Peterson, P., Org, T. & Rebane, A. Transcriptional regulation by AIRE: molecular mechanisms of central tolerance. Nat. Rev. Immunol. 8, 948–957 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Gardner, J. M. et al. Deletional tolerance mediated by extrathymic Aire-expressing cells. Science 321, 843–847 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Huang, S. et al. A novel multi-alignment pipeline for high-throughput sequencing data. Database 2014, bau057 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  63. van der Veeken, J. et al. The transcription factor Foxp3 shapes regulatory T cell identity by tuning the activity of trans-acting intermediaries. Immunity. 53, 971–984 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. de Santiago, I. et al. BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes. Genome Biol. 18, 39 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics. 27, 1017–1018 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Bailey, T. L. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 37, 2834–2840 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  70. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  PubMed  Google Scholar 

  71. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Yoshida, H. et al. The cis-regulatory atlas of the mouse immune system. Cell 176, 897–912 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. https://doi.org/10.14806/ej.17.1.200 (2015).

  78. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Qunhua, L., James, B. B., Haiyan, H. & Peter, J. B. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011).

    MathSciNet  Google Scholar 

  81. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Ramirez, F. et al. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Shen, L., Shao, N., Liu, X. & Nestler, E. ngs.plot: quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genom. 15, 284 (2014).

    Article  Google Scholar 

  85. Gothe, H. J. et al. Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations. Mol. Cell 75, 267–283 (2019).

    Article  CAS  PubMed  Google Scholar 

  86. Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics. 28, 1919–1920 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Möller, A. et al. Monoclonal antibodies recognize different parts of Z-DNA. J. Biol. Chem. 257, 12081–12085 (1982).

    Article  PubMed  Google Scholar 

  91. Schmidl, C., Rendeiro, A. F., Sheffield, N. C. & Bock, C. ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors. Nat. Methods 12, 963–965 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Bansal, K. et al. Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers. Proc. Natl Acad. Sci. USA 118, e2110991118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank A. Baysov, J. Lee, I. Magill and the members of the Broad Genomics Platform for RNA-seq and scRNA-seq; the staff at the HMS Biopolymers Facility for all other sequencing; the members of the HMS Immunology Flow Core; L. Du and the staff at the HMS Transgenic Mouse Core; K. Hattori and A. Ortiz-Lopez for experimental assistance; L. Yang and B. Vijaykumar for computational help; C. Laplace for graphics; K. Chowdhary and D. Michelson for discussions; M. Anderson for providing the NOD mice with Aire-driven expression of IGRP–GFP; and A. Herbert for drawing our attention to the Z22 monoclonal antibody. This work was supported by NIH grant R01AI088204 (to D.M.). Y.F. is in part supported by the Harvard Molecules, Cells and Organisms Training Program. K.B. is supported by the Department of Biotechnology/Wellcome Trust India Alliance Intermediate Fellowship (IA/I/19/1/504276).

Author information

Authors and Affiliations

Authors

Contributions

Y.F. and D.M. conceived the study. Y.F. designed and performed all experiments except for the Pol II ChIP–seq. Y.F. performed all data analysis with supervision from D.M., C.B. and S.M. K.B. performed the Pol II ChIP–seq of mTECs from Aire-KO mice. Y.F. and D.M. wrote the manuscript, which was edited by all of the authors.

Corresponding author

Correspondence to Diane Mathis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Alan Herbert and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Performance of the pre-trained CNN model.

a, Schematics of the CNN model for pre-training and fine-tuning. The first section of the main body is comprised of convolutional layers to extract relevant DNA sequence motifs. The following section has repeated blocks containing dilated convolutional layers with residual skip connections, to spread information and model long-range interactions in the input DNA sequences. AIRE-induced and expression-matched AIRE-neutral gene lists have been described3,9. Briefly, AIRE-induced genes were defined as Aire+/+/Aire−/− > 2 and AIRE-neutral genes were Aire+/+/Aire−/− > 0.9 and <1.1, based on bulk RNA-seq data from ref. 3. b, Exemplar true versus predicted profiles over a randomly selected sequence from the test set. Profiles for 15 sequencing datasets are shown. c, Boxplot showing the Pearson correlations between the predicted and true sequencing profiles of test set sequences for the sequencing datasets used for the pre-training, including DNase-seq, ATAC-seq and ChIP-seq. d, Model evaluation on four datasets: a test set and validation set from the B6 genome, and two test sets from the NOD genome: (1) containing SNPs/Indels compared with counterparts in the B6 genome, and (2) derived from NOD-specific genes (in order to prevent data leakage during prediction). e, Bar plot comparing the performances of randomly initialized model and pre-trained model on the test set from the B6 genome. SNPs: single-nucleotide polymorphisms; Indels: insertions and deletions; AIG, AIRE-induced gene; ANG: AIRE-neutral gene.

Source Data

Extended Data Fig. 2 Additional analysis using the fine-tuned CNN model and Z-DNABERT, related to Fig. 1.

a, Contribution score profiles for AIRE-induced genes whose largest-positive-gradient regions contained (CA)n repeats (left) or NFE2–MAF-binding motifs (right). b, Motifs enriched in the regions (50 bp in length) with the largest ISM scores. c, Motifs that are relatively more enriched in extended-promoter sequences of AIRE-induced genes than AIRE-neutral genes (E-value < 0.05). The MEME suite was used to identify the enriched motifs for panel b and c. d, Example ISM score heatmaps for (CA)n repeats in AIRE-induced gene promoters. Each of the three rows shows results for one possible substitution in the order of A- > C- > G- > T from top to bottom. Red (positive ISM score) indicates that substitution of the original nucleotide leads to a decreased average Z-DNA score across the stretch of the (CA)n repeat; Blue (negative ISM score) indicates the other way. e, Boxplot showing the distribution of ISM scores at various positions near the boundaries of (CA)n repeats in AIRE-induced gene promoters. For example, position 2 indicates the second nucleotides from both ends of a (CA)n repeat. p-values for panel e were calculated using the one-sample Wilcoxon Signed Rank Test (one-tailed). AIG, AIRE-induced gene.

Source Data

Extended Data Fig. 3 Strain-specific gene expression in mTECs was predominantly driven by cis-regulation.

a,b, Cytofluorometric gating scheme for isolation of mTECs from B6 (a), NOD and F1 (b) mice. c, Rationale of the F1-hybrid analysis. d, Allelically imbalanced gene transcripts and OCRs in B6×NOD F1-hybrid mTECs. Red dots depict significantly imbalanced events with a false discovery rate (FDR) < 0.05. e, Correlation between the fold-changes in accessibility for mTEC OCRs in B6 versus NOD mice (x axis) and fold-changes in accessibility for OCRs (n = 23256) on the B6 versus NOD allele in mTECs of F1 hybrids (y axis). Red dots depict significantly imbalanced OCRs (n = 3750). f, Correlation between the fold-changes in transcript levels for mTECs from B6 versus NOD mTECs (x axis) and transcript fold-changes for the B6 versus NOD allele in mTECs of F1 hybrids (y axis). Only genes significantly differentially expressed in B6 and NOD mTECs are shown (adjusted p-value < 0.05, n = 248). g, Correlation between allelic biases in the expression of the nearest AIRE-induced gene (x axis) and in the accessibility of the OCR (y axis). The imbalanced OCR was assigned to an imbalanced AIRE-induced gene (n = 248) if 1) the OCR was located within 50,000 bp of the gene’s TSS and 2) the AIRE-induced gene was the OCR’s nearest gene. There were 156 imbalanced OCRs assigned to imbalanced AIGs. p-values according to panels e and g were from Spearman’s correlation. FC: fold-change. OCR: open chromatin region; TF: transcription factor.

Source Data

Extended Data Fig. 4 Exemplar genetic variants associated with allelic imbalances in chromatin accessibility and gene expression.

a,b, Examples of genetic variants of NFE2L2-binding motifs associated with imbalanced expression of AIRE-induced genes. c,d, Examples of genetic variants of Z-DNA motifs associated with allelic imbalances in the expression of an AIRE-induced gene. OCR: open chromatin region; WT: wild-type; Aire-KO: Aire knockout.

Extended Data Fig. 5 Effect of spermidine on Z-DNA formation and thymic cell populations.

a, Density plot showing the effect of spermidine vs control PBS injection on Z-DNA intensity in mTECs measured by flow cytometry using an anti-Z-DNA antibody. b, Representative cytofluorimetric plots and quantitative summary for mTECs of WT and Aire-KO mice treated with spermidine or control PBS. c, Cytofluorometric gating scheme for analyses of thymocyte compartments. d, Analogous plots to panel b for thymocyte compartments. Error bars, mean ± s.e.m. from n = 3 biological replicates. KO: Aire-KO.

Source Data

Extended Data Fig. 6 Additional analysis of scRNA-seq of PBS-treated versus spermidine-treated mTECs, related to Fig. 3.

a, Log2 ratio (M-values) versus log2 average (A-values) plots showing the effect of spermidine treatment on mTECs from wild-type mice. Red dots depict spermidine-specific AIRE-induced genes (FC > 2, p-value < 0.05). Dark grey dots depict AIRE-induced transcripts shared between mice injected with spermidine and PBS. b, Per-replicate UMAPs of scRNA-seq of mTEChi and post-AIRE mTEClo for PBS-treated and spermidine-treated Aire-WT mice. Each dot on the UMAPs is a single cell (n = 3184). Each number on the UMAPs indicates a cluster identified using Seurat. c, Merged UMAPs of scRNA-seq of mTEChi and post-AIRE mTEClo from PBS-treated (n = 2 biological replicates) and spermidine-treated (n = 2 biological replicates) Aire-WT mice. mTEC subtypes were labeled. d, UMAPs showing the expression of Aire (left) and one MHC Class II gene (right).

Source Data

Extended Data Fig. 7 Additional analysis of BLISS in mTECs, related to Fig. 4.

a, Boxplot comparing BLISS signals at Z-DNA ChIP-seq peaks that had low, medium or high Z-DNA ChIP-seq signals in mTECs from Aire-WT mice. Low: <25th percentile (n = 1508); Medium: 25th - 75th percentile (n = 3008); High: > 75th percentile (n = 1506). b, Boxplot comparing Aire-KO BLISS signals at promoters of AIRE-inducible (n = 1563) and expression-matched AIRE-neutral genes in Aire-KO mTECs (n = 1907). AIRE-inducible and AIRE-neutral genes were weakly expressed genes (Aire-KO TPM < 0.35) whose promoter DSB generation was detected by BLISS in mTECs from Aire-KO mice. c, Boxplot comparing the enrichment of Z-DNA motifs (left) at DSB hotspots upregulated by spermidine (n = 97) versus those unaffected by spermidine (n = 78). Analogous plot for CTCF-binding motifs is shown on the right. The number of motifs was normalized according to the length of the DSB hotspots. d, Correlation between genetic variation in CTCF-binding motifs and allelic imbalance in DSB generation. Individual lines indicate DSB hotspots with a stronger CTCF-binding motif match on the B6 allele (red, n = 60) or on the CAST allele (blue, n = 59). e, Analogous plot for (CA)n repeats (n = 38 for B6 and n = 51 for CAST). p-values for panels a-c were calculated using the Wilcoxon rank sum test (two-tailed), and for panels d and e using the Kolmogorov-Smirnov (KS) test (two-tailed).

Source Data

Extended Data Fig. 8 Promoters of AIRE-induced genes were poised for expression prior to the engagement of AIRE.

a, Boxplot comparing the ATAC-seq and Pol II ChIP-seq signals at promoters of AIRE-induced genes (n = 1563) versus those at expression-matched ANGs (n = 1907) in mTECs from Aire-KO mice. b, Exemplar DNA and chromatin profiles of AIRE-induced genes poised for expression in mTECs from Aire-KO mice. In comparison, exemplar profiles for an ANG were shown on the right. c, Same as Fig. 4d except WT ATAC-seq and ChIP-seq signals. p-values in panels a and c were calculated using the Wilcoxon rank sum test (two-tailed). C&T: CUT&Tag; L: Low, n = 747; M: Medium, n = 2130; H: High, n = 322; *: p-value < 1e-10; **: p-value < 1e-20. Data for WT and Aire-KO ATAC-seq, WT Pol II and AIRE ChIP-seq came from ref. 9. Data for WT MED1 ChIP-seq came from ref. 94.

Source Data

Extended Data Fig. 9 NFE2L2 may cooperate with Z-DNA to poise AIRE-induced genes for expression.

a, Boxplots comparing the Aire-KO BLISS signals at AIRE-induced gene promoter DSB hotspots containing varying numbers of NFE2L2-binding motifs (left) and CTCF-binding motifs (right). For NFE2L2-binding motifs: Low: <=1 (n = 598); Medium: 2–5 (n = 258); High: >5 (n = 87). For CTCF-binding motifs: Low: <=1 (n = 468); Medium: 2–4 (n = 404); High: >4 (n = 71). b, Boxplots comparing the enrichment of NFE2L2-binding motifs at DSB hotspots up-regulated by spermidine (n = 159) versus those unaffected by spermidine (n = 104). The number of motifs was normalized according to the length of the DSB hotspots. c, Density plots and heatmaps showing distributions of Z-DNA motifs (top) and NFE2L2-binding motifs (bottom) at DSB hotspots in promoters (n = 6884). Grey areas depict 95% confidence intervals. d, Boxplot comparing the lengths of Z-DNA motifs at OCRs unchanged (n = 282) or up-regulated (n = 356) by BRG1 (See Methods) in mTECs. e, De novo motif analysis for OCRs unchanged versus up-regulated by BRG1. f, MA plot (log2-scale) showing the expressions of NFE2-related factors in mTECs from Nfe2l2-KO and Ctrl mice. g, Representative cytofluorimetric plots and quantitative summary for mTECs from Nfe2l2-KO and Ctrl mice. Error bars, mean ± s.e.m. from n = 3 biological replicates. h, Volcano plots showing the expression of AIRE-induced genes and ANGs that contain high-confidence NFE2L2-binding motifs at promoters in mTECs from Nfe2l2-KO and Ctrl mice. i, Differentially expressed AIRE-induced genes and ANGs (p-value < 0.05) between mTECs from Nfe2l2-KO and Ctrl mice. p-values for panels a-b and d were calculated using the Wilcoxon rank sum test (two-tailed), and for panel h using the Fisher’s exact test (two-tailed). L: Low; M: Medium; H: High; CI: confidence interval. Nfe2l2-KO: Foxn1Cre-Nfe2l2flox/flox; Ctrl: control, Foxn1Cre-Nfe2l2+/+.

Source Data

Extended Data Fig. 10 Manipulation of Z-DNA formation, DSB generation or Nfe2l2 expression affected the expression of signature genes of mTEC mimetic cells.

a, KEGG pathway analysis (adjusted p-value < 0.05) for differentially expressed genes (p-value < 0.05, fold-change > 2, n = 745) between mTECs from Nfe2l2-KO and Ctrl mice. b, Network plot showing the significantly enriched downregulated KEGG pathways and the associated genes. c, Expression of lineage-defining TFs42 in WT versus Aire-KO AIRE-stage mTECs (log2 scale). d, MA plots (log2 scale) highlighting expression changes of signatures genes of several mimetic mTEC subtypes in mTECs from Nfe2l2-KO versus Ctrl mice. Red dots depict signature genes of the corresponding subtypes. e-f, Analogous plots showing the impact of spermidine and topotecan, respectively. Signature gene lists used were available at https://github.com/dmichelson/mimetic_cells/tree/main/scrna-seq/adult-neonate/mimetic-cell-signatures. p-values for panels d-f were calculated using the Fisher’s exact test (two-tailed).

Source Data

Extended Data Fig. 11 A model of Z-DNA’s influence on AIRE target choice.

Independent of AIRE, Z-DNA formation is more likely to occur at the promoters of genes having Z-DNA motifs but not under robust TF-mediated transcriptional control. NFE2L2 and other unknown factors would engage BRG1 or other chromatin remodelers to stabilize the energetically unfavorable Z-DNA formation. Z-DNA would enhance DSB generation at the promoters of genes subject to AIRE induction, which would facilitate their poising, thereby promoting the recruitment of and induction of gene expression by AIRE.

Supplementary information

Supplementary Information

Supplementary Discussion, Supplementary Notes, legends for the Supplementary Tables and Supplementary References.

Reporting Summary

Supplementary Table 1

Motifs enriched in the regions with the largest positive gradients of AIRE-induced genes. Motifs listed in the table were identified by the XSTREME program (combined results of MEME and STREME) of MEME suite with E < 0.05.

Supplementary Table 2

Z-DNA motifs at promoters of AIRE-induced and AIRE-neutral genes. Z-DNA motifs were identified on the basis of the criteria of the NIH non-B search program.

Supplementary Table 3

Intrareplicate correlations of various sequencing experiments. List of the mouse genotypes, cell types, treatments, replicate numbers, replicate correlation and numbers of uniquely mapped reads for bulk sequencing experiments generated in this study.

Supplementary Table 4

Quality-control data for scRNA-seq. For each sample, the number of cells retained, the median percentage of mitochondrial counts per cell, the median number of unique RNA features per cell and the median number of RNA molecules per cell are shown.

Supplementary Table 5

BLISS adapter sample barcodes. BLISS adapter sample barcode sequences were provided for individual sequencing experiments.

Supplementary Table 6

Z-DNA motifs at BRG1-upregulated OCRs and unchanged OCRs. Z-DNA motifs were identified on the basis of the criteria of the NIH non-B search program.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, Y., Bansal, K., Mostafavi, S. et al. AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization. Nature 628, 400–407 (2024). https://doi.org/10.1038/s41586-024-07169-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-024-07169-7

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing