Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Large-scale mapping and mutagenesis of human transcriptional effector domains

Abstract

Human gene expression is regulated by more than 2,000 transcription factors and chromatin regulators1,2. Effector domains within these proteins can activate or repress transcription. However, for many of these regulators we do not know what type of effector domains they contain, their location in the protein, their activation and repression strengths, and the sequences that are necessary for their functions. Here, we systematically measure the effector activity of more than 100,000 protein fragments tiling across most chromatin regulators and transcription factors in human cells (2,047 proteins). By testing the effect they have when recruited at reporter genes, we annotate 374 activation domains and 715 repression domains, roughly 80% of which are new and have not been previously annotated3,4,5. Rational mutagenesis and deletion scans across all the effector domains reveal aromatic and/or leucine residues interspersed with acidic, proline, serine and/or glutamine residues are necessary for activation domain activity. Furthermore, most repression domain sequences contain sites for small ubiquitin-like modifier (SUMO)ylation, short interaction motifs for recruiting corepressors or are structured binding domains for recruiting other repressive proteins. We discover bifunctional domains that can both activate and repress, some of which dynamically split a cell population into high- and low-expression subpopulations. Our systematic annotation and characterization of effector domains provide a rich resource for understanding the function of human transcription factors and chromatin regulators, engineering compact tools for controlling gene expression and refining predictive models of effector domain function.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput tiling screen across 2,047 human TFs and CRs finds hundreds of EDs.
Fig. 2: Hydrophobic amino acids interspersed with acidic, serine, proline or glutamine residues are necessary for AD activity.
Fig. 3: Most RD sequences contain sites for SUMOylation, short interaction motifs for recruiting corepressors or are structured binding domains for recruiting other repressive proteins.
Fig. 4: Discovery of bifunctional activating and repressing domains.

Similar content being viewed by others

Data availability

The Illumina sequencing datasets generated in this study are available from the Sequencing Read Archive (BioProject PRJNA916593).

Code availability

The HT-recruit Analyze software for processing high-throughput recruitment assay and high-throughput protein expression assays are available on GitHub (https://github.com/bintulab/HT-recruit-Analyze). All custom codes used for data processing and computational analyses are available from the authors upon request.

References

  1. Lambert, S. A. et al. The human transcription factors. Cell 175, 598–599 (2018).

    Article  CAS  PubMed  Google Scholar 

  2. Medvedeva, Y. A. et al. EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database 2015, bav067 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  3. UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).

    Article  Google Scholar 

  4. Tycko, J. et al. High-throughput discovery and characterization of human transcriptional effectors. Cell 183, 2020–2035.e16 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Alerasool, N., Leng, H., Lin, Z.-Y., Gingras, A.-C. & Taipale, M. Identification and functional characterization of transcriptional activators in human cells. Mol. Cell 82, 677–695.e7 (2022).

    Article  CAS  PubMed  Google Scholar 

  6. Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Partridge, E. C. et al. Occupancy maps of 208 chromatin-associated proteins in one human cell type. Nature 583, 720–728 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Soto, L. F. et al. Compendium of human transcription factor effector domains. Mol. Cell 82, 514–526 (2022).

    Article  CAS  PubMed  Google Scholar 

  9. Keung, A. J., Bashor, C. J., Kiriakov, S., Collins, J. J. & Khalil, A. S. Using targeted chromatin regulators to engineer combinatorial and spatial transcriptional regulation. Cell 158, 110–120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Erijman, A. et al. A high-throughput screen for transcription activation domains reveals their sequence features and permits prediction by deep learning. Mol. Cell 78, 890–902.e6 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Staller, M. V. et al. A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6, 444–455.e6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Sanborn, A. L. et al. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 10, e68068 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Arnold, C. D. et al. A high-throughput method to identify trans-activation domains within transcription factor sequences. EMBO J. 37, e98896 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Klaus, L. et al. Systematic identification and characterization of repressive domains in Drosophila transcription factors. The EMBO Journal 42.3, e112100 (2023).

  15. Stampfel, G. et al. Transcriptional regulators form diverse groups with context-dependent regulatory functions. Nature 528, 147–151 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Neumayr, C. et al. Differential cofactor dependencies define distinct types of human enhancers. Nature 606, 406–413 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  17. Staller, M. V. et al. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 13, 334–345.e5 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Chubb, J. R., Trcek, T., Shenoy, S. M. & Singer, R. H. Transcriptional pulsing of a developmental gene. Curr. Biol. 16, 1018–1025 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bintu, L. et al. Dynamics of epigenetic regulation at the single-cell level. Science 351, 720–724 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Ptashne, M. How eukaryotic transcriptional activators work. Nature 335, 683–689 (1988).

    Article  ADS  CAS  PubMed  Google Scholar 

  22. Mitchell, P. J. & Tjian, R. Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989).

    Article  ADS  CAS  PubMed  Google Scholar 

  23. Gerber, H. P. et al. Transcriptional activation modulated by homopolymeric glutamine and proline stretches. Science 263, 808–811 (1994).

    Article  ADS  CAS  PubMed  Google Scholar 

  24. Gill, G., Pascal, E., Tseng, Z. H. & Tjian, R. A glutamine-rich hydrophobic patch in transcription factor Sp1 contacts the dTAFII110 component of the Drosophila TFIID complex and mediates transcriptional activation. Proc. Natl Acad. Sci. USA 91, 192–196 (1994).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Courey, A. J. & Tjian, R. Analysis of Sp1 in vivo reveals mutiple transcriptional domains, including a novel glutamine-rich activation motif. Cell 55, 887–898 (1988).

    Article  CAS  PubMed  Google Scholar 

  26. Escher, D., Bodmer-Glavas, M., Barberis, A. & Schaffner, W. Conservation of glutamine-rich transactivation function between yeast and humans. Mol. Cell. Biol. 20, 2774–2782 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Tuttle, L. M. et al. Gcn4-mediator specificity is mediated by a large and dynamic fuzzy protein-protein complex. Cell Rep. 22, 3251–3264 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Raj, N. & Attardi, L. D. The transactivation domains of the p53 protein. Cold Spring Harb. Perspect. Med. 7, a026047 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Kumar, M. et al. The Eukaryotic Linear Motif resource: 2022 release. Nucleic Acids Res. 50, D497–D508 (2022).

    Article  CAS  PubMed  Google Scholar 

  30. Ross, S., Best, J. L., Zon, L. I. & Gill, G. SUMO-1 modification represses Sp3 transcriptional activation and modulates its subnuclear localization. Mol. Cell 10, 831–842 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Rocca, D. L., Wilkinson, K. A. & Henley, J. M. SUMOylation of FOXP1 regulates transcriptional repression via CtBP1 to drive dendritic morphogenesis. Sci Rep. 7, 877 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  32. Verger, A., Perdomo, J. & Crossley, M. Modification with SUMO. A role in transcriptional regulation. EMBO Rep. 4, 137–142 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Göös, H. et al. Human transcription factor protein interaction networks. Nat. Commun. 13, 766 (2022).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  34. Torres-Machorro, A. L. Homodimeric and heterodimeric interactions among vertebrate basic helix-loop-helix transcription factors. Int. J. Mol. Sci. 22, 12855 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sun, X. H., Copeland, N. G., Jenkins, N. A. & Baltimore, D. Id proteins Id1 and Id2 selectively inhibit DNA binding by one class of helix-loop-helix proteins. Mol. Cell. Biol. 11, 5603–5611 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Benezra, R., Davis, R. L., Lockshon, D., Turner, D. L. & Weintraub, H. The protein Id: a negative regulator of helix-loop-helix DNA binding proteins. Cell 61, 49–59 (1990).

    Article  CAS  PubMed  Google Scholar 

  37. Tapia-Ramírez, J., Eggen, B. J., Peral-Rubio, M. J., Toledo-Aral, J. J. & Mandel, G. A single zinc finger motif in the silencing factor REST represses the neural-specific type II sodium channel promoter. Proc. Natl Acad. Sci. USA 94, 1177–1182 (1997).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  38. Andrés, M. E. et al. CoREST: a functional corepressor required for regulation of neural-specific gene expression. Proc. Natl Acad. Sci. USA 96, 9873–9878 (1999).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  39. Brayer, K. J. & Segal, D. J. Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem. Biophys. 50, 111–131 (2008).

  40. Koipally, J. & Georgopoulos, K. A molecular dissection of the repression circuitry of Ikaros. J. Biol. Chem. 277, 27697–27705 (2002).

    Article  CAS  PubMed  Google Scholar 

  41. McCarty, A. S., Kleiger, G., Eisenberg, D. & Smale, S. T. Selective dimerization of a C2H2 zinc finger subfamily. Mol. Cell 11, 459–470 (2003).

  42. Boyle, P. & Després, C. Dual-function transcription factors and their entourage: unique and unifying themes governing two pathogenesis-related genes. Plant Signal. Behav. 5, 629–634 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Latchman, D. S. Eukaryotic transcription factors. Biochem. J. 270, 281–289 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Dyson, H. J. & Wright, P. E. Role of intrinsic protein disorder in the function and interactions of the transcriptional coactivators CREB-binding protein (CBP) and p300. J. Biol. Chem. 291, 6714–6722 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gillespie, M. A. et al. Absolute quantification of transcription factors in human erythropoiesis using selected reaction monitoring mass spectrometry. STAR Protocols 1, 100216 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Willy, P. J., Kobayashi, R. & Kadonaga, J. T. A basal transcription factor that activates or represses transcription. Science 290, 982–985 (2000).

    Article  ADS  CAS  PubMed  Google Scholar 

  47. Majello, B., De Luca, P. & Lania, L. Sp3 is a bifunctional transcription regulator with modular independent activation and repression domains. J. Biol. Chem. 272, 4021–4026 (1997).

    Article  CAS  PubMed  Google Scholar 

  48. Ma, J. Crossing the line between activation and repression. Trends Genet. 21, 54–59 (2005).

    Article  CAS  PubMed  Google Scholar 

  49. Mann, R. S., Lelli, K. M. & Joshi, R. Hox specificity unique roles for cofactors and collaborators. Curr. Top. Dev. Biol. 88, 63–101 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Bürglin, T. R. & Affolter, M. Homeodomain proteins: an update. Chromosoma 125, 497–521 (2016).

    Article  PubMed  Google Scholar 

  51. Loh, Y.-H. et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431–440 (2006).

    Article  CAS  PubMed  Google Scholar 

  52. Heurtier, V. et al. The molecular logic of Nanog-induced self-renewal in mouse embryonic stem cells. Nat. Commun. 10, 1109 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  53. White, M. A. et al. A simple grammar defines activating and repressing cis-regulatory elements in photoreceptors. Cell Rep. 17, 1247–1254 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Friedman, R. Z. et al. Information content differentiates enhancers from silencers in mouse photoreceptors. eLife 10, e67403 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  56. Rodriguez, J. M. et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 41, D110–117 (2013).

    Article  CAS  PubMed  Google Scholar 

  57. Zulkower, V. & Rosser, S. DNA Chisel, a versatile sequence optimizer. Bioinformatics 36, 4508–4509 (2020).

    Article  CAS  PubMed  Google Scholar 

  58. Van Rossum, G. & Drake, F. L. Python 3 Reference Manual (CreateSpace, 2009).

  59. Holehouse, A. S., Das, R. K., Ahad, J. N., Richardson, M. O. G. & Pappu, R. V. CIDER: resources to analyze sequence-ensemble relationships of intrinsically disordered proteins. Biophys. J. 112, 16–21 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank M. Hinks and members of our laboratories for helpful conversations and assistance. This work was supported by grant nos. NIH-NIGMS R35M128947 (L.B.), NIH-NHGRI R01HG011866 (L.B. and M.C.B.), NSF GRFP DGE-1656518 (N.D.), NIH-NIDDK F99/K00 F99DK126120 (J.T.), Stanford Bio-X Bowes Fellowship (P.S.), Stanford School of Medicine Dean’s Fund (C.A.), NIH-NIGMS 5T32GM007365-45 (A.M.), Stanford Interdisciplinary Graduate Fellowship affiliated with Stanford Bio-X (A.M.), NIH Director’s New Innovator Award 1DP2HD08406901 (M.C.B.), NSF CAREER 2142336 (P.F.) and the BWF-CASI Award (L.B.). P.F. is a Chan Zuckerberg Biohub Investigator.

Author information

Authors and Affiliations

Authors

Contributions

N.D. and L.B. designed the study, with significant intellectual contributions from P.S. and A.M. P.S. and N.D. designed the TF tiling libraries. A.M. designed the CR tiling libraries, with contributions from J.T., M.C.B. and L.B. N.D. designed all other libraries with contributions from J.T., A.M., P.S., M.C.B. and L.B. N.D. screened the CRTF minCMV and FLAG libraries with assistance from P.S. and J.T. A. and K.S. screened the CRTF pEF and PGK promoter libraries. N.D. performed all other screens. N.D. analysed the data, with assistance from L.B. I.L., C.A. and N.D. performed individual recruitment assay experiments. N.D. performed western blot experiments. C.L. generated the PGK cell line. N.D., P.F. and L.B. wrote the manuscript, with significant contributions from J.T. and C.L., along with contributions from all authors. L.B. supervised the project, with contributions from M.C.B. and P.F.

Corresponding author

Correspondence to Lacramioara Bintu.

Ethics declarations

Competing interests

N.D., A.M., P.S., J.T. and M.C.B. have filed a provisional patent (U.S. Provisional Application No. 63/318,144) related to this work. All remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Martha Bulyk, Steven Hahn and Matthew Welrauch for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 CRTF tiling screens’ separation purity, reproducibility, and validation.

a, Comparison between the set of proteins tiled in Tycko et al., 2020 and this study. b, Flow cytometry data showing citrine reporter distributions for the minCMV promoter screen on the day we induced localization with dox (Pre-induction), on the day of magnetic separation (Pre-separation), and after separation (Bound). Overlapping histograms are shown for 2 separately transduced biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. c, Citrine reporter distributions for the pEF promoter screen (n = 2). d-e, Biological replicate screen reproducibility (for hits above the threshold: pearson r2 = 0.78 for minCMV and r2 = 0.19 for pEF; for all data, including noise under the hit threshold: pearson r2 = 0.66 for minCMV and r2 = 0.16 for pEF). f, Comparison between average repression enrichment scores of tiles that were screened in the CRTF tiling pEF screen (x-axis) and previous Silencer tiling screen (y-axis)4. Dashed lines are the hits thresholds for each screen. Tiles were identical with a 1 aa register shift (as Silencer library tiles included an initial methionine absent from the CRTF tiling library). Pink dots are tiles that were individually validated in g. g, Citrine reporter distributions of individually validated CRTF tiling pEF screen hits that were not identified within the Silencer tiling screen (n = 2).

Extended Data Fig. 2 CRTF tiling FLAG protein expression screen separation purity, reproducibility, validation, and example of how the data were used.

a, Alexa Fluor 647 distributions from anti-FLAG staining of the CRTF tiling library in minCMV promoter reporter cells (n = 2). b, Biological replicate screen reproducibility (pearson r2 = 0.49). c, Validations of FLAG protein expression screen. Expression levels were measured by Western blot with an anti-FLAG antibody. Anti-histone H3 was used as a loading control for normalization (see Supplementary Fig. 1 for regions of interest that were selected for quantification using ImageJ’s gel analysis routine). Lane 1: rTetR-3xFLAG (no tile) theoretical molecular weight of 29 kDa; lanes 2-6: rTetR-3xFLAG-screened P53 deletions, theoretical molecular weight of 39 kDa; lanes 7-9: rTetR-3xFLAG-P53’s AD loaded at increasing amounts; lanes 10-14: rTetR-3xFLAG-screened random control (see Supplementary Table 3 for protein sequences). Shift from expected molecular weight of the expressed P53 proteins is likely due to post-translational modifications P53’s AD undergoes28. Comparison between high-throughput measurements of expression and Western blot protein levels (r2 = 0.87, n = 10 proteins, n = 2 blot replicates, dots are the mean, bars the range). d, Tiling plot for BCL11A (n = 2, dots are the mean, bars the range). Example of a domain that was annotated at position 571-710. This domain had a low expression tile in the middle but the domain was left unsegmented. See more about how domains were called in Methods.

Extended Data Fig. 3 CRTF tile hits validation screens’ separation purity, reproducibility, and validation.

a, Flow cytometry data showing citrine reporter distributions for the minCMV promoter screen on the day we induced localization with dox (Pre-induction), on the day of magnetic separation (Pre-separation), and after separation (Bound). Overlapping histograms are shown for 2 biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. b, Citrine reporter distributions for the pEF promoter validation screen (n = 2). c-d, Biological replicate screen reproducibility. e, Comparison between individually recruited measurements and minCMV promoter validation screen measurements (n = 2, dots are the mean, bars the range) with logistic model fit plotted as solid line (r2 = 0.91, n = 20). Dashed line is the hits threshold. Note, both screen thresholds are below 0, with several validated screen measurements below 0 (Methods). f, Comparison between individually recruited measurements and pEF promoter validation screen measurements (n = 2, dots are the mean, bars the range) with logistic model fit plotted as solid line (r2 = 0.94, n = 19).

Extended Data Fig. 4 Validations of CR & TF EDs.

a, Comparison between set of proteins screened in Alerasool et al., 2022’s tAD-seq and CRTF tiles (this study). b, Net charge per residue distributions (calculated by CIDER59) of activation domains identified by HT-recruit compared to their PADDLE-predicted function12 (Mann-Whitney p-value = 1.4e-15, boxes: median and interquartile range (IQR); whiskers: Q1- 1.5*IQR and + Q3). c, CRTF tiling library screened at three different promoters with distinct expression levels. minCMV is a minimal promoter with all cells off. PGK is a low expression, medium strength promoter, and pEF is a high expression, strong promoter. d, Flow cytometry data showing citrine reporter distributions for the PGK promoter screen on the day we induced localization with dox (Pre-induction), 5 days later on the day of magnetic separation (Pre-separation), and after separation (Bound). Overlapping histograms are shown for 2 biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. e, Biological replicate PGK promoter screen reproducibility (for hits above the threshold: pearson r2 = 0.27 for repression hits; for all data, including noise under the hit threshold: pearson r2 = 0.11 for all data). Although it is possible to detect activators at the PGK promoter, the dynamic range is very small (ten of the strongest activating tiles at the minCMV promoter (black dots) are very close to the random controls (grey dots)). f, Validation screen biological replicate reproducibility of tiles that were hits in both the PGK and pEF promoter screens. g, Tiling plots for MEF2C and KLF11 (n = 2, dots are the mean, bars the range). PGK repression domains annotated in teal. h, Comparison of each repression domain’s max tile average repression scores in PGK (x-axis) and pEF promoter screen (y-axis). Dashed lines are the hits thresholds for each screen.

Extended Data Fig. 5 Mutant AD screen’s separation purity, reproducibility, and validation.

a, Citrine distributions after 2 days recruitment to minCMV of UniProt-annotated Q-rich ADs with or without an 11 aa acidic sequence from VP64 (n = 2). b, Deletion scan across P53’s AD: Deletions that caused a complete loss of activation, meaning they are below the experimentally validated activation threshold (dotted line, determined in Fig. 1g for the screen that included these constructs), are coloured in gray, and deletions that retained some activation are colored in yellow (n = 2, dots are the mean, bars the range). c, Individual validations of tiles including 15 aa deletions (deleted sequences shown above each panel). Untreated cells (gray) and dox-treated cells (colors) shown with two biological replicates in each condition. Vertical line is the citrine gate used to determine the fraction of cells ON (written above each distribution). d, Flow cytometry data showing citrine reporter distributions for the Mutant AD transcriptional activity screen on the day we induced localization with dox (Pre-induction), on the day of magnetic separation (Pre-separation), and after separation (Bound). Overlapping histograms are shown for 2 separately transduced biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. e, Biological replicate Mutant AD transcriptional activity screen reproducibility. f, Comparison between individually recruited measurements and Mutant AD screen measurements (n = 2, dots are the mean, bars the range) with logistic model fit plotted as solid line (r2 = 0.95, n = 23). g, Alexa Fluor 647 distributions from anti-FLAG staining. h, Biological replicate Mutant AD protein expression screen reproducibility.

Extended Data Fig. 6 Mutant AD screen follow-up.

a, Deletion scan across SMARCA4’s AD (n = 2, dots are the mean, bars the range). Predicted secondary structure (prediction from whole protein sequence using AlphaFold)60 shown below, where green regions are alpha helices. Deletions that are significantly different from WT are colored in gray (p < 0.05, one-tailed z test). b, Enrichment scores comparing WT versus the W, F, Y, L mutant of DUX4 tile 35 (p-value = 3.3e-13, one-tailed z-test, n = 2, dots are the mean, bars the range). c, Violin plots of average FLAG enrichment scores from 2 biological replicates binned by each sublibrary. Dashed line represents the hit threshold for this screen. P-values computed from Mann-Whitney one-sided U tests. Boxes: median and interquartile range (IQR); whiskers: Q1- 1.5*IQR and + Q3. d, Correlations between each tile’s activation strength in the minCMV validation screen and the count of indicated aa. e, Boxplot of acidic count for each mutant’s activation category (Decrease n = 33, No change n = 18). Mann-Whitney one-sided U test, p-value = 2.25e-3. Boxes: median and interquartile range (IQR); whiskers: Q1- 1.5*IQR and + Q3. f, Boxplot of average activation enrichment scores with interquartile range shown for tiles that contain a single necessary sequence across each category (Acidic n = 9 S, P, Q n = 9, Mixed n = 64). P-values computed from Mann-Whitney one-sided U tests. Boxes: median and interquartile range (IQR); whiskers: Q1- 1.5*IQR and + Q3.

Extended Data Fig. 7 Distribution of tile’s predicted secondary structure, mutant RD screen’s separation purity and reproducibility, and HES family tiling plot examples.

a, Distributions of activating and repressing tile’s fraction of the sequence predicted to be structured from AlphaFold’s60 predictions on the full length protein sequence. p-value = 4.1e-8 (Mann Whitney U test, one-sided, boxes: median and interquartile range (IQR); whiskers: Q1- 1.5*IQR and + Q3). b, Flow cytometry data showing citrine reporter distributions for the Mutant RD transcriptional activity screen on the day we induced localization with dox (Pre-induction), on the day of magnetic separation (Pre-separation), and after separation (Bound). Overlapping histograms are shown for 2 separately transduced biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. c, Biological replicate Mutant RD transcriptional activity screen reproducibility. d, Comparison between individually recruited measurements and Mutant RD screen measurements (n = 2, dots are the mean, bars the range) with logistic model fit plotted as solid line (r2 = 0.91, n = 9). There are significantly fewer points for this plot compared to others because unlike the Mutant AD screen which included all hits that contained a W, F, Y or L, the Mutant RD screen had much fewer hits that overlapped our set of validations since only the strongest tiles within domains or hits that contained co-repressor binding motifs were included in the library design e, Alexa Fluor 647 staining distributions for the Mutant RD FLAG protein expression screen. f, Biological replicate Mutant RD protein expression screen reproducibility. g, Tiling plots for all 7 HES family members (n = 2, dots are the mean, bars the range).

Extended Data Fig. 8 Mutant RD screen follow-up.

a, Repression enrichment scores for a subset of repressing tiles (n indicated in figure) that contain a relatively more flexible CtBP-binding motif (regex shown above), excluding the more refined CtBP-binding motif (regex shown on second line). Mutants have their binding motifs replaced with alanines (p-values computed from one-tailed z-test). b, Repression enrichment scores for repressing tiles that contain a flexible SUMO-binding motif (fraction of non-hit sequences containing motif = 0.155). (n = 2, dots are the mean, bars the range, p-values computed from one-tailed z-test). c, Fraction of AD deletion sequences containing a SUMOylation motif binned according to their effect on activity (yellow=no change on activation relative to WT, gray=decreased activation). 11 total ADs. d, Deletion scan across TCF15’s RD (n = 2, dots are the mean, bars the range). Deletions are colored by whether they were above (blue) or below (gray) the experimentally validated detection threshold for repression (dotted line). AlphaFold’s60 predicted secondary structure (prediction from whole protein sequence) shown below where green regions are alpha helices. Annotations shown from protein accession NP_004600.3 e, Distribution of bHLH classifications of RDs overlapping bHLH UniProt annotations. Classifications taken from ref. 34. f, Deletion scan across REST’s RD (n = 2, dots are the mean, bars the range). Deletions are colored by whether they were above (pink) or below (gray) the validated threshold. AlphaFold’s60 predicted secondary structure (prediction from whole protein sequence) shown below where green regions are alpha helices and orange arrows are beta sheets. g, Tiling plots for IKZF family members (n = 2, dots are the mean, bars the range. h, Deletion scan across IKZF1, 2 and 4’s RDs (n = 2, dots are the mean, bars the range). Deletions are colored by whether they were above (pink) or below (gray) the validated threshold. i, Cartoon model of potential mechanisms corresponding to the RD categories in Fig. 3f.

Extended Data Fig. 9 Bifunctional domain deletion scan screen’s separation purity, reproducibility, and examples.

a, Counts of bifunctional domains from proteins that contain the indicated DNA binding domains. Homeodomains are enriched among TFs containing bifunctional domains compared to the frequency of homeodomains among all TFs (p = 2.5e-4, Fisher’s exact test, two-sided). b, Tiling plot for NANOG (n = 2, dots are the mean, bars the range). c, Flow cytometry data showing citrine reporter distributions for the bifunctional deletion scan minCMV promoter screen on the day we induced localization with dox (Pre-induction), on the day of magnetic separation (Pre-separation), and after separation marker (Bound). Overlapping histograms are shown for 2 separately transduced biological replicates. The average percentage of cells ON is shown to the right of the vertical line showing the citrine level gate. d, Biological replicate bifunctional deletion scan minCMV promoter screen reproducibility. e, Citrine reporter distributions for the bifunctional deletion scan pEF promoter screen (n = 2). f, Biological replicate bifunctional deletion scan pEF promoter screen reproducibility. g, Example of a bifunctional domain from NANOG with independent activating and repressing regions (n = 2, dots are the mean, bars the range). Note, deletion of the necessary sequence for activation, caused an increase in repression, and vice-versa.

Extended Data Fig. 10 Examples of bifunctional domain sequences at three different promoters.

a, Tiling plot for LEUTX (n = 2, dots are the mean, bars the range). b, Deletion scan across one of LEUTX’s bifunctional tiles (n = 2, dots are the mean, bars the range). Deletions were binned by their statistical significance into those that decreased activity (gray lines) compared to the WT tile and those that did not (one-tailed z-test). The necessary sequence for another gene family member, ARGFX, is highlighted in teal. c, Bifunctional domain necessary region location categories. Overlapping regions were defined as any tile that contained a deletion that was both necessary (below activity threshold) for activation and necessary for repression. d, Citrine distributions of ARGFX-161:240 recruited to minCMV (n = 2, left), and recruited to pEF (n = 2, right). e, Citrine distributions of bifunctional tiles identified from minCMV and pEF CRTF tiling screens recruited to PGK promoter (n = 2). Asterisks denote p-values < 0.05 for the percentage of cells on (right) and off (left) in the dox population (one-sided Welch’s t-test, unequal variance). ARGFX-191:270 off p = 0.0003, on p = 0.02; FOXO1-561:640 off p = 0.017, on p = 2.44e-5; NANOG 191:270 off p = 2.12e-5, on p = 0.0002; NANOG 225:304 off p = 0.202, on p = 0.0004; KLF7 1:80 off p = 0.99, on p = 0.0005. f, Comparison between set of proteins screened in Alerasool et al., 2022’s ORFeome and this study.

Supplementary information

Supplementary Fig. 1

Source images of western blots shown in Extended Data Fig. 2c detecting FLAG and H3 protein levels in each cell line labelled. Regions that were selected for quantification are shown as rectangles. See Supplementary Table 3 for protein sequences.

Reporting Summary

Supplementary Fig. 2

Tiling plots of CRTF tiling screen. Tiling plots for 2,028 CRs and TFs (related to Figs. 1b,e and 4c, Extended Data Figs. 2d, 4g, 7g, 8g, 9b and 10a). Each horizontal bar is an 80 aa tile, and each vertical bar is the range from two biologically independent screens. minCMV’s activation enrichment scores plotted in yellow and pEF’s repression enrichment scores plotted in blue. The hit calling threshold is two standard deviations above the mean of the poorly expressed random controls for activation and three standard deviations above the mean for repression (Methods). Points with larger marker sizes are hits in the validation screen. Marker hues indicate FLAG-stained expression levels. Protein annotations are sourced from UniProt. Only proteins that had a mapped UniProt ID are plotted. Note: there are a few repression tiles that are below the threshold but have larger marker sizes (for example, BCL6 tile 34 and NR3C1 tile 66). This is because their CRTF tiling pEF promoter screen scores are below the threshold, but their CRTF tiling PGK promoter screen scores are above that screen’s threshold, so they were re-tested in the pEF promoter validation screen and measured as hits. We think these points were false negatives in the CRTF tiling pEF promoter screen, but because they were hits in the smaller, higher coverage validation screen they are probably hits.

Supplementary Fig. 3

Gating strategy for protein expression screens. Gating strategy for data shown in Extended Data Figs. 2a,b, 5g,h and 7e,f.

Supplementary Table 1

CRTF tiling library sequences and enrichment scores from the FLAG protein expression screen, minCMV, pEF and PGK promoter screens, and the validation screens.

Supplementary Table 2

Domains from tiles. Activation and repression domain sequences and maximum tile enrichment scores.

Supplementary Table 3

Individual validation flow cytometry data and protein sequences probed by western blots.

Supplementary Table 4

AD mutants library sequences and enrichment scores from the FLAG protein expression screen and minCMV promoter screen.

Supplementary Table 5

RD mutants library sequences and enrichment scores from the FLAG protein expression screen and pEF promoter screen.

Supplementary Table 6

Bifunctional domains, deletion scan library sequences and enrichment scores from the FLAG protein expression screen, minCMV and pEF promoter screens.

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

DelRosso, N., Tycko, J., Suzuki, P. et al. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 616, 365–372 (2023). https://doi.org/10.1038/s41586-023-05906-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-05906-y

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing