A mouse tissue transcription factor atlas

Transcription factors (TFs) drive various biological processes ranging from embryonic development to carcinogenesis. Here, we employ a recently developed concatenated tandem array of consensus TF response elements (catTFRE) approach to profile the activated TFs in 24 adult and 8 fetal mouse tissues on proteome scale. A total of 941 TFs are quantitatively identified, representing over 60% of the TFs in the mouse genome. Using an integrated omics approach, we present a TF network in the major organs of the mouse, allowing data mining and generating knowledge to elucidate the roles of TFs in various biological processes, including tissue type maintenance and determining the general features of a physiological system. This study provides a landscape of TFs in mouse tissues that can be used to elucidate transcriptional regulatory specificity and programming and as a baseline that may facilitate understanding diseases that are regulated by TFs.

In this manuscript, Zhou use a DNA-affinity approach to capture and identify proteins that bind to DNA designed to contain a large number of transcription factor (TF) response elements. They do so in 32 mouse tissues, aiming to generate TF profiles for eac h individual tissue, but more importantly to correlate these to each other to create a hierarchy of TFs that are unique to some or shared between several organs. To this end, the authors invoke a range of bioinformatic analyses to group and classify their data. Specifically, they link TF patterns to tissue functionality, also making use of existing data of TF -TF interactions and TF target genes. This is a nice study, using a methodology previously developed by the same authors to identify TFs that bind to a DNA construct containing a large number of concatenated TFresponse elements. They now applied this to a large number of tissues to rank and classify TF patterns and correlate this with tissue function. The manuscript is well-structured, clearly written, using sound analysis methods, and the figures are of high quality. The limitation of the paper resides in the fact that is mainly a descriptive study based on a single (although large) data set, where only in the last section the authors try to get close r to functionality by monitoring changes in TF profiles after liver regeneration. Yet, this does not lead to a concrete set of TFs that could be conclusively/causatively linked to this process.
Other remarks: 1. One of the limitations of the used approach is that it is an in vitro method, with a number of associated shortcomings. The authors should address these, and put this in the perspective of biologically interpreting their data. For instance, the procedure starts from a nuclear extract, thus taking TFs out of their physiological context (e.g. being in its chromatinbound or soluble state, which is a dynamic equilibrium for many TFs). Second, the bait is a piece of naked DNA, i.e. devoid of nucleosomes or other chromatin constituents. Therefore, capture of a particular TF (or failure to do so) does not have a direct biological meaning. As a result, the set of TFs that is identified in the end provides a fingerprint that may be used for certain classification procedures, however it does not indicate expression level in the respective tissues, or propensity to be associated with chromatin in vivo. The authors should emphasize this to make the reader aware of this. 9. The liver is a highly heterogeneous tissue. Can the effects observed in the liver regeneration experiment be ascribed to any particular cell type?
10. I have some hesitation about the terminology described as 'TF hierarchical networks among the tissues' (p. 16). Hierarchy implies interdependence, which does not apply to the large majority of tissues. Instead, repeated identification of the same proteins may indicate re-use of TFs for different functionalities, which, from an organism point of view, may be a necessity given the large variety of constituent cell types (hundreds) and the limited number of TFs encoded in the genome ('just' hundreds).

Reviewer #2 (Remarks to the Author):
In this paper the authors employ the catTFRE approach recently developed by their group to identify the repertoire of TFs active in the nucleus of 24 adult and 8 fetal mouse tissues. This approach, that is based on TF enrichment using tandem repeats of binding sites and mass spectrometry, is a great improvement from whole cell/tissue proteomics as a large fraction of TFs are below the detection limit in these methods, and from expression profiling as there is low correlation between TF mRNA levels and TF activity. This dataset, for which the authors generated an easy to navigate web page, will certainly be useful for the scientific community and serve as a framework to study tissue-specific gene regulatory networks. Having said this, there are several analyses that need to be modified or added to support the several conclusions made by the authors to be acceptable for publication. Limitations of the method should also be discussed.
Major concerns: 1) Some of the numbers referenced in the manuscript don't match with those in the webpage. For instance, in lines 139-140 the authors mention 173 and 447 TF for skeletal muscle and thymus, respectively. However, in the TF Atlas webpage these numbers are 167 and 448. Where are the discrepancies coming from?
2) The authors should comment on any potential bias in TF abundance due to the method (for instance the tandem motif sequences used in the pull down, affinity, saturation, etc). Two TFs from the same family could bind to the same sequence so their relative FOTs could be influenced by their relative protein abundance and affinity for the motif. Further, TF s from different families may compete with a different number of TFs for sites, so their relative FOTs may also be influenced by number of family members that recognize a motif. From their previous paper, it is clear that using different DNA sequences in the pull down results in different enrichments and abundances. The authors should find a way of controlling for this factors or clearly specify these limitations. 3) Is the number of TFs detected in different tissues a property of the number of TFs active in the different tissues or could be explained by differences in the amount of protein in the nuclear extracts? 4) Some of the thresholds used seem arbitrary. For instance, why look at the 35 most abundant proteins (line 151), 12 tissues (line 166), a median expression >0.5 (line 172), or 10 times the median (line 178)? A rationale for the thresholds should be included. 5) In line 155 the authors "assume" that TFs that are more abundant "regulate the busy and important functions of the tissues." However, there are other factors that influence that such as affinity, which genes they regulate (it could be few but important), etc. The authors could hypothesize this and then show based on data. 6) The statement in line 193-194 is not necessarily true. There could be widespread transcriptional regulation by ubiquitous TFs or by a few highly expressed/active TFs. 7) Paragraph 200-206 seems disconnected with the rest and there is no conclusion. In addition, in line 202 the authors say that TSG have more ubiquitous tissue distribution, however in figure 3e that difference is not significant. The authors should either make a clear point or remove the paragraph. 8) In lines 293-295 the authors state that clusters can shed light on the "dark proteome". However, they do very little effort to validate their predictions besides the example of zfp655. The authors should go beyond the anecdotal example. For how many TFs you can make functional predictions? For how many of those is the function known in the literature? Do es it match? 9) In line 332 the authors reference papers studying protein-protein interactions between TFs using Y2H. However, in reference 13 mammalian two-hybrid is used instead (correct also in line 500). Other references also use other techniques. Literature should be properly cited. 10) In paragraph 356-370 the authors comment on the connection between ubiquitous TFs and ttrTFs, and then list a number of examples. To make this claim the authors need to do statistical analyses. Are interactions between these 2 classes of TFs more frequent than expected by chance? 11) In lines 381-384, how where the TGs determined? This paragraph is unclear. 12) The criteria used to define ttmTFs is not very stringent as many TFs not involved in maintaining tissue identity can be enriched in a tissue and also be coexpressed with its targets. Indeed, the authors classify 30% of the TFs they detect as ttmTFs which seem high. Besides providing some anecdotal examples, the authors should attempt a more systematic analysis to support their claim. 13) Lines 421-425 are impossible to understand. It is also speculative as there is no experiment or analysis showing or suggesting causality between ttmTF concentration and function. 14) Some sentences in the Discussion section are purely speculative, and no evidence is provided in the paper. For instance, lines 492-494, 495-498, 516-518. Overstatements should be avoided. 15) The authors should comment on the limitations of the method in the Discussion section.
Minor concerns: 1) Line 49: "TFs interacting with the promoters of…" Enhancers and silencers also play an important role in gene regulation. 2) Paragraphs lines 62-87: Other methods that study TFs and GRNs should also be mentioned such as yeast one-hybrid assays (PMID 25910213, 23917988), genome-wide DNase footprints (PMID: 22955618), etc.
3) The authors filter the proteins they detect by mass spectrometry based on DBDs. To have a sense of the specificity of the approach, the authors should also mention, at least in the methods section, which proportion of the proteins they detect (in number and in abundance) correspond to TFs. 4) Line 130: DBTF is not defined. 5) In line 142: FOT is not defined. 6) In line the 155 the authors mention nuclear receptors (NRs) but in line 133 they talk about NHRs. Consistency should be kept throughout the manuscript. 7) What is the difference between ttrTFs and ubiquitous-non-uniform TFs? Some of the definitions are confusing and there are many acronyms in the paper making it hard to read. 8) Some figures lack appropriate labels, and larger fonts would benefit reading.  Figure 4a, 4d, 6c, 6e, 6f, 7c need a label for the color gradients. What are the axis in figure 5a? Fi gure 5e: label missing in top graph. Figure 6f: what is it being clustered? A label is missing in the yaxis of fig 7g. 9) In line 223 the authors say they detected 47 NRs from 32 tissues. But in the following sentence they talk about half of adult tissues (24 in total). This is confusing. 10) In paragraph 277-284 the authors use cosine similarity. The way it is defined is not very intuitive and it doesn't scale linearly with the overlap in the set of tissues shared by two TFs. The authors should explore other more intuitive measures of similarity such as the Jaccard index or PCC. 11) In paragraph 312-325 the authors mention the correlation coefficient for the expression of TF pairs. Is this based on catTFRE or mRNA expression? Why do the authors use PCC in this case and cosine for figure 5a? 12) Line 352-354: The correlation coefficient and the p-value should be included. 13) Line 382: TG is not defined.

Reviewer #1:
In this manuscript, Zhou use a DNA-affinity approach to capture and identify proteins that bind to DNA designed to contain a large number of transcription factor (TF) response elements. They do so in 32 mouse tissues, aiming to generate TF profiles for each individual tissue, but more importantly to correlate these to each other to create a hierarchy of TFs that are unique to some or shared between several organs. To this end, the authors invoke a range of bioinformatic analyses to group and classify their data. Specifically, they link TF patterns to tissue functionality, also making use of existing data of TF-TF interactions and TF target genes. This is a nice study, using a methodology previously developed by the same authors to identify TFs that bind to a DNA construct containing a large number of concatenated TF-response elements. They now applied this to a large number of tissues to rank and classify TF patterns and correlate this with tissue function. The manuscript is well-structured, clearly written, using sound analysis methods, and the figures are of high quality. The limitation of the paper resides in the fact that is mainly a descriptive study based on a single (although large) data set, where only in the last section the authors try to get closer to functionality by monitoring changes in TF profiles after liver regeneration. Yet, this does not lead to a concrete set of TFs that could be conclusively/causatively linked to this process. Q1: One of the limitations of the used approach is that it is an in vitro method, with a number of associated shortcomings. The authors should address these, and put this in the perspective of biologically interpreting their data. For instance, the procedure starts from a nuclear extract, thus taking TFs out of their physiological context (e.g. being in its chromatin-bound or soluble state, which is a dynamic equilibrium for many TFs). Second, the bait is a piece of naked DNA, i.e. devoid of nucleosomes or other chromatin constituents. Therefore, capture of a particular TF (or failure to do so) does not have a direct biological meaning. As a result, the set of TFs that is identified in the end provides a fingerprint that may be used for certain classification procedures; however it does not indicate expression level in the respective tissues, or propensity to be associated with chromatin in vivo. The authors should emphasize this to make the reader aware of this.
Reply: Indeed, catTFRE is an in vitro method and has the limitations mentioned by the reviewer mentioned. In order to demonstrate the feasibility and accuracy of catTFRE approach in dissecting the endogenous TF activity and biological features in proteome scale, we have performed the following procedures and evaluations: 1. While we acknowledge that catTFRE is an in vitro binding method, our previous data demonstrated that our approach is able to monitor the biological response of TF dynamic changes. For instance, we utilized the catTFRE approach to analyze dynamic changes of  2. We agree that naked DNA does not represent the natural state of DNA in a living cell as compared to a nucleosome template. We have investigated the difference between naked DNA and nucleosomes in our previous paper in Molecular Cell (PMID: 23850489). We performed DNA-pulldown with naked DNA or nucleosome assembled with core histone octamers and tested them on CoR-ERα-ERE complex formation. On nucleosomal EREs, we were able to detect 16 of the 18 CoRs seen on the naked EREs ( Figure CL2). The main effect of the nucleasomal DNA seemed to decrease the amount of TFs bound on the DNA, and thus decrease the signal in mass spectrometry. This makes sense as nucleosomes are known to inhibit TF DNA binding. Considering that we want to construct a TF atlas in mouse tissues with the deep TF coverage, we used naked DNA instead of nucleosomes. In addition to our studies, utilizing naked DNA as bait to survey protein-DNA interactions was a conventional To make the readers aware of these limitations, we added the statement above and the references in the discussions. Reply: To eliminate the ambiguity, we uniformly use "TF DNA-binding activity" in the revision.
To evaluate the quantification capability of catTFRE, we measured the saturation curve of catTFRE.
We performed serial dilution experiments with 3 pmol of catTFRE DNA (the exact amount used throughout the study) with different amount of NE (200ug, 500ug, 1mg, 2mg and 5mg) from mouse brain tissue. As shown in the figure CL3, the total MS signal of TFs (chromatographic peak area) has high correlation coefficient (R 2 =0.959) with the total NE amounts. Notably, an excellent linear response was obtained when NE amount ranged from 1mg to 5mg. Based on these results, we used 3 pmol of DNA and 2mg of total NE for screening the TF atlas of mouse tissues.
We also surveyed individual TFs in the dilution experiments and found good linear response characters ( Figure CL4).

Figure CL3
Quantitative feasibility and linearity of catTFRE strategy evaluated by dilution analysis.
Different amounts of NE extracted in brain were used as shown. Total peptide AUC (area under curve) was calculated. TFs selected in Figure CL3 were calculated.
Taken together, we have demonstrated that the catTFRE approach can sensitively and accurately monitor the abundance and DNA-binding activity dynamics of TFs with dilution and many "proof of principle" experiments. Also, we evaluated the saturation curve to set up optimized conditions for DNA pull-down MS pipeline. Please see Supplementary Fig. 1 in the revision.
We appreciate the reviewer's comment for precisely pointing out the shortcomings of the catTFRE. We added this limitation to the discussion section in the revision.  As expected, TFs detected by proteome profiling tend to be the high abundant ones in the catTFRE dataset ( Figure CL6). We also calculated correlation coefficients for the 13 overlap tissues in both datasets (Table CL 1) and found that the Spearman's rank correlation coefficient ranged from 0.046 (liver) to 0.401 (spleen), suggesting a poor correlation between TF expression levels and their DNA-binding activities.

Figure CL6
TFs detected in profiling data are in higher abundance part of TFRE data. Y-axis showed TF rank in TFRE data. Red boxes are overlapped TFs by profiling data and TFRE data. P value was calculate using Wilcoxon rank-sum test. to examine some of the "important" TFs in the 13 overlapped tissues. As showed in table CL2, the catTFRE approach detected most of the "important" TFs in tissues (76 out of 85), while the profiling data only identified few of them (6 out of 85).  In summary, comparison between the catTFRE and the profiling datasets indicated that the catTFRE could more accurately monitor the TF binding activities and represent the biological features of endogenous TFs in the tissues. We acknowledge that the comparison is not entirely fair as the mass spectrometry technique has made great advancement and the profiling data was collected with last generation mass spectrometer. Nevertheless, we added the comparison between catTFRE and profiling results in the revision and pointed out the differences in technology used. Please see Supplementary Fig. 1 and Supplementary Data 1 in the revision.
Q4: It is unclear which TF response elements were included, and how many of the identified proteins corresponded with these.

Reply:
We referred to TF binding database JASPAR to select consensus TFREs for different TF families. To design the catTFRE construct, we used 100 selected TFREs and placed two tandem copies of each sequence with a spacer of three nucleotides in between, resulting in a total DNA length of 2.8 kb. In the mouse TF atlas, we identified 87 identified TFs whose response elements correspond to the designed TFREs. Moreover, we also identified large number of additional TFs whose response elements were not included in the catTFRE sequence. In the previous work (Proc Q5: In addition to the above, the authors only mention the number of TFs that were identified, and not the total number of proteins that were co-isolated (as interaction partners or contaminants).
Reply: Indeed the catTFRE pulled down many transcriptional co-regulators (TCs) and other DNA binding proteins (DBP). In the mouse TF atlas, we identified 523 TCs, ranging from 63 in skeletal muscle to 366 in thymus (Table CL3). Figure CL7 summarizes the distribution of TCs detected in the 32 mouse tissues. Similar to TF's pattern, an L-shaped distribution pattern was observed among the 32 tissues. Interestingly, TC showed a lower tissue-specificity score (TSPS) than the TFs (P = 3.86E-16), indicating their ubiquitous distribution ( Figure CL8). Among them, six subunits of Mi-2/NuRD complex (Rbbp7, Hdac1, Hdac2, Mbd2, Rbbp4 and Mbd3) were identified.
Mi-2/NuRD is an important protein complex coupling chromatin remodeling ATPase and chromatin deacetylation functions, and plays an essential role in gene expression through epigenetic regulation. We added the description of TCs in Supplementary Fig. 2 in the revision.   # The ratio of abundance was the amount of TFs in total proteins.
Q6: The clustering of nuclear receptors (Fig 4a) results in a slightly different classification than previously proposed (page 9). However these groupings are not mutually exclusive as they are assigned by function and location, respectively.

Reply:
The slightly different classification between our dataset and previously proposed (Cell.  Focusing on ttmTFs, we found that expression of the liver ttmTFs was significantly decreased compared with that of the non-ttmTFs after PHx, indicating liver cells lost their identity undergoing drastic perturbations like PHx. Six members of hepatocyte nuclear factor family in liver ttmTF group were markedly down regulated in 12h and 3 days after PHx, and displayed a tendency to return to the original and stable state in the terminating phase ( Figure CL11).  We have added the new analysis and the reference above in the revision. Please see Fig. 7 and Supplementary Fig. 7 in the revision for more details.
Q9: The liver is a highly heterogeneous tissue. Can the effects observed in the liver regeneration experiment be ascribed to any particular cell type?
Reply: This is a very importantly issue -cell type resolved proteomics. Indeed, the liver consists cell-type enriched TFs were observed to be up-regulated in the process (Table CL4). It seemed that more LSEC enriched TFs were up-regulated in the process. Since LSEC also expressed more TFs than the other cell types, we normalized the number of up-regulated cell type enriched TFs with the number of TFs detected in the cell and displayed the result in Figure CL13. LSEC and KC seemed to the cell types that displayed more dynamic regulation in the process.  Reply: We agree with the reviewer's point and understand his/her hesitation about using the word "hierarchical networks". We used the term for lacking of a better or more precise word. We thus removed the adjective "hierarchical" in the revision. References: 1 Reviewer #2: In this paper the authors employ the catTFRE approach recently developed by their group to identify the repertoire of TFs active in the nucleus of 24 adult and 8 fetal mouse tissues. This approach, that is based on TF enrichment using tandem repeats of binding sites and mass spectrometry, is a great improvement from whole cell/tissue proteomics as a large fraction of TFs are below the detection limit in these methods, and from expression profiling as there is low correlation between TF mRNA levels and TF activity. This dataset, for which the authors generated an easy to navigate web page, will certainly be useful for the scientific community and serve as a framework to study tissue-specific gene regulatory networks. Having said this, there are several analyses that need to be modified or added to support the several conclusions made by the authors to be acceptable for publication. Limitations of the method should also be discussed.
Reply: Many thanks for reviewer's positive comments. We have added the limitations of catTFRE approach in the discussion section.   Taken together, we have demonstrated that the catTFRE approach can sensitively and accurately monitor the abundance and DNA-binding activity dynamics of TFs with dilution and many "proof of principle" experiments. Also, we evaluated the saturation curve to set up optimized conditions for DNA pull-down MS pipeline.
To make the readers aware of these potential biases, we have emphasized the limitations in the discussion section.    We added the explanation for the setting of thresholds in the revised manuscript. Reply: In the revision, we deleted the overstatements or not-so-accurate statements to ensure the integrity of the study. We have deleted this statement "more abundant TFs regulate the busy and important functions of the tissues" in the revision, as it is just an assumption.  (Table CL6). For example, Module #1 containing 6 TFs that are mainly expressed in tongue and skin is related to muscle contraction and keratinocyte differentiation; the functional correlation to nervous system of Tead2, Bach1, Notch3 and Dlx5 in Module #5 was also reported, whereas another member Fosl2 was not reported yet. Module #31 is enriched in brain, eye and spinal cord; its members Nkx2-2, Sox2 and Arnt2 are essential for brain development and the central nervous system.
We have added the details in the revision in Page 11 and Supplementary Data 4.  Reply: We investigated the connection between different TFs sub-groups. As shown in figure   CL16, the connection between ubiquitous TFs and ttrTFs was more frequent than expected by chance (ratio = 17.5%, P < 0.001). We think it is better to calculate statistical significance of connection between ubiquitous TFs and non-ubiquitous TFs, instead. As shown in the figure CL16, connection between ubiquitous TFs and non-ubiquitous TFs was more frequent than expected by chance (ratio = 54.5%, P < 0.001) and higher than other two groups (ubiTFs-ubiTFs, non-ubiTFsnon-ubiTFs). We have added this in the revision (Fig. 5d).  (Table 1), further revealing the value of the nominated ttmTF list.
We also use other analysis strategy to evaluate the correlation between the ttmTF function and the tissue features. For example, we submitted target genes co-regulated by two ttmTFs for Reactome analysis. Reactome terms that are enriched in dual-ttmTF target genes represent the major function of the tissue (Fig. 7c), suggesting ttmTFs that we nominated may carry out essential functions in the tissue.
We apologize for not explaining the number of ttmTFs properly. The ttmTF groups behave quite diverse among different tissue. Even though a total of 286 ttmTF were nominated, accounting for 30% of total identified TFs, the percentage of ttmTF in a particular tissue is low -an average of 18 (only take up 10% of identified TFs) in each tissue.   (Table CL4). * Total includes DNA binding protein, TC and TF; there were some overlap among them.
# The ratio of abundance was the amount of TFs in total proteins.
We have added the TC and DBP quantitative identification and analysis in the revision, and also updated them in the TF Atlas website.

Q4: Line 130: DBTF is not defined.
Reply: DBTF is the abbreviation for DNA binding transcription factor. We have annotated this in the revision. To eliminate of ambiguity, we have added an abbreviation form in the revision (Table   CL7). We have added the abbreviation index in the revision. Reply: FOT is the abbreviation of fraction of total. FOT is defined as a TF's iBAQ divided by the total iBAQ of all identified proteins in a particular tissue. Its definition is included in Table CL7.

Q6: In line the 155 the authors mention nuclear receptors (NRs) but in line 133 they talk about
NHRs. Consistency should be kept throughout the manuscript.
Reply: Thanks for the comment. To keep the consistency, we uniformly use "nuclear receptor (NR)" in the revision.
Q7: What is the difference between ttrTFs and ubiquitous-non-uniform TFs? Some of the definitions are confusing and there are many acronyms in the paper making it hard to read.

Reply:
We apologize for the confusion. In the revision, we have explained the terms used in this study, as follows: TtrTFs: tissue type restricted TFs; TFs that are expressed in a particular tissue at levels that are at least 10 times higher than the median value of all adult tissues.
Ubiquitous TFs: TFs with a transformed median expression value of > 0.5 were considered ubiquitous TFs.
Ubiquitous-non-uniform TFs: Among ubiquitous TFs, the expression of twenty-seven TFs exhibited a maximum value of less than 10 times the median value, indicating a ubiquitous-uniform distribution.
Indeed, there is a considerable overlap between these two categories. However, these two methods classified TF patterns from a different perspective.
We have updated this item explanation of the TF sub-groups in the Methods section.
Q8: Some figures lack appropriate labels, and larger fonts would benefit reading.       We have updated these figures and made the correct labels in the revision.
Q9: In line 223 the authors say they detected 47 NRs from 32 tissues. But in the following sentence they talk about half of adult tissues (24 in total). This is confusing.
Reply: In the "Transcription Network of the NRs" section, we investigated the NR DNA binding activities throughout the all tissues that we measured. We have removed the inaccurate words "in the adult animal".