Myofibroblast transcriptome indicates SFRP2hi fibroblast progenitors in systemic sclerosis skin

Skin and lung fibrosis in systemic sclerosis (SSc) is driven by myofibroblasts, alpha-smooth muscle actin expressing cells. The number of myofibroblasts in SSc skin correlates with the modified Rodnan skin score, the most widely used clinical measure of skin disease severity. Murine fibrosis models indicate that myofibroblasts can arise from a variety of different cell types, but their origin in SSc skin has remained uncertain. Utilizing single cell RNA-sequencing, we define different dermal fibroblast populations and transcriptome changes, comparing SSc to healthy dermal fibroblasts. Here, we show that SSc dermal myofibroblasts arise in two steps from an SFRP2hi/DPP4-expressing progenitor fibroblast population. In the first step, SSc fibroblasts show globally upregulated expression of transcriptome markers, such as PRSS23 and THBS1. A subset of these cells shows markers indicating that they are proliferating. Only a fraction of SFRP2hi SSc fibroblasts differentiate into myofibroblasts, as shown by expression of additional markers, SFRP4 and FNDC1. Bioinformatics analysis of the SSc fibroblast transcriptomes implicated upstream transcription factors, including FOSL2, RUNX1, STAT1, FOXP1, IRF7 and CREB3L1, as well as SMAD3, driving SSc myofibroblast differentiation.

1. On page 8, the authors claim that SSc patients showed a lower proportion of fibroblasts compared to controls with a relatively modest increase in the proportion of keratinocytes. However, based on Figure 1 panel A and C, the number of keratinocytes (clusters 0, 1, 4, 11) are mainly from SSC samples -very little from control. Please explain the discrepancy (e.g. was the epidermis removed during the sample preparation?). It will also be helpful to quantify these cell populations and show in terms of pie or column charts to demonstrate 'modest' increase as stated by the authors. Similarly on page 11, it would be helpful to quantify the 'global' changes in fibroblast population and illustrate them as a figure with statistics to demonstrate significant increase in reticular dermis population but not DP and DS fibroblasts. For instance, Fisher's exact test to show increase in proportion of cell population may suffice.
2. CCL19 subpopulation in cluster 0 in figure 2C seems significant but the authors do not address this cell type/state in much detail. 4. Figure 4B, immuno-staining images are not convincing. The DAPI, which stains the nucleus, seems elongated and blurred. The zoomed image could also be selective and not representative of the whole tissue. Please provide a wider view of the skin to show a greater context to the area of interest. For instance, including the barrier between the dermis and epidermis. As a control, it will also be helpful to show the stainings (especially the negative expression of SFRP4 and SFRP2) in healthy donors or in not so deep part of the dermis.
5. The authors suggest the directionality of the pseudo time ( Figure 5) based on a subjective notion that SSc samples are found in cluster 4. Showing directionality using tools such as RNAVelocity would objectively support the author's claim that cells are transitioning from cluster 1, 3, and 4. In the same token, Figure 5 only highlights genes that are up-regulated across the pseudo time. Showing different trajectory patterns, ie. down regulation of genes that are associated with progenitor cells would also support the notion that the assumed directionality is correct. 6. A minor cluster (sub cluster 9) to be associated with proliferation is interesting but questionable. Single cell profiling platforms can easily produce artifacts or doublets with high expression of cell profiling markers. Since the numbers are low (39 in total), using flow cytometry methods to detect KI67 or other cell proliferative markers and myofibroblasts markers will be necessary. 7. How microarray and single cell gene sets were analyzed is completely missing in the methods section. Please clearly explain how the gene sets were selected and which statistics were applied to show enrichment of gene sets in bulk microarray data. Please define what authors mean by "signature".
8. The approach in which genes were selected and analyzed for SCENIC analysis is unclear and text is also confusing. Please provide a clearer explanation of gene sets used and which background was set to derive statistical significance. Few words are repeated in multiple occasions in the last paragraph of the results sections. Editing is advised. 9. In some analysis, UMAP clusters are used (e.g. MONOCLE) and in another analysis, TSNE clusters are used (e.g. SCENIC). Please use single definition of clusters across all analysis.
10. The claim that SFRP4 is a "robust immunohistochemical marker" for myofibroblasts seems far fetched based on single staining analysis. Demonstrating specificity across different tissue conditions (including healthy controls) and wider resolution to cover different areas of the skin will help to determine specificity and the robustness of the marker.
11. The claim that myofibroblasts differentiate in a "two-step" process from SFRP4/DPP4 expressing normal fibroblasts progenitors is also under substantiated. Rudimentary analysis based on SCENIC with arbitrarily selected genesets to infer TF regulons hardly proves that these progenitors are undergoing two distinct steps towards myofibroblasts. Either lineage trancing and/or ChIP seq analysis to demonstrate biding of regulations will strong support the claim made by the authors.
12. Surprisingly, most of CD45 immune cells are not detected. If they were removed during cell preparation, the authors should clearly state how they were removed in the methods section. If not removed, the data shows under-representation in the skin. We expect SSc patients to harbor larger number of CD45+ leukocytes and they will play a critical role in skin microenvironment and influence gene expression -how they contribute to the overall reactivity of myfibroblasts is missing in this manuscript.
Minor points: 1. The top section on page 12 seems redundant to the bottom section of page 11. Top section on page 12 also seems to skip figure 3. Similarly, Section: "Myofibroblasts show a discrete transcriptome" on page 13 is redundant. Similar claims were already made in page 12. This a technically extremely well and thoroughly performed study elucidating the progeny of activated fibroblasts in a human condition associated with fibrosis. In addition to confirmatory findings, the study makes numerous novel observations and creates data that will be of great importance to the field. However, although at a very high level, the study is mostly descriptive and does not contain any mechanistic approaches.

All
1. For instance, which of the myofibroblast-characteristic markers (e.g., SFRP4) or involved transcription factors could be used as targets in therapeutic approaches? How does interfering with the identified signaling pathways affect the myofibroblast marker profiles, if at all?
We agree this these are important questions to address in future work. We feel they are beyond the scope of the current manuscript.
2. Verification of the mRNA profiles at the protein and tissue level is missing with the one exception of an immunofluorescence co-staining of SFRP4, SFRP2, and smooth muscle alpha actin. Normal skin controls are missing in this figure.
We  Figure S3C. Thus, this fibroblast subpopulation (subcluster #2 in Figure  2 3. Along with the descriptive nature of the study, the purpose of the analysis not entirely clear. Was the aim to identify novel fibroblast/myofibroblast markers, targets, pathways and how will the findings be translated?
The scRNA-seq data advances our understanding of SSc myofibroblast biology in several areas. Defining the transcriptome-phenotype of myofibroblasts, previously only defined by staining with smooth muscle active provides a first critical step in understanding their biology. In addition, the data clarify an important point regarding myofibroblast origins, i.e. that they derive from fibroblasts and not myeloid, adipocyte, pericyte or epithelial progenitors. The bioinformatics analysis provides a map for identifying the transcription factors that regulate myofibroblast differentiation. The data support a two-step mode for their differentiation and show that type I collagen is upregulated in the first step.
4. It is curious that bona fide myofibroblast markers are not tested for their expression in different clusters (e.g., Figures 3 and 4). The authors introduce smooth muscle alpha actin, collagen 1A1, TNC, CDH11, THY1 as frequently used myofibroblast markers but it is not investigated in tSNE or UMAP analysis which fibroblast populations are characterized. Likewise, the authors do not compare their fibroblast clusters (and specific markers) with clusters that have been produced by other groups in different fibrotic disease context or lineage-tracing studies.
Since our original submission dot plots have emerged as a more compact visually informative way to show expression of genes in different clusters. Thus, we have added a figure ( Figure 3A), showing more easily accessible information about the markers of the different clusters, addressing this concern and other concerns below. This study is an extension of previous study on single cell RNA seq analysis of healthy normal skin (Tabib T., et al JID 2018). In this manuscript, the authors performed additional single cell RNA-seq analysis on skin biopsies of 12 patients with SSc and 10 control samples, 4 of which were added from the previous study. Based on clustering analysis, the authors reveal 10 fibroblast cell types in SSc that are also represented in control samples. Fibroblasts expressing high levels of SFRP2 represent the major population of dermal fibroblasts where a sub cluster positive for WIF1 is depleted while cluster expressing PRSS23 is increased in SSc skin fibroblasts. The sub cluster expressing SFRP2(hi) and PRSS23 correlate with the severity of SSC disease and authors apply Monocle to establish cell trajectory and suggest that SFRP(hi)PRSS23+WIF1-fibroblasts are the immediate progenitors of myofibroblasts, where minor population of them are highly proliferative in SSc skin. The authors further compare the data with bulk mRNA expression showing PRSS23 correlated highly with the MRSS and authors use SCENIC tool to infer transcription factor networks associated in SSC transition from control fibroblasts. The manuscript is a descriptive paper revealing single cell RNA expression profiling of human control and SSC skin.
Major concerns: 1. On page 8, the authors claim that SSc patients showed a lower proportion of fibroblasts compared to controls with a relatively modest increase in the proportion of keratinocytes. However, based on Figure 1 panel A and C, the number of keratinocytes (clusters 0, 1, 4, 11) are mainly from SSC samples -very little from control. Please explain the discrepancy (e.g. was the epidermis removed during the sample preparation?). It will also be helpful to quantify these cell populations and show in terms of pie or column charts to demonstrate 'modest' increase as stated by the authors. Similarly, on page 11, it would be helpful to quantify the 'global' changes in fibroblast population and illustrate them as a figure with statistics to demonstrate significant increase in reticular dermis population but not DP and DS fibroblasts. For instance, Fisher's exact test to show increase in proportion of cell population may suffice.
We have added these data as suggested. We have added a column chart to show the different proportions of cells in Figure 1D. We did not remove the epidermis during preparation of the samples.
The difficulty with seeing the proportions of each cell type clearly in Figure 1C is because there are more SSc total cells (from more patients n=12) than control cells (n=10), and the blue "SSc" dots tend to cover up the pink "control" dots in 1C. We had previously shown the differential cell proportions in the fibroblast clusters ( Figure 3B) We have added statistical analysis (Chi square and statistical comparisons between cell clusters).
2. CCL19 subpopulation in cluster 0 in figure 2C seems significant but the authors do not address this cell type/state in much detail.
Yes, we agree this cluster is potentially important, in part because it shows changes are occurring in multiple fibroblast populations. In the new dot plot Figure 3B Figure 4B, immuno-staining images are not convincing. The DAPI, which stains the nucleus, seems elongated and blurred. The zoomed image could also be selective and not representative of the whole tissue. Please provide a wider view of the skin to show a greater context to the area of interest. For instance, including the barrier between the dermis and epidermis. As a control, it will also be helpful to show the stainings (especially the negative expression of SFRP4 and SFRP2) in healthy donors or in not so deep part of the dermis. 5. The authors suggest the directionality of the pseudo time ( Figure 5) based on a subjective notion that SSc samples are found in cluster 4. Showing directionality using tools such as RNAVelocity would objectively support the author's claim that cells are transitioning from cluster 1, 3, and 4. In the same token, Figure 5 only highlights genes that are up-regulated across the pseudo time. Showing different trajectory patterns, i.e. down regulation of genes that are associated with progenitor cells would also support the notion that the assumed directionality is correct.
As the reviewer suggests, we have added WIF1, which goes down in expression and cells differentiate from fibroblasts to myofibroblasts. We have shown previously that expression of this gene correlates negatively with the MRSS (Arthritis Rheumatol 67, 3004-3015; 2015). We have simplified this figure ( Figure 5) so that the change in pseudotime of a restricted number of marker genes is more easily seen, placing the previous figure in the supplemental figures ( Figure S11).The claim in differentiation from healthy fibroblasts to myofibroblasts in the pseudotime analysis is based on the knowledge that SSc patients start with normal skin and thus normal fibroblasts. There is significant body of data supporting this assertion. So, there is logically a movement from normal healthy to diseased fibroblast transition and transcriptomes. As suggested, we have also added a Velocyto analysis, supporting the direction of cell differentiation to myofibroblasts ( Figure S12).
6. A minor cluster (sub cluster 9) to be associated with proliferation is interesting but questionable. Single cell profiling platforms can easily produce artifacts or doublets with high expression of cell profiling markers. Since the numbers are low (39 in total), using flow cytometry methods to detect KI67 or other cell proliferative markers and myofibroblasts markers will be necessary. 7. How microarray and single cell gene sets were analyzed is completely missing in the methods section. Please clearly explain how the gene sets were selected and which statistics were applied to show enrichment of gene sets in bulk microarray data. Please define what authors mean by "signature".

Given the very low numbers of dividing cells and the low numbers of cells digestible from scleroderma skin (4000-8000 cells), it is not practical to expect we would be able to see these cells by flow
We describe our analysis of single cell datasets in the Methods section "Single cell RNA-sequencing". Some of the methods for the microarray analysis were described in the Figure legend but  We use "signature" rather than "cluster" for the cluster of genes from the hierarchical clustering used for the AddModuleScore analysis to help distinguish these from the clusters and subclusters referred to on the t-SNE and UMAP plots. I felt that using the word "signature" would help distinguish this analysis.
8. The approach in which genes were selected and analyzed for SCENIC analysis is unclear and text is also confusing. Please provide a clearer explanation of gene sets used and which background was set to derive statistical significance. Few words are repeated in multiple occasions in the last paragraph of the results sections. Editing is advised.
We apologize for not being clearer about our methods here. I have added the following to the METHODS section. "For the SCENIC analyses, we used only cells from V2 chemistries, 4 control and 9 SSc samples. To begin clusters 1, 2, 3, and 4 were subsetted from the fibroblast dataset and all genes showing expression in at least one cell were analyzed. A second analysis was then carried out subsetting clusters 3 and 4, with cluster 4 further subsetted to delineate the SFRP4+ myofibroblast group ( Figure S13A). Again, all genes showing expression in at least one cell were included in the analysis. Finally, to focus on changes associated with SSc, a more restrictive gene list (984 genes) was compiled of: 1

) genes increased in SFRP2+ SSc cells (in clusters 3 and 4) compared to control SFRP2+ cells (in clusters 3 and 4, Bonferroni corrected Wilcoxon p<0.05); and 2) genes increased in SFRP2+SFRP4+ myofibroblasts compared to SSc SFRP2+SFRP4-cells (in cluster 3 and 4, Bonferroni corrected Wilcoxon p<0.05)." We have edited the paragraph as suggested
9. In some analysis, UMAP clusters are used (e.g. MONOCLE) and in another analysis, TSNE clusters are used (e.g. SCENIC). Please use single definition of clusters across all analysis. (Figure 2A) most highly expressing SFRP2. To provide more clarity, we have changed the labeling of these clusters so that they are consistently labeled across the figures, using selectively expressed marker genes.

I understand this is confusing, but the clusters we analyze in MONOCLE (cluster 1, 3 and 4) and SCENIC (clusters 1-4), are the same clusters (by number) as in the UMAP plot. It just happens that these are the four (out of 10 clusters) in the UMAP
10. The claim that SFRP4 is a "robust immunohistochemical marker" for myofibroblasts seems far fetched based on single staining analysis. Demonstrating specificity across different tissue conditions (including healthy controls) and wider resolution to cover different areas of the skin will help to determine specificity and the robustness of the marker.

We several years ago we described SFRP4 staining in both superficial and deep dermis (J Invest Dermatol. 2008;128(4):871-81). In that manuscript we show the staining in the superficial and deep dermis in control and SSc skin.
There is a population of fibroblasts in the papillary dermis that also stain SFRP4 in normal skin as seen in that paper and also in our scRNA-seq data. So, we agree robust is too strong a term and rather the transcriptome, representing the multiple myofibroblast genes are the robust markers I have removed this descriptor from the text.
11. The claim that myofibroblasts differentiate in a "two-step" process from SFRP4/DPP4 expressing normal fibroblasts progenitors is also under substantiated. Rudimentary analysis based on SCENIC with arbitrarily selected genesets to infer TF regulons hardly proves that these progenitors are undergoing two distinct steps towards myofibroblasts. Either lineage trancing and/or ChIP seq analysis to demonstrate biding of regulations will strong support the claim made by the authors.
The paradigm for two-step differentiation is based on the alterations in gene expression rather than the SCENIC data. The dot plot added to Figure 3A shows this more clearly. Lineage tracing and chip-seq are not possible using primary human fibroblasts, which we know from other work change their phenotype within a couple days after placing in tissue culture. The advantage of these scRNA-seq experiments is that they capture the transcriptome of the cells within a couple hours of their excision from skin biopsies.
12. Surprisingly, most of CD45 immune cells are not detected. If they were removed during cell preparation, the authors should clearly state how they were removed in the methods section. If not removed, the data shows under-representation in the skin. We expect SSc patients to harbor larger number of CD45+ leukocytes and they will play a critical role in skin microenvironment and influence gene expression -how they contribute to the overall reactivity of myofibroblasts is missing in this manuscript.

I'm not sure I understand why the reviewer states that most of the CD45 immune cells are not detected. We show these cells in clusters 7 (NK/T) and 9 (macrophages) in Figure 1. CD45 immune cells actually represent a relatively small proportion of the cells in SSc skin compared to many other pathologies, such as psoriasis or atopic dermatitis. Because of the complexity of the fibroblast analysis and of the myeloid cell populations in skin, we have prepared a separate manuscript describing myeloid populations in SSc skin. This is too large a topic to include in this manuscript.
Minor points: 1. The top section on page 12 seems redundant to the bottom section of page 11. Top section on page 12 also seems to skip figure 3. Similarly, Section: "Myofibroblasts show a discrete transcriptome" on page 13 is redundant. Similar claims were already made in page 12. Although there is some overlap between these analyses, the paragraph entitled "Myofibroblasts show a discrete transcriptome" focuses on the transcriptome of myofibroblasts, a critical outcome of the manuscript.
2. All Supplementary figures are missing. I was not able to access them. I'm not sure why these were not available. They will be again submitted with additions as noted above. The lack of availability of these Figures doubtless made it hard to follow parts of the manuscript.
3. There are too many UMAPS to juggle back and fourth to follow the results from the manuscript. Showing a single UMAP with SFRP2, PRSS23, SFRP4 and WIF1 expression will greatly help to understand the flow and intent of the authors. And move other parts into supplementary.

Figure 7c, show figure labels
This has been added 5. Please highlight UMAP with which V1 and V2 kits was used (in the supplementary). We have added this as Figure S5B.

What is a "link list" in page 25. Rank list?
The link list is the output of GENIE3 that tells us each gene's potential regulators and an associated weight based on the input data set. It lets us know, through the weight, how relevant a transcription factor/regulator is in relation to its target. The file has 3 columns with TF (transcription factor), Target (for the target gene), and weight. We create the transcription factor modules from this list.
The authors have not been dramatically responsive but I consider the work sufficiently novel and impactful to be published in Nat Comm.
Reviewer #2 (Remarks to the Author): 1. The authors addressed questions that were raised, however, the reviewer is still concerned regarding the statistics used to predict transcription factors in the SCENIC package and drawing general conclusions based on this.
1. Drop outs are a common problem of single cell RNA-seq thus how the data is filtered will dramatically influence the outcome of the analysis. As the authors noted, identification of specific motif can only be found when the number of input genes are reduced. In such case how do you adjust the background for statistical analysis, ie. is the same background applied for transcriptome and restricted set of genes? Please define background for each comparison and provide explanation how the enrichment is calculated in respect to the background. It should be noted that, while DoRothEA and D-AUCell rely on independent regulons, the SCENIC networks are constructed from the same dataset they are applied to. This poses the risk of overfitting. Please consider other tools for confirmation.
2. All genes showing expression in at least one cell seems very permissive as SCENIC and other regulatory analysis tools are easily influenced by the gene input. Please try with more robust cutoff and show significance of this analysis. Furthermore, the number of cells in each sub cluster are dramatically different. Unequal variance of cell number will influence the statistics to derive meaningful motifs. Please try using similar cell numbers (e.g. downsampling) and see if the conclusions are the same.
3. Why does SMAD3 only come up only when restricted genes are used as input? Again, this may be easily influenced by the input number and how the background is defined. To ensure SMAD3 is not an artifact, functional validation in culture fibroblasts or ChIP-seq validation (e.g. Cut & Tag) from tissue samples would substantially prove this.
2. The immune stating is still not convincing. The antibodies used in past studies differ from the one that is presented in this manuscript. Therefore a proper control is necessary.
In this paper: 3. There is a big discrepancy between the single cell RNA-seq data where SFRP4 is a small percentage of SFRP2, the immune-staining seem to suggest nearly 100% overlap between the two markers. This either indicates problems with the staining background or single cell data is not representative of cell population. Please clarify.
4. Figure 8 panel E and F should be swapped.

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): The authors have not been dramatically responsive but I consider the work sufficiently novel and impactful to be published in Nat Comm.
We thank the reviewer for agreeing that the manuscript is novel and impactful.
Reviewer #2 (Remarks to the Author): 1. The authors addressed questions that were raised, however, the reviewer is still concerned regarding the statistics used to predict transcription factors in the SCENIC package and drawing general conclusions based on this.
1. Drop outs are a common problem of single cell RNA-seq thus how the data is filtered will dramatically influence the outcome of the analysis. As the authors noted, identification of specific motif can only be found when the number of input genes are reduced. In such case how do you adjust the background for statistical analysis, ie. is the same background applied for transcriptome and restricted set of genes?
Response: Our the initial SCENIC analyses was carried out using all genes expressed by at least one cell (so these aren't filtered, since we are only removing undetected genes). From our previous response, "For the SCENIC analyses, we used only cells from V2 chemistries, 4 control and 9 SSc samples. To begin clusters 1, 2, 3, and 4 were subsetted from the fibroblast dataset and all genes showing expression in at least one cell were analyzed. A second analysis was then carried out subsetting clusters 3 and 4, with cluster 4 further subsetted to delineate the SFRP4+ myofibroblast group ( Figure S13A). Again, all genes showing expression in at least one cell were included in the analysis. Finally, to focus on changes associated with SSc, a more restrictive gene list (984 genes)was compiled of: 1

) genes increased in SFRP2+ SSc cells (in clusters 3 and 4) compared to control SFRP2+ cells (in clusters 3 and 4, Bonferroni corrected Wilcoxon p<0.05); and 2) genes increased in SFRP2+SFRP4+ myofibroblasts compared to SSc SFRP2+SFRP4-cells (in cluster 3 and 4, Bonferronic corrected Wilcoxon p<0.05)." The validity of the SCENIC analysIs is addressed further below, applying DoRothEA in a parallel analysis.
Please define background for each comparison and provide explanation how the enrichment is calculated in respect to the background.
In SCENIC workflow that we used to predict regulon (matching a TF to co-expressed regulated genes) activity, the input genes are first filtered by their availability in RicisTarget database, and then individual regulons are constructed from scRNA-seq data with GENIE3. In other words, the regulons are refined via RicisTarget by keeping genes that contain the respective TF binding motifs from the current analyzed dataset. Subsequently, statistical method AUCell is applied to score individual cells by assessing for each TF separately to define if the co-expressed genes are enriched in the top quantile of the cell signature. Since the AUCell scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure.
It should be noted that, while DoRothEA and D-AUCell rely on independent regulons, the SCENIC networks are constructed from the same dataset they are applied to. This poses the risk of overfitting. Please consider other tools for confirmation.
We thank the reviewer for the suggestion, and we have addressed this as follows: Since SCENIC networks are constructed from the same scRNA-seq dataset they applied to, this poses the risk of overfitting. Thus, we have performed DoRothEA 1, 2 , which relies on independent TF-targeted gene interactions (regulon activity) curated from various resources, such as literatures, ChIP-seq peaks, TF binding motifs and gene expression inferred interactions, to compare its results with SCENIC. Based on the number of supporting evidences, an interaction confidence level in calculated, ranging from A to E, with A being the most confidence interactions and E the least. VIPER is a statistical method that used in combination with DoRothEA to estimate TF activities from scRNA-seq expression data.
This approach emphasizes the potential importance of some of the TFs seen previously in our SCENIC analysis, prominently, HIF1A, STAT1 and FOSL2, as well as SMAD3. We have added this analysis as Figure  S17, and as mentioned below also on the basis of this and the other additional rigor added in response to the comments below expanded the discussion on these TFs.

1.
Garcia Response: To set a more robust cutoff, we adjusted the filtering to keep genes 1) with at least 6 UMI counts across all cells, and 2) detected in at least 1% of cells. We added figure S14 with SCENIC analyzing these filtered genes matched in RicisTarget database.
Furthermore, the number of cells in each sub cluster are dramatically different. Unequal variance of cell number will influence the statistics to derive meaningful motifs. Please try using similar cell numbers (e.g. downsampling) and see if the conclusions are the same.  Figure S15: Three independent sampling (73 cells for all three clusters) were preformed followed with the same filtering set above for running SCENIC workflow. Regulons up in SFRP4+ showed before, such as FOXP1, IRF7 and RUNX2, were showed up in 2 out of 3 downsampling, while other regulons like STAT1 that showed up twice here wasn't observed before without downsampling. We have added these results as Figure S15.

Response: We agree that the number of cells in subclusters
Finally, we would like to emphasize the results in Figure 8E, showing that several TFs predicted to be important through the regulon analyses, are upregulated at the gene expression level. While we realize increased gene expression does not necessarily mean more protein or enhanced TF activity, the combined data showing enhanced predicted motif activation along with together are supportive of the importance of these highlighted TFs in regulating myofibroblast differentiation.
3. Why does SMAD3 only come up only when restricted genes are used as input? Again, this may be easily influenced by the input number and how the background is defined. To ensure SMAD3 is not an artifact, functional validation in culture fibroblasts or ChIP-seq validation (e.g. Cut & Tag) from tissue samples would substantially prove this. 2. The immune stating is still not convincing. The antibodies used in past studies differ from the one that is presented in this manuscript. Therefore a proper control is necessary.  Figure S10C) showing the SFRP4 staining in the deep reticular dermis in SSc skin and not in normal skin, using the same antibody used in Figure 4B. Unfortunately, the antibody previously provided by Dr. R. Friis is no longer available (he retired several years ago and the antibody has been lost).

This question and comment appear less pertinent in view of the reanalysis using
3. There is a big discrepancy between the single cell RNA-seq data where SFRP4 is a small percentage of SFRP2, the immune-staining seem to suggest nearly 100% overlap between the two markers. This either indicates problems with the staining background or single cell data is not representative of cell population. Please clarify.
Response: This is clarified in the supplemental figure, mentioned in response to comment 2 ( Figure S10C). In Figure 4B we show the overlapping staining between SFRP2, SFPR4 and SMA (smooth muscle actin). The co-staining with these three markers is a key observation in the manuscript, since the other points ( -seq data and subsequently we were able to confirm not only that SFRP4 and SMA cells are the same, but also that SFRP4 (and SMA) staining cells also stain with SFRP2, strongly pointing to their cellular origin. However, most SFRP2 staining cells do not stain with SMA or SFRP4, and SFRP2 fibroblasts can easily be seen to stain throughout the dermis in normal skin.
To this end, we here respond to this critique using the same approach used in a recent Nature paper (1), in which RCisTarget (R implemetation of CisTarget, the same database used by SCENIC) was used to identify putative important TF, but then DoRothEA regulon genes were used to build an AP-1 module for probing the role of the TF in the dataset. As this manuscript is on a similar topic in renal disease, was submitted after ours, and currently published, it is concurrent and could therefore serve as a reasonable benchmark for the use of these bioinformatics methodologies. I have reproduced the methods section of this manuscript following, underlining the most relevant part to our further analysis of our SCENIC results.
"To obtain transcription factor scores in distal and proximal regions, we used the top 200 marker genes for fibroblast, pericyte and myofibroblast cell clusters as input gene lists to RCisTarget 77 . We followed the RCisTarget Vignette to perform the analysis with default parameters (available at https://bioconductor.org/packages/release/bioc/vignettes/RcisTarget/inst/doc/RcisTarget.html). To quantify AP-1 expression, we used all Jun and Fos genes as a geneset and applied the same method to obtain an AP-1 score as we did for ECM score. To quantify AP-1 activity (defined as the expression of putative target genes 78,79 , we defined AP-1 target genes according to the Dorothea regulon database 63,80 and applied the same method as ECM score to obtain a single cell AP-1 activity score" Thus, as an alternative approach to further assess the SMAD3 targets, and the same approach used by the recently published Nature paper on myofibroblasts in renal fibrosis (1), we have now examined expression of SMAD3 target genes in our single cell data, using the DoRothEA regulon database. Namely, we created a SMAD3 activity score using the 33, DoRothEA level A, SMAD3 target genes and applied this to our fibroblast data using the Seurat AddModuleScore, we plotted these activity scores on scRNA-seq UMAP feature plots. This plot supports a role for Smad3 in regulation of the transcriptomes of the SSc SFRP2+ cells as they mature into myofibroblasts. We have added this observation as supplemental Figure S19.
In addition, we have carried out RNA-seq experiments to test genes in myofibroblasts inhibited by SMAD3 siRNA compared to non-targeting siRNA and HRPT1 (control) treated siRNA (Table S10), and selected genes based on two criteria to create SMAD3 activity scores. Namely, we derived SMAD3 activity scores from: 1) SMAD3 siRNA treated myofibroblasts, filtered for absolute gene expression of non-targeting control siRNA >50 TPM, showing expression of SMAD3 siRNA treated cells of less than 0.8 of nontargeting control siRNA, and excluding genes showing expression less than 0.7 in HRPT1 (control) siRNA treated cells compared to non-targeting control siRNA (415 genes); or 2) SMAD3 siRNA treated myofibroblasts, filtered for absolute gene expression in non-targeting control treated siRNA cells of >100 TPM, showing expression of SMAD3 siRNA treated cells of less than 0.7 of non-targeting control siRNA, and excluding genes showing expression less than 0.7 in HRPT1 (control) siRNA treated cells compared to non-targeting control siRNA (74 genes). Using the Seurat AddModuleScore, we plotted these activity scores on scRNA-seq UMAP feature plots, adding these as Figure S20. Both of these data-driven SMAD3 activity scores indicate increased SMAD3 activity in SSc SFRP2+ fibroblasts and SScSFPR2+SFRP4+ myofibroblasts even more clearly than the activity score derived from DoRothEA.