Unraveling a tumor type-specific regulatory core underlying E2F1-mediated epithelial-mesenchymal transition to predict receptor protein signatures

Cancer is a disease of subverted regulatory pathways. In this paper, we reconstruct the regulatory network around E2F, a family of transcription factors whose deregulation has been associated to cancer progression, chemoresistance, invasiveness, and metastasis. We integrate gene expression profiles of cancer cell lines from two E2F1-driven highly aggressive bladder and breast tumors, and use network analysis methods to identify the tumor type-specific core of the network. By combining logic-based network modeling, in vitro experimentation, and gene expression profiles from patient cohorts displaying tumor aggressiveness, we identify and experimentally validate distinctive, tumor type-specific signatures of receptor proteins associated to epithelial–mesenchymal transition in bladder and breast cancer. Our integrative network-based methodology, exemplified in the case of E2F1-induced aggressive tumors, has the potential to support the design of cohort- as well as tumor type-specific treatments and ultimately, to fight metastasis and therapy resistance.

The authors generated a very comprehensive map of E2F1 molecular interactions and using this tool they identified receptor signatures that are related to EMT, tumor invasion and aggressiveness. The interactive map that the authors constructed is a very impressive and useful tool that will be of great use to researchers studying the molecular biology of cancer and in particular the RB/E2F pathway, which is a pivotal pathway in tumorigenesis. This is a novel tool. As the authors themselves indicate, a previous study has generated an E2F1 interaction map a few years ago, however, the current tool is much more advanced and updated and includes additional layers (for example non-coding RNAs). Undoubtedly, this important tool will be of great help to the scientific community. The authors have tested experimentally some of the network's tumor -specific predictions and it is very encouraging and reassuring to see that the experimental data fully support the network's predictions with respect to the involvement of specific proteins in tumor invasion. The authors analyze patients' data and show that differences in the expression levels of the relevant genes (such as E2F1, TGFBR and FGFR1; E2F1, TGFBR and EG FR) affect patient survival. Overall, this is a very important study and its conclusions are novel and very interesting. Importantly, the conclusions are of clinical significance since the application of the model based signatures for patients' classification can affect their anti-cancer treatments.
A number of concerns that should be addressed prior to publication: 1) While the network map is a very important and useful tool it should be improved to make it more user-friendly. A couple of suggestions: one issue is that it is very difficult to follow the lines connecting the nodes and to understand from the map what each line/arrow means. It will be most useful if one could just click on each line and get the two nodes it connects. Also, when one clicks on a node (a protein for example) one often gets a reference but unfortunately this is usually just a reference for that protein and not the paper(s) that explain how this node (protein) is connected/related to E2F1. For some of the nodes there is no refer ence at all. This should be added.
2) In figure 7 some of the differences are not huge and they are based on a small number of patients. It is important to demonstrate the statistical significance of the data and include the statistical analysis in the paper.
A more minor concern: At the end of the result section the authors say "Our analysis reveals that an invasive tumor phenotype in bladder cancer is mainly driven by E2F1, TGFBR and FGFR1, while in case of breast cancer it is driven by E2F1, TGFBR and EGFR." C learly these proteins are very important players in the tumor specific invasion process but saying that the invasive process "is mainly driven " by them might be an over-interpretation of the data and the authors should rephrase this sentence.

Reviewer #2 (Remarks to the Author):
In this manuscript the authors have used varied datasets to generate a comprehensive examination of the role of E2F1 in cellular functions. This is nicely complemented by in vitro experiments to test some of the hypotheses developed after analysis of the modeling. This work clearly illustrates how E2F1 regulates tumor progression and metastasis through a wide variety of functions. On the whole, this is an important approach to show the diversity of E2F1 regulated (and regulating) activities as approached to the traditional view of E2F1 being a cell cycle regulator.

Major Points
The applicability of the method to other networks is unclear given the fact that the E2Fs are involved in such diverse functions, partially as a result of the number of transcriptional targets that they recognize. To validate the method, identification of another network, potentially involved in EMT to complement the manuscript, should be identified. This should be done for a gene with a well-defined role in a small number of functions.
Perturbation of the networks identified in Figure 3 by removal (knockout / knockdown) of a gene, followed by alterations to the network should be included as a validation. The data with knockout is available for many of the genes in GeoDatasets and the authors could present how the networks have changed. Alternatively, the authors could C RISPR out one of the central genes in the network and measure the both the transcriptional and the EMT / MET response.

Minor Points
In Figure 1, I wanted to explore in detail but the link was not active. I had to manually type the link to have it work. Likely just an editing issue since the line number also appeared in the broken link. In addition, is there a long term hosting option available -perhaps on the journal website instead of at the Rostock university site? Too many websites are not maintained following publication and this resource should be. Figure 2a -this should be expanded to numerous additional cell lines to determine if E2F1 expression is commonly observed in the EMT-like lines. One of each, as shown, is interesting but not convincing. This could be combined with gene expression data from the C C LE.
The legend for Figure 2 should be more clear -it wasn't until I read the methods that I found that the authors had used an ER responsive adenovirus.
The findings in Figure 6/7 should be confirmed in a separate (larger) dataset. This confirmation would be appropriate for supplemental data section.
The logic based EMT prediction (Figure 7) should be confirmed. This could be done using the TC GA data where matched histology is available to go with the gene expression data. The EMT nature should be confirmed by a pathologist.
C hromosomal stability studies out of the C hellappin lab could also be tied to Figure 1, potentially in results or discussion.
Grammar should be carefully checked throughout the manuscript as phrasing was awkward in many sentences.

Reviewer #3 (Remarks to the Author):
This manuscript is extremely confusing, for I cannot tell from my reading how the network is constructed and computed. The definition and use of feedback loops requires ascertaining directionalities among all the nodes putatively involved, and there is insufficient inf ormation provided to offer convincing evidence that these directionalities have in fact been determined firmly. Moreover, the nodes are described to comprise mRNA, miRNA, proteins, and phosphoproteins, yet the relationships among these --including time-scales of influences --are not compellingly resolved. Accordingly, the results and conclusions proffered are difficult to accept.

Reviewer #4 (Remarks to the Author):
The work by Khan et al. addresses an important line of research involving the extraction from biological computational models clinically relevant observations. In this specific case, the authors study a system involving the E2F family of Transcription Factors, whose deregulation plays an important role in cancer. In particular, they build an interaction network from literature and different databases, which they later convert to a logic -based model. These models are tailored towards bladder and breast cancer with help from gene expression data. From these models, one signature of receptor proteins is identified for each cancer. This signatures affect invasion assay results on bladder and breast cell lines as predicted. They also show power to discriminate between good and bad progression-free survival outcome of bladder and breast cancer patients in data from two published studies.
However, we could not find a particularly important result neither in terms of the biology of the cancers under study nor their therapeutic opportunities. Methodologically, the paper uses existing methods or small variations of them. In general, the methods chosen along the way have not been compared to existing ones (not even elaborated the reasons to using these over others).
In addition, we have several specific questions and concerns outlined below.
Major comments: 1 -It is not clear how the boolean rules have been decided. In page 10, line 212 -214, it is stated that "We derived Boolean functions (…) based on stoichiometric information (…)". I missed a more in depth explanation of how this is actually achieved. 1

Reviewer #1 (Remarks to the Author):
The authors generated a very comprehensive map of E2F1 molecular interactions and using this tool they identified receptor signatures that are related to EMT, tumor invasion and aggressiveness.
The interactive map that the authors constructed is a very impressive and useful tool that will be of great use to researchers studying the molecular biology of cancer and in particular the RB/E2F pathway, which is a pivotal pathway in tumorigenesis. This is a novel tool. As the authors themselves indicate, a previous study has generated an E2F1 interaction map a few years ago, however, the current tool is much more advanced and updated and includes additional layers (for example non-coding RNAs). Undoubtedly, this important tool will be of great help to the scientific community.
The authors have tested experimentally some of the network's tumor-specific predictions and it is very encouraging and reassuring to see that the experimental data fully support the network's predictions with respect to the involvement of specific proteins in tumor invasion. The authors analyze patients' data and show that differences in the expression levels of the relevant genes (such as E2F1, TGFBR and FGFR1; E2F1, TGFBR and EGFR) affect patient survival.
Overall, this is a very important study and its conclusions are novel and very interesting. Importantly, the conclusions are of clinical significance since the application of the model based signatures for patients' classification can affect their anti-cancer treatments.
A number of concerns that should be addressed prior to publication: 1) While the network map is a very important and useful tool it should be improved to make it more user-friendly. A couple of suggestions: one issue is that it is very difficult to follow the lines connecting the nodes and to understand from the map what each line/arrow means. It will be most useful if one could just click on each line and get the two nodes it connects. Also, when one clicks on a node (a protein for example) one often gets a reference but unfortunately this is usually just a reference for that protein and not the paper(s) that explain how this node (protein) is connected/related to E2F1. For some of the nodes there is no reference at all. This should be added.
Reply: Thank you for your suggestions and comments. Initially, we used an automatic edge routing layout in CellDesigner software where interactions are placed in such a way to not to cross nodes. However, for large networks, interactions are represented by long links which are very difficult to navigate and hard to follow. To enhance visibility and ease to navigate the interactions, we manually changed the layout so that interactions are represented by a direct link between nodes, and made them transparent where they pass through a node. Moreover, to reduce the load of links on the map layout, we have now grouped all E2F1 regulated genes from functional and regulatory modules and placed them close to E2F1 and coloured them according to their modules. We have also rearranged nodes in various regulatory and functional modules to improve the readability. We have included references to all reactions present in the map. If the user will now click on the reaction checkbox in the right panel and select any of the reactions in the map, information about the reaction type (e.g. state transition, positive/negative influence), reactant(s), product(s) and modifier(s) will be shown in a pop-up window. We also provide 2 CellDesigner compatible file for further use. The latest version of the map is now accessible through https://navicell.curie.fr/pages/maps_e2f1.html.
2) In figure 7 some of the differences are not huge and they are based on a small number of patients. It is important to demonstrate the statistical significance of the data and include the statistical analysis in the paper.
Reply: We agree with the reviewer about the limitations of original Figure 7 due to the small number of patients. To address this, we now validated our findings in larger patient cohorts of bladder and breast cancer taken from TCGA and included them as new Figure 8 in the manuscript. More precisely, we selected two subgroups of patients in bladder cancer where the individual gene expression of E2F1, TGFBR1 and FGFR1 was above (signature group) or below (signature* group) the mean expression values, respectively. Using Pearson's chi-squared test, we found that our two signatures are able to distinguish patients into early vs advanced stages in bladder cancer and aggressive vs less-aggressive stages in breast cancer significantly (p-value < 0.005).
At the end of the result section the authors say "Our analysis reveals that an invasive tumor phenotype in bladder cancer is mainly driven by E2F1, TGFBR and FGFR1, while in case of breast cancer it is driven by E2F1, TGFBR and EGFR." Clearly these proteins are very important players in the tumor specific invasion process but saying that the invasive process "is mainly driven" by them might be an overinterpretation of the data and the authors should rephrase this sentence.
Reply: This sentence has been rephrased.

Reviewer #2 (Remarks to the Author):
In this manuscript the authors have used varied datasets to generate a comprehensive examination of the role of E2F1 in cellular functions. This is nicely complemented by in vitro experiments to test some of the hypotheses developed after analysis of the modeling. This work clearly illustrates how E2F1 regulates tumor progression and metastasis through a wide variety of functions. On the whole, this is an important approach to show the diversity of E2F1 regulated (and regulating) activities as approached to the traditional view of E2F1 being a cell cycle regulator.

Major Points
The applicability of the method to other networks is unclear given the fact that the E2Fs are involved in such diverse functions, partially as a result of the number of transcriptional targets that they recognize. Since there were no homogeneous data available with respect to our cell models in GEODatasets for these genes, we performed perturbation experiments by knockdown of these genes (single and double knockdown) followed by determining transcription and invasiveness. Our results are shown as new Figure 6 in the revised manuscript.

Minor Points
In Figure 1, I wanted to explore in detail but the link was not active. I had to manually type the link to have it work. Likely just an editing issue since the line number also appeared in the broken link. In addition, is there a long term hosting option available -perhaps on the journal website instead of at the Rostock university site? Too many websites are not maintained following publication and this resource should be.
Reply: We apologize for the broken link to access our map. The mishap was the result of the conversion of our manuscript into PDF. Meanwhile, an interactive version of our map is now submitted to NaviCell repository and can be accessed from https://navicell.curie.fr/pages/maps_e2f1.html. Besides, xml files of the map are also available as Supplementary materials. Finally, the E2F1 map, the related data and most of the source code for the analysis made in the paper can also be downloaded from https://sourceforge.net/projects/e2f1map.  Figure S5). These results clearly underscore that high E2F1 expression is commonly observed in EMT-like cell lines.
The legend for Figure 2 should be more clear -it wasn't until I read the methods that I found that the authors had used an ER responsive adenovirus.
Reply: We updated the legend for Figure 2.
The findings in Figure 6/7 should be confirmed in a separate (larger) dataset. This confirmation would be appropriate for supplemental data section.
The logic based EMT prediction (Figure 7) should be confirmed. This could be done using the TCGA data where matched histology is available to go with the gene expression data. The EMT nature should be confirmed by a pathologist.
Reply: We have validated our findings in larger patient cohorts of TCGA bladder and breast cancer and included statistical analyses as shown in the revised Figure 8. We used histologic data available from TCGA to classify bladder cancer patients into early and advanced stages. For breast cancer, we found that PAM50 staging correlates well with patient survival and was used by us to distinguish between aggressive and less-aggressive types. Furthermore, the EMT nature of the patients' subgroups was confirmed using expression values of known EMT markers (CDH1, miR-205, CDH2, VIM, SNAI1, SNAI2, TWIST1 and ZEB1) in both cancer types (Supplementary Figure S7). We found that the molecular signatures predicted by our workflow are able to classify patients from the TCGA cohorts into early vs advanced stages in bladder and aggressive vs less-aggressive stages in breast cancer.
Chromosomal stability studies out of the Chellappin lab could also be tied to Figure 1 Commun.) is not connected to E2F and therefore not included.
Grammar should be carefully checked throughout the manuscript as phrasing was awkward in many sentences.
Reply: We have grammatically and linguistically revised the whole manuscript.

Reviewer #3 (Remarks to the Author):
This manuscript is extremely confusing, for I cannot tell from my reading how the network is constructed and computed. The definition and use of feedback loops requires ascertaining directionalities among all the nodes putatively involved, and there is insufficient information provided to offer convincing evidence that these directionalities have in fact been determined firmly.
Reply: The distinctive feature of our map is that the interactions included have been manually curated by experts on cancer biology, while most of the previously published comprehensive networks of similar scale are automatically generated. In line with the reviewer's comment, we have also carefully manually checked the given references to provide directionalities of the interactions. Further, and in order to assess the quality control, we randomly selected ~10% of total interactions and asked two independent domain experts to crosscheck them based on the literature references provided. Over 98% of the interactions were correctly derived from the literature.
Moreover, the nodes are described to comprise mRNA, miRNA, proteins, and phosphoproteins, yet the relationships among these --including time-scales of influences --are not compellingly resolved.
Accordingly, the results and conclusions proffered are difficult to accept.  Table 1. Our results indicate that including time-influence will not change the model output.

Reviewer #4 (Remarks to the Author):
The work by Khan et al. addresses an important line of research involving the extraction from biological computational models clinically relevant observations. In this specific case, the authors study a system involving the E2F family of Transcription Factors, whose deregulation plays an important role in cancer. In particular, they build an interaction network from literature and different databases, which they later convert to a logic-based model. These models are tailored towards bladder and breast cancer with help from gene expression data. From these models, one signature of receptor proteins is identified for each cancer. This signatures affect invasion assay results on bladder and breast cell lines as predicted. They also show power to discriminate between good and bad progression-free survival outcome of bladder and breast cancer patients in data from two published studies.
However, we could not find a particularly important result neither in terms of the biology of the cancers under study nor their therapeutic opportunities. Methodologically, the paper uses existing methods or small variations of them. In general, the methods chosen along the way have not been compared to existing ones (not even elaborated the reasons to using these over others).
Reply: There are several motivations why we think our results are interesting: 1. Our map reflects the diverse crosstalk of the E2F family members with several important signaling pathways associated with cancer progression. The distinctive feature of our map is that the interactions included have been manually curated by experts on cancer biology. In contrast, most of the previously published comprehensive interaction networks with similar scale were automatically generated. In line with this, when we assessed the quality on the network by randomly selecting ~10% of total interactions and asking two independent experts to crosscheck them based on the literature provided, over 98% of the interactions were correctly derived from the literature. We believe that this is hardly achieved by automatically generated maps currently developed.
2. In this manuscript, we especially focused on the newly discovered role of E2F1 in malignant progression and found out how E2F1 co-operates tumor type-specifically to drive invasion and metastasis. In fact, our map has a great potential for dissemination and re-use by the community to investigate other processes related to E2Fs such as chemoresistance or angiogenesis. Further, we here used the map to investigate E2F1-associated malignant progression in two tumor entities, but the workflow proposed can be applied to other tumor entities in which E2F1 can play a similar role, thereby increasing the potential for re-use by the community (comment introduced on page 21).
3. The distinctive feature of the methodology used is that it integrates coherent workflow tools coming from data analysis, bioinformatics and mathematical modelling. Precisely and to the best of our knowledge, we do not find in the literature a precedent of combining network-based high throughput 4. Our methodology includes an innovative element in terms of network reduction, namely the use of an algorithm employing multi-objective optimization concepts to rank and select key regulatory motifs, based on network topology features and expression profiles. To the best of our knowledge, this has not been explored before in the context of cancer.
In addition, we have several specific questions and concerns outlined below.
Major comments: 1 -It is not clear how the boolean rules have been decided. In page 10, line 212-214, it is stated that "We derived Boolean functions (…) based on stoichiometric information (…)". I missed a more in depth explanation of how this is actually achieved.
Reply: The Boolean rules were manually assigned. Precisely, upon the selection of the relevant network motifs, we established the Boolean rules based on the network structure and the inspection of the available literature about the interactions (comment included on page 26). We think this approach is consistent with the one followed to reconstruct the comprehensive regulatory map from which the Boolean models were derived. All Boolean rules are provided in Supplementary Excel file 2.
2 -How do random signatures obtained from the core network predict survival? This is necessary to assess the capability of the model to give a good signature.
Reply: In order to assess the predictive capability of our workflow to find potential molecular signatures, we generated 30 random signatures of three nodes from each of the regulatory cores. We arbitrarily assigned high or low expression states with respect to their mean expression value and identified their capability to distinguish patients into various clinical stages. For each signature (e.g. A high AND B low AND C high) and corresponding signature* (A low AND B high AND C low) set, we calculated the relative  (TGFBR1 and CXCR1   8 for bladder and HMMR, TGFBR2, IL1R1 and THRB for breast)? Also, why use expression fold change to select between TGFBR1 or TGFBR2 but not to select other receptors?
Reply: In order to capture all possible input signals to the regulatory network cores, we expanded the input layers by including additional receptors present in the comprehensive interaction network which are directly connected to nodes constituting the regulatory cores, irrespective of their expression profile.
Thus, we included TGFBR (connected to SMADs) and CXCR1 (connected to ZEB1 and SNAI1) in the bladder cancer model. Similarily, we expanded the breast cancer input layer with HMMR (connected to FN1), TGFBR (connected to SNAI1 and SNAI2) as well as IL1R1 and THRB (both connected to TRAF1 and MYC). As the TGFBR and CXCR families contain several members, we selected those with highest differential expression (ArrayExpress accession number: E-MTAB-2706).  In the revised version the authors have addressed all of the concerns I had regarding the previous version. As I have indicated previously, the interactive map that the authors constructed is a very impressive and useful tool that will be of great use to r esearchers studying the molecular biology of cancer and in particular the RB/E2F pathway, which is a pivotal pathway in tumorigenesis. Also, the experimental work performed by the authors fully supports the network's predictions with respect to the involvement of specific proteins in tumor progression and invasion.

Reviewer #2 (Remarks to the Author):
Accept

Reviewer #3 (Remarks to the Author):
The authors have provided responses to my previous criticisms. Unfortunately, I do not find these responses convincing because they are largely difficult to verify. For instance, with respect to the validity of the foundational network, to the text has now been added the following passage: "In order to assure the accuracy of the network, we randomly selected ~ 10% of the interactions and asked independent domain experts to cross -validate them. Over 98% of the interactions were derived correctly." This does not easily settle the kinds of concerns I raised, because the results could strongly depend on which 10% were chosen and who the domain experts were. What would be preferable for validation would be more system-wide experimental tests of model predictions, rather than the 'cherry-picking' approach shown in Figures 5 and 6. There are logic-based cell signaling model publications in the literature that do a more convincing job of model prediction testing, but interestingly the authors do not cite them. In the Discussion section the authors state that not many relevant logic modeling papers have been publi shed previously, and indeed cite very few. Perhaps they have missed some of this literature.

Reviewer #4 (Remarks to the Author):
We are thankful to the authors for their thorough response and review of the manuscript.
Theis version is significantly enhanced. The new manuscript, along with the response to us and other reviewers, clarifies the content, novelty, and significance of the work.
We are satisfied with the responses and we have no further requests.

Reply to Reviewer's comments
In the following, we address the comments from Reviewer #3. Comments are in black, while our replies appear in blue.
Question 1: the authors have provided responses to my previous criticisms. Unfortunately, I do not find these responses convincing because they are largely difficult to verify. For instance, with respect to the validity of the foundational network, to the text has now been added the following passage: "In order to assure the accuracy of the network, we randomly selected ~ 10% of the interactions and asked independent domain experts to cross-validate them. Over 98% of the interactions were derived correctly." This does not easily settle the kinds of concerns I raised, because the results could strongly depend on which 10% were chosen and who the domain experts were. What would be preferable for validation would be more system-wide experimental tests of model predictions, rather than the 'cherrypicking' approach shown in Figures 5 and 6. Reply 1: the quality control procedure used is analogous to the one followed in a recently published paper, in which it was constructed, annotated and curated a network accounting for activation of macrophages with a similar size in terms of nodes and edges and level of complexity (Journal of Immunology, 2017, PMID: 28137890). Here, 10% randomly selected interactions were assessed by three molecular oncologists from three different laboratories. Three reviewers have at least five years of experience each in molecular oncology. To our knowledge, a common accepted practice for quality control in many fields of science and technology relies on random selection of a significant sample and further independent assessment, as we did here.
We think that the content of Figures 5 and 6 is actually a common practice in molecular biology, in which a subset of relevant predictions are selected for further functional and molecular validation. This strategy has even been used before in the context of validating the predictions of Boolean networks (see for example PLOS Computational Biology, PMID: 17722974).

Question 2:
there are logic-based cell signaling model publications in the literature that do a more convincing job of model prediction testing, but interestingly the authors do not cite them. In the Discussion section the authors state that not many relevant logic modeling papers have been published previously, and indeed cite very few. Perhaps they have missed some of this literature.

Reply 2:
Unfortunately, some journals, including this one, have a strict limitation in the number of papers that can be cited. Thus, we cannot cite many papers that have made use of Boolean modelling for investigating intracellular regulatory networks. However, we have included a relevant recent publication on Boolean modelling and anticancer drugs (PMID: 28381545).