FOXQ1 recruits the MLL complex to activate transcription of EMT and promote breast cancer metastasis

Aberrant expression of the Forkhead box transcription factor, FOXQ1, is a prevalent mechanism of epithelial-mesenchymal transition (EMT) and metastasis in multiple carcinoma types. However, it remains unknown how FOXQ1 regulates gene expression. Here, we report that FOXQ1 initiates EMT by recruiting the MLL/KMT2 histone methyltransferase complex as a transcriptional coactivator. We first establish that FOXQ1 promoter recognition precedes MLL complex assembly and histone-3 lysine-4 trimethylation within the promoter regions of critical genes in the EMT program. Mechanistically, we identify that the Forkhead box in FOXQ1 functions as a transactivation domain directly binding the MLL core complex subunit RbBP5 without interrupting FOXQ1 DNA binding activity. Moreover, genetic disruption of the FOXQ1-RbBP5 interaction or pharmacologic targeting of KMT2/MLL recruitment inhibits FOXQ1-dependent gene expression, EMT, and in vivo tumor progression. Our study suggests that targeting the FOXQ1-MLL epigenetic axis could be a promising strategy to combat triple-negative breast cancer metastatic progression.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection RNA sequencing data was collected by Illumina HiSeq 2000 platform. ChIP-seq data was collected Illumina 400 platform. Flow data was collected by BD FACS Diva 4.0 software. Proteomics Data was collected by Orbitrap Fusion™ Tribrid mass spectrometer with Xcalibur to operate the instrument (Thermo) nature portfolio | reporting summary

March 2021
Data analysis ChIP-qPCR/q-RT-PCR: The data were analyzed in Microsoft Excel (Version 16.40) and Prism 8 (Version 8.4.3). P-values were calculated by unpaired two-sided t-test. For >2 samples, multiple comparison was made to the respective control group and p-value was adjusted by Bonferroni correction. ChIP-seq: The data were analyzed on Galaxy (https://usegalaxy.org/), an open-soure web-based platform. Reads were mapped using Bowtie2 (Version 2.3.2.2) using the built-in Homo sapiens (b37): hg19 reference genome. ChIP-seq peaks were called from alignment results for each biological replicate using MACS2 (Galaxy Version 2.1.1.20160309.0) relative to input, control sample. Peak detection was based on FDR (qvalue) set to 0.001. The resulting bedgraph files were converted to bigwis using 'Wig/BedGraph-to-bigWig converter' (Galaxy Version 1.1.1). Enrichment on chromosome and annotation (CEAS) was conducted on peak BED files using Galaxy/Cistrome (https://cistrome.org/ap) CEAS version 1.0.0. Motif analysis was conducted using peak summits submitted to MEME Suite (Version 5.4.4) at http://meme-suite.org/tools/ meme-chip. RNA-seq: Data was analyzed using R Studio (Version 1.2.5033) and the Bioconductor package (Version 3.1.0). Paired-end reads were mapped to the hg19 human genome using Bowtie2 v2.2.9. The abundance was estimated using RSEM and the differential expression analysis was done using EdgeR v3.12.1 in the Bioconductor package. Proteomics Data analysis was performed first with Proteome Discoverer 1.4 (Thermo). Secondary analysis was performed using Scaffold 4.4.5 (Proteome Software). Flow data were analyzed on FlowJo v10 software.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy The source data underlying Figs Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
For in vitro studies, sample sizes (n=/>3). Sample size as estimated according to previous successful experience and to be large enough to obtain reproducible results. For in vivo studies, our prior studies have found an average of ~60 lung lesions per mouse in MDA-MB-231 xenograft mouse model with a standard deviation of 5. A sample size of 8 animals per group was selected and was determined to be sufficient to detect a difference of 1

March 2021
standard deviation units at 0.95 based on balanced one-way analysis of variance power calculation . Differences of this magnitude represent a minimum threshold that would provide any biological meaning.
Data exclusions No data were excluded from analyses Replication All in vitro experiments were performed using at least 3 biological replicates to ensure reproducibility. For in vivo experiement, each finding was confirmed in a independent and different xenograft model. Randomization All mice were randomly assigned into different experimental groups. For in vitro studies, all samples were analyzed equally with no subsampling. Therefore, there was no requirement for randomization.

Blinding
Investigators were generally not blinded as the experimental conditions required investigators to know the identity of the samples.
Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Validation All Antibodies were validated by the manufacturer. In addition, we validated that all antibodies showed the expected phenotype for a given assay. For almost all antibodies, we validated loss of antibody detection of protein following knockdown ofprotein levels . This was done by either western blot analysis, FACS or confocal microscopy. When we did not validate specificity by knockdown, as was the case for certain antibodies used for western blot analysis, we verified that the antibody yielded the expected March 2021 molecular weight and banding pattern.

anti-FOXQ1
We validated it by western blot in different cell models with OXQ1 knockdown and overexpression. anti-RbBP5 We confirmed that the RbBP5 band at ~75 kDa upon RbBP5 overexpression and knockdown by western blot. anti-ASH2L We confirmed that the ASH2L band at ~70 kDa upon ASH2L overexpression and knockdown by western blot. anti-WDR5 We validated that the WDR5 band at ~35 kDa upon WDR5 overexpression and knockdown by western blot. anti-H3K4me3 We validated this antibody's IP capability by using it in previously used cell lines and qPCR was performed to validate the results are same as previous results for a panel of genes.

anti-bactin
We validated a single band at around 45 kDa in different cell lines by western blot anti-N-Cadherin We observed a single band at the correct molecular weight by western blot anti-Vimentin We observed a single band at the correct molecular weight by western blot anti-Fibronectin We observed clean band at the correct molecular weight by western blot anti-Claudin-1 We observed a single band at the correct molecular weight by western blot anti-Occludin We observed a single band at the correct molecular weight by western blot anti-E-cadherin We observed a single band at the correct molecular weight by western blot anti-α-catenin We observed a single band at the correct molecular weight by western blot anti-b-catenin We observed a single band at the correct molecular weight by western blot anti-γ-catenin We observed a single band at the correct molecular weight by western blot anti-FLAG We validated it by observing correct molecular weight in western blot analysis for several Flag-tagged protein. We also tested Flag Ab by IP proteins tagged with Flag and confirmed in Western blot analysis. anti-Myc We validated it by observing correct molecular weight in western blot analysis for several Myc-tagged protein. We also tested Myc Ab by IP proteins tagged with Myc and confirmed in Western blot analysis. anti-HA We validated it by observing correct molecular weight in western blot analysis for several HA-tagged protein. We also tested Ha Ab by IP proteins tagged with Ha and confirmed in Western blot analysis. anti-V5 We validated it by observing correct molecular weight in western blot analysis for several V5 tagged protein. We also tested V5 Ab by IP proteins tagged with V5 and confirmed it in Western blot analysis. Anti-KMT2A/MLL1 Rabbit We observed a clean band at the correct molecular weight by western blot Anti-KMT2B/MLL2 Rabbit We observed a clean band at the correct molecular weight by western blot Anti-KMT2C/MLL3 Rabbit We observed a clean band at the correct molecular weight by western blot Anti-KMT2D/MLL4 Rabbit We observed a clean band at the correct molecular weight by western blot Anti-KMT2E/SET1A Rabbit We observed a cleanband at the correct molecular weight by western blot Anti-KMT2F/SET1B Rabbit We observed a clean band at the correct molecular weight by western blot

Authentication
Cells were authenticated by comparing them to the original morphological and growth characteristics and were verified using the GenomeLab short tandem repeat (STR) profiling (Beckman Coulter) with >90% match.

Mycoplasma contamination
All cell lines were tested for mycoplasma negative by DAPI stain and Immunofluorescence microscopy. Only mycoplasmanegative cells were used for research.
Commonly misidentified lines (See ICLAC register) No cells from this database were used.

Animals and other research organisms
Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research, and Sex and Gender in Research Laboratory animals Female NSG mice (8-10 weeks) were purchased from JAX (Jackson Labs).

Wild animals
This study did not involve wild animals

Reporting on sex
This study only used female mice because breast cancer is mainly a female disease.
Field-collected samples This study did not involve samples collected in the field.