Genomic studies controvert the existence of the CUX1 p75 isoform

CUX1, encoding a homeodomain-containing transcription factor, is recurrently deleted or mutated in multiple tumor types. In myeloid neoplasms, CUX1 deletion or mutation carries a poor prognosis. We have previously established that CUX1 functions as a tumor suppressor in hematopoietic cells across multiple organisms. Others, however, have described oncogenic functions of CUX1 in solid tumors, often attributed to truncated CUX1 isoforms, p75 and p110, generated by an alternative transcriptional start site or post-translational cleavage, respectively. Given the clinical relevance, it is imperative to clarify these discrepant activities. Herein, we sought to determine the CUX1 isoforms expressed in hematopoietic cells and find that they express the full-length p200 isoform. Through the course of this analysis, we found no evidence of the p75 alternative transcript in any cell type examined. Using an array of orthogonal approaches, including biochemistry, proteomics, CRISPR/Cas9 genomic editing, and analysis of functional genomics datasets across a spectrum of normal and malignant tissue types, we found no data to support the existence of the CUX1 p75 isoform as previously described. Based on these results, prior studies of p75 require reevaluation, including the interpretation of oncogenic roles attributed to CUX1.

Protein isoforms and splice variants often have important and distinct biological functions. Since protein isoforms, by definition, have considerable sequence homology and may be expressed at different levels within the cell, it can be challenging to accurately differentiate between and functionally characterize such isoforms 1 . One such protein reported to have multiple isoforms is CUT-like homeobox 1 (CUX1), a HOX-family transcription factor with critical roles in development and tumorigenesis. In vertebrates, the CUX1 locus contains two distinct genes that partially share exons: CUX1, which encodes a transcription factor localized to the nucleus, and CASP, which encodes a golgi-associated transmembrane protein involved in retrograde transport [2][3][4] . Altered levels and mutations of the CUX1 transcription factor have been implicated in cancer across several tumor types and species 5,6 . CASP, on the other hand, has not been implicated in human disease 7,8 . The RefSeq database documents seven mRNA isoforms for the human CUX1 locus; five of these are CASP transcripts and two are CUX1 (Fig. S1). Due to its relevance to human health, we focus our attention herein on CUX1. For the sake of simplicity, our subsequent references to the CUX1 gene or mRNA allude to those isoforms that encode CUX1, unless stated otherwise.
CUX1 is highly conserved, ubiquitously expressed, and essential for survival in mice and Drosophila 9 . CUX1 controls many cellular processes including determination of cell identity, cell cycle progression, cell-cell communication, and cell motility 9 . In cancer, however, there are conflicting reports of CUX1 acting alternately as an oncogene or tumor suppressor gene 6 . To resolve this discrepancy, we hypothesized that distinct CUX1 protein isoforms explain these disparate functions.
The two RefSeq-annotated CUX1 mRNA transcripts vary only by alternative first exons and encode a fulllength protein of 1505 amino acids length, described in the literature as p200 (Figs. 1a, S1). p200 CUX1 has four DNA-binding domains, comprised of three CUT-repeat domains and one homeodomain (Fig. 1a). A truncated p110 CUX1 isoform is generated by post-translational proteolytic processing of full-length p200 CUX1 by cathepsin L (Fig. 1a) 10 . This cleavage occurs during the S phase in normal cells, and can become constitutive in transformed cells 10,11 . p110 CUX1 lacks one CUT-repeat domain and the N-terminal region but retains the

Results
Human hematopoietic cells only express the p200 CUX1 isoform. Given the relevance of CUX1 to myeloid malignancies, we first sought to identify the CUX1 isoforms expressed in human acute myeloid leukemia (AML) cells. Immunoblotting with an antibody that recognizes an epitope shared across all CUX1 isoforms (clone B-10, Fig. 1b) reveals six of eight AML cell lines express a dominant p200 CUX1 band (Fig. 1c). We also observed a less prominent 75 kDa band in five cell lines (Fig. 1c). p200 was also the predominant isoform in primary human CD34+ hematopoietic stem and progenitor cells (HSPCs), the normal counterpart thought to give rise to myeloid malignancies (Fig. 1d). To discern if the p75 band we observed corresponds to that described previously, we assessed cell lines reported to express p75 or p110: murine NIH-3T3 fibroblasts (p110), and MCF7, T47D, and MDA-MB-231 human breast cancer cell lines (p75) 12,26 . In contrast to prior findings, we did not observe a short isoform protein band in any of these cell lines, using both the B-10 and the PUC antibodies (Fig. 1e, S2). The absence of p110 in NIH-3T3 cells was also previously observed 12 .
We next blotted AML cell lines to determine if we could detect the p75 band with other CUX1 antibodies. We used a polyclonal antibody we previously generated, PUC, that recognizes amino acids 1223-1242 of CUX1 27 , and ABE217, a polyclonal antibody raised against an epitope spanning amino acid 861 of CUX1 (Fig. 1b). There were several background bands observed with PUC, but no dominant p75 band (Fig. 1f). The faint band at 75 kDa seen with the ABE217 antibody cannot be the p75 isoform, as the antibody recognizes an epitope of CUX1 upstream of the p75 protein sequence (Fig. 1b, g). Thus, p200 is the predominant CUX1 isoform in human hematopoietic cells, and we did not detect p75 or p110 in cells previously reported to express short isoforms.
To circumvent potential antibody artefacts, we took an alternative approach to determine if the p75 band is in fact CUX1, by tagging the endogenous CUX1 allele with an in-frame C-terminal GFP tag by CRISPR/Cas9 homology-mediated repair. We used KG-1 cells, which express both p75 and p200 bands. KG-1 cells have a partial deletion of chromosome arm 7q that includes CUX1, thus they are mono-allelic for CUX1, enabling facile Human hematopoietic cells only express the p200 CUX1 isoform. (a) Schematic representation of the CUX1 mRNA. There are two CUX1 mRNA transcripts that vary only by the alternative first exons (1a and 1b). CUX1 encodes a full-length protein of 1505 amino acids which runs at 200 kDa (p200). A truncated p110 CUX1 protein is reported to be generated by proteolytic cleavage by cathepsin L. The p75 CUX1 isoform is reported to arise from an alternative transcription site embedded within intron 20. (b) Schematic representation of the predominant CUX1 protein isoforms, with protein domains indicated, and the CUX1 antibodies used in this study. (c) Immunoblot of CUX1 in the indicated human AML cell lines, using the B-10 antibody (n = 3). 10 μg of protein was loaded for the K562 and Kasumi-1 cell line, and 15 μg of protein was loaded for all other cell lines. (d) Immunoblot of CUX1 in primary human CD34 + HSPCs using the B-10-HRP antibody (n = 3). (e) Immunoblot of CUX1 in the NIH-3T3 fibroblast line and several human breast cancer cell lines previously reported to express p75 CUX1 using the B-10-HRP antibody (n = 3). (f) Immunoblot of CUX1 in indicated human AML cell lines, using the PUC antibody (n = 3). (g) Immunoblot of CUX1 in indicated human AML cell lines, using the ABE217 antibody (n = 3). (h) Immunoblot of GFP in a KG-1 cell line where endogenous CUX1 is C-terminally tagged with GFP. Protein from unedited KG-1 cells is also included (n = 3). Blot is cropped from the same gel to remove an intervening irrelevant lane.  www.nature.com/scientificreports/ CRISPR/Cas9 editing (Fig. S3). GFP-tagged CUX1 migrated at a higher molecular weight than endogenous CUX1, as expected (Fig. S4). Probing these cells with an anti-GFP antibody only identified a single unique p200 CUX1 band in the tagged cell line (Fig. 1h). Together, these data indicate that human AML and primary HSPCs express p200 and not shorter CUX1 isoforms.
CRISPR/Cas9 genomic editing precludes the existence of a CUX1 p75 isoform. To further interrogate if the p75 band is encoded by the CUX1 locus or is a non-specific western blotting artefact, we devised several CRISPR/Cas9 strategies to selectively target KG-1 genomic DNA encoding p75, p200, or both (Figs. 2a, d, 3a). As illustrated in Fig. 2a, we designed a gRNA targeting exon 4 of CUX1 which is only expressed in the genomic region encoding the p200 isoform. This would be expected to selectively introduce a frameshift muta- www.nature.com/scientificreports/ tion in the p200 protein only and thereby induce nonsense-mediated decay of only p200, leaving p75 intact. We identified two single-cell CRISPR-edited clones that had a complete loss of p200 CUX1 (C9 and H9), best appreciated with the ABE217 antibody (Fig. 2b). The B-10 antibody also shows a loss of p200, with a residual non-specific band migrating at a slightly higher molecular weight (Fig. 2c). The expression level of the p75 band was unchanged, as expected based on our targeting strategy (Fig. 2c).
As a control, we designed a gRNA targeting exon 23 of CUX1 (gEx23.1) to introduce frameshift mutations in and edit all CUX1 transcripts (Fig. 2d). As exon 23 is shared by all isoforms, including p75, it would be expected to disrupt all bona fide CUX1 proteins. Transfection with gEx23.1 followed by single cell cloning identified a clone with a single base pair insertion generating a frameshift mutation. We blotted the gEx23.1-edited KG-1 clone with ABE217 and observed that the band for p200 CUX1 shifted downward, consistent with a predicted C-terminal truncation of ~ 28 kDa (Fig. 2e). The B-10 antibody binds an epitope after the exon 23 gRNA cut site, thus all gEx23-edited isoforms will be undetectable with B-10 ( Fig. 1b). Indeed, probing with B-10 demonstrates abolished expression of the p200 CUX1 band, yet persistent p75 (Fig. 2f). The fact that the 75 kDa band remained indicates that it is not encoded by the CUX1 locus.
We similarly targeted CUX1 using two different exon 23 gRNAs in primary human CD34+ HSPCs, and saw no change in expression of any bands other than p200 CUX1 (Fig. S5). The residual p200 protein in the gEx23 edited lanes is consistent with ~ 75-80% editing in these bulk populations. Overall, these data indicate that hematopoietic cells express p200 CUX1 and not p75.
We considered the possibility that p75 does not contain exon 23, perhaps due to alternative splicing. As a different approach, we designed a pair of gRNAs flanking the predicted p75 intronic TSS reported to be ~ 2.5 kb upstream of exon 21 (Fig. 3a) 12 . We reasoned that eliminating the putative intronic TSS would eliminate transcription of the p75 isoform while leaving p200 unperturbed. We deleted approximately 2.56 kb of intronic DNA, leaving 79 base pairs intact proximal to exon 21. Using a PCR strategy to screen the expected deletion, we generated a successfully deleted single-cell clone (KG-1 Δp75 #21, Fig. 3b). Immunoblotting, however, indicated no change in p75 (Fig. 3c). This suggests that this 2.56 kb segment of DNA does not harbor the putative p75 TSS, and is incongruent with the prior report 12 . In summary, these experiments show that the presumptive p75 isoform contains neither exon 23 nor a TSS within 2.56 kb upstream of exon 21 as originally described, further implicating the p75 band as a western blotting artefact.
Proteomics approaches do not support the existence of p75 CUX1. We considered that the p75 artefact results from the denaturing conditions of western blotting. To test this, we performed a CUX1 immunoprecipitation which is performed with proteins in their native state. After immunoprecipitation of CUX1 in KG-1 cells with the B-10 antibody, we probed the resulting blots again with B-10. While the p75 band is present in the input control, we do not observe it in the immunoprecipitate (Fig. 4a). This result is consistent with p75 being an artefact of western blotting.
We turned to unbiased proteomics to search for any CUX1 peptides within the 75 kDa region. We subjected KG-1 whole cell lysates and CUX1 immunoprecipitates to SDS-PAGE. We excised regions corresponding to 200 kDa and 75 kDa and performed LC-MS/MS (Fig. 4b). In the 200 kDa region after CUX1 immunoprecipitation, we observed 51 peptides (21 unique, shown as blue rectangles, Fig. 4c; Table S1) summed across replicates that mapped to CUX1/CASP protein sequences. Because p200, p75 and CASP partially share exons, some peptides are ambiguous and thus map to both p200 and CASP or both p200 and p75. However, as CASP is a 77 kDa protein and p75 is 75 kDa, we can infer that all ambiguous peptides in the 200 kDa region are in fact p200. In the 75 kDa region of the immunoprecipitated samples, we observed one peptide that ambiguously mapped to the N-terminal region of CUX1/CASP, but observed no peptides mapping to the p75 region ( Fig. 4c; Table S2). We think this might be a degradation product from the p200 band for this sample, as it was a low-scoring peptide of low intensity, compared to the corresponding peptide in the p200 sample. These data confirm that p75 CUX1 does not immunoprecipitate with anti-CUX1, consistent with Fig. 4a.
We assessed the proteomic analysis of whole cell lysates, which are agnostic to antibody selection (Fig. 4b, first two lanes). The p200 band contained 8 peptides (5 unique) which, based on a priori knowledge of relative protein migrations, we ascribed to p200 ( Fig. 4d; Table S3). Within the p75 band isolated from lysates, we observed 7 peptides (3 unique) that all unambiguously map to CASP. We did not detect any peptides from the p75 region in the lysates that mapped to p75 CUX1 ( Fig. 4d; Table S4). Taken together, we conclude that there is no proteomic evidence of the p75 isoform in a representative AML cell line that possesses the p75 protein band.
No p75 CUX1 is detected at the RNA level in human AML and breast cancer cell lines. It remained possible that the p75 protein is below the level of detection of mass spectrometry. As a more sensitive test, we looked for the presence of p75 CUX1 at the mRNA level by RT-PCR, using two primer sets (pairs 2 and 4 and pair 3 and 4) spanning intron 20 and exon 22 (Fig. 5a, b). Primer pair 3 and 4 was previously reported to detect the p75 transcript 12 . We assessed cDNA from three AML cell lines (K562, Kasumi-1, and KG-1) and three human breast cancer cell lines described to express p75 (T47D, MDA-MB-231, and MCF-7). Primers for GAPDH and all CUX1 isoforms (primer pair 1 and 4 and pair ex23 and ex24) served as positive controls. We did not detect any bands with the previously reported p75 primers (3 and 4) or with our primers (2 and 4, Fig. 5c). We confirmed that primer 4 was functional, as it successfully amplified p200 (primer pair 1 and 4), indicating that the lack of a p75 transcript was not due to primer design issues. Overall, we do not detect a p75 transcript in either human AML cell lines or the breast cancer cell lines previously reported to express p75 12 .
Functional genomics consortia datasets lack epigenetic or transcriptional evidence for a CUX1 p75 intronic transcriptional start site. Heretofore, our analysis has encompassed thirteen cell types. We www.nature.com/scientificreports/  www.nature.com/scientificreports/ sought to extend our analysis to comprehensively assess additional normal and malignant tissue types. As such, we leveraged consortia-generated functional genomics datasets for evidence of p75 CUX1 across a variety of cell types. We first assessed epigenetic marks that canonically decorate promoters. Promoters and enhancers both have H3K27ac, H3K4me3 and H3K4me1 deposition, although enhancers have higher levels of H3K4me1 28 . All three of these marks are present at the p200 TSS across seven Tier 1 ENCODE cell lines (Fig. 6a). However, we did not observe any H3K4me3 peaks in the intron 20 region of CUX1, including in MCF7, previously reported to express p75 (Fig. 6a). We observed some H3K27ac and H3K4me1 peaks in intron 20, but these peaks were not at the predicted TSS and not in MCF7 cells (Fig. 6a). Additionally, we also looked at the ChromHMM track in MCF7 cells. ChromHMM is a computational approach that integrates experimental ChIP-seq datasets for different histone marks into a hidden Markov model to assign chromatin states genome-wide 29 . The MCF7 ChromHMM track shows an active TSS (red region) at exon 1a and 1b but only weak transcription (dark green)  Fig. 6a). Thus, epigenetic features of promoters are not present at the putative p75 TSS, even in a cell type documented to express p75. Promoters are also characterized by accessible chromatin, transcription factor binding, RNA-polymerase II (POL2RA) occupancy, and CpG islands [30][31][32][33] . In line with this, the p200 TSS has pronounced DNase hypersensitivity, transcription factor occupancy, POLR2A peaks, and a CpG island (Fig. 6a). However, there is a paucity of these signals in intron 20 and they are not enriched in the previously mapped p75 TSS. There is one DNase accessibility site in MCF7 near the predicted p75 TSS, but it does not have any of the other features of a promoter. These data are all inconsistent with an intron 20 alternative TSS.
Efforts have been made to comprehensively annotate promoters by synthesizing functional genomic datasets and curated databases. One such catalog is the Eukaryotic Promoter Database (EPD), which correctly annotates both exon 1a and 1b but not any promoters within intron 20 ( Fig. 6a "EPDnew Promoters" track) 34 .
We confirmed this observation with next-generation sequencing-based cap-analysis gene expression (CAGEseq), which captures 5′ capped mRNA transcripts to map TSSs 35 . CAGE-seq identifies 5′ capped mRNA in 5 different ENCODE cell lines (GM12878, H1-hESC, K562, HepG2 and MCF7) at exons 1a and 1b of CUX1 yet none within intron 20, including in MCF7 (Fig. 6a). Similar results were observed from CAGE-seq data from an atlas of 975 human primary cells and cancer cell lines (Fig. 6b) 36 . In aggregate, these data do not support an alternative intronic TSS.
The presumptive p75 transcript expresses intronic sequence proximal to exon 21 12 . We mined RNA-seq datasets to identify reads spanning this region in MCF7 and T-47D human breast cancer cell lines reported to express p75 6,12,37 . All exon 21 sequencing reads abruptly end at the intron 20/exon 21 border, inconsistent with the p75 transcript (Fig. 6c). In the MCF7 cell line, we observed some sequencing reads within the intron 20 region, but these were not conserved across four different replicates and were not contiguous with exon 21 reads 37 (Fig. 6c). These data do not support the existence of a p75 mRNA containing intronic sequence.

Discussion
Next-generation sequencing and CRISPR/Cas9 genome editing has revolutionized biomedical research in many ways. Perhaps less appreciated, however, is the role of these technologies in establishing the legitimacy of research findings. CRISPR/Cas9 editing, for instance, has invalidated cancer dependencies, drug targets, and viral receptors, as some illustrative examples [43][44][45] . In this report, we employ the power of functional genomics and CRISPR editing to demonstrate that the CUX1 gene does not encode a p75 isoform as described 12 . This conclusion is buttressed via multiple orthogonal approaches including biochemical studies with several antibodies, proteomics and extensive mining of functional genomic datasets across a plethora of cell types. Taken together, our data is inconsistent with a p75 CUX1 isoform arising from an intronic TSS and suggests that prior reports were based on a western blotting artefact 12 .
Other studies support our conclusion. p75 was first identified in HeLa cells, HEK293 cells, breast cancer cell lines and mouse thymus 12 . The original manuscript identifying human CUX1 generated antiserum against the entire CUX1 protein, yet western blotting of HeLa cells identified only p200 46 . Probing HEK293 cells with an antibody against the C-terminus region of CUX1 shows no p75 in another study 19 . In a report that p75 CUX1 causes polycystic kidney disease, the endogenous p75 expression was never documented at the protein level; all subsequent experiments were performed by over-expressing p75 cDNA 47 . Probing a western blot of entire Drosophila embryos for the highly conserved CUX1 ortholog, Cut, only reveals the full-length protein 48 . While p200 CUX1 protein increases after TGF-β treatment in normal lung fibroblasts, p75 does not 49 . Finally, in a study that reported that androgen-resistant prostate cancer cell lines upregulate p200, p75 was unchanged 50 . Collectively, these studies either fail to document an endogenous p75 protein, or uncouple the biology of p200 from p75.
It is unclear what the p75 cDNA product previously reported represents 12 . Perhaps it is a result of recursive splicing, where long introns are spliced in a sequential manner in tissue-specific contexts leading to a lag in www.nature.com/scientificreports/ splicing of intron 20 material 51 . Alternatively, studies of nascent transcription indicate that splicing does not always occur in the order of transcription, and introns that are spliced later temporally tend to be longer and have higher RNA-binding protein occupancy 52 . In keeping with this notion, intron 20 is the sixth longest intron in CUX1, and has an elevated RNA-binding protein occupancy compared to most other introns in the gene and thus may be spliced later than other introns in CUX1 52 . It is conceivable that the p75 cDNA product previously observed represents an intermediate, incompletely spliced p200 cDNA. In our analysis of hematopoietic cells, we only document the expression of the p200 CUX1 isoform. We cannot comment on the validity of p110 or other short isoforms (p80, p90 and p150). As these isoforms are generated post-translationally, functional genomics datasets are ineffective in determining their legitimacy. Future studies of these putative isoforms in other tissue types should employ stringent techniques such as CRISPR/Cas9 mutagenesis to ensure against being misled by western blotting artefacts.
Our finding calls into question studies that ascribed oncogenic functions to p75 CUX1. Many of these publications did not study endogenous p75, but instead employed overexpression models, which can confound results 12,18,19,53,54 . We speculate that overexpression of CUX1 has dominant negative effects. For instance, in mice, both overexpression of p75 and knockout or knockdown of CUX1 leads to myeloproliferative disease 19,24,55 . One interpretation of these seemingly incongruent findings is that artificial overexpression of p75 either interferes with the stoichiometry of endogenous CUX1 protein complexes or blocks full-length CUX1 from binding to its target genes. Indeed, p75 CUX1 has increased DNA-binding affinity compared to endogenous p200 CUX1 56 . The net effect of CUX1 overexpression may be the disruption of endogenous CUX1 tumor suppressor activity.
In support of this model, the p150 CUX1 isoform was found to exert a dominant negative phenotype upon p200 CUX1 14 . In this light, the use of overexpression systems to characterize p110 may also misattribute this isoform with oncogenic properties 16,18,53,57,58 .
There are relatively fewer reports that p200 CUX1 is oncogenic. p200 CUX1 has been shown to promote cell line migration, invasion, and evasion of apoptosis 17,59 . p200 transgenic mice develop organ hyperplasia 60 . 7q copy number gains and CUX1 overexpression has been documented in primary cancers 6,61,62 . However, these later findings should be interpreted with caution. Chromosome 7 also encodes oncogenes, including EGFR (on 7p), and BRAF, CDK6, and EZH2 (on 7q). Thus, in cancers with chromosome 7 copy number gains, the driver may be a true oncogene, while CUX1 is a passenger. Indeed, rigorous pan-cancer gene-level analysis of copy number alterations and mutation patterns in primary patient samples reveal CUX1 genetic changes are significantly characteristic of a tumor suppressor gene [20][21][22] . There is now a growing body of work that CUX1 is tumor suppressive 21,[23][24][25]27,63 .
Given the importance of CUX1 in development and disease across a wide variety of tissue types, it is critical to carefully dissect and understand the genomic structure of the CUX1 locus and encoded protein. The complexity of the gene has led to confusion in the field resulting in serious inaccuracies, most recently by Xu, et. al. 64 . We expect that our current study will help rectify these obstacles going forward.
Human mobilized peripheral blood CD34+ HSPCs from multiple healthy donors were purchased from the Fred Hutchinson Co-operative Center for Excellence in Hematology (Seattle, WA, USA). CD34+ HSPCs were expanded in StemSpan SFEMII base media supplemented with CC110 culture supplement for 1-3 days prior to electroporation.
Generation of CUX1-GFP tagged KG-1 cell line. pCUX1.1.0-gDNA (Addgene plasmid #112434; RRID: Addgene_112434) and pCUX1-donor plasmids (Addgene plasmid #112338; RRID: Addgene_112338) were a gift from Kevin White. The CUX1 homology arms were chemically synthesized by Gibson assembly and comprised 0.6 kb upstream and 1 kb downstream CUX1 sequence flanking the stop codon. The GFP tag is a LAP tag that contains a TEV protease cleavage site, an S peptide, the EGFP coding sequence, followed by an IRES and a kanamycin resistance element 65 . The LAP-GFP tag was PCR amplified from a separate plasmid, and assembled with the CUX1 donor sequence using Gibson assembly. KG-1 cells were transfected with 0.5 μg of each plasmid using the Neon® Transfection System (Invitrogen by Life Technologies, Waltham, MA, USA). The electroporation settings used for transfection was 1650 V, 20 ms pulse width, 1 pulse, and electroporation was performed according to manufacturer instructions. Cells were cultured in 0.5 mg/ml G418 after 7 days for 3 weeks to select for a transfected population. Primers used to confirm correct integration of the GFP tag are listed below: www.nature.com/scientificreports/ for 20 min, with frequent vortexing, and clarified by centrifugation (5000 g, 10 min at 4 °C). Total protein of the resulting supernatant was quantified using the Bradford assay at 595 nm wavelength, with BSA used to generate the standard curve. 10-15 µg of protein was subjected to SDS-PAGE and probed with anti-CUX1 antibody conjugated to HRP (B-10-HRP, mouse mAb derived against aa 1308-1332, Santa Cruz, 1:1000 in 5% milk/TBST) and visualized using ECL substrate. β-Actin was detected with anti-β-actin-HRP (C4, Santa Cruz, 1:3000 in 5% milk/TBST). Other antibodies used to probe for CUX1 expression include ABE217 (rabbit polyclonal antibody derived against aa 861, 1:1000 in 5% milk/TBST), and PUC (rabbit polyclonal antibody generated in-house that recognizes aa 1223-1242, 1:1000 in 5% milk/TBST). GFP (D5.1) rabbit mAb #2956 (Cell Signaling Technology, Product #2956S, 1:1000 in 5% FBS/TBST) was used to probe for GFP-tagged CUX1. Nitrocellulose membrane (Thermo Scientific, Catalog number: 88018) was used for protein transfer and ECL substrate (Thermo Scientific, Catalog number: 34579) was used for visualizing the proteins of interest.
gRNA design. All gRNAs were designed using the Broad Institute's sgRNA designer tool, and generated gRNA sequences were verified using Synthego's Verify Guide Design tool. gRNA sequences used in this study were purchased from Synthego, and the sequences are listed below: Reverse-transcriptase PCR. RNA was extracted from 500,000 cells of the K562, Kasumi-1, KG-1, T47D, MDA-MB-231 and MCF-7 cell lines using Trizol, precipitated using chloroform and 70% ethanol, and purified using the RNeasy Mini kit (QIAGEN cat no 74104). 500 ng of RNA from each cell line was used to synthesize cDNA using the Thermo Scientific Maxima™ H Minus cDNA Synthesis Master Mix Kit (Thermo Scientific cat no M1661). 1 µL of this synthesized cDNA was then used in the qPCR reaction. Primers were used specific to the p200 CUX1 isoform, the p75 CUX1 isoform, and GAPDH primers as a housekeeping control. The primers for the p75 transcript were obtained from the paper that originally described the existence of this isoform 12 . In addition, we also designed our own primers spanning various regions of intron 20 of CUX1. Controls tested include a no template water control, and a no RT enzyme control. 30 cycles of PCR were performed. Primer sequences are listed below: GAPDH forward primer: 5′-ACC ACA GTC CAT GCC ATC AC-3′. GAPDH reverse primer: 5′-TCC ACC ACC CTG TTG CTG TA-3′. p200 + p75 CUX1 forward primer: 5′-CCG GAG GAG AAG GAG GCG CT-3′. p200 + p75 CUX1 reverse primer: 5′-AGC TGT CGC CCT CCG AGC TG-3′. Immunoprecipitation. CUX1 was immunoprecipitated from the KG-1 cell line, and then blotted for CUX1 again to query whether short CUX1 isoforms were able to be pulled down by the anti-CUX1 antibody. 100 × 10 6 cells were spun down for a CUX1 pulldown and a control IgG pulldown each. Cells were lysed in hypotonic buffer (5 mM EDTA, 5 mM EGTA, 5 mM Tris-Cl) with protease inhibitor added (Roche complete mini-EDTA free). Pellets were passed through a 20-gauge needle and incubated on ice, then spun down. The supernatant was removed, and the pellet was resuspended in RIPA buffer with protease inhibitor added (Roche Complete). Pellets were again passed through a 27-gauge needle, incubated on ice and subsequently spun down. The supernatant was collected, and then incubated overnight at 4 °C on a rocker with either 12 µg of the anti-CUX1 antibody (B-10, Santa Cruz) or a rabbit IgG antibody. Protein A/G beads (Santa Cruz) were then added the following day to the supernatant, and incubated at 4 °C on a rocker for 1 h. The immunoprecipitated protein was then spun down, washed in cold PBS, resuspended in loading buffer, and subjected to SDS-PAGE.

LC-MS/MS via MaxQuant.
LC-MS/MS was performed using adapted methods previously published 67 .
In brief, peptide samples were re-suspended in Burdick & Jackson HPLC-grade water containing 0.2% formic acid (Fluka #60-006-17), 0.1% TFA (Pierce #28903), and 0.002% Zwittergent 3-16 (Millipore Sigma #693023), a sulfobetaine detergent that contributes the following distinct peaks at the end of chromatograms: MH+ at 392, and in-source dimer [2 M+ H+] at 783, and some minor impurities of Zwittergent 3-12 seen as MH+ at 336. The peptide samples were loaded onto a 100 μm × 40 cm PicoFrit column self-packed with 2.7 μm Agilent Poroshell 120, EC-C18, washed, then switched in-line with a 0.33 uL Optimize EXP2 Stem Traps, packed spray tip nano column packed with Halo 2.7 μm Pep ES-C18, for a 2-step gradient. Mobile phase A was water/acetonitrile/ formic acid (98/2/0.2) and mobile phase B was acetonitrile/isopropanol/water/formic acid (80/10/10/0.2). Using a flow rate of 350 nL/min, a 90 min, 2-step LC gradient was run from 5% B to 50% B in 60 min, followed by 50-95% B over the next 10 min, hold 10 min at 95% B, back to starting conditions and re-equilibrated. Electrospray tandem mass spectrometry (LC-MS/MS) was performed at the Mayo Clinic Proteomics Core on a Thermo Q-Exactive Orbitrap mass spectrometer, using a 70,000 RP (70 K Resolving Power at 400 Da) survey scan in profile mode, m/z 340-1800 Da, with lockmasses, followed by 20 MSMS HCD fragmentation scans at 17,500 resolution on doubly and triply charged precursors. Single charged ions were excluded, and ions selected for MS/MS were placed on an exclusion list for 60 s. An inclusion list (generated with in-house software) consisting of expected Cux1 sequences was used during the LC-MS/MS runs.