Endogenous HIF2A reporter systems for high-throughput functional screening

Tissue-specific transcriptional programs control most biological phenotypes, including disease states such as cancer. However, the molecular details underlying transcriptional specificity is largely unknown, hindering the development of therapeutic approaches. Here, we describe novel experimental reporter systems that allow interrogation of the endogenous expression of HIF2A, a critical driver of renal oncogenesis. Using a focused CRISPR-Cas9 library targeting chromatin regulators, we provide evidence that these reporter systems are compatible with high-throughput screening. Our data also suggests redundancy in the control of cancer type-specific transcriptional traits. Reporter systems such as those described here could facilitate large-scale mechanistic dissection of transcriptional programmes underlying cancer phenotypes, thus paving the way for novel therapeutic approaches.


Functional validation of endogenous HIF2A reporter systems. To functionally validate the
H2AmC1-3 reporter systems we evaluated mCherry expression alongside HIF2A mRNA and protein expression. We used lentivirally delivered CRISPR interference (CRISPRi) 20 , i.e. sgRNA-mediated recruitment of a repressive dCas9-KRAB protein to a specific genomic locus, to inhibit endogenous HIF2A mRNA expression using three sgRNA constructs (iHIF2A-1, iHIF2A-2, iHIF2A-3) targeting the EPAS1 transcription start site (Fig. 2a). In comparison to the non-targeting sgRNA control, iHIF2A-1 and iHIF2A-2 caused a significant reduction in HIF2A mRNA level while no reduction was observed in iHIF2A-3 transduced cells (Fig. 2b). These effects translated into similar changes at the protein level (Fig. 2c). The reporter lines expressed a strong HIF2A protein band of the expected size, and the HIF2A antibody did not detect any larger bands of ~150 kDa, the expected size of a HIF2A-mCherry fusion protein, indicating that the T2A peptide was efficiently cleaved in our systems ( Supplementary Fig. 2a). Importantly, flow cytometry on the H2AmC2 cells showed that mCherry fluorescence in the non-targeting sgRNA control cells stayed close to the level seen in untransduced cells and cells transduced with iHIF2A-3, while iHIF2A-1 and iHIF2A-2 transduced cells showed a clear reduction in mCherry fluorescence, closely mimicking the parental UOK101 cells (Fig. 2d). These results were replicated in the H2AmC3 cells (Fig. 2e). HIF2A targeting in the H2AmC1 cells resulted in only modest changes in mCherry fluorescence, suggesting the possibility that in this clone mCherry was expressed at least partially independently of HIF2A (data not shown). H2AmC1 was therefore excluded from further analysis. To further test the interdependence of HIF2A mRNA, protein and mCherry fluorescence in our systems we transduced both H2AmC2 and H2AmC3 cells with wildtype VHL. As expected, this resulted in a robust reduction in HIF2A protein expression (Fig. 2f). However, no change was observed in HIF2A mRNA expression or mCherry fluorescence ( Fig. 2g and Supplementary Fig. 2b,c). These results confirmed that mCherry activity in the H2AmC2 and H2AmC3 systems reflected HIF2A mRNA expression, and they also suggested that HIF2A does not significantly regulate its own expression.

FACS-based evaluation of reporter activity at the single cell level.
To test the possibility of using the H2AmC2 and H2AmC3 cells in pooled CRISPR-Cas9-based screening, we transduced them with the HIF2A-targeting sgRNAs under conditions of low multiplicity of infection (MOI) and evaluated the effects on mCherry fluorescence. For these experiments, the sgRNAs were cloned into vectors that also expressed EGFP, which allowed the evaluation of the relationship between the presence of HIF2A-targeting constructs and mCherry activity at the single cell level in a mixed population. Using a population in which ~10% of the cells were EGFP positive (Fig. 3a), leading to an estimated MOI of 0.11 with ~95% of the EGFP positive cells being infected by a single virus, we evaluated the percentage of EGFP positive cells within the population of lowest 10% of mCherry fluorescence (Fig. 3b). As expected, the non-targeting sgRNA or iHIF2A-3 demonstrated no enrichment of EGFP positive cells within the mCherry low population, however, there was a marked increase in EGFP positive cells within the mCherry low fraction in the population that contained 10% of iHIF2A-1 or iHIF2A-2 ( Fig. 3c,d). Next, we evaluated the effects of different mCherry population cut-offs on the level of enrichment for EGFP positive cells. This confirmed that decreasing levels of mCherry fluorescence selected for an increasing fraction of EGFP positive cells in iHIF2A-1 and iHIF2A-2 transduced cells whereas no changes were observed in cells transduced with the non-targeting sgRNA or iHIF2A-3 ( Fig. 3e and Supplementary Fig. 3a). However, even the lowest 1% of mCherry expressing cells had a significant fraction of EGFP negative cells. Finally, we wanted to test the reporter systems in conjunction with CRISPR-Cas9-based mutational gene targeting. In order to achieve uniform Cas9 expression, we first isolated single cell-derived H2AmC2 and H2AmC3 subclones transduced with a constitutively expressed Cas9-EGFP (Fig. 3f). We then tested the effects of single HIF2A-targeting sgRNAs on mCherry fluorescence at low MOI. As expected, a combined analysis of both control and HIF2A-targeted cells revealed an enrichment of HIF2A-targeted cells in the population of lowest 10% of mCherry fluorescence ( Fig. 3g and Supplementary Fig. 3b). In sum, these experiments gave proof-of-principle evidence that a pooled CRISPR-Cas9 genetic screen could be coupled to FACS-based isolation of low mCherry cells in order to identify genes that support HIF2A expression in ccRCC.
A pooled CRISPR-Cas9 screen for chromatin regulators supporting HIF2A expression. Chromatin factors as supporters of cancer-specific transcriptional programmes are emerging as potential therapeutic target molecules 21 . In order to interrogate the chromatin factor dependencies of HIF2A expression in ccRCC, we generated a focused library of 7617 sgRNAs targeting a total of 836 known and potential chromatin regulators as well as HIF2A as a positive control (Supplementary Tables 1 and 2). The library contained on average nine sgRNAs per gene and 100 non-targeting negative controls. A total of 120 million H2AmC2 and H2AmC3 cells were transduced with the chromatin library at a low MOI, resulting in ~12 million infected cells and thus ensuring >1000X representation of sgRNAs. We then used FACS to isolate the population with the lowest mCherry expression and assessed the representation of sgRNAs by high-throughput sequencing (Fig. 4a). Unsorted cells propagated for an equal time were used as controls. We observed a good correlation between the experimental systems ( Fig. 4b and Supplementary Table 2). Only <1.5% of sgRNAs had <5 counts/million reads in the initial population and excluding such sgRNAs retained 100% of the target genes ( Fig. 4c) with most genes retaining all sgRNAs (Fig. 4d). The excluded sgRNAs were either lost during library preparation due to technical reasons, or they targeted genes that affected cell viability, such as SF3B3 and PCNA, genes included as control genes in the library. A formal analysis comparing the unsorted control samples to the plasmid that was used for virus production confirmed that known essential genes 22 were highly represented among the most significantly depleted genes ( Supplementary Fig. 4). Looking specifically at HIF2A as a positive control, we found that 8/9 sgRNAs showed evidence of enrichment in the mCherry low population with good correlation between the two reporter systems (Fig. 4e), one sgRNA was excluded due to low representation. In a global analysis, HIF2A sgRNAs stood out as being the most enriched, with 6/10 top scoring sgRNAs targeting HIF2A (Fig. 4f). In a gene level analysis combining data from all sgRNAs for each gene, HIF2A was by far the most strongly enriched gene in the mCherry low population (Fig. 4g). To test the statistical significance of the result we used a permutation-based approach to calculate empirical P-values for the gene level enrichment scores. Comparing the observed data to an expected background clearly highlighted HIF2A as the most significant hit with an adjusted P-value of <0.001 (nominal P-value < 1.195e-06) (Fig. 4h,i). However, no other factor reached statistical significance, a result confirmed by manual inspection of the top-scoring sgRNAs: no other gene contained more than one sgRNA within the top 50 of the list whereas non-targeting control sgRNAs appeared twice. In sum, the only significant hit from our screen was the positive control HIF2A, confirming the technical validity of our approach, but we found no evidence for individual chromatin factors that would be required for the strong HIF2A expression in ccRCC.

Discussion
We report the development of endogenous reporter systems for HIF2A expression in ccRCC cells and their application in high-throughput functional CRISPR-Cas9-based loss-of-function screening. HIF2A is a critical mediator of ccRCC development and a validated therapeutic target [23][24][25] . Additionally, varying sensitivity of ccRCC cells to HIF2A inhibition has been attributed to different levels of HIF2A expression as well as acquired mutations in the HIF2A complex 23,25 . Yet, the mechanisms that regulate the high and specific HIF2A expression in ccRCC are incompletely understood. Given the recent evidence that the expression of oncogenic drivers, such as MYC, can be dependent on single chromatin modifiers 3 , we explored the chromatin factor dependencies of HIF2A expression using our systems. Despite strong signal for HIF2A, the positive control in the screen, we found little evidence for the existence of individual chromatin factors that would be essential for HIF2A expression in ccRCC. This suggests redundancy within the HIF2A gene regulatory machinery, possibly complicating efforts that aim at targeting HIF2A expression in the therapeutic setting 26 .
CRISPR-Cas9-based genome editing has opened up unprecedented possibilities for biological research. The general efficiency of CRISPR-Cas9 gene editing across the many possible biological contexts remains to be established, however. For example, the degree to which an intact homology-directed repair (HDR) pathway is required for HDR-targeted gene editing in cancer cell lines remains unclear. ccRCCs often carry mutations in the histone methyltransferase SETD2, which promotes homologous repair [27][28][29] . This suggests that defects in some DNA repair pathways may be a requirement for ccRCC development, possibly hindering the efficiency of HDR-mediated gene editing in ccRCC cells. Our results show that at least some ccRCC cell lines are amenable for HDR-based gene editing, but the efficiency appears to be lower than what has been reported in some other systems 14 . Several ccRCC cell lines tested also failed to integrate the fluorescent reporter (data not shown). Alternative strategies that depend on smaller template plasmids and NHEJ-based DNA repair could make gene editing-based development of experimental ccRCC systems more efficient 30 .
While the strong signal for HIF2A in our genetic screen gives technical validation to both the reporter systems and the screening approach, it remains possible that the systems are not sensitive enough to identify factors that only subtly affect HIF2A expression. It is also possible that some chromatin factors were not efficiently targeted due to technical reasons. An alternative interpretation suggests, however, that simultaneous targeting of multiple chromatin factors may be required for efficient inhibition of HIF2A expression. Combinatorial CRISPR-Cas9 screens could thus represent an interesting avenue forward. Targeting alternative gene sets or performing genome-wide screens could also give new insight into HIF2A regulation. The endogenous transcriptional reporters developed herein could also be suitable for unbiased functional analysis of gene regulatory element function 31,32 .
In conclusion, we have developed and validated experimental systems for the identification of factors that support HIF2A expression in ccRCC cells. We demonstrate that these systems are compatible with pooled genetic screens. Combined with microscopy they may also be compatible with small molecule screens in an arrayed format. More generally, similar systems could be useful for the identification of transcriptional dependencies in other cancer contexts as well.

Methods
Cells and Reagents. UOK101 cell were obtained from the National Cancer Institute (NCI). The identity of the UOK101 cells has been confirmed by Sanger sequencing-based detection of a previously identified homozygous VHL mutation 19,33 in June 2018, and STR analysis confirmed that the UOK101 cells had not been cross-contaminated with other commonly used cell lines. The cancer cell lines have been confirmed negative for mycoplasma by biannual tests using the MycoAlert ™ Mycoplasma Detection Kit (Lonza, LT07-318). UOK101 cells were cultured in RPMI-1640. HEK293T cells, used for lentivirus production, were cultured in DMEM. Both media were supplemented with 10% fetal bovine serum (FBS), penicillin (100 U ml −1 ) and streptomycin (μg ml −1 ). Cell lines were used in experiments within 10 passages from thawing.
Quantitative RT-PCR. Total RNA from cells was purified using RNAzol RT (Sigma) according to the manufacturer's protocol. cDNA was synthesized using the High-Capacity cDNA Reverse Transcription Kit (Thermo). qRT-PCR was performed using the StepOnePlus instrument (Thermo) with pre-designed TaqMan gene expression assays (Thermo): EPAS1 (Hs01026149_m1), CXCR4 (Hs00607978_s1), VHL (Hs03046964_s1) and TBP (Hs00427620_m1). Signal was quantified using the double delta Ct method and normalized to TBP as the housekeeping control.
Western Blotting. Whole-cell extract was used for Western blotting by lysing cells in RIPA buffer Fluorescence-Activated Cell Sorting (FACS). Cells were analysed for fluorescence on an LSR Fortessa (BD Biosciences). Single cells were detected on the basis of FSC-A, FSC-H and SSC-A. mCherry (561 nm/610 nm), EGFP (488 nm/509 nm), and BFP (383 nm/445 nm) fluorescence was measured. Cell sorting was performed on an Influx cell sorter (BD Biosciences) using the same settings as described above. Single cells were sorted onto individual wells of a 96-well plate containing cancer cell-conditioned RPMI-1640 supplemented with 10% fetal bovine serum (FBS), penicillin (100 U ml −1 ) and streptomycin (100 μg ml −1 ). FlowJo software was used to analyse data obtained.
Immunofluorescence. Cells grown on coverslips in 6-well plates were washed with PBS and incubated with 4% PFA in PBS for 10 min at room temperature (RT). Cells were permeabilized with 0.5% Triton X-100 (Sigma). A drop of ProLong Diamond Antifade Mountant with DAPI (Life Technologies) was placed on microscope slides and coverslips with cells were mounted on them. After 5 min incubation at RT, imaging was done on a Leica TCS SP5 confocal microscope.
Chromatin Factor sgRNA Library Production. Genes encoding known or putative chromatin factors were identified from previously curated lists [38][39][40][41] (Supplementary Table 1). The library also contained additional control genes, such as known essential genes, as well as 100 non-targeting control sgRNAs 22 . The sequences for the sgRNAs were taken from Wang et al. 22 with nine sequences selected for each gene. Oligos were ordered from Custom Array Inc. Oligos were amplified and cloned into pKLV2-U6gRNA5(BbsI)-PGKpuro2ABFP-W (Addgene #67974) 2 by Gibson assembly 42 .
Pooled CRISPR-Cas9 Screening. The lentiviral sgRNA library was produced using HEK293T cells as described above. H2AmC2 and H2AmC3 cells were first transduced with the lentiviral library at varying concentrations to determine the amount needed for a MOI of ~0.11. A total of twelve 15 cm dishes were seeded at 5 million cells each. Twenty-four hours later, i.e. after one doubling time, ~120 million cells were transduced with the lentiviral library at a low MOI, leading to ~10% positive cells and ensuring a >1000x representation of the library. Virus was removed the following day. Puromycin treatment, starting 2 days after transduction, was then applied for a total of 4 days. Three weeks post-transduction, the lowest 10% of the mCherry population from both H2AmC2 and H2AmC3 cells were isolated by FACS and the cells were pelleted along with their respective unsorted controls. DNA was extracted and amplified for the region containing the sgRNAs. Samples were then purified using an Agencourt AMPure XP beads purification protocol. Purified samples were pooled and quantified with Qubit before sending for Illumina sequencing an a HiSeq4000 instrument.
Sequencing results were aligned using Bowtie and normalised to one million reads. Normalised counts of less than five were removed from the analysis. The fold change enrichment was calculated for each sgRNA in both H2AmC2 and H2AmC3 by calculating the fold change of the sorted samples compared to their respective unsorted control sample. Next, the median of all the remaining sgRNAs per gene in each system was calculated. This data was normalised to obtain a z-score for each gene. The average of the normalised median enrichment for each gene was then calculated between the two systems. For each gene, empirical P-values were calculated using a resampling-based method with 1000 permutations and multiple testing correction using FDR. Statistical Analysis. Statistical analyses were conducted in R and Microsoft Excel. P-values lower than 0.05 were considered statistically significant. For qRT-PCR three independent experiments are shown, each of which is the average of three technical replicates, unless stated otherwise in the figure legends.