Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma

Genome-wide association studies (GWAS) have identified ~20 melanoma susceptibility loci, most of which are not functionally characterized. Here we report an approach integrating massively-parallel reporter assays (MPRA) with cell-type-specific epigenome and expression quantitative trait loci (eQTL) to identify susceptibility genes/variants from multiple GWAS loci. From 832 high-LD variants, we identify 39 candidate functional variants from 14 loci displaying allelic transcriptional activity, a subset of which corroborates four colocalizing melanocyte cis-eQTL genes. Among these, we further characterize the locus encompassing the HIV-1 restriction gene, MX2 (Chr21q22.3), and validate a functional intronic variant, rs398206. rs398206 mediates the binding of the transcription factor, YY1, to increase MX2 levels, consistent with the cis-eQTL of MX2 in primary human melanocytes. Melanocyte-specific expression of human MX2 in a zebrafish model demonstrates accelerated melanoma formation in a BRAFV600E background. Our integrative approach streamlines GWAS follow-up studies and highlights a pleiotropic function of MX2 in melanoma susceptibility.


Supplementary Note
In the Results section describing sQTL, we used LeafCutter 1 to assess splice junctionlevel QTL focusing on alternative intron exclusion and exon joining within a shared cluster as opposed to estimated isoform-levels. sQTL initially indicated that rs398206 was associated with an alternative intron excision of MX2 in primary melanocytes in the opposite direction of MX2 eQTL (P = 7.79e-7, slope = -0.53; Supplementary Fig 10A-B). To seek additional support of this observation in other tissue types, we performed sQTL analysis in 44 GTEx tissue types. In 21 tissue types including blood, testis, ovary, and fibroblasts, the same pattern of significant sQTLs were observed for rs398206 or correlated SNPs (lowest D' = 0.87, EUR), where the protective, C allele favors an alternative junction usage producing an alternative MX2 transcript (ENST00000418103). Moreover, in 10 tissue types, this sQTL was reciprocated by risk, A allele favoring the junction usage producing the full-length transcript (ENST00000330714), raising a potential hypothesis of alternative promoter usage of MX2 in these tissue types. Thorough inspection of raw data in melanocytes, however, indicated that the finding was driven by the junction reads of low coverage (i.e. < 4% of the samples showed three or more reads mapped to the junction spanning Chr21:42742322:42748763) ( Supplementary Fig 10C-F). Since this junction in melanocytes was not mapped to the reference genome (Ensembl75), nor was it detected in PacBio long-read sequencing (data not shown), we performed isoform-specific qPCR of two other MX2 alternative transcript isoforms (ENST00000543692 and ENST00000418103), which are predicted to use similar junctions. The results for both isoforms displayed similar expression patterns to that of the full-length isoform (ENST00000330714) relative to rs398206 genotypes, suggesting that sQTL finding was false-positive ( Supplementary Fig 10C-F). Together these data suggest the main effect of MX2 eQTL in melanocytes was not driven by alternative isoforms or splicing events.
In the Results section describing immune infiltrates, to explore the possibility that MX2 plays its roles mainly through immune response during melanomagenesis, we asked if MX2 levels are correlated with immune cell infiltration in TCGA melanoma samples. Using cell type deconvolution programs, TIMER 2 and CIBERSORT 3 , we observed weak correlations between MX2 levels and infiltration of CD4+ T cells, neutrophils and dendritic cells among 6 cell type models (TIMER; purity-corrected partial Pearson correlation r = 0.221, 0.279, 0.273, and P = 2.36e-6, 1.68e-9, 4.31e-9, respectively; Supplementary Fig 17A). When we examined correlations with proportions of 22 types of immune cells established by CIBERSORT, we did not observe a significant correlation with MX2 levels (data not shown). Instead, weak correlations between melanoma-associated rs398206 A allele count and fractions of Macrophage M1 and M2 were observed (Pearson correlation r = 0.204 and 0.211, and P = 0.01 and 0.013, respectively; n = 147 samples with deconvolution P < 0.05; Supplementary Fig  17B). Together these data indicate that MX2 expression levels in melanomas (or estimated effect of melanoma cell population in tumors for the case of purity-corrected correlation) are slightly correlated with infiltration of a subset of immune cell types. It is possible that MX2 has roles in melanoma formation and progression through pathways involved in tumor immunity, but these data alone do not provide sufficient evidence.
In the Results section describing MX2 overexpression in melanocytes, to rule out the possibility that we missed immediate changes in melanocyte transcriptome upon MX2 overexpression, we performed RNA-seq of MX2 over-expressed primary melanocytes at an earlier time point. For this analysis, we picked one of the three cell lines originally tested for RNA-seq (C23, the line that displayed growth effect by 100ng/ml doxycycline treatment over 72hr period; Fig 6B). We first tested different durations of doxycycline induction (3,6,24, and 72hr) on this melanocyte line and selected 6hr as a time point for RNA-seq based on level of MX2 induction at the protein level (Supplementary Figure 18D). We then compared transcriptomes of melanocytes with or without 100ng/ml doxycycline treatment at the 6hr time point and additionally assessed the 72hr time point as a control set in three biological replicates for RNA-seq analysis. We did not include empty virus control at this time based on no apparent effect of doxycycline or virus transduction in enriched pathways (Supplementary Table 16).
Differentially expressed genes (DEG) in MX2-overexpressed melanocytes at both 6hr and 72hr were much fewer at FDR > 0.1 cutoff (compared to 158 DEG at 72hr in the previous results). Given that we used only one cell line of a moderate level of MX2 induction for this analysis (6.2-fold at 6hr and 1.6-fold at 72hr at RNA levels; Supplementary Data 5-6; Supplementary Figure 18E), we relaxed significance cutoff to select genes for pathways analyses. At nominal P < 0.05, 258 DEG were identified from 6hr induction (Supplementary Data 5) and 485 DEG from 72hr induction (Supplementary Data 6). Among the original 158 DEG, 132 genes (82%) displayed the same direction of fold change in the 72hr dataset from the second experiment, demonstrating consistency between two sets of experiments even with relatively lower detection levels in the second set. IPA analysis using 258 DEGs from 6hr induction displayed enrichment of pathways including those involved in intracellular second messenger signaling, cancer, cell growth, neurotransmitters, cardiovascular signaling, and cellular immune response (Supplementary Table 13). A similar profile of pathways was commonly enriched in both 6hr and 72hr time points (Supplementary Table 14). Together with our initial finding, these results do not conclusively suggest a role of single dominant pathway but rather multiple alternative hypotheses not excluding immune response-related mechanisms. The current results of earlier time point (6hr) detected much lower differential expression events compared to those from later time point (72hr) even with higher MX2 induction at mRNA and protein levels. These results suggest that short-term transcriptomic changes by MX2 induction in the melanocytes at 6hr are not greatly different from those reflected in the transcriptome at 72hr.

Supplementary
Supplementary Figure 7 MPRA results from all 16 melanoma loci (A) Melanoma GWAS variants were plotted for their inverted regression P-values of allelic transcriptional difference in UACC903 melanoma cells. Twosided Wald test with robust sandwich type variance estimate was used. Multiple comparisons were adjusted using Benjamini & Hochberg method. Allelic effect size is shown on the X-axis as log-transformed allelic fold difference using the ratio of RNA TPM over DNA TPM (TPM ratio). Putative function of 39 significant MPRA variants are shown as activator, repressor, or both (expression levels of either allele is higher, lower, or higher and lower than those of scrambled sequence). Chromosome band and SNP ID are shown for the variant displaying the lowest FDR from each locus. (B) Transcriptional activity was shown as log-transformed RNA TPM over DNA TPM (TPM ratio) for 39 significant variants (FDR < 0.01), non-significant (the rest of tested variants), and negative controls (8 variants) using data from UACC903 cells. Two-sided Mann-Whitney U test was performed. P = 0 ("Significant variants" vs "Non-significant variants") and P = 3.34e-40 ("Significant variants", "Negative control variants").

SFig9
Supplementary Figure 9 (A) Allelic transcriptional activity fold increase (relative to the allele with lower activity) from MPRA was plotted against allelic fold increase of transcription factor binding motif score (relative to the allele with lower score) for 39 MPRA-significant variants (FDR < 0.01) and non-significant variants. Fold changes in both axes are log2-transformed. The scores for the most significant TF for each variant was used (TF motif prediction P < 0.001). Pearson correlation r scores for each group are shown. P = 0.149 for significant variants, and P = 0.556 for non-significant variants. (B) Posterior probabilities (left Y-axis in blue) of MPRA variants are plotted for subsets of variants with increasing FDR cutoffs. Center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; n = 432 (MPRA-tested variants with probability scores available). Red dashed line indicates the median probability score when including up to FDR = 100%. Enrichment ratios of median posterior probability score from each subset over that of FDR = 100% group are shown as dots and a trend line on the right Y-axis in red. (C) A flowchart of variant prioritization using melanocyte eQTL. (D) Individual luciferase activity assays of 145bp sequences encompassing rs398206 is shown for UACC502. pGL4.23 construct including minimal TATA promoter was used. One representative set is shown from three biological replicates. Mean with SEM, n = 6. All constructs are significantly higher than pGL4.23 (TATA) control (P < 0.0001). Two-tailed, unpaired t-test assuming unequal variance.     Western blotting using anti-YY1 and anti-GAPDH antibodies and cell lysates of UACC903 transfected with four different siRNAs targeting YY1. Proteins were isolated at 72hrs following transfection. One representative set from three biological replicates is shown. Non-targeting: non-targeting siRNA, siGAPDH: positive control siRNA targeting GAPDH. (B) CRISPRi using dCAS9-KRAB-MeCP2 and four gRNAs targeting the area immediately surrounding rs398206. Levels of MX1 transcript (GAPDH-normalized) are shown as fold change over those from nontargeting gRNA. Three biological replicates of n = 6 were combined (total n=18, except gRNA-3, n=17). gRNAs 1, 3, and 4 directly overlap rs398206, while gRNA 2 targets ~25bp upstream of rs398206. (C-D) CRISPRi using dCAS9 or dCAS9-KRAB-MeCP2 and gRNAs 1 and 3. Levels of MX2 (C) or MX1 (D) transcript (GAPDH-normalized) are shown as fold change over those from empty gRNA vector, pRC0215, within each set using the same type of dCAS9 constructs. Three biological replicates of n = 5 were combined (total n = 15). A same set of experiments using non-targeting gRNA as a control showed similar results (data not shown). AAVS1 (gRNA targeting adeno-associated virus integration site on Chr19). Box: Median and 25 th to 75 th percentile. Whisker: 10 th to 90 th percentile. P -values are shown from one-sample Wilcoxson test (two-sided) for difference from non-targeting siRNA/gRNA (P < 0.05 are shown in red). Dotted line denotes the MX2 levels in non-targeting siRNA/gRNA control. (E) Western blotting was performed using anti-CAS9 (recognizing both ~160 kDa dCAS9 and ~200 kDa dCAS9-KRAB-MeCP2) and anti-GAPDH antibodies, and cell lysates of UACC903 co-transfected with dCAS9 or dCAS9-KRAB-MeCP2 and indicated gRNAs. Relative positions of protein ladders are shown on the left side for the anti-CAS9 blot. Proteins were collected from one representative set of three sets of total transfections.