Gene expression imputation identifies candidate genes and susceptibility loci associated with cutaneous squamous cell carcinoma

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate < 10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.


Supplementary Figure 1
Manhattan plot of individual SNP cSCC associations at the 1q21 locus. Significance levels from the Kaiser cSCC GWAS 1 are plotted for all SNPs in the region (gray circles) and for the subset of SNPs (red diamonds) with nonzero coefficients in one or more of the prediXcan expression models for the associated genes at this locus. The locations of the associated genes (red text) are also shown, with arrows indicating the transcribed strand and ticks indicating exons. Plotting was done using LocusZoom 2 .

Supplementary Figure 6
Manhattan plot of individual SNP cSCC associations at the 6p21 locus. Significance levels from the Kaiser cSCC GWAS 1 are plotted for all SNPs in the region (gray circles) and for the subset of SNPs (red diamonds) with nonzero coefficients in one or more of the prediXcan expression models for the associated genes at this locus. The locations of the associated genes (red text) and other candidate genes from the previous GWAS (black text) are also shown, with arrows indicating the transcribed strand and ticks indicating exons. Plotting was done using LocusZoom 2 .
Manhattan plot of individual SNP cSCC associations at the 15q13 locus. Significance levels from the Kaiser cSCC GWAS 1 for all SNPs in the region (gray circles) and the subset of SNPs (red diamonds) with nonzero coefficients in the prediXcan expression model for the associated gene (red text), plotted as in Supplementary Fig. 6.
Manhattan plot of individual SNP cSCC associations at the 16q24 locus. Significance levels from the Kaiser cSCC GWAS 1 for all SNPs in the region (gray circles) and the subset of SNPs (red diamonds) with nonzero coefficients in one or more of the prediXcan expression models for the associated genes (red text), plotted as in Supplementary Fig. 6.
Manhattan plot of individual SNP cSCC associations at the 20q11 locus. Significance levels from the Kaiser cSCC GWAS 1 for all SNPs in the region (gray circles) and the subset of SNPs (red diamonds) with nonzero coefficients in the prediXcan expression model for the associated gene (red text), plotted as in Supplementary Fig. 6. Table 1 Sun-exposed skin associations with FDR < 10% in the Kaiser GERA cohort.

Supplementary
All genes in sun-exposed skin whose imputed expression levels were associated with cSCC at FDR < 10% in the Kaiser GERA cohort in the discovery phase, ordered by significance in the Kaiser GERA cohort. Their validation results in the 23andMe dataset are also shown. P-value, from Wald test; Beta SE, standard error in effect size (beta).  Supplementary Table 2 Non-sun-exposed skin associations with FDR < 10% in the Kaiser GERA cohort.

Sun-exposed skin
All genes in non-sun-exposed skin whose imputed expression levels were associated with cSCC at FDR < 10% in the Kaiser GERA cohort in the discovery phase, ordered by significance in the Kaiser GERA cohort. Their validation results in the 23andMe dataset are also shown.
P-value, from Wald test; Beta SE, standard error in effect size (beta).
Non-sun-exposed skin  Supplementary Table 3 LCL associations with FDR < 10% in the Kaiser GERA cohort.
All genes in LCLs whose imputed expression levels were associated with cSCC at FDR < 10% in the Kaiser GERA cohort in the discovery phase, ordered by significance in the Kaiser GERA cohort. Their validation results in the 23andMe dataset are also shown.  Supplementary Table 4 Whole blood associations with FDR < 10% in the Kaiser GERA cohort.
All genes in whole blood whose imputed expression levels were associated with cSCC at FDR < 10% in the Kaiser GERA cohort in the discovery phase, ordered by significance in the Kaiser GERA cohort. Their validation results in the 23andMe dataset are also shown. Supplementary Table 6 Adjusted logistic regressions in the 1q21 locus.
Logistic regressions of cSCC case/control status against one or more imputed gene expression levels (as shown) performed in the Kaiser GERA cohort. † Additional covariates included sex, age, and ten ancestry principal components (not shown), as described in Methods.
SE skin, sun-exposed skin. Results of a TWAS performed with a Bonferroni-corrected significance threshold in the Kaiser GERA cohort, for comparison with the previous Kaiser GWAS 1 and with the discoveryvalidation approach used in the main text.
R 2 , squared correlation coefficient for the prediXcan imputation model; P-value, from Wald test; Beta SE, standard error in effect size (beta); exp, exposed. Abbreviations: Skin (non-sun-exp) = non-sun-exposed skin Skin (sun-exp) = sun-exposed skin Beta SE = standard error in effect size (beta) Kaiser GERA analysis