Kidney cancer is the 12th most common malignancy in the world with estimated 337,860 new cases and 143,406 deaths in 2012 [1]. Renal cell carcinoma (RCC) accounts for ~90% of all kidney cancers [2]. The incidence differs significantly by sex, with two-fold higher rates for men than women. The 2:1 sex ratio has been consistent over time, across different age groups, geographical locations and ethnic backgrounds; and, hence, the male excess cannot be explained by differences in environmental or lifestyle exposures and hormonal factors alone [3, 4]. Although there is recent evidence of sexual dimorphism at the genomic level, sex chromosome differences have gained most attention [5]. The first comprehensive sex-specific somatic alteration analysis of 13 cancer types from The Cancer Genome Atlas (TCGA) revealed extensive sex differences in autosomal gene expression and methylation signatures of kidney cancer, although it did not consider germline variation between sexes [6]. A genetic contribution to RCC susceptibility is well documented. Besides the rare inherited germline variants implicated in some familial RCCs, e.g., VHL (von Hippel-Lindau disease), MET (hereditary papillary renal cancer), FLCN (Birt-Hogg-Dubé syndrome) and FH (hereditary leiomyomatosis and renal cell cancer) genes [7], large genome-wide association studies (GWAS) have identified 13 autosomal RCC susceptibility loci implicating several candidate genes (supplementary table 1) [8,9,10,11,12,13]. A role for sex in modifying genetic susceptibility to RCC is possible, but, unlike many other sexually dimorphic diseases and traits [14,15,16], no genome-wide, systematic effort to study possible sex specific genetic contributions to kidney cancer risk has been undertaken.

We conducted a sex-specific genome wide association analysis of kidney GWAS datasets consisting of 13,230 individuals (8193 men, 5087 women) using approximately 6 million genotyped and imputed SNPs in sex-stratified and sex interaction models and replicated the top findings using another 8113 men and 2974 women. To explore the possibility of sex-specific gene regulation of the top genotypic variants, we performed an expression quantitative trait loci (eQTL) analysis using paired genotyping and gene expression data from normal and kidney tumour tissues of a subset of the genetic discovery cohort.


Genetic association analysis


The International Agency for Research on Cancer (IARC) kidney cancer GWAS have been previously described [12]. The dataset consisted of two IARC-Centre National de Genotypage (CNG) scans using 11 studies recruited from 18 countries and included a total of 5219 RCC cases (1992 women, 3227 men) and 8011 controls (3095 women, 4916 men) of European descent, the first being genotyped using HumanHap 317k, 550 or 610Q, and the second using Omni5 and OmniExpress arrays. Quality control (QC) assessments applied to the data have been previously described [8, 12]. Briefly, we used the following quality control measures at individual levels as exclusion criteria, genotype success rate of < 95%, discordant sex, duplication or relatedness based on IBD score > 0.185 and samples with < 80% European ancestry. SNP exclusion criteria included call rate < 90%, departure from Hardy Weinberg equilibrium in controls at P < 10−7, and MAF < 0.05. Imputation of genotypes was done by minimac version 3 using 1094 subjects from the 1000 Genomes Project (phase 1 release 3) as the reference panel and ~6 million SNPs were retained for the final analysis after post imputational QC steps (r2 > 0.3). Genome Reference Consortium Human Build 37 (GRCh37/hg19) was used to map variants. Population stratification analysis (implemented in EIGENSTRAT using EIGENSOFT software version 5.0.2) [17] on the pooled dataset identified 19 significant (P < 0.05) eigenvectors, showing significant association with the country of recruitment. Informed consent from the study participants and approval from the IARC Institutional Review Board (IARC Ethics Committee) was obtained.

SNP selection

Sexually dimorphic SNPs could have (i) a concordant effect direction (CED), if the association is present (i.e., significant after multiple testing correction) for one sex and nominally significant and directionally concordant for the other, (ii) single sex effect (SSE), if the association is present for one sex only, or (iii) opposite effect direction (OED), if the association is present for one sex, at least nominally significant and in opposite direction for the other sex [16]. Previous studies on sex-specific genetic associations indicated that sex-specific scans had a higher probability to select SNPs with CED or SSE signal, while sex-interaction scans had a higher probability to select SNPs with OED [16]. Therefore, in the discovery phase, we conducted both sex stratified and sex interaction scans. For the sex-stratified analysis, a log-additive model using unconditional logistic regression adjusted for age, study and the significant eigenvectors were used to identify associations. For the sex interaction analysis, a regression model including the main effects of the genotypes, sex, covariates and an interaction term for genotypes and sex was used to detect association. We applied a false-discovery-rate (FDR) approach separately for male and female datasets to account for multiple testing and the difference in sample size. This allows the stratified study design of the discovery stage to be less stringent in identifying hits, while keeping the stringency of conventional Bonferroni cutoff in the combined (discovery + replication) stage for the final interpretation of results. FDR q-value cut offs of 5 and 30% were used to detect significant and suggestive SNPs respectively in each of the datasets. Accordingly, p-value threshold of 1 × 10−6 and 4 × 10−6 was considered to be significant (5% FDR) and p-value threshold of 1.1 × 10−5 and 5 × 10−5 was considered suggestive (30% FDR) for female and male datasets respectively. In addition to the significant and suggestive sex-specific p-values, a nominally significant (P < 0.05) sex interaction p-value was taken into account in order to identify SNPs showing sex difference. The same FDR cut-offs were used to detect significant and suggestive signals in interaction tests (Supplementary figure S1). All association analyses were conducted using R statistical software version 3.3 implemented in high performance computing cluster. In addition, a clear LD cluster (atleast one correlated SNP with r2 > 0.5 within 1 Mb window) for the SNP was also considered as a criterion to avoid false positives. Among multiple SNPs in LD (r2 > 0.8, with LD-window of 1 Mb) showing an association, we choose the one with the lowest missing rate and p-value. All regional LD plots were generated in LocusZoom using genome build hg19 and 1000 Genomes EUR as LD population [18]. To focus on common SNPs and to avoid spurious association, as a QC step we removed the SNPs having MAF < 0.05 and without LD cluster (supplementary figure S2),

In-silico replication and joint meta-analysis

In-silico replication of the top hits from the discovery phase was conducted using 3660 cases (1399 women, 2261 men) and 7427 controls (1575 women, 5852 men) from two previously published National Cancer Institute (NCI, Bethesda, Maryland, USA) and one MD Anderson Cancer Center (MDA, Texas, USA) RCC GWAS scans genotyped using OmniExpress, Omni2.5, HumanHap 550, 610 and 660 W beadchip arrays. Quality control and genotype imputation was done as described previously [8, 9, 12]. For each study, sex-stratified and sex-interaction models for all significant and suggestive SNPs were tested assuming a log-additive model of genetic effects using unconditional logistic regression with adjustment for age, study centre, and significant eigenvectors. The odds ratios and 95% confidence intervals per SNP from each study were meta-analysed using fixed-effect models implemented in GWAMA [19], to get the combined estimates from the replication series. We also performed a joined meta-analysis of results from the discovery and replication series on 8061 women and 16,256 men to get the combined effect estimates of the tested SNPs. Heterogeneity in genetic effects across datasets was assessed using the I2 and Cochran’s Q statistics.

Expression QTL analysis of the selected SNPs

To identify gene regulatory effects of the 17 identified SNPs, we examined transcript expression near each of the SNPs in 101 tumour adjacent normal and 259 tumour kidney tissues in women and 178 tumour adjacent normal and 385 tumour kidney tissues in men. All of these kidney samples were part of the discovery GWAS study (112 from first IARC GWAS and 532 samples from second IARC GWAS) and the eQTL analysis was performed on matched gene expression and GWAS datasets. Expression analysis was conducted using Illumina HumanHT-12 v4 expression BeadChips (Illumina, Inc., San Diego), normalised using variance stabilising transformation (VST) and quantile normalisation. Out of the 17 transcripts, 12 transcripts in normal samples and 14 in tumours were expressed in <10% of the samples. Expression for MIR4472-1 was not available for both tumour and normal samples in our dataset. For the few transcripts showing sex-difference in expression in our dataset, we also downloaded raw counts of RNA-seq data from 60 normal and 459 tumours from TCGA kidney renal cell carcinoma (TCGA-KIRC) and used as a validation cohort. For eQTL analysis, additive linear models were used to test the association between each transcript and SNP with age, country, tumour stage and grade as covariates. All transcripts with expression in <10% of the samples were filtered out from eQTL analysis. All available transcripts mapping to each SNP were evaluated, and FDR adjusted p-value < 0.05 using Benjamini–Hochberg procedure was used as statistical significance threshold. All probes overlapping SNPs with European-ancestry having MAF > 0.01 were filtered out. Colocalization of GWAS and eQTL signals were analysed used eCAVIAR software [20].


In the discovery phase, sex-specific analysis identified an excess of SNPs with association p-values <0.05. However, only a few loci could reach the significant (5% FDR) or suggestive (30% FDR) association thresholds, among which only 4 loci in women and 7 in men attained Bonferroni genome-wide significance threshold (P < 5E-08) (Fig. 1). The association quantile-quantile plots indicated little inflation for both the datasets (λfemale = 1.02, λmale = 1.04; supplementary figure S3a, b). Following MAF and LD based QC, a total of 17 sex-specific SNPs (6 significant and 11 suggestive) were selected for follow-up. Among the 17 SNPs, 15 were single sex-specific signals (SSE) and the 2 other SNPs namely, rs4903064 and rs6554676 showing CED were strongly associated in women and nominally in men (Supplementary table 2). Among the 15 single sex-specific signals, 7/15 associations were male-specific, whereas, 8/15 SNPs were female-specific (Supplementary table 3). The strongest association was observed for rs4903064 in females (ORfemale = 1.47 [95% CI = 1.33–1.62], Pfemale = 9 × 10−14 compared with ORmale = 1.09 [95% CI = 1.01–1.19], Pmale = 0.02; Pinteraction = 1.7 × 10−5, Table 1) at 14q24.2 mapping to an intronic region of DPF3 (Fig. 2). Other significant SNPs in discovery series, rs2121266 at 2p21, rs12930199 at 16p13.3 and rs1548141 at 3q11.2 mapped to the intronic regions of EPAS1, RBFOX1 and OR5H6, respectively. Significant SNPs rs10484683 and rs78971134 mapped to intergenic regions at 7p22.3 and 6q24.3, with the nearest genes being BTBD11 and SAMD5, respectively. For rs78971134 (SAMD5) the minor allele frequencies were similar for male and female cases. Regional LD plots for each of the loci are detailed in Supplementary Figure S4 (a) and (b). In contrast, the sex-interaction scan did not identify any SNP even at 30% FDR, except for the very rare variant rs141939233 (NC_000003.11:g.94783768 C > G, MAF = 0.001, P = 9.83 × 10−8) which did not meet the inclusion criteria for SNPs (MAF > 0.05) and hence, no SNP could be carried forward (Supplementary figure 5a, b). Overall, all putative variants showed either CED or SSE and no SNP with an OED could be identified from the analysis.

Fig. 1
figure 1

Sex stratified genome-wide association scan in renal cell carcinoma: Manhattan plots of male and female specific association P-values from the discovery series

Table 1 Significant associations in the replication series and/or final joint meta-analysis
Fig. 2
figure 2

Regional plot of the most significant sex-specific loci: P-values and LD among SNPs at 14q24.2 mapping to the DPF3 gene in women and men

In the in-silico replication of the 17 selected SNPs, only rs4903064 (at DPF3) independently replicated with stronger and significant (p < 0.05) effect in women compared with men (ORfemale = 1.24 [95% CI = 1.07–1.42], Pfemale = 3 × 10−3 compared with ORmale = 1.09 [0.98–1.21], Pmale = 0.09). In addition rs147304092 (BBS9), rs13027293 (STEAP3), rs6554676 (SLC6A18) showed nominally significant association with RCC risk for either men or women in the follow-up series (Table 1).

In the joint meta-analysis of the discovery and replication series for the selected 17 SNPs, a total of 4 SNPs attained genome-wide significance (Table 1). In addition to the consistent findings for DPF3, we found a stronger association for males for EPAS1 but with significant study heterogeneity in the female dataset. Two additional SNPs that reached genome-wide significance in the joint meta-analysis were rs10484683 at SAMD5 and rs78971134 near BTBD11 showing an association with risk for men but not women (Table 1). The results of replication and final meta-analysis of all the 17 SNPs are listed in supplementary table 3.

We also examined sex-specific expression of genes corresponding to the selected SNPs using expression data in normal and tumour kidney tissues from a subset of the discovery cohort. Significant sex-difference in expression was detected for BTBD11 gene in normal tissues and also a higher expression of SAMD5 in tumour tissues of women (Supplementary table 4). We replicated the findings for sex difference in expression between men and women for SAMD5 in TCGA KIRC cohort and also observed significant differential expression between tumour and normal samples (Supplementary figure 6). We further tested the effect of the identified SNPs on expression of nearby genes by detecting cis expression quantitative trait loci (eQTL) in kidney tissues. No significant eQTL was identified for any of the 17 SNP-transcript pairs in normal tissues (supplementary table 5), but we identified rs4903064 as the lead cis-eQTL for DPF3 expression in tumours with highest colocation posterior probability with the GWAS signal (Supplementary figure S7). We further examined sex-specific cis-eQTLs and found a stronger association of rs4903064 on DPF3 for women compared with men (βwomen = 0.06, Pwomen = 2.69 × 10−6 vs βmen = 0.03, Pmen = 0.004, Psex_interaction = 0.03 Fig. 3). A borderline association was also observed for rs6554676 and SLC6A18 expression in male tumour tissues only (βmale = −0.21, Pmale = 0.05 vs βfemale = −0.01, Pfemale = 0.94).

Fig. 3
figure 3

cis-eQTL: boxplot displaying expression levels of DPF3 gene stratified by the risk SNP rs4903064 in women and male kidney tumour tissues


We conducted the first systematic sex-specific genome-wide association analysis of RCC and confirmed sexually dimorphic associations for two previously known risk SNPs on DPF3 and EPAS1 at 14q24 and 2p21, respectively. In a joint meta-analysis of top hits using 8,061 women and 16,256 men, we also identified two additional suggestive SNPs (rs10484683 at SAMD5 and rs78971134 near BTBD11) with possible sex-specific associations – both being associated with a risk for men, and with no strong evidence of association for women.

The SNP rs4903064 at DPF3 gene was previously reported to be associated with increased RCC risk in a large GWAS [12], and our analysis confirms the previous reports of its sex-specific association. We further provide evidence that the association might be mediated through expression of the gene, with the magnitude of the association between the SNP and expression being greater for women than men. Polymorphisms at intron 1 of DPF3 are also associated with increased risk of breast cancer for women of European origin, but the SNPs were not in linkage disequilibrium with rs4903064 [21]. DPF3 is a histone acetylation and methylation reader of the BAF and PBAF chromatin remodelling complexes. Other components of the complexes like BAP1 and PBRM1 are frequently mutated in RCC and show sex differences in their mutation frequency and association with survival [22]. Chromatin-remodelling complexes regulate gene expression and loss of these chromatin modifiers has been associated with characteristic gene expression signatures in RCC [23, 24]. Sexually dimorphic gene expression is frequent in both murine [25] and human [6, 26] kidney normal and tumour tissues, and is hypothesised to contribute to the mechanism underlying sex-difference in kidney diseases including cancer [5, 27]. Therefore, variants of chromatin remodelling complex associated genes might modify RCC risk differently for men and women through sex-specific gene expression but the exact mechanism remains speculative and requires detailed functional studies in vitro.

The SNP rs2121266 mapping to intron 1 of the EPAS1 gene is in strong linkage disequilibrium (r2 = 0.97, D′ = 1.00) to the previously described risk SNP rs11894252 at 2p21 [8]. Our finding of a stronger association for men is in agreement with previous findings of stronger associations for the proxy SNP rs11894252 for men (ORmale = 1.18 compared with ORfemale = 1.06, Pinteraction = 0.03) in RCC. Additionally, sexually dimorphic associations for EPAS1 variants were also observed for rs13419896 in lung squamous cell carcinoma [28] and rs4953354 in lung adenocarcinoma [29] in two independent reports from a Japanese population. EPAS1 (HIF2α) is a key gene in RCC and functions as a transcription factor in the VHL–HIF signalling axis [30, 31]. The intron 1 of EPAS1 contains oestrogen response elements (EREs) and oestrogen-dependent downregulation of EPAS1 occurs in invasive breast cancer cells [32]. RCC related polymorphisms near other important genes like CCND1, MYC/PVT1 have been found on enhancers at tissue-specific HIF-binding loci in renal tubular cells [33, 34], implying a role for HIF in transactivation of key oncogenic pathways in RCC. Although rs2121266 and rs11894252 were not eQTLs for EPAS1, it is possible that the role of these polymorphisms in sex hormone mediated regulation of EPAS1 and transactivation of downstream genes may result in sex-specific susceptibility to RCC.

Two other SNPs that reached genome-wide significance in the joint analysis of discovery and replication series, namely rs10484683 at SAMD5 and rs78971134 near BTBD11 have not been previously reported to be associated with risk of RCC. For rs10484683 (SAMD5), the sex-specific finding from the discovery stage was driven by MAF differences in the controls only. Hence, the result remains unclear and might be the reason that the apparent association did not replicate. The SNP rs10484683 was not a significant cis eQTL in normal or tumour kidney tissues in our series, but expression of SAMD5 varied significantly between tumour samples from men and women. Also, a significant overexpression of SAMD5 in tumours from current and TCGA datasets suggests its potential role in RCC pathogenesis. Although not previously implicated in RCC, SAMD5 overexpression has been found to be associated with bile duct and cholangiocarcinoma [35]. BTBD11 gene codes for an ankyrin repeat and BTB/POZ domain-containing protein involved in regulation of proteolysis and protein ubiquitination. Functional implications of this gene is not well known in RCC, but SNPs near the BTBD11 gene were previously reported to be associated with kidney function traits [36] and diabetic kidney diseases [37] by large genome-wide studies, however, these SNPs were not in LD with the current risk variant rs78971134.

We confirmed sex-specific genetic associations of known RCC risk SNPs and identified new suggestive associations for one sex or the other. No clear pattern of an increased risk for men or decreased risk for women could be observed in the top sexually dimorphic SNPs, as would be otherwise anticipated for explaining the 2:1 sex ratios. Therefore, these SNPs are not conclusive for untangling the sex-specific genetic susceptibility that might contribute to the sex ratio in incidence. Due to technical constraints we could not examine sex chromosomal associations in the current study. Even given its large sample size, a drawback of the study is its limited statistical power to detect subtle sex-specific associations (SSEs or CEDs), particularly when analysing men and women separately. A male-specific association may simply reflect the lack of power to detect association in women, owing to the smaller sample size for women compared with men. To increase the power to detect sex-specific associations, the combination of results from different GWAS in sex-stratified meta-analyses is warranted. In addition to large well powered sex-specific genetic studies, multi-omics approaches studying both autosomes and sex chromosomes and their interaction with sex hormones might help to unravel the endogenous causes of sex bias in sexually dimorphic traits and diseases like RCC.