Introduction

Incidence of cutaneous melanoma, the most aggressive and lethal form of skin cancer, continues to increase in fair-skinned populations1. Ultraviolet radiation (UVR) exposure is the main risk factor of melanoma2. By sequencing lesions in different evolutionary stages, Shain et al.3 found strong signature UVR mutations detectable at all stages, from benign lesions to invasive melanoma3. In some melanoma patients, the tumour suppressor gene TP53 has occasionally shown mutations with a UVR-signature4,5. Still, the most common mutations found in melanomas are the BRAF V600E and V600K mutations6 although these mutations do not carry the typical UVR signature7. The biological pathway from UVR exposure to melanoma onset is not yet fully explained, and could be due to epigenetic changes after UVR exposure.

Epigenetic modifications, such as DNA methylation (DNAm), have been associated with all cancers, including melanoma8. However, to our knowledge, previous studies on the epigenetics of melanoma have mostly been based on post-diagnostic sampling9. Epigenetic biomarkers have been proposed for both survival and response to treatment in melanoma patients10,11. However, the predictive value of these markers for melanoma risk itself has not been evaluated. Since most melanomas appear without association with a precursor lesion12, evaluation of pre-diagnostic markers should be based on samples of skin, blood, or saliva stored in biobanks. The DNA from these mixed tissues such as blood or saliva could be confounded by the composition of the different tissues. Based on epigenetic data, it is possible to estimate the relative contribution from each cell type observed in the tissue mix13.

The population-based Norwegian Women and Cancer (NOWAC) cohort study14 has been used to study the importance of host factors and UVR exposure in melanoma risk15,16,17, and to identify pre-diagnostic epigenetic markers for lung and breast cancer18,19,20. In a nested case–control study within the NOWAC cohort, we aimed to identify biomarkers of melanoma risk in pre-diagnostic blood samples of melanoma cases and cancer free controls, in an epigenome wide association study (EWAS), as well as a subset EWAS on case specific characteristics. To complement our analyses, EWAS was also performed in an open source data set from an independent study on melanoma21 and combined with our results.

Materials and methods

Material (NOWAC)

The NOWAC cohort includes over 172,000 women aged 30–70 years at recruitment in 1991–2006 (response 54%)14. Information on host characteristics and lifestyle factors was collected through baseline questionnaires and up to two follow-up questionnaires. The NOWAC study has high external validity, with no major selection bias14,22. Approximately 50,000 women (46–63 years) constitute the post genome cohort within NOWAC and donated a blood sample at inclusion or at the second follow-up (2003–2006)23. By using the unique identity number of Norwegian citizens, NOWAC is linked to the Cancer Registry of Norway (CRN) for follow-up of cancer incidence and vital status. We included all incident melanoma cases (n = 183) with an isolated DNA and RNA sample in the biobank per December 31, 2013, and matched each case with one cancer free control, based on time since blood sampling and year of birth (1943–1947, 1948–1952, 1953–1957). The Norwegian Malignant Melanoma Registry (NMMR) was established under the CRN in 2008, and information on tumour thickness for incident cases since 2008 was obtained from the NMMR. For melanoma cases diagnosed before 2008, information on tumour thickness was extracted manually by the CRN’s experienced melanoma registrars from histopathological reports in the CRN archive24.

All participants gave written informed consent and the Medical Ethical Committees of North Norway has approved the NOWAC study, the storage of human biological material, as well as this study (2016/976/REK Nord). All methods in this study were performed in accordance with the relevant ethical guidelines and regulations.

DNA methylation

Details of the DNAm quality control has been described elsewhere25. Briefly, DNA were treated with bi-sulfite and hybridised to the Illumina Infinium MethylationEPIC array according to the manufacturer’s protocol. Background subtraction and control normalization were performed with minfi to reduce background noise and dye bias26. Type I and Type II probes were normalized using the Beta mixture quantile normalization method from the wateRmelon R-package27. After quality control, 775 528 CpG probes remained in the data set. White blood cell composition was estimated using the Houseman algorithm13,28.

To complement our analysis, we included an open source data set from an independent study by Conway et al.21 (hereafter referred to as the GSE120878 study) which compared the epigenetic profiles of melanoma biopsies (n = 89) and nevi biopsies (n = 73), all from suspected melanoma biopsies from different patients using logistic regression. Their DNA methylation data were deposited at the GEO database in April 2019 (accession number GSE120878)21. In this study, DNA methylation was measured on the Illumina Infinium HumanMethylation450 BeadChip array and processed with the minfi R package.

Covariates

Baseline variables include age at recruitment, birth cohort and recruitment year. Age at blood sample and time in freezer were recorded. Participants reported, by questionnaire, education (≤ 10, 11–13, ≥ 14 years), smoking (never, former, current smoker), hair color (dark brown/black, brown, blond/yellow, red), freckling after sunbathing (yes, no), and the number of asymmetric nevi > 5 mm on the legs (0, 1, 2–3, 4–6, 7–12, 13–24, ≥ 25; categorized as 0, 1, ≥ 2). On the basis of average ambient ultraviolet radiation29, region of residence (latitudes 70°–58°) was categorized as low UVR exposure (north Norway), medium–low (central Norway), medium (southwest Norway), and highest UVR exposure (southeast Norway)30. In the baseline and follow-up questionnaires, participants reported history of severe sunburns (never, 1, 2–3, 4–5, ≥ 6 times/year), average number of weeks per year spent on sunbathing vacations (never, 1, 2–3, 4–6, ≥ 7 weeks/year), and average use of an indoor tanning device (never; rarely; 1, 2, or 3–4 times/month; > 1 time/week) in childhood (≤ 9 years), adolescence (10–19 years), and adulthood (> 19 years)30. The reported frequencies of sunburns, sunbathing vacations and indoor tanning were transformed and multiplied by the length of each interval for each questionnaire15. The participants were then classified into five categories; non-exposed and quartiles. To capture the tail of the distribution, the upper quartile was further divided into two equally sized groups (i.e. six categories in total) as described in Page et al.25 Cumulative UVR exposure was constructed by summarizing the categories (i.e. scores 0–5) for indoor tanning and sunbathing vacations15. Reproducibility of melanoma risk factors in the NOWAC questionnaire was good (kappa/intraclass correlation coefficient 0.49–0.77) and independent of age and education31.

Statistical methods

Conditional logistic regression was used to study the association between future status of melanoma as the outcome and white blood cell composition as continuous exposure, accounting for time to diagnosis and potential confounders: hair colour, nevi, and UVR exposure. To minimize technical variation and capture unmeasured confounding, we constructed surrogate variables using the sva package in R32,33,34. The surrogate variables were constructed as orthogonal decompositions of the residuals after projecting melanoma status on the DNA methylation data matrix33. We used conditional logistic regression, with control for the matching variables (age at blood sample and time in freezer) to assess the associations between future melanoma as the outcome and DNA methylation, adjusting for lifetime history of sunburns (as an indicator of severe UVR exposure)15, hair colour (the best measure of skin sensitivity to UVR exposure in the NOWAC cohort)17,35, and surrogate variables as potential confounders.

To control for multiple testing, we used the false discovery rate (FDR) procedure of Benjamini and Hochberg36. The genes annotated to the top 2000 CpG sites in our main model were included in an enrichment analysis, using the Enricher web interface37.

An EWAS was run using logistic regression without any covariate adjustments in the GSE120878 dataset, and the log p-values included as covariates in an adaptive multiple testing FDR method called AdaPT38 when correcting our main model for multiple testing. This method is based on the covariate modulated FDR (cmFDR) proposed by Ferkingstad et al.39 which weight the FDR significance by information from the new data set. The combined AdaPT analysis was restricted to CpG sites with a nominal p-value < 10–10 (NCpG = 2176) in the GSE120878 EWAS and the respective log p-values from these sites were included as side information in AdaPT, for each CpG, respectively.

A prediction model was trained on the same CpG sites (p-value < 10–10, NCpG = 2176) from the GSE120878 data, using a regression and decision tree algorithm40 similar to that of Onwuka et al.41. The prediction model was applied on the NOWAC data set of incident melanoma cases and controls.

Lastly, we performed an EWAS including only the melanoma cases in linear regressions with log transformed tumour thickness as the outcome, adjusting for lifetime history of sunburns and hair colour.

Institutional review board statement

The Medical Ethical Committees of North Norway has approved the NOWAC study, the storage of human biological material, as well as this sub-study (2016/976/REK Nord). All methods in this study were performed in accordance with the relevant ethical guidelines and regulations.

Informed consent statement

All participants gave written informed consent.

Results

Baseline characteristics of the cases and controls are presented in Table 1. Having higher education, being a non-smoker, having blond/yellow/red hair, freckling when sunbathing, and large asymmetric nevi on the legs were more common in the melanoma cases than in the controls. Compared to controls, melanoma cases reported more UVR exposure: lower proportion living in the region with low ambient UVR, the lower proportion experiencing no sunburns, the higher proportions in the highest categories of sunbathing vacations, and lower proportions in the two lowest categories of indoor tanning and cumulative UVR exposure (Table 1). Mean age at melanoma diagnosis was 60.2 years (range 49–70) and mean time from blood sampling to diagnosis was 4.4 years (range 0–9.7 years) (Table 1). Estimated cell-type proportions were similar in cases and controls (0.09 ≤ p ≤ 0.94) (Supplementary Table S1). None of the white blood cell fractions were significantly differently distributed between the melanoma cases and controls after adjustment for time to diagnosis (0.14 ≤ padj ≤ 0.56) (Supplementary Table S2).

Table 1 Characteristics of the cases and their matched controls.

We did not find any CpGs significantly associated with melanoma risk in the genome-wide analyses (0.85 ≤ padj ≤ 0.99). The top 10 CpG sites are listed in Table 2. The estimated odds ratios (ORs) genome wide were in equal proportions in both directions, indicating no global loss of methylation. The pathway enrichment analysis of the top 2000 CpGs did not identify any pathways previously reported for melanoma (Supplementary Table S3).

Table 2 The top 10 CpG sites in the conditional logistic regression analyses of CpG sites for melanoma cases (n = 183) versus controls (n = 183).

In the combined AdaPT analysis, after adjusting our findings with the log p-value from the GSE120878 EWAS, seven CpG sites from the NOWAC study had an FDR adjusted p-value below 0.15 (Table 3). The distribution of the ORs for these seven CpGs were shifted towards higher risk, with 5/7 CpGs having an OR above 1, as compared to the entire set of OR in our main EWAS, where the OR was in equal proportions in both directions.

Table 3 The seven CpG sites rejected at FDR level of 0.15 with AdaPT (covariate modulated FDR; cmFDR). p value from the conditional logistic regression analyses and cmFDR adjusted with AdaPT using log p-values from the GSE120878 analysis.

The prediction model trained on the GSE120878 data set, did not predict melanoma status well: while 48% were predicted true positive, 49% were predicted as false positive, with the true negative only 1% and the false negative 2%. We did not find any significant CpG associations in the EWAS analysis of DNAm and tumour thickness in the melanoma cases (0.86 ≤ padj ≤ 0.99).

Discussion

We compared DNAm profiles of incident melanoma cases to healthy controls, to identify potential biomarkers for melanoma risk. We did not identify any genome wide significant CpGs related to melanoma risk. However, by combining different data sources, weighing the FDR adjustment, we identified seven potentially differentially methylated CpG sites associated to incident melanoma, all previously associated to melanoma in a case–control study21.

Two of the top 10 genes identified in our EWAS (Table 2) have previously been associated to melanoma; RSF142 and NTN443. However, they have been associated with more advanced stages in melanoma from case only studies and cell lines, and not with melanoma risk. We observed an equal number of effect sizes in both directions while the proportion of hypomethylation was larger in the GSE120878 study (50% vs ~ 57.6%), indicating a global loss of methylation in melanoma biopsies, which was not observed in the pre-diagnostic samples. This indicates that the loss of methylation observed in cancers may be a consequence of the disease, and not its cause. The log-odds was consistently over ten times larger in the study including samples from prevalent cases as compared to this pre-diagnostic study, which is also to be expected, given the differences in sample tissues in the two studies.

None of the top genes found in our primary analysis were associated to melanoma risk in the largest GWAS of melanoma to date44, which included almost 37,000 melanoma cases and ten times as many controls. Of the CpGs associated with melanoma risk in Table 3, two are associated to genes MIR196B and SH3RF3, which have been observed differentially expressed in sun exposed skin, as compared to non-exposed skin45. Given the prominent role of UVR exposure in melanoma risk, this is a potentially interesting finding that should be followed up. Further among the genes indicated, the HOXA9-HOXA10 cluster has been observed differentially expressed in multiple cancers46, and upregulation of HOXA9 is related to poor survival in melanoma cases47. Since the analysis was informed using findings from a previous case–control study on melanoma, all findings in this analysis have previously been associated with melanoma.

None of the white blood cell fractions were significantly differently distributed between the melanoma cases and controls, even after adjusting for time to diagnosis. This indicated that the cell type composition would not be a confounder for disease status, and was thus not adjusted for in our analysis.

The discovery of pre-diagnostic biomarkers relies on a large number of samples with biological material stored in biobanks, since the future cancer status of each participant is unknown. Biobanks of the size needed for this type of incident sampling, are almost exclusively storing biological samples derived from blood. The use of blood leukocytes may explain some of the poor performance for the multi-CpG prediction for melanoma, which did not predict case status with high accuracy. Additionally, the tissue differences need to be kept in mind when comparing the results between pre-diagnostic blood samples and tissue specific cancer samples. Using results from cancer tissue to inform the FDR correction could help detect cancer like signals in blood samples early on in the disease. Circulating blood leukocytes are constantly in contact with all organ systems in the body, and exposed to the same environment, thus, weak signals from the environmental exposure can often be detected in blood leukocytes48. Additionally, pre-clinical tumors are likely to shed DNA fragments in the blood stream, which can influence the DNAm signature in the blood sample, and DNAm isolated from whole blood may then contain weak cancer specific signals.

Being nested in the NOWAC cohort, this study benefits from a large population-based cohort with well documented case information and prospective baseline information on major risk factors for melanoma, such as UVR exposure, but the pre-diagnostic biological material was limited to whole blood and with a limited sample size. The distribution of T categories was not even across the cases, as it can be in a selected clinical sample, but reflective of what is found in the general population (i.e. more T1 than T4 melanomas).

NOWAC is a female only cohort, while GSE120878 included both sexes. Previous cancer studies have included only one sex, either only females14 or males49. Studies addressing association between UV exposure and melanoma, found no interaction between sun exposure and sex50. The GSE120878 data set was balanced with respect to sex ratio, and the p-value for any sex differences between the groups not significant21.

The lack of genotype information in the cohort is a limitation. Multiple genetic markers have been found to increase melanoma risk44, most notable variants in the CDKN2A gene51, however, the consent in NOWAC did not open for genotyping of the participants.

We find that the use of covariate modulated FRD methods, like AdaPT, is a good way of combining our results with public data from a different source.

Conclusion

No epigenome-wide significant associations to melanoma risk was found, but 7 CpGs identified by combining data and previous knowledge was suggestive of melanoma risk. Future melanoma status was not well predicted in this study, however, using a more targeted tissue, such as skin biopsies could have resulted in more informative epigenetic markers for melanoma risk.