Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Journal name:
Nature Genetics
Volume:
46,
Pages:
624–628
Year published:
DOI:
doi:10.1038/ng.2966
Received
Accepted
Published online

Incidence and mortality for sex-unspecific cancers are higher among men, a fact that is largely unexplained1, 2. Furthermore, age-related loss of chromosome Y (LOY) is frequent in normal hematopoietic cells3, 4, but the phenotypic consequences of LOY have been elusive5, 6, 7, 8, 9, 10. From analysis of 1,153 elderly men, we report that LOY in peripheral blood was associated with risks of all-cause mortality (hazards ratio (HR) = 1.91, 95% confidence interval (CI) = 1.17–3.13; 637 events) and non-hematological cancer mortality (HR = 3.62, 95% CI = 1.56–8.41; 132 events). LOY affected at least 8.2% of the subjects in this cohort, and median survival times among men with LOY were 5.5 years shorter. Association of LOY with risk of all-cause mortality was validated in an independent cohort (HR = 3.66) in which 20.5% of subjects showed LOY. These results illustrate the impact of post-zygotic mosaicism on disease risk, could explain why males are more frequently affected by cancer and suggest that chromosome Y is important in processes beyond sex determination. LOY in blood could become a predictive biomarker of male carcinogenesis.

At a glance

Figures

  1. Structural genetic variants found in phenotypically normal blood cells from 1,141 elderly men with no previous record of hematological malignancy.
    Figure 1: Structural genetic variants found in phenotypically normal blood cells from 1,141 elderly men with no previous record of hematological malignancy.

    (a) The circular plot shows the position and frequency of 40 autosomal variants, including 13 deletions (red, outer circle), 16 CNNLOH regions (green, middle circle) and 11 gains (blue, inner circle). The asterisk by chromosome Y indicates that the frequency of LOY is not shown to scale with the autosomal variants. (b) Frequency of LOY, with the percentage of cells affected in each participant plotted on the y axis after sorting subjects by descending mLRR-Y, that is, the median log R ratio (LRR), for ~2,560 SNP probes in the male-specific region of chromosome Y (chr. Y: 2,694,521–59,034,049, hg19/GRCh37). The percentage of cells affected in each participant was calculated as described in Supplementary Figure 3. The solid line indicates the threshold of LOY used in the survival analyses, and the dashed line shows the threshold for the estimation of the frequency of LOY in the studied cohort.

  2. LOY frequency estimation after accounting for experimental variation.
    Figure 2: LOY frequency estimation after accounting for experimental variation.

    (a) mLRR-Y observed in 1,141 men with no history of hematological malignancies before blood sampling. Each triangle represents one participant. (b) Distributions of mLRR-Y (gray bars) and experimental noise (white bars) that were used to find the threshold for the estimation of LOY frequency. The latter distribution was generated as described in the Online Methods. The dashed black lines represent the 99% CIs of the distribution of expected experimental background noise. Of the 1,141 men, we found that 168 (14.7%) had a lower median LRR than the lower 99% CI representing LOY in ~13.1% of cells. For the frequency of LOY reported here, we used the lowest value in the noise distribution as the threshold (green line at −0.139).

  3. LOY and its effect on mortality.
    Figure 3: LOY and its effect on mortality.

    (ac) Impact of LOY on all-cause mortality (a), cancer mortality (b) and mortality from non-hematological cancers (c) in 982 men with no history of cancer before sampling. The HR, 95% CI, number of events and P value are shown for each model. Results are derived from Cox proportional hazards regression models, with subjects classified into groups 1 and 0 on the basis of their level of LOY. Individuals in the affected group (red curves) had LOY in ≥35% of nucleated blood cells (Fig. 2).

  4. Longitudinal LOY analyses in five elderly men showing progressive accumulation of cells containing LOY with increasing age.
    Figure 4: Longitudinal LOY analyses in five elderly men showing progressive accumulation of cells containing LOY with increasing age.

    (ae) ULSAM subjects 33 (a), 41 (b), 311 (c), 1655 (d) and 102 (e). Each subject was analyzed at two different ages, and the lower part of each panel shows a time axis with ages at genotyping, the time point of cancer diagnosis and cancer type as well as age and cause of death, when applicable. The red line in each panel shows mLRR-Y as estimated from SNP array experiments performed on blood collected at different ages in each subject. The text in red indicates the estimated percentage of nucleated blood cells affected with LOY. This number was calculated using MAD software from the analysis of SNP array data for PAR1 of chromosomes X and Y (Online Methods and Supplementary Fig. 3).

  5. Flowchart describing all steps in the analysis of 1,217 ULSAM participants genotyped on the Illumina 2.5M HumanOmni SNP BeadChip.
    Supplementary Fig. 1: Flowchart describing all steps in the analysis of 1,217 ULSAM participants genotyped on the Illumina 2.5M HumanOmni SNP BeadChip.

    This figure also summarizes major findings and refers to relevant figures and tables showing detailed results. Boxes with numbers 1 and 2 summarize the scoring of aberrations from the cohorts of 1,153 and 1,141 ULSAM subjects. Boxes with numbers 3 and 4 refer to primary survival analyses and exploratory survival analyses using 982 participants, respectively.

  6. Circos plots showing the structural variants found in the entire cohort of 1,153 participants who were successfully genotyped on Illumina BeadChips.
    Supplementary Fig. 2: Circos plots showing the structural variants found in the entire cohort of 1,153 participants who were successfully genotyped on Illumina BeadChips.

    The numbers of deletions, CNNLOH events and gains are shown with red, green and blue bars, respectively. Panels a and b show 688 subjects without cancer history and 465 cases with cancer diagnoses, respectively. Panels c and d display 12 participants with a history of hematological malignancy before blood sampling and 73 individuals with diagnoses of hematological malignancy after blood sampling. Correspondingly, panels e and f illustrate data from 162 participants with diagnoses with non-hematological malignancy before blood sampling and 262 cases who received diagnoses of non-hematological cancer after sampling. Data showing LOY in panels a,b,e,f are not shown to scale (highlighted with an asterisk). The numbers of LOY events in these panels are 55, 38, 13 and 29, respectively.

  7. Estimation of the percentage of blood cells affected by LOY through analysis of SNP array data from PAR1 of chromosomes X and Y using MAD software in the ULSAM cohort.
    Supplementary Fig. 3: Estimation of the percentage of blood cells affected by LOY through analysis of SNP array data from PAR1 of chromosomes X and Y using MAD software in the ULSAM cohort.

    PAR1 is the largest of the PARs (regions with homologous sequences on chromosomes X and Y) with coordinates 10,001–2,649,520 on chromosome Y and 60,001–2,699,520 on chromosome X. MAD software is a tool for the detection and quantification of somatic structural variants from SNP array data, which uses diploid BAF values for the identification and LRR values for the quantification of somatic variants, and is not originally intended for analyses of chromosome Y data. However, by using the correlation between LRR in the PAR1 region of the Y chromosome and dBAF (absolute deviation from the expected BAF value of 0.5 in heterozygous probes) of the PAR1 region of chromosomes X and Y (panel a), we could use MAD quantification of the diploid PAR1 region on chromosomes X and Y to calculate the percentage of cells affected by LOY (panel b) in a two-step process. For example, the dBAF value at the LRR threshold for survival analyses (mLRR-Y ≤ –0.4) can be found using the equation given in panel a (0.178). This equation (y = –2.7823x + 0.0954) is describing the relationship between mLRR-Y on chromosome Y and dBAF on chromosomes X and Y for the 1,141 subjects. Next, the percentage of cells affected by LOY can be found by applying the equation in panel b that describes the relationship between dBAF and the percentage of cells as estimated by MAD software for 14 cases (y = 1.832x + 0.023). For this example, dBAF of 0.178 translates to LOY in 35% of cells.

  8. Validation of findings of LOY using next-generation sequencing for six candidate subjects.
    Supplementary Fig. 4: Validation of findings of LOY using next-generation sequencing for six candidate subjects.

    Low-coverage whole-genome sequencing was performed on 100 participants from the cohort. Among the 93 subjects with a median LRR in the male-specific region on chromosome Y (mLRR-Y, i.e., the median LRR for ~2,560 SNP probes in the region chr. Y: 2,694,521–59,034,049, hg19/GRCh37) lower than –0.139 (i.e., the threshold for frequency estimation, Fig. 2), whole-genome sequencing was performed in 6 participants. Panel a shows LRR data from the male-specific region on chromosome Y (MSY) in these 6 subjects using box plots. The rightmost box (in all panels) contains data from the 94 sequenced individuals with an mLRR-Y value above the –0.139 threshold for frequency estimation. The red lines in all panels represent the expected normal state. Next-generation sequencing data from the 6 subjects and 94 controls are plotted in panel b. The median read depth in the MSY for the 94 subjects without LOY was 1.6 (s.d. = 0.6). The corresponding read depth in the six subjects with LOY was 1.3 (s.d. = 0.5). In comparison, the median read depth on chromosome 22 was 3.8 (s.d. = 1.4) in the 94 subjects without LOY and 3.8 (s.d. = 1.2) in the 6 subjects with LOY. Read depth data were used to estimate the ploidy of chromosome 22 and the MSY region on chromosome Y in comparison with the rest of the genome using FREEC software39. The estimated ploidy is plotted in panels b and d. FREEC calculates ploidy for the regions of interest as the copy number value in each 5-kb window in the region of interest after GC content read count normalization, given a normal autosomal ploidy of 2. Panels c and d show that the copy number state on chromosome 22 is normal in the participants affected with LOY and is plotted in panels a and b, using SNP array and next-generation sequencing data, respectively.

  9. Detailed individual validations of LOY in ULSAM subjects 98 and 279 using low-coverage whole-genome next-generation sequencing.
    Supplementary Fig. 5: Detailed individual validations of LOY in ULSAM subjects 98 and 279 using low-coverage whole-genome next-generation sequencing.

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  10. Detailed individual validations of LOY in ULSAM subjects 531 and 693 using low-coverage whole-genome next-generation sequencing.
    Supplementary Fig. 6: Detailed individual validations of LOY in ULSAM subjects 531 and 693 using low-coverage whole-genome next-generation sequencing.

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  11. Detailed individual validations of LOY in ULSAM subjects 1124 and 1238 using low-coverage whole-genome next-generation sequencing).
    Supplementary Fig. 7: Detailed individual validations of LOY in ULSAM subjects 1124 and 1238 using low-coverage whole-genome next-generation sequencing).

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  12. Examples of autosomal structural genetic aberrations of [ge] 2 Mb in length in the ULSAM cohort.
    Supplementary Fig. 8: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort.

    LRR and BAF values from SNP array data as well as validation with next-generation sequencing using low-coverage whole-genome sequencing (panel c) or exome sequencing (panels a,b,d–f) with an average coverage of 17×. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red. Panels c and d show two examples of CNNLOH; in these cases, next-generation sequencing did not, as expected, detect any structural changes in these regions.

  13. Examples of autosomal structural genetic aberrations of [ge] 2 Mb in length in the ULSAM cohort.
    Supplementary Fig. 9: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort.

    In each panel, LRR and BAF values from SNP array data are plotted. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red.

  14. Examples of autosomal structural genetic aberrations of [ge] 2 Mb in length in the ULSAM cohort.
    Supplementary Fig. 10: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort.

    In each panel, LRR and BAF values from SNP array data are plotted. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red.

  15. No validation of suggested cases of GOY using low-coverage ([sim]5[times]) whole-genome next-generation sequencing.
    Supplementary Fig. 11: No validation of suggested cases of GOY using low-coverage (~5×) whole-genome next-generation sequencing.

    Of 100 sequenced participants, 3 had a positive median LRR on the SNP array in the male-specific part of chromosome Y (mLRR-Y), indicating possible GOY. In panels a–c are plotted the SNP array and next-generation sequencing data from chromosomes 22, X and Y for each of these three subjects. The LRR and BAF values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy estimated from next-generation sequencing data was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY.

  16. Results from exploratory survival analyses in the ULSAM cohort using Cox proportional hazards regression models with different thresholds for the classification of participants into groups 1 and 0 on the basis of their level of LOY measured as the median LRR in the male-specific part of chromosome Y (mLRR-Y).
    Supplementary Fig. 12: Results from exploratory survival analyses in the ULSAM cohort using Cox proportional hazards regression models with different thresholds for the classification of participants into groups 1 and 0 on the basis of their level of LOY measured as the median LRR in the male-specific part of chromosome Y (mLRR-Y).

    The number of participants (n) with LOY and the minimum percentage of affected cells for each subject are given for each of the tested thresholds. The red and blue curves represent results from analyses with cancer mortality and all-cause mortality as endpoints, respectively. Models with significant effect on mortality (α level of 0.05) are indicated by solid black triangles, and non-significant models are plotted with empty triangles. On the basis of these results, mLRR-Y at –0.4 is the most informative threshold for survival analyses in the studied ULSAM cohort.

  17. LOY frequency estimation in the PIVUS cohort after accounting for experimental variation.
    Supplementary Fig. 13: LOY frequency estimation in the PIVUS cohort after accounting for experimental variation.

    Panel a shows median LRR in the male-specific part of chromosome Y (mLRR-Y) observed in all men (n = 488) genotyped from this cohort. Each triangle represents one participant. Panel b shows the distribution of mLRR-Y (gray bars) and the experimental noise (white bars) that were used to find the threshold for the estimation of LOY frequency. The latter distribution was generated as described in the Online Methods. The dotted black lines represent the 99% CIs of the distribution of expected experimental background noise. Among the 488 men in PIVUS, we found that 100 subjects (20.5%) had an mLRR-Y value lower than –0.154, which represents the lowest value in the distribution of experimental noise and corresponds to >13% of cells with LOY.

  18. Validation of the result that men with LOY are at a higher risk for all-cause mortality in an independent cohort (PIVUS).
    Supplementary Fig. 14: Validation of the result that men with LOY are at a higher risk for all-cause mortality in an independent cohort (PIVUS).

    Panel a shows the results from Cox proportional hazards regression models with all-cause mortality as the endpoint using different thresholds in the analyses of 488 men. The participants were classified into groups 1 and 0 on the basis of their degree of LOY using different thresholds for mLRR-Y, i.e., the median LRR in the male-specific part of chromosome Y. The number of participants (n) with LOY and the minimum percentage of affected cells for each subject are given for each of the tested thresholds. On the basis of these results, mLRR-Y at –0.5 is the most informative threshold for survival analyses in the PIVUS cohort. Panel b shows the results from a Cox proportional hazards regression model testing the effect of LOY on risk for all-cause mortality in 488 PIVUS men at the –0.5 threshold. The survival of men with LOY is represented by the red curve. HRs, P value, 95% CIs and the number of events are shown.

  19. Comparisons of the spectrum of cancer diagnoses for ULSAM participants with and without LOY who were successfully genotyped on Illumina BeadChips and scored for structural genetic variants.
    Supplementary Fig. 15: Comparisons of the spectrum of cancer diagnoses for ULSAM participants with and without LOY who were successfully genotyped on Illumina BeadChips and scored for structural genetic variants.

    The cutoff level for LOY used for these comparisons was mLRR-Y <–0.139, corresponding to LOY in >18% of cells (see Fig. 2, Supplementary Fig. 3 and the text). Panels a and b display cancer diagnoses for the entire cohort of 1,153 men with 93 participants scored with LOY and 1,060 subjects who were not scored with LOY, respectively. Panels c and d show the distribution of the cancer diagnoses in the cohort after excluding subjects with cancer before blood sampling. In the remaining 982 men, 80 subjects were scored with LOY and 902 subjects were not scored with LOY. All cancer diagnoses were grouped into 13 categories, and, for each category, the percentage of cases is shown, followed by the absolute number of patients (in parentheses) with the diagnosis category. GI, gastrointestinal; ENT, ear/nose/throat; CNS, central nervous system.

References

  1. Cook, M.B., McGlynn, K.A., Devesa, S.S., Freedman, N.D. & Anderson, W.F. Sex disparities in cancer mortality and survival. Cancer Epidemiol. Biomarkers Prev. 20, 16291637 (2011).
  2. Edgren, G., Liang, L., Adami, H.O. & Chang, E.T. Enigmatic sex disparities in cancer incidence. Eur. J. Epidemiol. 27, 187196 (2012).
  3. Jacobs, P.A., Brunton, M., Court Brown, W.M., Doll, R. & Goldstein, H. Change of human chromosome count distribution with age: evidence for a sex differences. Nature 197, 10801081 (1963).
  4. Pierre, R.V. & Hoagland, H.C. Age-associated aneuploidy: loss of Y chromosome from human bone marrow cells with aging. Cancer 30, 889894 (1972).
  5. Nowinski, G.P. et al. The frequency of aneuploidy in cultured lymphocytes is correlated with age and gender but not with reproductive history. Am. J. Hum. Genet. 46, 11011111 (1990).
  6. United Kingdom Cancer Cytogenetics Group. Loss of the Y chromosome from normal and neoplastic bone marrows. United Kingdom Cancer Cytogenetics Group (UKCCG). Genes Chromosom. Cancer 5, 8388 (1992).
  7. Wiktor, A. et al. Clinical significance of Y chromosome loss in hematologic disease. Genes Chromosom. Cancer 27, 1116 (2000).
  8. Wong, A.K. et al. Loss of the Y chromosome: an age-related or clonal phenomenon in acute myelogenous leukemia/myelodysplastic syndrome? Arch. Pathol. Lab. Med. 132, 13291332 (2008).
  9. Wiktor, A.E., Van Dyke, D.L., Hodnefield, J.M., Eckel-Passow, J. & Hanson, C.A. The significance of isolated Y chromosome loss in bone marrow metaphase cells from males over age 50 years. Leuk. Res. 35, 12971300 (2011).
  10. Jacobs, P.A. et al. Male breast cancer, age and sex chromosome aneuploidy. Br. J. Cancer 108, 959963 (2013).
  11. Forsberg, L.A. et al. Age-related somatic structural changes in the nuclear genome of human blood cells. Am. J. Hum. Genet. 90, 217228 (2012).
  12. Jacobs, K.B. et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651658 (2012).
  13. Schick, U.M. et al. Confirmation of the reported association of clonal chromosomal mosaicism with an increased risk of incident hematologic cancer. PLoS ONE 8, e59823 (2013).
  14. Forsberg, L.A., Absher, D. & Dumanski, J.P. Non-heritable genetics of human disease: spotlight on post-zygotic genetic variation acquired during lifetime. J. Med. Genet. 50, 110 (2013).
  15. Laurie, C.C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642650 (2012).
  16. de Carvalho, C.M. & Santos, F.R. Human Y-chromosome variation and male dysfunction. J. Mol. Genet. Med. 1, 6375 (2005).
  17. Hughes, J.F. & Rozen, S. Genomics and genetics of human and primate Y chromosomes. Annu. Rev. Genomics Hum. Genet. 13, 83108 (2012).
  18. Lemos, B., Araripe, L.O. & Hartl, D.L. Polymorphic Y chromosomes harbor cryptic variation with manifold functional consequences. Science 319, 9193 (2008).
  19. Lemos, B., Branco, A.T. & Hartl, D.L. Epigenetic effects of polymorphic Y chromosomes modulate chromatin components, immune response, and sexual conflict. Proc. Natl. Acad. Sci. USA 107, 1582615831 (2010).
  20. Vijayakumar, S. et al. Detection of recurrent copy number loss at Yp11.2 involving TSPY gene cluster in prostate cancer using array-based comparative genomic hybridization. Cancer Res. 66, 40554064 (2006).
  21. Zhang, L.J., Shin, E.S., Yu, Z.X. & Li, S.B. Molecular genetic evidence of Y chromosome loss in male patients with hematological disorders. Chin. Med. J. (Engl.) 120, 20022005 (2007).
  22. Bianchi, N.O. Y chromosome structural and functional changes in human malignant diseases. Mutat. Res. 682, 2127 (2009).
  23. Veiga, L.C.S., Bergamo, N.A., Reis, P.P., Kowalski, L.P. & Rogatto, S.R. Loss of Y-chromosome does not correlate with age at onset of head and neck carcinoma: a case-control study. Braz. J. Med. Biol. Res. 45, 172178 (2012).
  24. Duijf, P.H., Schultz, N. & Benezra, R. Cancer cells preferentially lose small chromosomes. Int. J. Cancer 132, 23162326 (2013).
  25. Eisenberg, M.L., Betts, P., Herder, D., Lamb, D.J. & Lipshultz, L.I. Increased risk of cancer among azoospermic men. Fertil. Steril. 100, 681685 (2013).
  26. Vijayakumar, S. et al. The human Y chromosome suppresses the tumorigenicity of PC-3, a human prostate cancer cell line, in athymic nude mice. Genes Chromosom. Cancer 44, 365372 (2005).
  27. Dunn, G.P., Bruce, A.T., Ikeda, H., Old, L.J. & Schreiber, R.D. Cancer immunoediting: from immunosurveillance to tumor escape. Nat. Immunol. 3, 991998 (2002).
  28. Vial, T. & Descotes, J. Immunosuppressive drugs and cancer. Toxicology 185, 229240 (2003).
  29. Vajdic, C.M. & van Leeuwen, M.T. Cancer incidence and risk factors after solid organ transplantation. Int. J. Cancer 125, 17471754 (2009).
  30. Hedstrand, H. A study of middle-aged men with particular reference to risk factors for cardiovascular disease. Ups. J. Med. Sci. Suppl. 19, 161 (1975).
  31. Lind, L., Fors, N., Hall, J., Marttala, K. & Stenborg, A. A comparison of three different methods to evaluate endothelium-dependent vasodilation in the elderly: the Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) study. Arterioscler. Thromb. Vasc. Biol. 25, 23682375 (2005).
  32. Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557572 (2004).
  33. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 16391645 (2009).
  34. Sheather, S. & Jones, M. A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. A Stat. Soc. 53, 683690 (1991).
  35. González, J.R. et al. A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data. BMC Bioinformatics 12, 166 (2011).
  36. Pique-Regi, R., Caceres, A. & Gonzalez, J.R. R-Gada: a fast and flexible pipeline for copy number analysis in association studies. BMC Bioinformatics 11, 380 (2010).
  37. Pique-Regi, R. et al. Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics 24, 309318 (2008).
  38. Rodríguez-Santiago, B. et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome. Am. J. Hum. Genet. 87, 129138 (2010).
  39. Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423425 (2012).
  40. R Development Core Team. R: A Language and Environment for Statistical Computing (Vienna, 2010).

Download references

Author information

Affiliations

  1. Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.

    • Lars A Forsberg,
    • Chiara Rasi,
    • Niklas Malmqvist,
    • Hanna Davies,
    • Saichand Pasupulati,
    • Geeta Pakalapati,
    • Ammar Zaghlool &
    • Jan P Dumanski
  2. Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

    • Lars A Forsberg,
    • Chiara Rasi,
    • Niklas Malmqvist,
    • Hanna Davies,
    • Saichand Pasupulati,
    • Geeta Pakalapati,
    • Ammar Zaghlool,
    • Erik Ingelsson &
    • Jan P Dumanski
  3. Department of Medical Sciences, Uppsala University, Uppsala, Sweden.

    • Niklas Malmqvist,
    • Eva Tiensuu Janson,
    • Erik Ingelsson &
    • Lars Lind
  4. Department of Oncology-Pathology, Cancer Center Karolinska, Karolinska Institutet, Stockholm, Sweden.

    • Johanna Sandgren &
    • Teresita Diaz de Ståhl
  5. Department of Public Health and Caring Sciences, Uppsala University, Uppsala, Sweden.

    • Vilmantas Giedraitis &
    • Lars Lannfelt
  6. Faculty of Medicine, University of Southampton, Southampton, UK.

    • Joannah Score &
    • Nicholas C P Cross
  7. HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA.

    • Devin Absher
  8. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    • Cecilia M Lindgren &
    • Andrew P Morris
  9. Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, USA.

    • Cecilia M Lindgren

Contributions

L.A.F. and J.P.D. conceived the study. L.A.F., C.R., E.T.J., E.I., L. Lind and J.P.D. were involved in study design. D.A., L. Lannfelt, V.G., C.M.L., A.P.M., E.I. and L. Lind provided materials, genotyping data and epidemiological data. H.D., S.P., G.P., A.Z. and J. Score performed wet-lab analyses. C.R., L.A.F., N.M., J. Sandgren, T.D.d.S. and J. Score implemented bioinformatics analyses. L.A.F. performed statistical analyses. L.A.F., C.R., E.I. and L. Lind were involved in survival analyses. L.A.F., N.C.P.C. and J.P.D. analyzed data. L.A.F. and J.P.D. coordinated the work and wrote the manuscript. All authors discussed the results and commented on the manuscript.

Competing financial interests

L.A.F. and J.P.D. have filed for a patent at The Swedish Patent and Registration Office protecting the commercial applications of LOY for the assessment of cancer risk.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Flowchart describing all steps in the analysis of 1,217 ULSAM participants genotyped on the Illumina 2.5M HumanOmni SNP BeadChip. (531 KB)

    This figure also summarizes major findings and refers to relevant figures and tables showing detailed results. Boxes with numbers 1 and 2 summarize the scoring of aberrations from the cohorts of 1,153 and 1,141 ULSAM subjects. Boxes with numbers 3 and 4 refer to primary survival analyses and exploratory survival analyses using 982 participants, respectively.

  2. Supplementary Figure 2: Circos plots showing the structural variants found in the entire cohort of 1,153 participants who were successfully genotyped on Illumina BeadChips. (758 KB)

    The numbers of deletions, CNNLOH events and gains are shown with red, green and blue bars, respectively. Panels a and b show 688 subjects without cancer history and 465 cases with cancer diagnoses, respectively. Panels c and d display 12 participants with a history of hematological malignancy before blood sampling and 73 individuals with diagnoses of hematological malignancy after blood sampling. Correspondingly, panels e and f illustrate data from 162 participants with diagnoses with non-hematological malignancy before blood sampling and 262 cases who received diagnoses of non-hematological cancer after sampling. Data showing LOY in panels a,b,e,f are not shown to scale (highlighted with an asterisk). The numbers of LOY events in these panels are 55, 38, 13 and 29, respectively.

  3. Supplementary Figure 3: Estimation of the percentage of blood cells affected by LOY through analysis of SNP array data from PAR1 of chromosomes X and Y using MAD software in the ULSAM cohort. (242 KB)

    PAR1 is the largest of the PARs (regions with homologous sequences on chromosomes X and Y) with coordinates 10,001–2,649,520 on chromosome Y and 60,001–2,699,520 on chromosome X. MAD software is a tool for the detection and quantification of somatic structural variants from SNP array data, which uses diploid BAF values for the identification and LRR values for the quantification of somatic variants, and is not originally intended for analyses of chromosome Y data. However, by using the correlation between LRR in the PAR1 region of the Y chromosome and dBAF (absolute deviation from the expected BAF value of 0.5 in heterozygous probes) of the PAR1 region of chromosomes X and Y (panel a), we could use MAD quantification of the diploid PAR1 region on chromosomes X and Y to calculate the percentage of cells affected by LOY (panel b) in a two-step process. For example, the dBAF value at the LRR threshold for survival analyses (mLRR-Y ≤ –0.4) can be found using the equation given in panel a (0.178). This equation (y = –2.7823x + 0.0954) is describing the relationship between mLRR-Y on chromosome Y and dBAF on chromosomes X and Y for the 1,141 subjects. Next, the percentage of cells affected by LOY can be found by applying the equation in panel b that describes the relationship between dBAF and the percentage of cells as estimated by MAD software for 14 cases (y = 1.832x + 0.023). For this example, dBAF of 0.178 translates to LOY in 35% of cells.

  4. Supplementary Figure 4: Validation of findings of LOY using next-generation sequencing for six candidate subjects. (263 KB)

    Low-coverage whole-genome sequencing was performed on 100 participants from the cohort. Among the 93 subjects with a median LRR in the male-specific region on chromosome Y (mLRR-Y, i.e., the median LRR for ~2,560 SNP probes in the region chr. Y: 2,694,521–59,034,049, hg19/GRCh37) lower than –0.139 (i.e., the threshold for frequency estimation, Fig. 2), whole-genome sequencing was performed in 6 participants. Panel a shows LRR data from the male-specific region on chromosome Y (MSY) in these 6 subjects using box plots. The rightmost box (in all panels) contains data from the 94 sequenced individuals with an mLRR-Y value above the –0.139 threshold for frequency estimation. The red lines in all panels represent the expected normal state. Next-generation sequencing data from the 6 subjects and 94 controls are plotted in panel b. The median read depth in the MSY for the 94 subjects without LOY was 1.6 (s.d. = 0.6). The corresponding read depth in the six subjects with LOY was 1.3 (s.d. = 0.5). In comparison, the median read depth on chromosome 22 was 3.8 (s.d. = 1.4) in the 94 subjects without LOY and 3.8 (s.d. = 1.2) in the 6 subjects with LOY. Read depth data were used to estimate the ploidy of chromosome 22 and the MSY region on chromosome Y in comparison with the rest of the genome using FREEC software39. The estimated ploidy is plotted in panels b and d. FREEC calculates ploidy for the regions of interest as the copy number value in each 5-kb window in the region of interest after GC content read count normalization, given a normal autosomal ploidy of 2. Panels c and d show that the copy number state on chromosome 22 is normal in the participants affected with LOY and is plotted in panels a and b, using SNP array and next-generation sequencing data, respectively.

  5. Supplementary Figure 5: Detailed individual validations of LOY in ULSAM subjects 98 and 279 using low-coverage whole-genome next-generation sequencing. (710 KB)

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  6. Supplementary Figure 6: Detailed individual validations of LOY in ULSAM subjects 531 and 693 using low-coverage whole-genome next-generation sequencing. (656 KB)

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  7. Supplementary Figure 7: Detailed individual validations of LOY in ULSAM subjects 1124 and 1238 using low-coverage whole-genome next-generation sequencing). (664 KB)

    In panels a and b are plotted SNP array and next-generation sequencing data from chromosomes 22, X and Y for each subject. LRR (blue dots) and BAF (red dots) values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to a diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy value, estimated from next-generation sequencing data, was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY. SNP array data within MSY include clusters of probes for known highly repetitive genes and loci, e.g., the TSPYA, TSPYB and RBMY1 genes16,17. Therefore, probes covering these loci do not reflect the true copy number state of chromosome Y. These regions are not included in the mapping of next-generation sequencing reads.

  8. Supplementary Figure 8: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort. (1,192 KB)

    LRR and BAF values from SNP array data as well as validation with next-generation sequencing using low-coverage whole-genome sequencing (panel c) or exome sequencing (panels a,b,d–f) with an average coverage of 17×. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red. Panels c and d show two examples of CNNLOH; in these cases, next-generation sequencing did not, as expected, detect any structural changes in these regions.

  9. Supplementary Figure 9: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort. (990 KB)

    In each panel, LRR and BAF values from SNP array data are plotted. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red.

  10. Supplementary Figure 10: Examples of autosomal structural genetic aberrations of ≥ 2 Mb in length in the ULSAM cohort. (1,006 KB)

    In each panel, LRR and BAF values from SNP array data are plotted. Triangles indicate the positions of calls for structural variants, and probes within these positions are plotted in red.

  11. Supplementary Figure 11: No validation of suggested cases of GOY using low-coverage (~5×) whole-genome next-generation sequencing. (860 KB)

    Of 100 sequenced participants, 3 had a positive median LRR on the SNP array in the male-specific part of chromosome Y (mLRR-Y), indicating possible GOY. In panels a–c are plotted the SNP array and next-generation sequencing data from chromosomes 22, X and Y for each of these three subjects. The LRR and BAF values from SNP array data are plotted overlaid, and the percentages of cells affected were calculated using MAD software35. LRR values on sex chromosomes were normalized to diploid state, and chromosome X probes (residing outside PAR regions) with ambiguous clustering (scored as heterozygotes) are excluded from analyses. The ploidy estimated from next-generation sequencing data was calculated using FREEC software39. Blue lines indicate the normal copy number state, and the red line indicates the observed LOY.

  12. Supplementary Figure 12: Results from exploratory survival analyses in the ULSAM cohort using Cox proportional hazards regression models with different thresholds for the classification of participants into groups 1 and 0 on the basis of their level of LOY measured as the median LRR in the male-specific part of chromosome Y (mLRR-Y). (158 KB)

    The number of participants (n) with LOY and the minimum percentage of affected cells for each subject are given for each of the tested thresholds. The red and blue curves represent results from analyses with cancer mortality and all-cause mortality as endpoints, respectively. Models with significant effect on mortality (α level of 0.05) are indicated by solid black triangles, and non-significant models are plotted with empty triangles. On the basis of these results, mLRR-Y at –0.4 is the most informative threshold for survival analyses in the studied ULSAM cohort.

  13. Supplementary Figure 13: LOY frequency estimation in the PIVUS cohort after accounting for experimental variation. (237 KB)

    Panel a shows median LRR in the male-specific part of chromosome Y (mLRR-Y) observed in all men (n = 488) genotyped from this cohort. Each triangle represents one participant. Panel b shows the distribution of mLRR-Y (gray bars) and the experimental noise (white bars) that were used to find the threshold for the estimation of LOY frequency. The latter distribution was generated as described in the Online Methods. The dotted black lines represent the 99% CIs of the distribution of expected experimental background noise. Among the 488 men in PIVUS, we found that 100 subjects (20.5%) had an mLRR-Y value lower than –0.154, which represents the lowest value in the distribution of experimental noise and corresponds to >13% of cells with LOY.

  14. Supplementary Figure 14: Validation of the result that men with LOY are at a higher risk for all-cause mortality in an independent cohort (PIVUS). (105 KB)

    Panel a shows the results from Cox proportional hazards regression models with all-cause mortality as the endpoint using different thresholds in the analyses of 488 men. The participants were classified into groups 1 and 0 on the basis of their degree of LOY using different thresholds for mLRR-Y, i.e., the median LRR in the male-specific part of chromosome Y. The number of participants (n) with LOY and the minimum percentage of affected cells for each subject are given for each of the tested thresholds. On the basis of these results, mLRR-Y at –0.5 is the most informative threshold for survival analyses in the PIVUS cohort. Panel b shows the results from a Cox proportional hazards regression model testing the effect of LOY on risk for all-cause mortality in 488 PIVUS men at the –0.5 threshold. The survival of men with LOY is represented by the red curve. HRs, P value, 95% CIs and the number of events are shown.

  15. Supplementary Figure 15: Comparisons of the spectrum of cancer diagnoses for ULSAM participants with and without LOY who were successfully genotyped on Illumina BeadChips and scored for structural genetic variants. (496 KB)

    The cutoff level for LOY used for these comparisons was mLRR-Y <–0.139, corresponding to LOY in >18% of cells (see Fig. 2, Supplementary Fig. 3 and the text). Panels a and b display cancer diagnoses for the entire cohort of 1,153 men with 93 participants scored with LOY and 1,060 subjects who were not scored with LOY, respectively. Panels c and d show the distribution of the cancer diagnoses in the cohort after excluding subjects with cancer before blood sampling. In the remaining 982 men, 80 subjects were scored with LOY and 902 subjects were not scored with LOY. All cancer diagnoses were grouped into 13 categories, and, for each category, the percentage of cases is shown, followed by the absolute number of patients (in parentheses) with the diagnosis category. GI, gastrointestinal; ENT, ear/nose/throat; CNS, central nervous system.

PDF files

  1. Supplementary Text and Figures (5,010 KB)

    Supplementary Figures 1–15 and Supplementary Tables 1–4

Additional data