DNA damage in circulating leukocytes measured with the comet assay may predict the risk of death

The comet assay or single cell gel electrophoresis, is the most common method used to measure strand breaks and a variety of other DNA lesions in human populations. To estimate the risk of overall mortality, mortality by cause, and cancer incidence associated to DNA damage, a cohort of 2,403 healthy individuals (25,978 person-years) screened in 16 laboratories using the comet assay between 1996 and 2016 was followed-up. Kaplan–Meier analysis indicated a worse overall survival in the medium and high tertile of DNA damage (p < 0.001). The effect of DNA damage on survival was modelled according to Cox proportional hazard regression model. The adjusted hazard ratio (HR) was 1.42 (1.06–1.90) for overall mortality, and 1.94 (1.04–3.59) for diseases of the circulatory system in subjects with the highest tertile of DNA damage. The findings of this study provide epidemiological evidence encouraging the implementation of the comet assay in preventive strategies for non-communicable diseases.

The induction of DNA damage following exposure to exogenous and endogenous agents is an important first step in the process of carcinogenesis. Replication of damaged DNA may lead to mutations or structural changes to the chromosomes, events which are critical in the development of cancer 1 . The induction of genome instability resulting from these events is considered to be one of the critical cancer hallmarks 2, 3 . In addition, most noncommunicable diseases (NCDs)-leading causes of death and disability-show characteristic phenotypes, e.g., accelerated aging, which are the result of accumulated DNA damage, telomere capping loss, and oxidative stress. DNA damage caused by oxidative reactions plays a key role in the development of NCDs 4 . The most common method for measuring DNA damage in human populations is the comet assay, or single cell gel electrophoresis. In brief, agarose-embedded cells are lysed, leaving histone-depleted nuclei known as nucleoids, containing supercoiled DNA; breaks in the DNA relax supercoiling and the relaxed DNA loops are free to migrate under electrophoresis, forming a structure resembling a comet, the relative DNA content of the comet tail indicating the frequency of breaks [4][5][6] . Because of its simplicity, sensitivity and speed, it has been widely adopted as a biomarker assay in human studies. Monitoring occupational exposure to mutagens, testing dietary supplementation with antioxidants, or checking levels of oxidative stress in relation to diverse diseases are among the most common objectives in human studies with the comet assay [5][6][7] . Extensive literature has been published on the assay (see references [5][6][7] for a description of the protocol). The assay is generally used as a marker of exposure, but given the wide range of events associated with DNA damage, the comet assay is well suited to evaluate the interaction between lifestyle factors, environmental exposures and genetic background, and as a consequence to contribute to risk assessment and the surveillance of populations at risk of NCDs. Indeed, cross-sectional studies have demonstrated that subjects with the most prevalent NCDs have elevated levels of DNA damage, measured by the comet assay, in circulating leukocytes 5,7 . However, an association of DNA damage with disease does not mean that DNA damage is the cause of disease; it could instead be an effect. The higher level of chromosome damage in cancer patients, or the pre-mortem lowering of systolic blood pressure are paradigmatic examples of this bias 8,9 . Whether the level of DNA damage in healthy individuals is a potential marker of individual risk of cancer and other NCDs is a question best resolved through cohort studies, which circumvent this problem of reverse causality. The historical cohort design, i.e., those studies in which data on the relevant events for each individual (in this case the results of assays measuring genetic damage) are collected from existing records and can immediately be analyzed, has proven to be the best approach for the validation of genetic biomarkers as risk predictors. Examples of biomarkers of genomic instability that have been linked to the risk of cancer with this study design include the frequency of chromosomal aberrations and micronuclei in peripheral lymphocytes 10-16 . Other biomarkers such as DNA adducts have been positively investigated using a nested case-control design with analysis of stored samples 17,18 , although the historical cohort approach remains the most appropriate for those biomarkers that are preferably evaluated on fresh tissue.
Modern epidemiological study of cancer and other chronic diseases depends increasingly on the use of large datasets assembled within the framework of international collaborative studies. This is also the case of the Com-Net initiative which started in 2011 gathering a network of over 100 laboratories working with the comet assay 5,18 . ComNet is now subsumed in the COST Action hCOMET, which has created a database consisting of individual comet assay data from nearly 20,000 subjects, contributed from 105 studies performed in 44 laboratories between 1999 and 2019. A detailed description of the database, including the list of participating laboratories and the baseline frequency of the assay descriptors has been recently published 19 . All laboratories contributing data to the hCOMET database were invited to participate to a historical cohort study, assuming that they had access www.nature.com/scientificreports/ to individual data required for a follow-up of cancer incidence and mortality, and that-after receiving ethical approval-they could interrogate local/national cancer registries or registries of causes of death. The aim of the hCOMET cohort study was to provide an estimate of the association between DNA damage-as measured by the standard comet assay in healthy individuals-and overall mortality, mortality by cause, and cancer incidence for those laboratories with access to cancer registry data.

Results.
To estimate the risk of overall mortality, mortality by cause, and cancer incidence associated with DNA damage, a cohort of 2,403 healthy individuals (25,978 person-years) screened in 16 laboratories using the comet assay between 1996 and 2016 was followed-up. Overall, 308 deaths were registered by the end of the follow-up period (median duration 10.8 years). The most common causes of death were malignant neoplasms (108), and diseases of the circulatory system (77). As regards cancer, gastro-intestinal (20), and genito-urinary (31) were the most frequent locations. One hundred and fifty-one cancer cases were recorded, mostly from laboratories EU31 (31 cases) and EU33 (101 cases).
The DNA damage level measured by the comet assay is reported as DNA migration, using either length of the comet tail (TL), percentage of fluorescence of DNA in the comet tail (%T), the product of the TL and %T (i.e. tail moment, TM) or visual score (VS, based on categorization of the comets into different classes) as primary comet assay descriptors. A preliminary comparison of DNA damage levels between the groups of deceased, incident cancer cases, and subjects still alive is reported in Tables 1, 2. Mean values of the comet descriptors TM, TL, VS and %T were higher for the groups of deceased and cancer cases, although a statistically significant difference with subjects alive was only reached for %T in deceased (18.0 vs 9.0; P < 0.001) and in cancer cases (17.7 vs 12.4; P < 0.019), respectively. The association between mortality risk and tertile of DNA damage was further investigated with the Kaplan-Meier analysis, which indicated a worse outcome in subjects in the medium or high tertile of DNA damage, a result which was consistent over the follow-up time, as shown by Fig. 1  www.nature.com/scientificreports/ A Cox's proportional hazard model evaluating the association between overall mortality and tertile of DNA damage was applied to the whole database, and results are reported in Table 3. The adjusted Hazard Ratio (HR) increased with the tertile of DNA damage (p for trend = 0.02) reaching statistical significance for subjects in the highest tertile when compared with the lowest tertile (HR = 1.42; 95% CI = 1.06-1.90; P = 0.02). Most of the other parameters included in the model, including age, smoking habit, and the presence of NCDs, showed significantly increased mortality risks, providing an intrinsic validation to study findings. Not surprisingly, the presence of NCD was the strongest predictor of mortality (HR = 13.2; 95% CI = 7.12-24.5; P < 0.001). The test for interaction and the consistency in all strata of the risk for the highest tertile of DNA damage ( Table 4), confirmed that the presence of genotoxic exposure or a chronic disease did not modify the association between DNA damage and mortality rates. A complementary analysis of data shown in Table 4 using the medium tertile in the group of unexposed controls as the referent value for the interaction analysis confirmed (although on the basis of small numbers) that also in this group subjects in the highest tertile of DNA damage had the highest risk of death (HR = 6.07; 1.33-27.7; P = 0.02) (Supplementary Table 1). Increasing trends of HR were found also in occupationally exposed subjects, and especially in subjects affected by chronic diseases. An additional analysis evaluated the association between DNA damage and mortality risk after removing deaths due to accidents. The association was a little stronger, with a HR = 1.25; 0.90-1.73 for the medium tertile and a HR = 1.52: 1.11-2.08 (p = 0.010), for  www.nature.com/scientificreports/ the highest tertile. No effect modification due to smoking habit could be found (p = 0.959), and no increase due to interaction was found in the group of subjects in the highest tertile of DNA damage who smoked. To evaluate the consistency of the observed effects over the different laboratories, a cohort-specific analysis was performed. The small number of events in most cohorts allowed a proper analysis only in a few datasets. Results referring to groups with more than 10 deaths, i.e., EU19, EU31, EU33, and EU8 are reported in Table 5. An increased mortality risk associated with DNA damage was present for the highest tertile in all cohorts but EU19, although no single laboratory reached statistical significance. To evaluate if the overall association between DNA damage and mortality was driven by a specific cohort, a sensitivity analysis was performed, sequentially removing a cohort from the analysis. The HR for the highest tertile of DNA damage, ranging from 1.33 to 1.51, remained significantly increased except when EU17, EU31 and EU33 were removed, though no major changes in the HR were observed. The HR for the medium tertile ranged from 0.77 to 1.31, was never significant, and was only once below 1.0, when EU33 was removed. The analysis for specific causes of death showed the strongest association for the subgroup of gastro-intestinal cancers (medium tertile; HR = 3.63; 0.71-18.50, and high tertile; HR = 7.45: 1.55-35.8, P = 0.012). In the group of genito-urinary cancers, the association was more evident for prostate and bladder cancers (high tertile; HR = 1.99: 0.76-5.20). A significant association with diseases of the circulatory system was found with the highest tertile of DNA damage (high tertile; HR = 1.94; 1.04-3.59, P = 0.036) ( Table 6). In general, the small number of events limited the statistical power of these comparisons and the causes of death that could be studied.
To investigate if the proximity to the end of follow-up might have affected the strength of the association a sensitivity analysis was performed stratifying the results by distance from the final event using a cut-off of five   Table 2).

Discussion.
The results of the present study provide support for the hypothesis that an increased level of DNA damage represents a relevant event in the pathway leading to chronic disease and eventually to death. In addition, the use of circulating leukocytes suggests that DNA damage can be suitably measured in this surrogate tissue to estimate mortality risk. The observation that the association between DNA damage and risk of death is not dependent on the proximity with the outcome, i.e., is not driven by the possible presence of the disease, is consistent with the hypothesis that measuring DNA damage in healthy subjects at any time may predict NCD and death. The small size of the cohort calls for longer follow-up period, and the heterogeneity of the primary comet assay descriptors suggests that further insight into collinearity may reduce the risk of misclassification. Genome instability represents an enhanced accumulation of mutations, and has been recognized as a hallmark of several chronic diseases and ageing 20 . The comet assay is considered among the most suitable methods for  www.nature.com/scientificreports/ monitoring DNA damage in population studies. It is a simple, fast, accurate, inexpensive method that requires small sample volumes and can be performed on frozen blood or buffy coat 25 . This assay fulfils all criteria for routine use in large-scale molecular epidemiology studies and clinical disease management, especially with the recent development of high throughput versions; in addition, modifications of the protocol allow the detection of a variety of lesions and DNA repair activity 21 . There have been many reports over the years of elevated DNA damage levels (strand breaks and/or oxidized bases) associated with a wide range of clinical conditions such as coronary artery disease, diabetes, kidney disease, chronic obstructive pulmonary disease, multiple sclerosis, and Alzheimer's disease 4,7,[22][23][24][25] . DNA damage and DNA repair are important elements in the etiology of cancer and in its treatment; Vodicka and colleagues 26 reviewed the DNA damage information available for 17 types of cancer emphasizing the sensitivity and cost-effectiveness of the comet assay in evaluating DNA damage. Several studies have been reported on inflammatory disorders, e.g., arthritis, inflammatory bowel disease, where DNA oxidation damage is often accompanied by depressed antioxidant status 27, 28 . Russo et al. have tested the potential application of the comet assay as a prognostic tool for patients with chronic obstructive pulmonary disease (COPD) undergoing pulmonary rehabilitation 25 . Likewise, Corredor et al. demonstrated in a cohort of hemodialysis patients that deceased subjects at the end of the follow-up period (approximately four years) had higher level of DNA strand breaks at baseline, i.e., at the entry of the study, than subjects still alive 29 . Despite the strong evidence from cross-sectional studies linking the presence of several chronic diseases with an increased level of DNA damage, and the extensive literature on mechanisms, clearly showing that elevated levels of DNA damage are associated with several NCDs, it remains a challenge to prove that DNA damage is a cause rather than an effect of the disease. The most suitable epidemiologic approach to evaluate long-term effects when the biomarker may be affected by the disease, i.e., reverse causality bias, is the cohort study, generally with a nonconcurrent design when biomarkers are involved. The observation reported here of an association between DNA damage and overall mortality lends support to a possible use of DNA damage as risk biomarker, envisioning public health implications, environmental and occupational preventive strategies, and even diagnostic use in clinics. The higher risks of death observed for those covariates known to be associated with mortality, i.e., age, smoking habit, and the presence of chronic diseases, gave an intrinsic validation to results on DNA damage biomarkers. The use of a composite endpoint represented by the tertile of any descriptors measured in the original studies (when tail intensity was not available) is fully justified by the common target of different descriptors, i.e., measuring the extent of DNA damage. This choice was further supported by 1) the presence of correlation between different descriptors of the comet assay reported in the literature and confirmed in the hCOMET dataset, and 2) the substantial overlapping of results between %T and the composite endpoint. On the other hand, the positive association between the composite endpoint and the risk of death and NCD may have been attenuated by the use of TL, which in many instances has been demonstrated to have a lower sensitivity in identifying DNA damage by known genotoxic agents. 19 . The underlying explanation is that tail length increases with DNA break frequency only up to a certain point -corresponding to a quite low level of damage. This reflects the fact that the comet tail is formed by broken DNA loops extending towards the anode, and so the tail length -once established -is defined by the length of the DNA loops. In most laboratories using software image analysis, the TL has a rather narrow dose-response relationship for direct strand-breaking agents such as ionizing radiation 30 . The dynamic range of the TM is mainly influenced by %T as the TL does not differ much. Thus, the TM typically shows a linear dose-response relationship with ionizing radiation 31, 32. A major concern was the possibility that the overall association with mortality was modified by the presence of exposure to DNA-damaging agents, or by the presence of disease. Although this hypothesis can be properly checked only with ad hoc studies 11,33 , the interaction analysis reported in Table 4 and Supplementary Table 1 suggests that an increasing trend in the risk of death by tertile of DNA damage was present in all groups of subjects evaluated, i.e., non-exposed subjects, occupationally exposed subjects, and NCD patients. As regards smoking, the lack of interaction with DNA damage level makes it unlikely that the higher risk of death associated to the tertile of DNA damage could be explained by this habit. Another potential source of confounding or of effect modification was the presence of several sub-cohorts. This issue was addressed with a sensitivity analysis, which demonstrated that no single sub-cohort could change the overall results.
The presence of selection bias is a critical issue in retrospective cohort studies. Subjects may know about their condition, and in some cases, subjects exposed to the disease have a higher interest in participation, especially if the study question was controversial. This may result in a biased assessment or in a selective loss to followup associated to exposure. This is hardly the case in the present study, not only because of the limited number of losses, but because subjects (and investigators) were not informed of the level of DNA damage, which was assessed only during statistical analysis of data, minimizing the possibility that results are affected by selection bias.
Results on specific causes of death were meaningful only for most common causes such as cardiovascular diseases and malignant neoplasm. The strongest association was found for gastro-intestinal cancer, showing a suggestive parallelism with results from similar analyses reported for the micronucleus assay and chromosome aberrations 10- 13 .
This study has a number of strengths, and several weaknesses. The major strength is the cohort design, which is the least affected by bias, especially the reverse causality bias, and has been used for the validation of several biomarkers before. It should be noted also that despite the evidence coming from cross-sectional studies, and the well-known mechanistic models, no epidemiological evidence has been published so far, demonstrating that the extent of DNA damage in healthy subjects may predict the risk of death and NCD, including cancer. As regards limitations, the small size of the cohort and the heterogeneity of the comet assay descriptors are surely the most critical, although the standardization based on tertiles is a well-established approach to deal with laboratory variability. Besides epidemiological weaknesses, there is the need to substantiate the mechanistic link between www.nature.com/scientificreports/ early genomic damage and such a long-term event as overall mortality. On the other hand, the presence of extensive evidence linking DNA damage to NCDs such as cardiovascular diseases and cancer, which in turn are the major causes of death, suggests a plausible pathway linking the extent of DNA damage to the risk of death. The inactivation of tumor suppressor genes or the amplification of oncogenes are well known consequences of DNA breakage, a process leading to the formation of the atherosclerotic plaque [34][35][36] . Another limitation is the possible effect of a decline in DNA repair capacity associated with ageing, oxidative stress and inflammation, which was not evaluated. It should be noted that the DNA repair activity may also be influenced by the same exposures that cause DNA damage, i.e., smoking, dietary factors as well as genotoxic agents in the environment and occupational settings. Thus, upregulation of the DNA repair activity may occur concurrently with a high rate of genotoxic insults due to external exposures.
The results of this analysis suggest a new role for the comet assay as a tool to measure risk biomarkers. The sensitivity analysis that showed that DNA damage can predict the risk of death independently from the duration of the follow-up (Supplementary Table 2) encourages further research on the use of the comet assay in long-term strategies aimed at preventing the risk of diseases as well as in pre-clinical applications.
In conclusion, the findings of this study strengthen the existing evidence that the level of DNA damage in circulating leukocytes of healthy individuals may be predictive of the risk of chronic diseases and mortality, reflecting events such as accelerated aging, telomere capping loss, oxidative stress and more generally genomic instability. These results disclose a number of issues relevant to both ethics and public health policies. However, applicability at the individual level has still to be established because of the low absolute level of risk observed and the large inter-individual variability of these results, not forgetting the role of individual parameters such as DNA repair competence. Nevertheless, from now on the use of assays measuring DNA damage to predict the risk of NCD diseases and mortality in population groups exposed to environmental or occupational risks or bearing a susceptible genetic profile should be considered for inclusion in public health strategies.

Material and methods
Study design and cohorts. The study design replicated the approach followed by similar studies 10-16 . All 44 laboratories contributing data to the hCOMET database within the framework of the hCOMET EU COST Action CA15132 were invited to participate to a pooled historical cohort study. The criteria for inclusion of laboratories in the full-scale study included 1) having had data published in peer-reviewed literature reporting the comet assay protocol used, and 2) providing evidence that personal identification was available for all subjects and the mortality follow-up was feasible through linkage with national/local cause of death registries. Data on cancer incidence were recorded when contact with cancer registry was available. A detailed description of the protocol used for the assay was collected from each laboratory submitting a database and evaluated by the hCOMET steering committee for compliance to standard methodology 6 . In brief, subjects who underwent DNA damage screening with the comet assay applied to circulating leukocytes for biomonitoring purposes were followed from the date of the assay until the date of death, emigration, or the end of a defined follow-up period (December 2017-January 2019, depending on the country), whichever occurred first. The median duration of the follow-up was 10.8 years. Only adult subjects with valid demographic data, without a previous cancer diagnosis at the time of test were included. Overall, 2,403 subjects, corresponding to 25,978 person-years, were contributed to the hCOMET cohort by 16 laboratories from Brazil, Italy, Poland, Russia, Slovakia, Norway, Spain, Croatia, Serbia, Cuba, France, Turkey, and Hungary. All laboratories scored a minimum of 100 comets per individual sample, and all subjects were sampled only once. More detail on the characteristics of the cohort is reported in Table 1. In the large majority of the cohort (nearly 98%), information on cause-specific mortality was obtained through linkage with local/national registries or with hospital records. In the remaining 2% an active system of follow-up for mortality was set up via personal knowledge. Overall, 308 deaths were registered and coded using the ICD-IX classification system. Data on cancer incidence (151 cases) were extremely heterogeneous within different laboratories, with results substantially based on two laboratories, EU31 and EU33, and were affected by a generally poor quality of cancer diagnosis (40.1% of cases were assessed via medical records or by personal knowledge). The limited reliability of case-assessment, and the heterogeneous report of confounders by different centers prevented any further analyses of data relating to cancer incidence, which have not been reported among the results. All individual studies contributing data to the hCOMET cohort had already received approval from local ethical committees, and the informed consent for the collection and analysis of individual coded data was signed by all study subjects. The analysis of individual coded data was run in accordance with the Declaration of Helsinki and with the measures of the General Data Protection Regulation (EU) 2016/679 (GDPR). The pooled analysis of data was approved by the ethics committee of the IRCCS San Raffaele Roma, Rome, Italy (12 December 2015, Prot. N. 10/15), the centre coordinating data collection and running the statistical analysis of data.

The hCOMET database
More detail about the study design and the quality procedures implemented in each study can be found in the original papers (Supplementary list of references) and summarized in the previous publications of the consortium, 5,19,37 which included data from 19,350 subjects, contributed by 44 laboratories. In addition, available information was collected concerning demographic parameters, lifestyle, occupational exposure, smoking, diet, genetic profile, and diagnoses of chronic diseases. To take into consideration the extensive heterogeneity of methods used in different laboratories, special care has been given to the hCOMET questionnaire collecting protocol description and technical features. Separate analyses were performed for each descriptor of the comet assay, namely tail intensity (%T), tail length (TL), tail moment (TM), and visual scoring (VS) to compare the different sensitivity of the four descriptors. To standardize for the large interlaboratory variation, mostly due to www.nature.com/scientificreports/ exposure to genotoxic agents, cell type (isolated lymphocytes or whole blood), and sample processing (fresh/ frozen) 19 , all subjects were classified within each laboratory according to tertile frequency of each descriptor as follows: low (1-33 percentile), medium (34-66 percentile) or high (67-100 percentile). The approach based on tertiles was the most efficient according to the Akaike information criteria when compared in the pooled database with other methods of standardization such as z scores and deviation from the mean 13,38 . To increase the number of events suitable for analysis, the %T sub-cohort, the descriptor with more observations, and generally considered the most reliable, 5,37 , was enriched with data from other descriptors, creating a composite endpoint that represented the primary endpoint of the analysis. Whenever %T was not measured in the original studies, but other descriptors were available, the tertile of DNA damage was estimated using the mean value of tertiles from the other markers, rounded to the nearest integer. This approach was based on the assumption that all descriptors of the comet assay provide suitable measure DNA damage within each of the included studies.

Individual studies.
The individual studies contributing data to the cohort are listed in the Supplementary list of references. Sample size ranged from 17 to 738 subjects, and studies were performed between 1996 and 2016. Information on age and sex was available for all subjects, while smoking status at the time of testing was available for 93.0% of the study subjects, with 32.2% declaring to have smoked during their life. The number of cigarettes smoked was available only for a small proportion of subjects. The number of subjects considered exposed to genotoxic agents or affected by disease in the original studies represented 66.8% of the whole cohort. Nine laboratories investigated various occupational exposures, four studied patients affected by chronic diseases, and three studied non exposed subjects. A description of all exposure groups by laboratory and by type of exposure/disease appears in the Supplementary Table 3.

Statistical analysis
The Kaplan-Meier product-limit method was applied to estimate the survival time after comet assay testing by different tertiles of DNA damage. Survival curves were compared by the log-rank test 39 . The association between events and the tertile of DNA damage was modeled according to the Cox proportional hazard regression model, adjusting for age at test (years), sex (M/F), smoking habit (yes/no), and occupational exposure/disease (unexposed/controls; occupationally exposed, affected by disease). Diagnostic tests did not detect any substantial violation of underlying assumptions of the Cox regression. To evaluate whether the overall association between DNA damage and overall mortality was driven by a specific cohort, a sensitivity analysis was performed sequentially removing cohorts from the analysis. The potential effect modification due to the presence of an occupational exposure to DNA damaging agents or of a chronic disease was explored by means of the likelihood ratio test 40 . Due to the limited number of cases in the lowest and medium tertiles of unexposed/control subjects, the pool of all control subjects was used as the referent value for the interaction analysis. Additional analyses were performed by laboratory and by specific cause of death. Finally, to provide an additional control for confounding, data were analysed after applying the propensity score matching, a statistical matching technique that attempts to estimate the effect of exposure by accounting for the covariates that predict exposure. Results of this analysis overlapped almost perfectly those from traditional regression analysis (data not shown), and therefore only the latter have been reported. To test the possibility that the likely presence of chronic diseases could have influenced the risk associated with the tertile of DNA damage, we stratified the entire cohort according to the median follow-up time, i.e. 0-5 and > 5 years, and tested a possible effect modification due to time since test. Data and code availability. The database supporting the current study has not been deposited in a public repository because of restriction of the ethical clearance (sensitive information), but are available from the corresponding author on request.