Introduction

Silicosis is one of the most important occupational diseases worldwide1. It is characterized as a progressive and irreversible lung disease caused by occupational inhalation of dusts. Silicosis is a preventable disease but its prevalence in China has been keeping increasing over the last decade2. A total of 23,152 new cases of pneumoconiosis was diagnosed in 2013, accounting for 87.72% of all reported occupational diseases in China, of whom about 35% were confirmed to be silicosis (8,095 cases)3. The occurrence of silicosis in China were resulted from prolonged exposures to high levels of silica dust from a variety of industrial activities involving mining, rock drilling, construction activities, steeling rolling, foundry work and abrasive blasting with silica-containing materials. Despite certain engineering and administrative control of measures (e.g., increasing ventilation, wet process and mask provision) have been adopted in many workplace to reduce workers from the contact of exposure, high occurrences of silicosis and the subsequent premature mortality were frequently observed among tin, tungsten and gold miners4,5. Silicosis is an incurable disease with irreversible progressive nature and all treatment strategies are relevant to postponing the progression and control of subsequent medical conditions6, mainly from secondary tuberculosis, functional disability, autoimmune diseases, or lung cancer7,8. The limited value of current treatment toward silicosis leads urgent needs for developing risk prediction models of silicosis for guiding better disease prevention and control among workers exposed to silica dust in China.

Risk prediction of silicosis is an approach of developing statistical models to estimate the probability of developing disease over an adequate time period (latency) that helps clinicians identify workers at higher risk of silicosis, which allows for an earlier counseling of risk factor modifications to decrease risk. Risk prediction model research has been well established in absolute cancer risk prediction, such as breast cancer9,10,11,12,13; however, it is a relatively new area in the occupational health studies14. Although a few of occupational epidemiological studies had explored prediction models to predict the probability of the occurrence of pneumoconiosis or work-related illnesses (e.g., shoulder pain, occupational allergic disease)15,16,17,18,19, most models were developed based on cross-sectional studies and only the Australia study explored the ‘prognostic model’ to project the occurrence of silicosis and lung cancer in the next 40 years but no discriminatory power was reported15.

This study aims to develop a risk model to predict the probability of silicosis occurrence among 3,492 workers exposed to silica dust in China who had been followed up over 33 years. Results from this study provide new insights for risk prediction models among workers exposed to silica dust and thereby offer significant contributions for guiding strategic measures for silicosis prevention in occupational health practice.

Results

By the end of 2008, we observed 298 incident cases of silicosis with a crude cumulative incident rate of 8.53% after an average of 32.90 ± 8.70 years of follow up with a follow-up rate of 95%, contributing an overall 125,814.69 person-years of observation. As described in Supplementary Table 1, a total of 1,347 workers (38.57%) died and the average age at death was 63.13 ± 12.59 years old. Over 96% of workers were married but more than half of them were illiterate. There were 77.38% workers who had ever smoked (current smoker, 43.44%; former smoker, 33.93%). The average age at first exposure to silica dust was 23.73 ± 6.36 years, while the age at entry into cohort was 27.62 ± 7.81 years, which indicated that some workers had dust exposure prior to their entry into the cohort. The mean concentration of respirable silica dust was 0.08 ± 0.04 mg/m3 and the net years of dust exposure was 24.03 ± 9.09 years. Overall, more than half of workers had 2 or more jobs experienced (38.43% had 2 jobs and 23.05% had 3 jobs or more) in the iron ore.

Table 1 presents the results of univariate Cox regression analyses of the potential predictors for the risk of silicosis, including exposure related variables (age at first exposure to silica dust, age of entering the cohort, average exposure to respirable silica dust, cumulative dust exposure time), jobs experienced in the iron ore, socio-demographics (education level, illiteracy; marriage status), smoking habits and history of lung diseases (pulmonary tuberculosis, chronic bronchitis and asthma). Six predictors (Table 1) identified to have significant contributions to the risk prediction were age at entry of the cohort (Wald χ2 = 155.06, p < 0.001), mean concentration of respirable silica dust exposure (Wald χ2 = 282.95, p < 0.001), net years of dust exposure (Wald χ2 = 110.52, p < 0.001), no. of jobs experienced in the iron ore (Wald χ2 = 43.77, p < 0.001), ever smoking (Wald χ2 = 5.08, p = 0.024) and illiteracy (Wald χ2 = 60.75, p < 0.001). Results from Schoenfeld residual plots showed that the hazard ratios of all the predictors (Table 1) were likely to be proportional, while the smoothing-spline curves fluctuated around zero; this phenomenon suggested that the assumption of proportionality was satisfied for all the predictors included in the Cox model (data not shown).

Table 1 Predictors selection for the prognosis of silicosis using univariate Cox regression model.

We included six significant predictors into the multivariate full Cox model, stepwise selection procedure (0.05 for entry and 0.051 for removing the variables) and the LASSO method for further selection. LASSO model is a shrinkage method which was applied to deal with model’s over-fitting and resulted in a greater reduction in the magnitude of coefficients for the weak predictors than those for the strong predictors. Results of comparing coefficients of models of all predictors (Model1) and two selection methods (stepwise selection procedure, model2; the LASSO method, model3) are summarized in Table 2. Six significant predictors (obtained from the Table 1) were all retained in the Cox model using stepwise selection and LASSO selection approach. Although the coefficient of each selected predictor varied less by using three different selection models, we prefer the LASSO coefficients because the LASSO method reduced coefficients more for the predictors with a weaker association than the stronger ones and also presented a smallest AIC value (AIC = 4937.97, Akaike information criterion)20,21.

Table 2 Comparing coefficients of full entry, stepwise and LASSO Cox Regression models for risk prediction of silicosis.

A bootstrap re-sampling method was applied to generate a large bootstrap dataset22, which allows all predictors included in the model becoming statistically significant when 200 bootstraps datasets for LASSO selection was applied. The coefficients of age of entering the cohort, mean concentration of respirable silica dust exposure, net years of dust exposure, no. of jobs experienced in the iron ore and illiteracy were virtually not affected by the LASSO method and the results were consistent with their 100% selection in the bootstrap samples. Smoking was selected in 94% of the bootstrap samples. Further analyses revealed that the coefficients for age at first exposure to silica dust, unmarried, pulmonary tuberculosis, chronic bronchitis and asthma were shrunk towards zero and this provide evidence that the unselected variables (as shown Table 1) had less contributions to the risk prediction of silicosis occurrence. These simulation results showed a clear consistence between the estimated effect of a predictor according to the original LASSO selection and the bootstrapped results.

Figure 1 demonstrates the results of risk score analysis from the final model by LASSO selection (Model3) incorporating with a linear combination of the six risk predictors weighted by their Cox regression coefficients. The mean of the risk score was 6.53 ± 1.15 and 4.87 ± 1.23 among workers with and without silicosis, with an average score of 5.01 ± 1.31 for the entire all cohort members. The predictive accuracy of this score system was regarded as accurate when it was determined by the time-dependent ROC (Receiver Operating Characteristic) curves (Fig. 2), given an AUC (Area Under the Curve) of more than 0.80 (mean ± s.d.: 0.86 ± 0.40; range: 0.83–0.96) at any time point over the entire follow-up period.

Figure 1
figure 1

Distributions of risk score of silicosis calculated with a linear combination of the selected predictors weighted by the Cox regression coefficient.

Note: low risk group (risk score < 3.97), high risk group (risk score >= 5.91).

Figure 2
figure 2

Time-dependent ROC analysis. Figure shows the time-dependent Receiver-Operator Characteristic (ROC) analysis for the risk scores of silicosis along the follow-up time (AUC: Area Under Curves).

We further split the risk score for silicosis into three subgroups according to its quartiles (Q1, Q3): low risk group (risk score < 3.97), moderate risk group (3.97 ≤ risk score < 5.91) and high risk group (risk score ≥ 5.91). In the iron ore cohort and based on the classification of this score system, 875 workers were classified into the low risk group with 4 (0.46%) silicosis cases, 1,745 workers were classified into the moderate risk group with 91 (5.21%) silicosis cases and 872 workers were classified into the high risk group with 203 (23.28%) silicosis cases.

Discussion

This retrospective cohort study with a long follow-up duration (33 years) demonstrated that the risk to silicosis for workers in the iron ore mining could be well predicted based on socio-demographic data and exposure related risk factors according to a newly developed risk score scheme. Our results are thus the novel findings that provide scientific evidence to guide clinicians to differentiate workers at high risk from those at low risk so that targeted prevention can be applied cost efficiently.

The significant predictors ruled in this study are age at the entry of cohort (which stands for age of first exposure to silica dust), smoking, illiteracy, no. of jobs experienced in the iron ore, mean concentration of respirable silica dust exposure and net years of dust exposure. These findings are consistent with many occupational epidemiological studies that a dose-response relationship was indicated between the occurrence of silicosis and the mean concentration of respirable silica dust exposure or net years of dust exposure23,24,25,26,27. There is evidence on a synergistic effect between crystalline silica dust and smoking to enhance the risk of silicosis and lung cancer25,27 and this provides supportive evidence for inclusion of smoking as an independent predictor in our newly developed score scheme to predict the risk of silicosis. On the other hand, age at the first exposure of silica dust might have contributed to the risk of silicosis occurrence, but it was not a significant predictor which was thus not included in our score system. Among 2,396 workers who had past dust exposure prior to the entry of the cohort, the mean dust concentration was quite similar among workers with different age at first dust exposure (Supplementary Table 2A & 2B), which explained the statistical non significance for the relationship between age at first exposure and risk of silicosis, a further possible explanation for this situation should be that those workers were enrolled into this cohort while they were assigned to relatively high-exposure tasks.

Previous studies reported that chronic obstructive pulmonary disease (e.g., chronic bronchitis, emphysema) was associated with a prolonged dust exposure23,24,25,26,228,29,30. A study in Mongolia reported that dust-induced chronic bronchitis and silicosis accounted for 67.8% of occupational diseases24. Prolonged exposure to silica dust may lead to airflow obstruction causing chronic bronchitis and emphysema. We included histories of chronic bronchitis diagnosed by doctors as an independent predictor in the Cox model for the risk prediction of silicosis, but it was not statistically significant. It is likely that insufficient cases of chronic bronchitis observed throughout the follow-up period yielded an inadequate power for detection of a significant result based on the context of our current study. On the other hand, biomass burning is an important indoor air pollutant in the remote areas of China, in particular in the early years of follow-up. Exposure to fumes or particular matters emitted from the biomass burning may be associated with the occurrence of respiratory disorders (e.g., chronic bronchitis) or even enhances the risk of silicosis occurrence. We did not collect information on the use of biomass burning and it thus posed another limitation to the current study.

We developed an unique score system for risk prediction of silicosis using shrinkage estimates with LASSO method in fitting Cox model that enables to adjust for model’s over fitting and avoid extreme predictions21; this approach had been applied in a number of case studies31,32,33 but it is less applied in occupational epidemiological studies. We recognized that results from bootstrap re-sampling procedure presented a good internal validity for the LASSO model applied in this study. On the other hand, risk score values developed from our study are evident to have a good predictive accuracy for prediction the occurrence of silicosis in this study (the AUCs ≥ 80%). Moreover, coefficients of the LASSO model transferred to risk score values demonstrated more convenient application for the clinicians in their routine practices and consulting patients at different levels of risk to silicosis, which offers another major advantage of the current study. Nevertheless, there are concerns on the model’s generalizability, as we used cohort data collected from an iron ore to fit the prediction model, which may not represent the whole profile of silica dust exposure workforce in different industrial circumstances. Further validation and calibration of our score system in other industrial settings with large population is warranted.

In conclusion, this study developed a unique risk score system to classify silica dust exposed workers into different risk to silicosis using a LASSO penalized Cox model, with a good internal validity and high model discrimination. This newly developed score system provides clinicians scientific guidance and convenient approaches to identify high-risk workers in routine consultation, thus has important cost efficient implications. Nevertheless, further larger well-designed cohort studies are warranted to validate our results, while cautions should be born in mind when applying the results from our study to other industrial settings.

Materials and Methods

The cohort and follow-up

The detailed information for the study design and subjects recruitment has been described in the previous studies4. In brief, a total of 3,658 workers who had ever been exposed to silica dust were enrolled in this retrospective cohort from an iron ore mine in China from January 1, 1964 through December 31, 1974 and followed up from 01/01/1964 to 31/12/2008. We included all workers in this cohort with a history of exposure to crystalline silica dust who had worked at least 1 year in the iron ore mining and excluded workers if she was a female worker (139 individuals), or the male workers who had sufficient evidence of silicosis at the entry of cohort (15 individuals) or he had already died at the entry of the cohort (12 individuals). Eligible subjects finally included in the study were 3,492 workers who had completed information on socio-demographics, lifetime occupational silica dust exposure and a history of previous diseases diagnosed by the medical doctors. All study subjects provided informed consent and the Research Ethics Committee of Tongji Medical College of the Huazhong University of Science & Technology approved all procedures and all experiments were conducted in accordance with the approved guidelines.

Diagnosis of silicosis and important predictors

The diagnosis of silicosis was originally made by a diagnostic expert panel comprising of three readers according to the Chinese National Diagnostic Criteria of Pneumoconiosis 1986 (i.e., the routine diagnoses)34, hereas all the chest radiographs were re-evaluated by a panel of experts using the Diagnostic Criteria in 200935 and showed a good agreement between the two diagnostic criteria. Following the criteria, silicosis was classified into three stages: stage one, stage two and stage three, which was very similar with the ILO (International Labour Organization) classification36. Silicosis stage one was approximate to the category 1, silicosis stage two was similar to the category 2 and silicosis stage three was equal to the category 3 in concurrence with the appearance of large opacities in ILO classification. Historical industrial hygiene data were collected from the periodical monitoring data for total dust for the period 1964–2008. The total dust concentrations and exposure duration (number of hours) per shift were summarized for each job title in a 3-year interval during 1950–1986 and then measured on an annual basis afterwards. There were 33 job titles confirmed to have exposure to silica dust, number of jobs experienced for workers were included as a measurement for different dust exposure. The concentration of total dust was monitored in the iron ore three times per month by the local industrial hygienists with an overall 55,333 monitored sampling data in total for the period 1964–2008. The average concentration of total dust for each specific job title in a defined calendar year was calculated by taking arithmetic mean of all the samplings of the specific job measured in the corresponding year. It has been noted that workers used half-face respirator as personal protective equipment in the recent decade but only the wet process was applied in the early years of follow-up.

The local industrial hygienist performed the direct measurement for total dust by placing the samplers in a location near workers’ routine practice in a typical working day and then collected dust samples for a 15–20 minutes interval when the work was in progress according to the Chinese National Sampling Regulations37. All samples collected were then weighed to estimate the total dust level in mg/m3; meanwhile the occupational hygienists transformed the total dust level to the concentration of crystalline silica dust for each specific job title according to a validated approach38.

Statistical methods and prediction model specification

We used the Visual FoxPro (VFP) and Microsoft excel software to establish datasets. We then doubly checked the data and cleaned accordingly using SAS software (9.1.3; SAS Institute, Cary, NC). We transformed relevant continuous variables to the categorical scale for clinical use16,17,39. As shown in Supplementary Figure 1, we performed three steps to assess the risk prediction model for the occurrence of silicosis following the principles of Clinical Prediction Model proposed by Ewout W. Steyerberg21 Step I - Predictors selection. We performed univariate Cox regression analyses to screening out potential predictors by estimating the Hazard Ratios (HR) and their 95% Confidence Intervals (CIs), predictors with p < 0.05 will be retained and evaluated further. Three models were performed to check the potential multicollinearity among predictors: full model predictor selection (Model1, all predictors entered into the Cox model at the same time), stepwise Cox model selection (Model2) and Least Absolute Shrinkage and Selection Operator (LASSO) selection (Model3). The LASSO method was used to shrink the coefficients toward zero by setting a constraint on the sum of the absolute standardized coefficients20,40. Schoenfeld residual plots were used to assess the assumption of proportional hazard ratio according to Harrell’s rho results. If the assumption of proportional hazard ratio was not fulfilled, the extended Cox model was carried out for the development of prediction model41,42,43. Step II - Risk model construction. We selected the optimal model from the Step I for risk model construction according to the smallest Akaike Information Criterion (AIC). We then assessed the effect of potential predictors using a risk score analysis on the basis of a linear combination of the selected predictors weighted by the Cox regression coefficient, using a formula as following: , where S is risk score, x is selected predictors, n is the number of selected predictors. Step III - Risk model evaluation. Bootstrap re-sampling was used to evaluate the model’s internal validity for the optimal model finally selected22,44,45. The discriminative ability of the model was determined according to the time-dependent Receiver-Operator Characteristic (ROC) curves and the corresponding Area Under the Curve (AUC) were calculated to assess the predictive accuracy of the model; the larger the area is under the AUC, the better the model is for the risk prediction46 Sensitivity analyses were conducted to evaluate the performances of different risk scores by plotting (t, AUC (t)) for different values (cutoff points) of follow up time t to test robustness of the results. All statistical analyses were two-sided and performed with R software (Version 3.1.0, 2014-04-10; R Foundation for Statistical Computing).

Additional Information

How to cite this article: Tse, L. A. et al. Prediction models and risk assessment for silicosis using a retrospective cohort study among workers exposed to silica in China. Sci. Rep. 5, 11059; doi: 10.1038/srep11059 (2015).