A novel and simple risk model predicts the prognosis for patients with paraquat poisoning

Background: Acute paraquat (PQ) poisoning is characterized by multi-organ failure and lacking effective therapies. Therefore, identifying risk factors and developing model that could predict early prognosis for patients with PQ poisoning is of great importance. Methods: This was a retrospective cohort study employed with patients suffered from acute PQ poisoning (n=1199). Patients (n=913) with PQ poisoning from 2011 to 2018 were randomly divided into 2 mutually exclusive groups: training (609 patients) and test (304 patients). Another 2 external cohorts containing 207 cases from Zhengzhou 2019 were used as validation from different time and 79 from Shenyang as validation from different site. Risk factors were identified by a logistic model with Markov Chain Monte Carlo (MCMC) simulation and further evaluated by a latent class analysis. The prediction score of this model was developed based on the training sample and was further evaluated using the testing and validation samples. Results: Eight risk factors including age, ingestion volume, CK-MB, platelet (PLT), white blood cell (WBC), neutrophil counts (N), gamma-glutamyl transferase (GGT) and serum creatinine (Cr) were identified as in dependent risk indicators of in-hospital death events. The risk model had a C statistic of 0.895 (95% CI 0.855-0.928), 0.891 (95% CI 0.848-0.932) and 0.829 (95% CI 0.455-1.000) and a predictive range of 4.6%-98.2%, 2.3%-94.9% and 0%-12.5% for the test, validation_time and validation_site group, respectively. In the training group, the risk model classified 18.4%, 59.9% and 21.7% of patients into the high, average and low-risk groups, with corresponding probabilities of 0.985, 0.365, and 0.03 for in-hospital death events. Conclusion: Eight risk factors were identified in this study. And we developed and evaluated a simple risk model to predict the prognosis of patients with acute PQ poisoning. This simple and reliable risk score system could be helpful in recognizing high-risk patients and reducing in-hospital death rate due to PQ poisoning. bladder, gastrointestinal tract and liver via neutrophil-mediated ROS in zebrafish and their relevance for human health risk assessment.


Introduction
As a non-selective contact herbicide, paraquat (PQ) is environmental harmless and predominantly used in developing agricultural countries [1,2]. However, it is lethal to human and animals when ingested orally and there are still no effective and specific antidotes for PQ. Patients with acute PQ poisoning usually died within several days to weeks after confirmed exposure due to hypoxemia or multiple organs failure, which causes a considerable economic burden and increased medical resource use for families and countries [3][4][5][6]. Therefore, for patients, especially in critically ill patients, clinical outcome evaluation and risk assessment in a timely manner is crucial and essential for medical resources using wisely, which has become a public health problem concerned by doctors and patients, as well as social security agency.
To our knowledge, several prognostic score systems have been reported to predict the clinical outcomes of patients with PQ poisoning, the major ones being the Acute Physiology and Chronic Health Evaluation II (APACHE II) score [7], Sequential Organ Failure Assessment (SOFA) score [8], the severity index of PQ poisoning (SIPP) [9], Poisoning Severity Score (PSS) [10] and some equations and nomograms based on large cohort study [11][12][13]. Most of them are more suitable for critically ill patients instead of minimally poisoned or early-stage patients who showed mild symptoms. Moreover, it is highly possible for these score systems failed to predict mortality and conduct risk assessment for PQ poisoned patients instantly because of their difficult calculation or unavailable laboratory tests, which cannot meet emergency work demand. Thus, the establishment of an effective, simple and universal predictive model based on common laboratory items would be of great help in risk stratification and therapeutic regimen adjustment for patients with acute PQ poisoning in all stages.
Accordingly, on the basis of 1199 patients with PQ-poisoning from two large academic hospitals in china with sufficient patients' sources, we established and evaluated a simple risk model by identifying significant clinical risk indicators to predict in-hospital death. Data from our study were collected from medical records of acute PQ-poisoned patients in different time and different region.
The end point of this research was patients' in-hospital death before discharge. This study is well designed and easily conducted to generate tools to predict in-hospital death by a combination of simple and clinically relevant variables.

Methods
Study samples and data source

Potential Risk Factors And Outcome
We selected candidate risk factors that were clinically meaningful, were reliable and easily collected, occurred at a frequency more than 1%. Initial factors include patient demographic characteristics  (Table S). To facilitate the calculation of risk score, we examined the nonlinear relationship of each continuous factor with the outcome and categorized it by a cut-off point taking into account both in-hospital death rates and sample sizes (Additional file: Figure S). Factors with missing values were imputed using multiple imputations with 10 imputations. The final imputed value was an average of the 10 imputations. Rates of missing ranged from 1.1% (age) to 13.3% (CK-MB). The outcome is in-hospital death.

Statistical Analyses Risk factor selection and evaluation
We fit a logistic model with Markov Chain Monte Carlo (MCMC) simulation using the training cohort that containing all candidate risk factors. Subsequently, a posterior probability for each factor was further calculated to assess the association strength between risk factor and the clinical outcome [14,15]. Generally, the factor with a posterior probability > 0.95 (or < 0.05 for factors with estimates < 0.0) was thought to be significant for predicting the prognosis and finally included in the risk factor list [16]. Next, based on the selected risk factors by the MCMC method, we established the final risk model to predict the outcome by fitting a logistic model to the training group.
The following indicators were calculated to assess the risk model performance: the Harrell C statistic to evaluate the overall predictive accuracy [17,18], the McFadden R square to evaluate explained variation [19], and the Hosmer-Lemeshow goodness-of-fit test to evaluate calibration [20]. The discrimination was then assessed among the observed outcomes in strata defined by deciles of the predictive probabilities. As described in previous publications [15], according to the deciles, patients in the training group were divided into 5 mutually exclusive risk classes, ranking the classes from lowest risk (class 1) to highest risk (class 5) for evaluation.
Further, data from training group were used to evaluate the selected risk factors. Latent class analysis was conducted by an unsupervised machine learning algorithm that does not require an outcome [14]. Generally, if the selected risk factors are significantly associated with the clinical outcome, they will have the ability to assign patients to risk class with unsupervised learning algorithms. Therefore, we performed latent class analysis to classify patients into 5 mutually exclusive classes and ranked them from lowest risk (class 1) to highest risk (class 5) based on the observed outcome. Then 5 classes were selected to align with the decile-specific classes based on the risk model described previously. We calculated a Spearman correlation coefficient between the risk classes based on the risk model and the risk classed based on latent class analysis. A high coefficient indicates good agreement between the 2 classified results, which provides information on the robustness of the selected risk factors.
Furthermore, we revalidated this risk model by comparing the performance in the training group with the test and another 2 independent validation groups.

Risk Score
A simple risk score was constructed to facilitate the use of the selected risk factors and the risk model for each patient based on the regression coefficients estimated from the training group. The coefficient of risk factor was divided by the sum of all coefficients in the model to calculate the points of each risk factor, multiplying by 100, and rounding to the nearest integer. Patients were stratified into 3 risk groups according to the distribution of the risk score: low (< 25%), average (25%-75%), and high (> 75%).
Statistical analyses were conducted using SAS statistical software version 9.4 (SAS institute lnc.). The latent class analysis was performed using the PROC LCA procedure, (version 1.3.2 beta). The nonlinear relationship was assessed using the PROC GAM procedure. The study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline. Each of the 22 items of the TRIPOD statement was addressed. There were some differences in basic clinical characteristics of patients across training, test and 2 validation samples. And the difference between the training group and the test and validation groups demonstrates the predictability and extrapolation of the model (Table 1).

Risk Factor Selection And Test
The MCMC simulation selected 8 candidate factors with a posterior probability of at least 0.95 (Table 2), including age, ingestion volume, CK-MB, PLT, WBC, N, GGT and Cr (Fig. 2). The risk model based on the 8 risk factors and the training group demonstrated good discrimination, calibration and fit. The overall C statistic was 0.926 (95% CI, 0.891-0.924) for the risk model (Fig. 3). The mean observed in-hospital death ranged from 3.3% in the lowest predicted quintile to 99.2% in the highest predicted quintile, a range of 95.9% and the explained variation was 0.4999 (Fig. 4). Moreover, the P value of the Hosmer-Lemeshow goodness-of-fit test was 0.4262 in the training group and 0.9078 in the test group, indicating that the model fitted well with the observed cohort (Fig. 5). Additionally, model performance in the test sample was comparable to that in the training sample.
The overall C Statistic was 0.895 (95% CI 0.855-0.928) in the test sample (Fig. 3). The rate of observed in-hospital death ranged from 4.6% in the lowest predicted quintile to 98.2% in the highest predictive quintile, the explained variation was 0.4182 for test group (Fig. 4).
Furthermore, in the latent class analysis, 609 patients in the training group were assigned into 5 classes based on the combination of the 8 risk factors (Fig. 6). For this analysis the area under the ROC curve was 0.877 (95% CI, 0.832-0.906), and the mean observed outcome rate ranged from 0.0% in the lowest rating group to 99.2% in the highest rating group. The spearman correlation coefficient between the predicted quintile based on the Logistic model and the latent class analysis was 0.754 (95% CI, 0.715-0.874).

Risk Score System
We observed that the risk factor-specific points ranged from 19 (WBC > = 20) to 6 (PLT < 80) ( Table 2). WBC > = 20, CK-MB > = 50, N > = 80, Ingestion volume > = 100, and Cr > = 150 were the top 5 factors with an odds ratio more than 5.0 (Fig. 2). The training group has mean (SD) risk score of 26.6 (21.6). For the test group, the mean (SD) score was 24.8 (21.1). In addition, in the training sample, 18.4%, 59.9% and 21.7% of patients were stratified into the high, average and low-risk groups, with a corresponding probability of 0.985, 0.365, and 0.03 for in-hospital death respectively (Fig. 7). And the stratification for the test sample was not markedly different from that of the training group (Fig. 7, Table 3).  (Fig. 3) and the p value of Hosmer and Lemeshow's Goodness of Fit Test was 0.9671 and 0.9999 for the two independent groups (Fig. 5).
For the validation groups, the mean (SD) of risk scores were 23 were classified into the low-, average-and high-risk groups, respectively, with corresponding probabilities of 0.98, 0.38 and 0.03 for in-hospital death events ( Fig. 7 and Table 3). In the validation_site group, 36.7%, 60.8% and 2.5% were classified into low-, average-and high-risk groups, with a correspondence probability of 0.03, 0.33 and 0.97 respectively. The probabilities for inhospital death events were identical to the training group ( Fig. 7 and Table 3).

Discussion
In this large study, we first evaluated 609 patients with acute PQ poisoning and revealed eight prognostic variables for survival after PQ poisoning, including WBC, CK-MB, N, Ingestion Volume, Cr, Age, GGT and PLT. Next, a random forest risk model and score system based on the above risk factors was established to predict the in-hospital death for patients diagnosed with PQ poisoning. Furthermore, our model was further verified both in internal and external validation cohorts. The risk factors identified in this model were based on data from medical records and easily collection and ready availability when the patients discharged. Meanwhile, the statistical algorithms are robust in this study. Importantly, the risk predictive model and its corresponding risk scores system may help clinicians distinguish patients with higher risk of in-hospital death after PQ poisoning.
Our research, on the basis of information selected from medical records of PQ poisoned patients, presents a large study that predict the in-hospital death. PQ poisoning is a major cause of fatal poisoning in most regions of Asia nations. PQ could cause multiple organ failure including severe pulmonary fibrosis, which is the main cause of death in paraquat poisoning [21,22]. Although efficacious therapeutic strategies for acute PQ poisoning management have been extensively investigated, early prognosis of PQ poisoning remains unpleasant [23,24]. Generally, the clinical outcomes of acute PQ poisoning are associated with the ingested dose. Previous studies have reported that plasma PQ concentrations are powerful tool for patients with PQ poisoning to predict clinical prognosis [25,26]. However, it is not available in most local hospitals to monitor serum PQ concentration. Moreover, the poison intaking amount is difficult to accurately assess, particularly in potentially valuable prognostic indicator for PQ-poisoned patients [9,27]. The lack of availability of these indicators in many hospitals makes it hard to apply the proposed scoring system on a daily basis, which would also limit the accurate evaluation of poisoning severity. The APACHE II scoring system is also applied in clinical to evaluate the prognosis of PQ-poisoned patients [28,29]. However, poisoning [30,31]. Cr, one of the factors associated with poor prognosis from the risk model, could be induced by direct oxidative injury in renal tubules [32]. Elevated serum creatinine level was closely associated with acute kidney injury (AKI). And PQ-patients with AKI had higher mortality risk than those normal [33]. Through risk stratification, we found that in the training group, the proportion of patients at high risk or general risk of experiencing in-hospital death after PQ poisoning diagnosis was 98% and 36%, respectively. The 8 risk factors could aid clinicians making a reasonable medical intervention that may improve prognosis and reduce unnecessary treatment. To some extent, the economic burden on healthcare might be greatly relieved with the improvement of patients' clinical prognosis and the reduction of in-hospital death.
In conclusion, the simple 8-factor risk model showed good reliability and validity and provided a basis for clinicians to identify high risk patients after PQ poisoning. The severity and early prognosis of PQ Declarations Figure 1 The design of the study. 932 samples were enrolled during the first two months in 2019.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.