Introduction

Since December 2019, some cases of unexplained pneumonia had been reported in Wuhan City, Hubei Province. Finally, the pathogen that caused the outbreak was initially identified as a novel coronavirus. On January 12, 2020, the virus was named as 2019 novel coronavirus (2019-nCoV) by the WHO1. The symptoms caused by the new coronavirus disease (COVID-19) could be mild or severe, and even some patients without any symptoms. Early recognition is extremely important for controlling the prevalence and spread of the disease2,3.

The incubation period after 2019-nCoV infection was 2–14 days, mostly 3–7 days. Most patients with COVID-19 showed fever with or without respiratory symptoms in the early stage of the onset. Respiratory symptoms were mainly dry cough; a small number of patients had fever or not obvious at the onset of illness, or even without fever, only dry cough. Severe patients gradually developed respiratory failure, and even acute respiratory distress syndrome and (or) shock4. In clinical work, it had been observed that the symptoms and signs of most light and ordinary patients had been very stable, and some patients suddenly became severe or critical in the course of 7 to 10 days and had to be transferred to ICU for treatment.

The prediction of prognosis and risk factors of severe and critically patients had been seen in relevant studies5,6,7. However, the risk factors and predictions of the transition from light or ordinary patients to heavy or critical have not been reported. We had collected clinical data from these patients in order to analyze the clinical symptoms, laboratory and imaging characteristics of these 2019-nCoV infected patients, especially the characteristics before the transition from light and normal to heavy and critical, and established a predictive model to provide a evidence for predicting changes of the disease.

Materials and methods

The study had been approved by the medical ethics committee of the Third People’s Hospital of Kunming city, Yunnan and The First People’s Hospital of Zaoyang city, Hubei, which conformed to the principles of Helsinki declaration. From January 20, 2020 to March 15, 2020, patients with covid-19 diagnosed by The third people’s hospital of Kunming city, Yunnan and The first people’s hospital of Zhaoyang city, Hubei were collected. The specific data were collected including demographic characteristics, basic diseases, clinical symptoms and signs, clinical data, laboratory test data, chest CT and clinical outcome and so on. 408 patients with covid-19 from the first people's Hospital of Zaoyang City in Hubei Province were taken as the training set, 408 patients were all mild or normal at the time of diagnosis, and 60 patients changed to severe or critical during the treatment process, which was used to construct a predictive model; 78 patients with covid-19 from the third people's Hospital of Kunming City in Yunnan Province as the verification set, 78 patients were mild or normal at the time of diagnosis, and 14 cases were converted to severe or critical during the treatment process to verify the effectiveness of the model. Those who were severe or critical at the time of diagnosis were excluded. The endpoint variable evaluated was the transition from mild or common patients to severe or critical. The informed consent was obtained from all the participants or, if subjects were under 18, from a parent and/or legal guardian, died patients from legal guardian.

According to novel coronavirus pneumonia diagnostic criteria8: (1) epidemiological history: 14 days before onset, there were travel history or residential history in Wuhan and other surrounding areas or other reported cases; There was a history of contact with new coronavirus infected persons (positive for nucleic acid detection) within 14 days before onset; Patients with fever or respiratory symptoms from Wuhan city and surrounding areas, or from the community with case reports, had been contacted within 14 days before the onset of the disease; there was clustering disease. (2) Clinical manifestations: fever and / or respiratory tract symptoms; imaging features of pneumonia; normal or reduced leukocyte count or lymphocyte count in the early stage of the disease. Anyone with epidemiological history and any of two with clinical manifestations can be diagnosed as suspected cases. The inclusion criteria of patients (2 criteria met at the same time): (1) suspected cases of 2019 ncov pneumonia; (2) sputum, throat swab, lower respiratory tract secretion and other standards were used for real-time RT-PCR to detect 2019 ncov nucleic acid positive.

Clinical classification of disease severity8

All confirmed patients were novel coronavirus pneumonia (fifth trial version) for clinical diagnosis. They were: (1) light: mild clinical symptoms, no pneumonia in imaging; (2) common pattern: fever, respiratory symptoms, imaging findings of pneumonia; (3) heavy: any of the following: Respiratory distress Forced, RR ≥ 30 times/min; in the resting state, the oxygen saturation ≤ 93%; PaO2/FiO2 ≤ 300MMHG (1mmhg = 0.133kpa); (4) severe: one of the following conditions: respiratory failure, requiring mechanical ventilation; shock; other organ failure requiring ICU monitoring and treatment.

Test method

The samples of pharynx, stool and blood were collected. The blood samples were tested by the biochemical laboratory according to the operating procedures. The nucleic acids of pharynx and stool were detected by the molecular laboratory using RT-PCR method. The total RNA was extracted within 2 h, and two target genes, including orf1ab and N, were amplified and tested at the same time. Amplification conditions: reverse transcription at 42 ℃ for 5 min, pre denaturation at 95 ℃ for 10 s, denaturation at 95 ℃ for 40 cycles for 10 s, expansion at 60 ℃ and collection of fluorescence signals for 45 s. The double target detection kit was provided by Shanghai Jienuo Biotechnology Co., Ltd. (gxzz 20203400058). Results: cut off value was 40, CT value < 37 was positive, CT value > 40 was negative, 37 ~ 40 was gray area (need to be retested). In the early morning, the fasting venous blood was taken from patients for examination of blood routine, blood biochemistry, coagulation function, blood gas analysis, immunoglobulin, hypersensitive protein, procalcitonin, electrolyte, T-lymphocyte subsets, etc.

T lymphocyte subset detection kit was produced by BD Biosciences, and the flow cytometer was BD FACSCantoII. Two specimen detection tubes were prepared, 2 mL whole blood was collected using the anticoagulant tubes of ethylenediamine teacetic acid, and 100 µL whole blood was inhaled from each of the tubes into the specimen detection tubes. 1 tube to join 20 µL CD3 (FITC)/CD4 CD8 (PE) (Cy5-PE) /CD45PerCP, 2 pipe for the same type of care, respectively to join 20 µL IgG-Cy5-FITC, IgG-PE and IgG-PerCP, oscillation were conducted to mix evenly, placed in room temperature, avoid light incubation for 20 min, add 2 ml hemolysin, continue oscillation to mix evenly, placed in room temperature, avoid light incubation above 10 min, and then placed in a centrifuge, carried out in accordance with the 1500 RPM speed centrifugal, centrifugal time for 5 min, put on a clear liquid, Then add 2 ml PBS, placed in a centrifuge, carried out in accordance with the 1500 RPM speed centrifugal, centrifugal time for 5 min, put on a clear liquid, add 2 ml PBS, placed in a centrifuge, carried out in accordance with the 1500 RPM speed centrifugal, centrifugal time for 5 min, the supernatant discarded, then add in 900µL PBS, using flow cytometry instrument testing, using Cell Quest software(Becton Dickinson, San Jose, CA, USA) analysis test, the above operations were in strict accordance with the instruction to operate.

Clinical treatment

From the date of diagnosis, the confirmed cases were given the antiviral drug lopinavir/ritonavir (200 mg/50 mg / capsule) orally, two capsule a time, three times a day; among them, the severe or critical patients were given immunoglobulin 20 g / day plus corticosteroids (40–80 mg/day methylprednisolone). According to the severity of hypoxemia, different ways and degrees of oxygen were given (low flow oxygen, high flow oxygen, nasal catheter oxygen, mask oxygen, etc.).

Observation index

The laboratory test and imaging data were obtained from the hospital electronic medical record system as the baseline data. Laboratory examination includes: blood routine examination, blood gas analysis, coagulation function, blood biochemistry, electrolyte, bacterial and fungal culture, infection markers, T-lymphocyte subsets, etc., which are rechecked once every 3–5 days and once every 1–2 days if necessary; imaging examination mainly included chest CT examination, which was rechecked once every 3–5 days. According to previous research, focused on the observation of factors that might affect the transition from mild or common to seere or critical, such as age, body mass index, underlying diseases, white blood cells, lymphocytes, neutrophils, blood gas analysis, various inflammatory factors, Liver and kidney function, coagulation function, electrolytes, T lymphocyte subsets, etc.

Discharge criteria8

The standard of discharge was no fever for at least three days, both lungs of chest CT were significantly improved, clinical remission of respiratory symptoms and negative detection of 2019-ncov nucleic acid by throat swab at least 24 h apart.

Statistical treatment

Excel table was used to collect and sort the raw data. Spss19.0 software (Version 19.0, IBM, Chicago, IL, USA) was used for statistical analysis. The processing of missing values adopted regression estimation method. Mean ± standard deviation was used to express the normal distribution measurement data, t test was used to compare the mean of two samples, and variance test was used to compare the mean of more than two samples; median was used to describe the non normal distribution measurement data, and rank sum test was used; χ2 test was used to count the data; In the training set, univariate unconditional logistic regression analysis was first used to incorporate meaningful variables into the multi-factor unconditional logistic regression analysis. The Forward: LR method was used to treat the variables with significant differences in the multivariate analysis as independent risk factors, a prediction model was established; calculated the accuracy, precision, recall and F1-Score of the model, the area under the ROC curve was used to test the discrimination of the prediction model; the calibration curve was drawn according to the actual incidence and prediction incidence. The model established by the training set was validated by the data of the validation set. P < 0.05 was considered to be statistically significant. SPSS modeler18.0 software was used, the data of the training set established a prediction model by the random forest method, calculated the accuracy, precision, recall and F1-Score of the model; then the prediction model was used to verify in the verification set. Compare the prediction models established by the two methods of unconditional logistic regression and random forest.

Results

Basic characteristics

Among 408 patients in the training set, the incubation period was 2–20 days, with a median of 9 (6, 14) days. Among them, 196 cases were male (48.0%) and 212 cases were female (52.0%). The median age was 47 years (37, 56). According to China's novel coronavirus pneumonia diagnosis and treatment plan (trial version fourth) clinical classification, when 408 cases were diagnosed, 64 cases were mild, 344 cases were common type. During the 6–10 days after the diagnosis, 60 patients were seriously ill, 52 of them were severe, 8 of them were critical. Seen Table 1 for the specific clinical characteristics.

Table 1 Basic characteristics of COVID-19 in mild or common group and severe or critical group in training set.

In this study, 408 patients with covid-19 in the training set were divided into two groups: 348 in the light and common type group, and 60 in the heavy and critical type group. There was no significant difference between the two groups with diabetes, cardiovascular disease and chronic obstructive pulmonary disease. There were significant differences in age and BMI between the two groups (z = − 2.236, P = 0.025; Z = − 4.094, P = 0.000). In terms of clinical symptoms, fever and cough were the most common symptoms, while diarrhea, palpitation and runny nose were the less common symptoms.

Biochemical indexes

408 patients in the training set were collected as the baseline indexes when the first pharyngeal test was positive for 2019-ncov nucleic acid. The baseline comparison between the two groups was as follows: peripheral blood lymphocyte (L) (z = − 2.305, P = 0.021), lactic acid (LA) (z = − 2.463, P = 0.014), albumin (ALB) (z = − 2.868, P = 0.004), Ca (z = − 1.994, P = 0.046), Fe (z = − 2.849, P = 0.004) were significant differences. There was no significant difference in leukocyte, neutrophil, hemoglobin, platelet, liver and kidney function, muscle enzyme, C-reactive protein and coagulation function. Seen Table 2 for details.

Table 2 Basic characteristics of COVID-19 in mild or common group and severe or critical group in training set.

In terms of antiviral treatment of 348 in the light and common type group, 110 (31.61%) patients were treated with lopinavir / ritonavir, 124 (35.63%) with abidor, 114 (32.76%) with combination of the two; of 60 in the heavy and critical type group, 23 (38.33%) with lopinavir / ritonavir, 17 (28.33%) with abidor and 20 (33.33%) with combination of the two. There was no significant difference in the use of antiviral drugs between the two groups. Seen Table 2.

T-lymphocyte subsets

In the training set, T-lymphocyte subsets were detected when the two groups were diagnosed. The counts of CD3+ T cells (z =5.621, P = 0.000), CD4+ T cells (z = − 5.617, P = 0.000), CD8+ T cells (z = − 5.456, P = 0.000) in the severe and critical groups were significantly lower than those in the light and general groups. But there was no significant difference between CD3+/CD45+, CD4+/CD45+, CD4+/CD8+ in the two groups. Seen Table 3 for specific results.

Table 3 The feature of T-lymphocyte subsets in mild or common group and severe or critical group in training set.

Dynamic changes of T-lymphocyte subsets

The counts of CD3+, CD4+, CD8+ T-lymphocytes in the training set were rechecked every 3–4 days. The dynamic changes of T-lymphocyte subsets in the two groups were shown in 1A, 1B and 1C of Fig. 1. It could be seen that the CD3+, CD4+, CD8+ T lymphocytes in the severe and critical groups were significantly lower than those in the light and common groups at all time points, and the CD3+, CD4+, CD8+ T lymphocytes in the severe and critical groups recovered slowly with the improvement of the condition.

Figure 1
figure 1

T-lymphocyte subsets dynamic changes of in training set: dynamic changes of CD3+ T cells (a), dynamic changes of CD4+ T cells (b), dynamic changes of CD8+ T cells (c).

Univariate and multivariate regression analysis

In the training set, L, LA, ALB, CA, Fe, CD4+ and CD8+ T cells were selected for univariate and multivariate regression analysis. The results showed that LA and CD8+ T cells were independent factors (P = 0.042, or = 1097.983, 95% CI 1.303, 924,798.262; P = 0.010, or = 0.903, 95% CI 0.835, 0.975), which could predict the transition from mild and common patients to severe and critical patients in the early stage. The cut off values were 2.05 and 190 respectively. Seen Table 4 for details.

Table 4 Univariate and multivariate analysis of the transition from light or ordinary to heavy or critical in training set.

ROC curve

Two independent factors, LA and CD8+ T cells, were used to draw ROC curve. The area under LA curve was 0.754 (0.581, 0.928), and the area under CD8+ T cell curve was 0.842 (0.713, 0.970). The sensitivity and specificity of LA were 0.857 and 0.594, accuracy, precision, recall and F1-score were 0.912, 0.750, 0.601 and 0.667; The sensitivity and specificity of CD8+ T cells were 0.959 and 0.687, accuracy, precision, recall and F1-score were 0.923, 0.833, 0.714 and 0.769. Seen 2a of Fig. 2 for ROC curve.

Figure 2
figure 2

ROC curve of independent influence factors La and CD8+ T cells in training set (a) and ROC curve of CD8+  T cells in validation set (b). Fitting curve of actual observation value and prediction model value of prediction model established by CD8+ T cells in training set (c) and in validation set (d).

In the training set, draw calibration curve according to the actual observation value and prediction model value

Seen 2c of Fig. 2, and the actual observation value and prediction model value were highly consistent.

Validation of the model

CD8+ T cells in the training set were independent factors for early prediction of the transition from light and common patients to heavy and critical patients, and the volume under the ROC curve was 0.842 (0.713, 0.970). The area under ROC curve was to be used to evaluate the differentiation of clinical prediction model of training set in external verification.

A total of 78 novel coronavirus pneumonia patients were admitted to the Third People's Hospital of Kunming as a validation set. The basic characteristics of the validation set were shown in supplementary Table 1, and the data of the validation set and the training set were comparable. According to the regression equation established by the training set, the prediction probability of the verification set was calculated and the ROC curve was drawn, as shown 2 b of Fig. 2. AUC of ROC curve was 0.906 (0.861, 0.981). The prediction model obtained good identification ability by external verification in the validation set as well as the training set. The calibration curve for actual observation value and prediction model value of validation set was drawn, as shown 2d of Fig. 2. The actual observation value was highly consistent with the prediction model value.

Random forest model

In the training set, the statistically significant variables of single factor analysis were included in the random forest model. The number of preselected variables at each tree node in the forest was set to the square root of all variables, and the total number of trees was set to 500. The analysis results showed that the importance of each variable in descending order was: CD8+ T cells, lactic acid, CD4+, ALB, FE, L, CA. Seen Fig. 3. Accuracy, precision, recall and F1-score of the random forest model were 0.948, 0.917, 0.786 and 0.845. The random forest model was substituted into the verification set data for verification, accuracy, precision, recall and F1-score were 0.935, 0.901, 0.714, 0.801.

Figure 3
figure 3

Random forest predictor ordering.

Prognosis

All the patients were treated by comprehensive treatment, in the training set, only 2 cases died and others discharged from hospital; in the validation set, all the patients were cured. All The average length of stay was 15.31 ± 5.64 days in 348 light and general patients, and 22.76 ± 4.82 days in 60 severe and critical patients. There was no significant difference in the average hospitalization days between the light and common type patients in the validation set and the light and common type patients in the training set (t = 0.732, P = 0.466); there was no significant difference in the average hospitalization days between the heavy and critical type patients in the validation set and the heavy and critical type patients in the training set (t = 0.046, P = 0.964). The training set and the validation set were comparable in the length of stay. Seen supplementary Table 2 for details.

Discussion

Since the outbreak of 2019 ncov infection in December 2019, there were 2000 to 4000 newly diagnosed patients every day in China, with a large number of serious cases. With the global spread of covid-19, the number of deaths had increased significantly. If severe cases before they transferred to severe could be identified as early as possible, and necessary interventions could be carried out as early as possible, the mortality rate might be reduced.

Most of the patients with early onset of covid-19 were not very serious, most of them were light and common type, a few of them were heavy / critical type, and a few of them were asymptomatic carriers. But in the course of treatment, some light and common patients would rapidly change into heavy and critical, and develop into severe pneumonia and acute respiratory failure, even death. Some studies had shown that the elderly with 2019 ncov infection were prone to severe or critical diseases because they were mostly combined with other basic diseases, and the mortality rate would also be high4,9,10. At the same time, the decrease of lymphocyte count was related to the progress of the disease, and the lymphocyte related index might be a potential predictor11.

The data of the training set in this study were all light or common type at the time of diagnosis. In the process of giving oxygen inhalation, monitoring vital signs and antiviral treatment at the same time, within 3–10 days of admission, 60 patients' condition suddenly increased without any sign, and had to be transferred to ICU for rescue. Comparing the baseline indexes of 60 patients before aggravation with that of 348 patients without disease change, there might be early warning factors of aggravation. The BMI, L, LA, blood calcium, CD3+, CD4+ and CD8+ T cell counts of the two groups were significant before the disease change, but only LA and CD8+ T cell counts were independent risk factors for the disease change. It showed that LA and CD8+ T cells had changed a lot before the aggravation of the disease, and they were independent risk factors that affected the development of light and common patients to heavy and dangerous ones. Further ROC curve analysis showed that the area under LA curve was 75.4%, the sensitivity was 85.7%, and the specificity was 59.4%; while the area under CD8+ T cell curve was 84.2%, the sensitivity was 95.9%, and the specificity was 68.7%. The fitting validity of the prediction model established by CD8+ T cells was also consistent. It showed that serum LA and CD8+ T cell count could not only change before the exacerbation of patients with mild and common new coronary pneumonia, but also predict the occurrence of exacerbations of patients, and its prediction effect was better. For the early intervention of medical staff in time, to prevent the progress of the disease to get time.

The prediction model established by 408 patients in the First People's Hospital of Zaoyang was verified in 78 light and common patients in the Third People's Hospital of Kunming. The area under the CD8+ T cell curve of the validation set was 90.6%, that under the CD8+ T cell curve of the contrast training set was 84.6%, and that under the curve was more than 70%, indicating that the prediction model had a good discrimination ability. Generally speaking, due to the different baseline characteristics of patients in different medical centers and different levels of medical treatment, the area under the ROC curve of the external validation set will generally decrease or increase, but the fluctuation range was within 10% clinically acceptable. In the training set, the scatter points fluctuated around the reference line, and the separation of the scatter points did not deviate from the reference line significantly, which suggested that the predicted observation value of the clinical prediction model was consistent with the actual observation value. At the same time, when the prediction model was applied to another medical center, the scattered points of curve fitting also fluctuated around the reference line, which showed that the prediction model had good accuracy and stability.

At the same time, using the random forest method to analyze the training set data again, the established model and verification in the verification group, all indicators were better than the unconditional logistic regression model, and the importance of each variable could be analyzed from high to low, the order was: CD8+ T cells, lactate, CD4+, ALB, FE, L, CA. It showed that the random forest had good adaptability to complex data, could give the ranking of each variable under high prediction accuracy, and improved the efficiency of the test.

It should be noted that the dynamic changes of CD4+ and CD8+ T lymphocytes were also valuable for the prognosis of the disease. With the improvement of the condition, CD4+ and CD8+ T lymphocytes gradually recovered in severe and critical patients. These data supported previous studies that lymphopenia, and in particular low T cell counts, were correlated with severe disease, but this study added that measuring T cell counts at disease onset (while still mild in all patients) could predict the outcome or disease progression and was therefore useful for patient management.

Some study thought12 that the reduction of CD8+ T cells in peripheral blood was closely related to serious diseases, and it had been determined that T cell apoptosis and migration to inflamed tissues were possible mechanisms that drived the reduction of peripheral T lymphocytes, and severe COVID- 19 patients were characterized by extensive T cell loss and subsequent T cell proliferation. Another study suggested13 that the immune characteristics of COVID-19 hospitalized patients were heterogeneous, and their CD8+ T cell exhaustion and disease severity might be related to changes over time. The cellular origin (T cells, dendritic cells, or macrophages) of inflammatory cytokine storms in novel coronavirus pneumonia patients remains to be determined. Whether the sharp decline or even depletion of CD4+, CD8+ T cells will affect the replication or elimination of the virus remains to be further studied.

The sample of this study was too small, needs to be confirmed by a large sample. The patients were only from 2 hospitals and entirely from China, which could potentially limit the generalizability in other areas of the world. However, our focus was to identify the risks of patients early and carry out the necessary interventions in time to reduce the occurrence of critically ill patients and the decline in mortality.