Exploration of pathological prediction of chronic kidney diseases by a novel theory of bi-directional probability

In the clinic, the pathological types of chronic kidney diseases (CKD) are considered references for choosing treatment protocols. From a statistical viewpoint, a non-invasive method to predict pathological types of CKD is a focus of our work. In the current study, following a frequency analysis of the clinical indices of 588 CKD patients in the department of nephrology, a third-grade class-A hospital, a novel theory is proposed: “bi-directional cumulative probability dichotomy”. Further, two models for the prediction and differential diagnosis of CKD pathological type are established. The former indicates an occurrence probability of the pathological types, and the latter indicates an occurrence of CKD pathological type according to logistic binary regression. To verify the models, data were collected from 135 patients, and the results showed that the highest accuracy rate on membranous nephropathy (MN-100%), followed by IgA nephropathy (IgAN-83.33%) and mild lesion type (MLN-73.53%), whereas lower prediction accuracy was observed for mesangial proliferative glomerulonephritis (0%) and focal segmental sclerosis type (21.74%). The models of bi-directional probability prediction and differential diagnosis indicate a good prediction value in MN, IgAN and MLN and may be considered alternative methods for the pathological discrimination of CKD patients who are unable to undergo renal biopsy.

novel theory, the "bi-directional cumulative probability dichotomy" is proposed, and two models for the prediction and differential diagnosing of CKD pathological type are established. Further, we apply the CKD sample to verify the accuracy rate of models.

Frequency of clinical indexes in CKD patients with a definite pathology diagnosis.
According to the database of CKD pathological classification, the frequency of each clinical index in different CKD pathological types is detected by calculating the ratio of index' count vs total count, respectively. For example, the frequency of "Proteinuria < 2Y" in MN is highly as 87.9%, while that is relative low as 70.1%. These subtle differences between them indicated the different characteristic of each clinical index in the corresponding CKD pathological types. The frequency of clinical indices in CKD pathological types was shown in the Table 1.

Theory of bi-directional probability accumulation dichotomy. Principles of bi-directional probability
accumulation. Theory of bi-directional probability accumulation. According to the principle of detecting frequency and the probability theory, the different frequencies of clinical indices in the χ 2 test were distributed in different CKD pathological types. If the frequency was as high as P > 0.50 (binominal distribution), P > 0.33 (trinomial distribution), or P > 0.25 (quadrinomial or higher distribution), the frequencies were defined as having a positive probability (p1), which was demonstrated as p1-0.5/p1-0.33/p1-0.25. In contrast, if the frequency was as low as P < 0.50 (binominal distribution), P < 0.33 (trinomial distribution), or P < 0.25 (quadrinomial or higher distribution), the frequencies were defined as having negative probability (p2), which was demonstrated as 0.5-p2/0.33-p2/0.25-p2. Further, the positive and negative probabilities were summed as ∑ positive (P1) or ∑ negative (P2), respectively, and the difference between the positive probability and the negative probability was defined as the excess probability of one CKD pathological type, and the corresponding formula is ∑ positive (P1)-∑ negative (P2). For visualizing the concept, an imagistic perception is shown in Fig. 1.   Principle of dichotomy. The above concept of bi-directional probability accumulation was based on the assumption of a certain CKD pathological type. Generally, we assume that the occurrence of any one CKD pathological type is a random event, so the basic probability of any one CKD pathological type is 50%. Therefore, the predictive probability of CKD pathological type is deduced as ∑ positive (P1) -∑ negative (P2)+ 50%.
Estimation of sampling error of ∑positive (P1) or ∑negative (P2). Because p1/p2 is calculated from the sample data of one pathological type, the sampling error of p1 or p2 should be estimated in the model of bi-directional probability accumulation. For the accumulated value of bi-directional probability ∑ positive (P1) or ∑ negative (P2), the sampling error was calculated as Establishment of the bi-directional probability accumulation dichotomy formula. Following the detection and analysis of bi-directional frequency in different pathological types, the principle of bi-directional probability accumulation dichotomy is applied to establish the prediction model of CKD pathological types, and the predictive formula was expressed as follows: Based on the predictive formula of CKD pathological types, the occurrence probability of CKD pathological types can be calculated by analyzing "yes" or "no" responses to the clinical indexes in one CKD patient. The indices of positive or negative probability (p1 or p2) in 5 pathological types (see Supplement information) Logistic regression model for differential diagnosis of CKD pathological types. Following the establishment of the bi-directional probability accumulation dichotomy, the predictive probability of CKD pathological types can be calculated by the responses of "yes" or "no" to clinical indexes in CKD patients. The results of two adjacent probabilities in a patient will be deduced, so the solution is the establishment of a logistic regression model of differential diagnosis. Six pairs of logistic regression models of differential diagnosis were established between each two CKD pathological types (MLN vs FSGS; IgAN vs FSGS; IgAN vs MLN; IgAN vs MN; MN vs MLN; FSGS vs MN). Based on the OR (odds ratio) in the 6 pairs of models of CKD pathological types, we set up a discrimination method of any one CKD pathological type. First, the propensity factors of each CKD pathological type are calculated and summed as ∑ OR1 (prediction group) or ∑ OR2 (control group); if the difference value of ∑ OR1 -∑ OR2 is > 0, then the judgement is the OR1-indicated pathological type. Conversely, if the difference value of ∑ OR1 -∑ OR2 is < 0, then the judgement is the OR2-indicated pathological type. In addition, due to a small sample number and the fact that models of MsPGN were not established, details are shown in Supplement information.
Verification of models of probability prediction and differential diagnosis. To verify the accuracy of the probability prediction and differential diagnosis models, the verifying software in the Excel program was applied to examine the CKD pathological types. Predicting the results of 135 CKD patients showed that the highest accuracy rate was obtained using MN (100%), followed by IgAN (83.33%) and MLN (73.53%), and a lower prediction accuracy was found for MsPGN (0%) and FSGS (21.74%). However, the MN prediction also demonstrated a relatively high rate of false judgment, as detailed in Table 2.

Discussion
CKD is a common chronic disease in the general population, and its prevalence is estimated to range from 10.2% to 13.0% in the United States, Norway and other countries 2-3 . An epidemiological survey showed that the prevalence of CKD is as high as 10.8% in China 4 ; the number of CKD cases in China is approximately 147.96 million according to the estimation of the overall population of China of 1.37 billion (data from the sixth national population census in 2010), especially for the primary kidney diseases such as primary glomerulonephritis, primary nephrotic syndrome, that are frequent occurred and showed the typical clinical features of proteinuria, hematuria, edema, high blood pressure, etc. At present, the first treatment of primary kidney diseases is very important to subsequent development of nephritic pathology, which is also an important factor that affects the outcome of end-stage renal disease (ESRD). Consequently, patients with ESRD are required to undergo expensive renal replacement therapy (e.g., haemodialysis or peritoneal dialysis or renal transplantation), resulting in a great reduction the quality of life of these patients. Currently, the accurate judgment of pathological type is mainly dependent on renal biopsy. Clinically, not all patients are willing to undergo a traumatic operation for renal biopsy, or they may exhibit clinical contraindications for renal biopsy. In cases the above situation occurs, the pathological discrimination of CKD patients is difficult, which can lead to confusion in choosing reasonable treatments for patients. Therefore, the present of study, we focused on the accurate prediction of the various pathological types in CKD patients with primary glomerular disease by statistical theory, and expect to be applied in the clinic, and to achieve the goal of effectively controlling CKD and delaying the onset of ESRD by the appropriate treatment of patients as much as possible.
In the current study, following the frequency detection of clinical indices of CKD patients with primary glomerular disease and the characteristics of different pathological types are analyzed, the theory of "bi-directional probability accumulation dichotomy" is proposed, and the judgment of relative pathological types is achieved by calculating the total value of bi-directional probability. Specifically, the probability value of predicting the pathological types of the primary glomerular disease can be calculated by collecting the frequency of "yes" answers to matching clinical indexes, such as basic characteristics, medical history, signs and laboratory results for a CKD patient with primary kidney disease. Therefore, based on the theory of "bi-directional probability accumulation dichotomy", the calculating formula was expressed as follows: Further, when the predictive probabilities of the relative pathological types were similar, the 6 pair models of differential diagnosis were established, and the judgement method of CKD pathological type was established, detailed as logistic regression models of MLN vs FSGS, FSGS vs IgAN, IgAN vs MLN, IgAN vs MN, MN vs MLN, and MN vs FSGS.
To apply them in the clinic, the theory of "bi-directional probability accumulation dichotomy" and the models of differential diagnosis must be validated in practice. Software for probability prediction and differential diagnosis was designed and applied to verify the prediction accuracy of the above models. The results showed the highest accuracy rate for MN (100%), followed by IgAN (83.33%) and MLN (73.53%), whereas lower prediction accuracy was shown for MsPGN (0%) and FSGS (21.74%). The results indicated a good prediction value for MN, IgAN and MLN patients. However, the prediction value of MN also showed a relative high rate of false judgment in non-MN patients. In conclusion, the present of study provides a preliminary reference for predicting the pathological types of CKD patients with primary glomerular diseases by a non-invasive method.
Although we got some useful results from this study, the limitations of sample number and prediction accuracy in this study should be considered in future studies. Especially, increasing the prediction accuracy on FSGS or MsPGN and decreasing the misjudgement rate on MN will be a key focus. Therefore, some substantial improvements need to be carried out in next study, such as extending the scope of collecting CKD patients from more hospitals to increase sample number, In addition, some rare pathological types in adult of China, such as sclerosing glomerulonephritis, minimal change nephrosis and MsPGN, which expect to be included in the next improved model. Indeed, the establishment of a comprehensive prediction system is a very valuable for the application in clinic. Noticeably, the present model aim only to the prediction of the pathological types in primary glomerular diseases. For the secondary kidney diseases such as lupus nephritis, amyloidosis, diabetic nephropathy, which have complex clinical characteristic and are identified by clinical various examinations. Thus, another corresponding model should be also established based on other statistical viewpoint in future study. Of course, the later model and the former model should be closely associated and cooperated, together for applying in predicting various kidney diseases pathologic types in clinical.
Summary, the current research is only an exploration and an attempt to predict the pathological types of primary glomerular diseases from a statistical probability perspective. Our ultimate goal is to establish a complete system of probability prediction on the pathological types in various kidney diseases which is applied in the prediction of patients with kidney diseases and uncovers a new field of prediction medicine for internal medicine.

Materials and Methods
Collection of clinical data in patients with CKD. Data were obtained from the archives of 723 CKD cases, including 588 cases for the establishment of models and 135 cases for the verification of models. Archives were from patients seen from July 2011 to August 2014 at the Department of Nephrology at a third-grade class-A hospital. The study was classified as statistics and was approved by medical ethics committee of the Second Xiangya Hospital in China. The criteria for case selection included the following: A. A histopathologic diagnosis according to biopsy and clinical diagnosis is clear; B. Patient age is from 13 to 65 years old, the length of hospital stay was ≥ 4 days, and the detailed medical history records and clinical test results were available; C. Clinical CKD staging  [5][6] . The criteria for case exclusion included the presence of end-stage renal disease or long-term renal replacement therapy such as haemodialysis or peritoneal dialysis. A total of 588 cases were used to build prediction models, and 120 patients were classified as having mild lesion nephrosis (MLN); 122 patients were diagnosed with focal segmental glomerularsclerosis (FSGS); 141 with membranous nephropathy (MN); 137 with IgA nephropathy-IgAN (these IgAN cases were further divided into IgA-mild lesion (IgA-ML) and IgA-focal segmental sclerosis or focal sclerosis or Hyperplasia with sclerosis (IgA-FS)); 9 patients with sclerosing glomerulonephritis (SGN); 27 and with mesangial proliferative glomerulonephritis (MsPGN). After the models were constructed, the remaining 135 CKD cases were used to verify the accuracy rate of predicting CKD pathological types.
Establishing statistical variables and databases. All results of classified or numeric variables in 588 CKD patients were input into Excel 2003 and SPSS 19.0 databases, the frequency of 40 classified variables including basic characteristics, medical history, clinical signs and laboratory data was calculated by the classification of CKD pathological types, and a special database of containing these clinical indexes was established. Following the significant difference was discriminated by the Pearson Chi-square (χ 2 ) test or t test, the significant variables were screened and further analyzed, as shown in Supplement information.
Data availability. All data needed to support the conclusions in this paper are present in the paper and the