J-CKD-DB: a nationwide multicentre electronic health record-based chronic kidney disease database in Japan

The Japan Chronic Kidney Disease (CKD) Database (J-CKD-DB) is a large-scale, nation-wide registry based on electronic health record (EHR) data from participating university hospitals. Using a standardized exchangeable information storage, the J-CKD-DB succeeded to efficiently collect clinical data of CKD patients across hospitals despite their different EHR systems. CKD was defined as dipstick proteinuria ≥1+ and/or estimated glomerular filtration rate <60 mL/min/1.73 m2 base on both out- and inpatient laboratory data. As an initial analysis, we analyzed 39,121 CKD outpatients (median age was 71 years, 54.7% were men, median eGFR was 51.3 mL/min/1.73 m2) and observed that the number of patients with a CKD stage G1, G2, G3a, G3b, G4 and G5 were 1,001 (2.6%), 2,612 (6.7%), 23,333 (59.6%), 8,357 (21.4%), 2,710 (6.9%) and 1,108 (2.8%), respectively. According to the KDIGO risk classification, there were 30.1% and 25.5% of male and female patients with CKD at very high-risk, respectively. As the information from every clinical encounter from those participating hospitals will be continuously updated with an anonymized patient ID, the J-CKD-DB will be a dynamic registry of Japanese CKD patients by expanding and linking with other existing databases and a platform for a number of cross-sectional and prospective analyses to answer important clinical questions in CKD care.

. Overview of the Japan Chronic Kidney Disease Database (J-CKD-DB) System. The SS-MIX2 (Standardized Structured Medical Information eXchange) leveraged recent progress made in healthcare information standards in Japan, including code standardization regarding laboratory data items and prescription data. The university hospitals participating in J-CKD-DB (left boxes) needed to have electronic health record systems that incorporated SS-MIX2 storage and a template-based structured-data entry function that could transfer the entered data to the SS-MIX2 storage. All data elements are extracted semi-automatically using SS-MIX2 storage and send to J-CKD-DB data centre through HTTPS (upper right box). MCDRS (Multipurpose Clinical Data Repository System), a software system developed at the University of Tokyo, is adopted for designing and collecting the data elements. The administrative office of J-CKD-DB project is in Kawasaki Medical School (lower middle box). J-CKD-DB is maintained, and data cleaning is carried out at the office (lower right box). AP, application; DB, database; DMZ, demilitarized zone; HTTPS, hypertext transfer protocol secure; SSH, secure shell; SSL, secure sockets layer; VPN, virtual private network. 1 Hospital code Issued by J-CKD-DB administration office 4 digit code

Results
Development of J-cKD-DB. Figure 1 shows the overview of the J-CKD-DB system. The J-CKD-DB is a nationwide multicentre EHR-based database of CKD in Japan. The inclusion criteria of the J-CKD-DB were as follows: (1) age ≥ 18 years old and (2) proteinuria ≥ 1 + (dipstick test) and/or estimated glomerular filtration rate (eGFR) <60 mL/ min/1.73 m 2 . All patients meeting these inclusion criteria based on outpatient or inpatient information were registered in the J-CKD-DB regardless of whether patients were under the care of nephrology or other specialties. Among more than 80 university hospitals in Japan, 21 agreed to join the J-CKD-DB Registry (Supplementary Table S1). Table 1 summarizes information collected from the J-CKD-DB. The Multipurpose Clinical Data Registry System (MCDRS) for data extraction and registry has been developed through the Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program) by the Japan Society for Promotion of Science; this system allows the efficient collection of clinical data using the SS-MIX2 format 11 , which is application-independent, and current use includes community healthcare information systems, backup for disaster, multi-institutional database development, and clinical research. participant selection and baseline characteristics. Supplementary Fig. 1 shows the selection flowchart for an initial analysis which is focused on CKD outpatients. As of January 2018, over 100,000 CKD patients from 11 university hospitals (Phase 1 Database-Building hospitals) were registered in the database from 1 January 2014 to 31 December 2014. Four university hospitals had no information about admission history as of January 2018 and thus were not included. Finally, 39,121 outpatients without admission history were included from 7 university hospitals in the initial analysis.
According to the KDIGO risk classification, very high-risk (red zone in Fig. 4a,b) cases accounted for 30.1% and 25.5% of all cases of male and female patients, respectively. Through all age strata, the very high-risk (red zone) cases were higher in males than in female (Fig. 5a,b). Among young patients 18-44 years, the most prevalence risk category was high-risk (orange zone), accounting for 54.3% in male and 57.9% in female (Fig. 5c,d). The prevalence of very high-risk (red zone) CKD increased gradually along with age in both sexes (i.e., 44.2% in male and 40.5% in female among patients over 85 years) (Fig. 5c,d).

Discussion
In the present study, we have established the largest registry of CKD patients in Japan, called the J-CKD-DB based on advanced EHR systems to automatically extract data. As an initial analysis, we analyzed 39,121 CKD outpatients and observed that majority of CKD patients were older than 65 years old, with the most prevalent age category of 70-79 years in both sexes. Younger CKD patients in J-CKD-DB tended to be at more advanced A stages than older patients.  Table 2. General characteristics of CKD outpatients in the J-CKD-DB at baseline. Continuous variables are described as median [inter-quartile interval, (IQI)] or mean (standard deviation, SD). Factors are described as n (%).
A major strength of the J-CKD-DB is the largest registry of CKD patients in Japan. Since the all data elements are extracted automatically using SS-MIX2 storage as determined from EHR data 11,12 , data collection and analysis will be facilitated by the J-CKD-DB in several ways. First, it is possible to build a multi-layered database by linking with an existing database, such as J-RBR/J-KDR 7,8 for complementary purposes (Fig. 6). Furthermore, we are also planning to connect the J-CKD-DB to biological samples (blood, serum and urine) and genomic information after informed consent and establish a multi-layered database that will comprise the J-RBR/J-KDR at the middle layer and a biological sample database at the top layer and the J-CKD-DB at the bottom layer (Fig. 6). Although there are some systems to build up the database of CKD automatically in the world [14][15][16][17] , the J-CKD-DB has an advantage of expanding and linking with these multi-layer databases over the other systems. Second, it is possible to conduct various studies by publicly inviting research questions, which will provide evidence on CKD management. Third, the database can be used to analyse guideline-based quality indicators, compliance rates, and the heterogeneity of medical quality across institutions. By iterating the above processes, the J-CKD-DB can develop evidence-based strategies, informing future clinical guidelines and improving the quality of medical treatment for CKD patients.
In the process of establishing the J-CKD-DB, we noticed a few issues to be addressed. Although laboratory and prescription data have generally been described according to the standardized code system, SS-MIX2, throughout Japan, we found variations in those coding systems across the participating hospitals 11 . We found it challenging to convert the local code of medical care and tests to the standardized code by referencing mapping tables. One solution is to consider local variations, which will contribute to the construction of additional clinical effect databases in the future.
The present study demonstrated that clinical features of 39,121 CKD outpatients in university hospitals in Japan. As anticipated, the prevalence of proteinuria increased as GFR decreased. Male patients in J-CKD-DB tended to have higher levels of proteinuria than female patients. Accordingly, the prevalence of CKD with very high-risk (according to the KDIGO risk classification) was higher in male than in female patients across all age strata. When we looked the proportion of CKD risk stages within each age category, the highest proportion of high-risk (orange zone) CKD was seen among younger patients aged 18-44 years, with high proportion of CKD stage G5 and A3 in both sexes. In the field of adolescent and young adult patients with childhood-onset CKD, www.nature.com/scientificreports www.nature.com/scientificreports/ progressive CKD due to various diseases including childhood-onset nephrotic syndrome, chronic glomerulonephritis such as IgA nephropathy, congenital malformations of the kidney and urinary tract (CAKUT), persist after patients become adults 18 . Especially, the median age at which CAKUT progresses to ESKD was reported to be around 35 years 19 . Based on these reports, younger patients with advanced CKD are more likely to be referred to and managed in university hospitals. Overall, the prevalence of proteinuria was higher in more severe stages of  www.nature.com/scientificreports www.nature.com/scientificreports/ eGFR, although only 48.7% of CKD patients in J-CKD-DB had data on proteinuria. Further, we will present the results of cross-sectional analyses in more detail and collecting more patients to conduct longitudinal follow-up study for patients in the J-CKD-DB as a J-CKD-DB extension (J-CKD-DB-Ex).
Weaknesses of the J-CKD-DB are as follows: First, EHR data may be inferior to observational epidemiological studies in some information, e.g. the cause of CKD, body mass index, blood pressure levels, social status, patient-reported experience measures, and incidence of CVDs because these elements are not included in SS-MIX2. Hence, some research questions related to these variables were not answered. Second, eGFR values based on a single measurement of serum creatinine are prone to causing misclassification, specifically in CKD G3a without proteinuria, thus not meeting the chronicity criterion. However, this approach has been used often in CKD research 20,21 .
In conclusion, we launched the J-CKD-DB based on advanced EHR systems to automatically extract data literally and literally eliminate manual input. As the information from every clinical encounter will be updated with an anonymized patient ID, the J-CKD-DB will be used for both cross-sectional and prospective investigations for patients with CKD in Japan to answer important clinical research questions and contribute to the improvement in the quality of CKD care.

Materials and Methods
Study design. This was an observational retrospective study using electronic health records.
Setting and data source. The J-CKD-DB has been designed as a nationwide multicentre EHR-based database of patients with CKD in collaboration with the JSN and the JAMI. It was initiated in June 2016 as a comprehensive database of the clinical effective information of the Ministry of Health, Labor and Welfare (UMIN trial number, UMIN000026272). To ensure the smooth implementation and maintenance of the systems essential for J-CKD-DB; the facilities participating in J-CKD-DB needed to have EHR systems that incorporated SS-MIX2 storage (Fig. 1) 11 . The ethical committee of Kawasaki Medical School and JSN comprehensively approved the study (JSN no. 28), and a local committee of participating university hospitals individually approved the study. Because this is a retrospective study and the data analyzed were anonymized, informed consent from participants In the future, the J-CKD-DB will be linked to other databases. Firstly, linkage to J-RBR/J-KDR 7,8 , which has over 40,000 participants, is planned in late 2020 (Fig. 2). Therefore, to enhance future analysis, physicians in each university hospitals have manually flagged special registrations in the database as follows: (1) haemodialysis cases, (2) peritoneal dialysis cases, (3) kidney transplantation cases, (4) cases in which kidney biopsy is being performed, and (5) cases registered in the J-RBR.

Statistical analyses.
Values are presented as median with interquartile interval (IQI), mean (SD), or count with percentage, as appropriate. Distributions of the variables were evaluated by histogram, quantile-quantile plot, and the Kolmogorov-Smirnov test. Clinical parameters were compared among the eGFR or proteinuria categories using the Kruskal-Wallis nonparametric test. All data were statistically analyzed using IBM SPSS Advanced Statistical Version 26.0 (SPSS, Chicago, IL, USA), and p < 0.05 was considered to indicate a significant difference. . It is also planning to connect J-CKD-DB to biological samples and genomic information after informed consent (top layer) and establishing a multi-layered database that will comprise the J-RBR. Numbers are number of patients/measurements.