Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center

Jin, Senjun; Chen, Lin; Chen, Kun; Hu, Chaozhou; Hu, Sheng’an; Zhang, Zhongheng

doi:10.1038/s41597-023-01952-3

Download PDF

Data Descriptor
Open access
Published: 23 January 2023

Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center

Senjun Jin¹^na1,
Lin Chen²^na1,
Kun Chen²,
Chaozhou Hu¹^na1,
Sheng’an Hu¹^na1 &
…
Zhongheng Zhang ORCID: orcid.org/0000-0002-2336-5323³

Scientific Data volume 10, Article number: 49 (2023) Cite this article

2055 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The medical specialty of critical care, or intensive care, provides emergency medical care to patients suffering from life-threatening complications and injuries. The medical specialty is featured by the generation of a huge amount of high-granularity data in routine practice. Currently, these data are well archived in the hospital information system for the primary purpose of routine clinical practice. However, data scientists have noticed that in-depth mining of such big data may provide insights into the pathophysiology of underlying diseases and healthcare practices. There have been several openly accessible critical care databases being established, which have generated hundreds of scientific outputs published in scientific journals. However, such work is still in its infancy in China. China is a large country with a huge patient population, contributing to the generation of large healthcare databases in hospitals. In this data descriptor article, we report the establishment of an openly accessible critical care database generated from the hospital information system.

An overview of clinical decision support systems: benefits, risks, and strategies for success

Article Open access 06 February 2020

Development and validation of a new algorithm for improved cardiovascular risk prediction

Article Open access 18 April 2024

AI in health and medicine

Article 20 January 2022

Background & Summary

Critically ill patients managed in the intensive care unit (ICU) are usually monitored closely for organ dysfunctions, and are treated intensively by a variety of supportive modalities^1,2. Vital signs, laboratory tests, and medical treatments were obtained at a higher frequency than those treated in the general ward. Such daily intensive management will produce a huge amount of information including medical orders, imaging studies, laboratory findings, and waveform signals. The data generation mechanisms may reflect key factors related to the healthcare system, the pathophysiology of underlying disease, and patient’s preferences and cultures³. Thus, in-depth data mining of such large databases, such as risk factor analysis, predictive analytics, and causal inference^4,5,6, can provide more insights into clinical research questions. More knowledge or pearls of wisdom can be obtained from data mining, and the translation of the knowledge into clinical practice may potentially improve clinical outcomes^7,8.

Most published scientific reports do not make their original raw data freely accessible in the current critical care research community, partly attributable to confidentiality issues. The unwillingness to share data makes it difficult to reproduce the reported results. Furthermore, the exploration of a such large database from a single research group could be biased and limited. Thus, strenuous efforts have been made to encourage the scientific community to share their raw data, which is also supported by the open data campaign^9,10. Several openly accessible critical care databases have been established, mainly reflecting the healthcare systems of western countries^11,12,13. China is a large country with a huge patient population. For example, the estimated incident sepsis cases are about 3 million in 2017, accounting for nearly 10% of the global incident cases¹⁴. Chinese hospitals also have special hospital information systems that are distinct from those of western countries. However, hospital information systems in Chinese hospitals are mainly used for clinical practice and are far less developed for research purposes. Data sharing is still in its infancy in the Chinese critical care community, which significantly impairs the transparency of scientific work and international collaborations. To the best of our knowledge, there are two critical care databases being established in China which focus on pediatric critically ill patients and those with infections^15,16. Here, we reported the establishment of a large critical care database comprising high-granularity data generated from the information system of a tertiary care university hospital. Details of the database are reported in the paper to encourage new research through secondary analysis of the database.

Methods

Study setting and population

The study was conducted in Zhejiang Provincial People’s Hospital, Zhejiang, China from January 2012 to May 2022. All patients admitted to the ICU of the hospital were eligible. There were two ICUs in the hospital: one was the comprehensive central ICU and the other was the emergency ICU (EICU). There was no exclusion criterion in enrolling subjects because we believed that patients who were excluded by a particular study might be eligible for another study. Thus, we included all records in the information system related to ICU stays. The study was approved by the ethics committee of Zhejiang Provincial People’s Hospital (approval number: QT2022185). Informed consent was waived as determined by the institutional review board, due to the retrospective design of the study. The study was conducted in accordance with the Declaration of Helsinki.

Database structure and development

The database is distributed as comma-separated value (CSV) files that can be imported to any relational database system. Each file contains a single table which will be further explained in the subsequent sections. Each individual subject can be identified by a series number (patient_SN) with the combination of digits and letters such as “3c74cf74c36241b7082ec35e458279dc”. Each unit hospital stay is denoted by a Hospital_ID with examples such as “9432117” and “336688072433”. The unique ICU stay can be identified by the HospitalTransfer table, which contains intrahospital transfer events for the subjects. All tables use Hospital_ID to identify an individual hospital stay, and the HospitalTransfer table can be used to determine ICU stays linked to the same patient and/or hospitalization.

We recommend the R package tidyverse for the management of the relational database because of its capability to streamline the workflow from data management to statistical analysis and to the training of machine learning models¹⁷. For large files, we recommend the data.table package to process the tabular data.

Deidentification

All tables are deidentified according to the Health Insurance Portability and Accountability Act (HIPAA). All protected information is removed including addresses, date of birth, date of hospital admission, date of discharge, date of medical order, personal numbers (e.g. name, phone, social security, and hospital number), exact age on admission (age is discretized into bins). When creating the dataset, patients were randomly assigned a unique identifier (patient_SN and hospital_ID) and the original hospital identifiers were not retained. As a result, the identifiers in the database cannot be linked back to the original, identifiable data. All doctor/nurse/pharmacist identifiers have also been removed to protect the privacy of contributing providers.

Data Records

The database comprises 8180 unique hospital admissions for 7638 individual patients from January 2012 to May 2022 and is available at the PhysioNet repository¹⁸. Table 1 shows the baseline demographics of hospital admissions. There are 2965 female and 5215 male patients in the dataset. The length of hospital days was 17 days (Q1 to Q3: 10 to 28). Male patients showed slightly longer hospital stay.

Table 1 Demographics and discharge status of the 8180 hospital admissions in the database.

Subjects

Abstract

Similar content being viewed by others

An overview of clinical decision support systems: benefits, risks, and strategies for success

Development and validation of a new algorithm for improved cardiovascular risk prediction

AI in health and medicine

Background & Summary

Methods

Study setting and population

Database structure and development

Deidentification

Data Records

Classes of data

Patient admission record table

Electronic medical record (First note table)

Progress note table

Diagnosis table

Hospital transfer table

Surgery information table

The Lab table

The Lab dictionary

Microbiology culture table

Drug sensitivity table

Examination report table

Medical order table

Medication table

Medication dictionary

Vital sign table

Technical Validation

Usage Notes

Data access

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Encouraging responsible intensive care data sharing

Search

Quick links