Electronic healthcare records and external outcome data for hospitalized patients with heart failure

Heart failure is one of the most important reasons for hospitalization among elderly individuals and is associated with significant mortality and morbidity. Epidemiological studies require the establishment of high-quality databases. Several datasets that primarily involve heart failure populations have been established in Western countries and have generated many high-quality studies. However, no such dataset is available from China. Due to differences in genetic background and healthcare systems between China and Western countries, the establishment of a heart failure database for the Chinese population is urgently needed. We performed a retrospective single-center observational study to collect data regarding the characteristics of heart failure patients in China by integrating electronic healthcare records and follow-up outcome data. The study collected information for a total of 2,008 patients with heart failure, containing 166 attributes.

specifically HF patients, and a large amount of data that are routinely collected during clinical practice were abandoned. To the best of our knowledge, this is the largest HF dataset in the world, including 337 cardiology centres from 33 ESC Member countries 12 . In essence, many trivial attributes may work together to influence the clinical outcome. Thus, a dataset including all aspects of individual patient-level data can help disentangle complex relationships among attributes. In the era of big data, the electronic healthcare records are able to produce a large amount of data related to a given HF patient. These multiparameter relational databases may or may not be related to a given research question. Different studies and analyses require different variables. Making such a publicly available dataset can help to encourage data reuse, thereby promoting more medical knowledge discovery.
Our study aimed to establish a HF database based on electronic healthcare records. Data on subsequent hospital admissions and mortality were obtained at mandatory follow-up visits at 28 days, 3 months and 6 months (if the patient was unable to reach the clinical centre, the follow-up visit was replaced by a telephone call). The study was a retrospective study enrolling hospitalized patients with heart failure from December 2016 to June 2019. Patients were enrolled from Zigong Fourth People's Hospital. Data were extracted from electronic healthcare records. However, this is a single-centre dataset, covering only Chinese patients. Findings with these data alone may not have convincing generalizability. Researchers may combine this dataset with other heart failure cohort data for a larger-scale study.

Methods
Study setting and population. The study was conducted at Zigong Fourth People's Hospital, Sichuan, China from December 2016 to June 2019, and was approved by the ethics committee of Zigong Fourth People's Hospital (Approval Number: 2020-010). Informed consent was waived due to the retrospective design of the study. The study complies with the Declaration of Helsinki.
Electronic healthcare records of consecutive patients with a diagnosis of HF were reviewed. We included all types of heart failure including acute HF, chronic HF, left HF, right HF, or a mixture of all. Heart failure was defined according to the European Society of Cardiology (ESC) criteria 13 : 1) The presence of symptoms and/or signs of HF. Typical symptoms include breathlessness, orthopnoea, paroxymal nocturnal dyspnea, reduced exercise tolerance, fatigue, tiredness, increased time to recover after exercise and ankle swelling. Typical signs include elevated jugular venous pressure, hepatojugular reflux, third heart sound (gallop rhythm) and laterally displaced apical impulse. 2) Elevated levels of BNPs (BNP >35 pg/mL and/or NT-proBNP >125 pg/mL) 3) Objective evidence of other cardiac functional and structural alterations underlying HF. 4) In case of uncertainty, a stress test or invasively measured elevated LV filling pressure may be needed to confirm the diagnosis.
Patients who had a diagnosis of heart failure on hospital admission were enrolled in our study. The diagnosis was recorded with ICD-9 in the EHR (Table 1).

Variables and attributes.
Data collected for the dataset included three broad categories: demographic data, baseline clinical characteristics, comorbidities, laboratory findings, drugs and outcomes. Demographic data were entered manually into the EMR system by the nurses on admission if a patient first visited our hospital. www.nature.com/scientificdata www.nature.com/scientificdata/ Otherwise, demographic data could be automatically extracted from previous visits. Some missing or error data were checked if they were identified by the nurses. To ensure the accuracy and consistency of data entry, a drop-down list was used for some variables in our EMR system, such as sex, department of admission and occupation. Laboratory tests and drugs were electronically entered by physicians and/or lab workers. Data in the EMR were extracted by SQL query to establish the current database. The accuracy of the SQL query was then checked manually by randomly selecting 50 patients. Many data items were recorded in Chinese in the electronic healthcare record database, thus the largest challenge is the language barrier. All the lab test items, examinations, drug names and diagnoses were recorded in Chinese in the electronic healthcare record database. To address this problem, all Chinese terms were translated to English by the principal investigators (Z.Z., P.X. and L.C.).
The demographic data were obtained from the first sheet of the medical records and included age, sex, height, body weight, admission ward, type of admission (emergency vs. nonemergency), occupation, discharge department, admission date, visit times, and marital status.
Baseline clinical characteristics were measured on the day of hospital admission and included body temperature, pulse, respiration rate, systolic blood pressure, diastolic blood pressure, mean arterial blood pressure, weight, height, body mass index (BMI), type of heart failure, New York Heart Association (NYHA) cardiac function, Killip Grade (Class 1 No rales, no 3rd heart sound; Class 2 Rales in <1⁄2 lung field or presence of a 3rd heart sound; Class 3 Rales in >1⁄2 lung field-pulmonary oedema; Class 4 Cardiogenic shock-determined clinically), and Glasgow Coma Scale (GCS) score. Echocardiographic findings included left ventricular ejection fraction (LVEF), left ventricular end diastolic diameter, mitral valve peak E wave velocity (m/s), mitral valve peak A wave velocity (m/s), E/A, tricuspid valve regurgitation velocity, and tricuspid valve regurgitation pressure.
Comorbidities included a medical history of myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic obstructive pulmonary disease (COPD), connective tissue disease, peptic ulcer disease, diabetes, moderate-to-severe chronic kidney disease, hemiplegia, leukaemia, www.nature.com/scientificdata www.nature.com/scientificdata/ Outcome variables included discharge date of the index hospital, vital status at hospital discharge, death within 28 days, readmission within 28 days, death within 3 months, readmission within 3 months, death within 6 months, readmission within 6 months, time to death (days from index hospital admission), time to readmission (days from index hospital admission), return to emergency department within 6 months, and time to visit emergency department within 6 months. The variable "DestinationDischarge" was recorded after hospital discharge, and the variable "outcome.during.hospitalization" was recorded after the decision to discharge was made.

Data Records
The study generated a single dataset, that contained information on 166 attributes of 2008 hospitalized patients from December 2016 to June 2019. The dataset is available at PhysioNet (https://doi.org/10.13026/8a9e-w734) 15 . Missing values are indicated with blanks. Detailed information on variable specifications is included in a variable description file.

technical Validation
The present study was a retrospective design. Information on eligible patients was collected at Zigong Fourth People's Hospital. First, the required data were exported from the electronic healthcare database with the assistance of the information technology technician. The exported data were then checked by expert emergency and critical care physicians; if outliers in each variable and contradictions within data were detected, data were validated by another investigator. The outliers and contradictions were judged by expert emergency and critical care physicians. Data on subsequent hospital admissions and mortality were obtained at mandatory follow-up visit at 28 days, 3 months and 6 months (if the patient was unable to reach the clinical centre, the follow-up visit was replaced by a telephone call).
Data were finalized and fully anonymized on June 8, 2020.