Heterogeneous graph construction and HinSAGE learning from electronic medical records

Graph representation learning is a method for introducing how to effectively construct and learn patient embeddings using electronic medical records. Adapting the integration will support and advance the previous methods to predict the prognosis of patients in network models. This study aims to address the challenge of implementing a complex and highly heterogeneous dataset, including the following: (1) demonstrating how to build a multi-attributed and multi-relational graph model (2) and applying a downstream disease prediction task of a patient’s prognosis using the HinSAGE algorithm. We present a bipartite graph schema and a graph database construction in detail. The first constructed graph database illustrates a query of a predictive network that provides analytical insights using a graph representation of a patient’s journey. Moreover, we demonstrate an alternative bipartite model where we apply the model to the HinSAGE to perform the link prediction task for predicting the event occurrence. Consequently, the performance evaluation indicated that our heterogeneous graph model was successfully predicted as a baseline model. Overall, our graph database successfully demonstrated efficient real-time query performance and showed HinSAGE implementation to predict cardiovascular disease event outcomes on supervised link prediction learning.


Supplementary Data 2. Graph Schema Construction.
A schema-building process occurred prior to importing the datasets into the database. The domain of the graph schema reflects how the entities are related to each other, indicating any slight modification in the layout of the schema will provide different outcomes depending on its schema.
While observing different schemas, we found that a linear graph cannot create efficient relationships with other nodes, thus, we used a circular structure to efficiently interact with nodes of more than two. Next, a directed edge format was selected, instead of undirected edges, to particularly describe the individual relationship connecting to specific nodes. Then, we wanted to create not a fully connected schema to provide a patient-centric graph, thereby, choosing a bipartite structure. For instance, It is costly to implement a fully connected network and requires an increase in query time as the number of connections between nodes and edges grows. Lastly, the graph needed to contain feature information with patients detailed medical records and the outcome values attached to perform the prediction, therefore, an attribute and weight graph schema was built.
Thus, significant factors were used in the prediction stage, which contributed to the event outcome. The event node illustrating the subject of the study was created as a separate node so that the query output can be focused on the properties attached to the event entity. This allowed the association between the subject outcome with other predictive properties to be evaluated. The detailed steps included deleting and merging entities, creating specific edges between the entities, aligning the node properties to edge properties, and simplifying the data structure.
Our graph model was constructed in a patient-centric method, where each patient was designed to be the center of each graph. Hence, patient individuals created a unique graph of their own containing the total medical records of a visit. In addition, the network was multi-attributed meaning each entity type has its corresponding set of attributes or features. Further, it was multi-relational in terms of a heterogeneous set of edges, which form interactions within the network with its own edge labels. The graph was also created as a bipartite type, where the sets of vertices can be disjointed into two groups, representing a not fully connected model.

Supplementary Data. 3. Entity and Attribute Selection.
Event: Event has two values for identifying the occurrence of the outcome, 0 indicates no event, and 1 is the occurrence of an event.
Person: The person entity is represented in the core of each patient's graph model, whereby all additional types of entities were linked to the individual. The person entity solely represents the unique patient ID.
Echocardiography: Echocardiography results provide several items of information for the measurement of the heart's pumping strength. The criteria for the measurements are right information rhythm, systolic left ventricular internal diameter, diastolic left ventricular internal diameter, systolic left ventricular wall stress, diastolic left ventricular wall stress, systolic left ventricular summit, diastolic left ventricular summit, left atrium size, aorta diameter, left systolic volume, left diastolic volume, left ventricular blood velocity1, left ventricular blood velocity2, blood velocity ratio, velocity-time, mitral regurgitation grade, aorta velocity, tricuspid valve grade, tricuspid velocity, tricuspid reflux max velocity, pressure difference, plax and psax (11-46), echo window, left ventricular mass index, septal velocity1, septal velocity2, left ventricle ratio, wall motion index, systolic blood pressure, diastolic blood pressure, systolic septal velocity, left ventricular ejection fraction, left ventricular mass.
Surgery: The Surgery entity consists of the surgical procedure code provided at each patient visit.
Smoke: The smoke data are designated as 1,2, or 3 depending on the level of a patient's smoking status. Here, 1 represents smoking, 2 as never smoked, and 3 equates to having stopped smoking.
Visit: Each patient's visit was recorded alongside the information of age, duration in days, and account number.
Physical: Measure entity contains gender, height, weight, systolic and diastolic blood pressure, pulse rate, and respiration rate.
Disease: Patients are admitted with the ICD-10 code, which illustrates the designated diagnosis code acquired. The disease entity displays the diagnosis of each patient.
Medication: The generic name and the ingredients of each drug prescribed are provided.
Laboratory: The laboratory exam name and exam results are provided.  Fig. 2. An example of a patient's medical journey as a graph database. The visit node in green represents the age, the smoke node in red represents the level of smoking, the surgery node in gray represents the surgery code that reflects the patient's number, the event node in navy represents the event occurrence, a physical node in purple represents weight, the echocardiography node in pink represents the tricuspid valve regurgitation grade, the diagnose code in yellow represents the diagnosis name and the medication node in orange represents the medication name.