Coronary heart disease prediction method fusing domain-adaptive transfer learning with graph convolutional networks (GCN)

Graph convolutional networks (GCNs) have achieved impressive results in many medical scenarios involving graph node classification tasks. However, there are difficulties in transfer learning for graph representation learning and graph network models. Most GNNs work only in a single domain and cannot transfer the learned knowledge to other domains. Coronary Heart Disease (CHD) is a high-mortality disease, and there are non-public and significant differences in CHD datasets for current research, which makes it difficult to perform unified transfer learning. Therefore, in this paper, we propose a novel adversarial domain-adaptive multichannel graph convolutional network (DAMGCN) that can perform graph transfer learning on cross-domain tasks to achieve cross-domain medical knowledge transfer on different CHD datasets. First, we use a two-channel GCN model for feature aggregation using local consistency and global consistency. Then, a uniform node representation is generated for different graphs using an attention mechanism. Finally, we provide a domain adversarial module to decrease the discrepancies between the source and target domain classifiers and optimize the three loss functions in order to accomplish source and target domain knowledge transfer. The experimental findings demonstrate that our model performs best on three CHD datasets, and its performance is greatly enhanced by graph transfer learning.

We re-extracted two datasets related to all-cause mortality from the original dataset.We constructed graphs for both datasets based on the KNN approach 22 .The details of the experimental datasets are shown in Table 1.All-cause death, Heart level, and Mace occurrence are CHD datasets derived from three different branches under the Cardiovascular Division.For each dataset, we extracted a subset and a number of important features that were relevant to the source task, and the segmentation targets were ultimately highly correlated with patient deaths.In our experiments, we treat them as undirected networks, with each edge representing a similar relationship between patients.We classified each dataset into two categories based on patient outcomes, including "Death (yes)", "Death (no)", "Cardiac function (high risk)", "Cardiac function (low risk)", "Mace (occurred)" and "Mace (did not occur)".We investigated this through six transfer learning tasks, including H → D, M → D, D → H, M → H, D → M, and H → M, where D, H, and M denote all-cause death, heart level, and Mace occurrence, respectively.
Baseline.To make a fair comparison with the model proposed in this paper, we use the following four methods as a baseline.Specifically, we compare this paper's approach with an advanced single-domain node classification model to verify the superiority of the cross-domain model.Later, the necessary comparisons are made between the variants of the model in this paper to analyze the role of each component.
1. DNN: DNN is a multilayer perceptron (MLP) that uses only node features.2. GCN: GCN 16 is a great deep convolutional network for graph-structured data that incorporates network topology and node attributes into a comprehensive learning framework.By default, we use KNN as the composition strategy (denoted as k-GCN).3. p-GCN: A type of GCN, the graph structure used by the model is in the form of a population graph 18 as the main approach.4. AM-GCN: The optimal single-domain graph convolution model studied in 18 is characterized by having adaptive dual-channel graph convolution layers and attention layers represented as aggregated nodes to accommodate node classification for different tasks. 5. DAMGCN: The model proposed in this paper.It is capable of performing graph convolution models for cross-domain node classification.
Experimental parameters.The experimental language uses Python 3.8.11,Scikit-learn 0.24.2, and the environment is the Pytorch 1.6.0framework.Table 2 gives the experimental parameter settings.Our experimental parameters are carried out according to a uniform standard.We employ the Adam optimizer for training and a fixed learning rate of 3e − 3 for all deep model approaches.We use all labeled source data as well as target data with a divided training (60%) and testing (40%) ratio.We apply the same set of parameter configurations to all cross-domain node classification tasks.For GCN, AMGCN, and DAGCN, the GCNs of both the source and target networks contain two hidden layers with a structure of 128-16.The dropout rate of each GCN layer is set to 0.5.The DNN has similar parameter settings as the GCN..The dataset is imbalanced, and we also use MCC and balanced accuracy metrics to reflect its performance 24 .True positive (TP) and true negative (TN) values indicated correctly classified malignant and benign cases, respectively.False-positive (FP) and false-negative (FN) values indicate incorrectly classified benign and malignant cases, respectively.In our experiments, we used the macroaveraging technique to calculate the Precision, Recall, and F score of the model.

Model comparison results.
We applied the advanced single domain model on 3 CHD datasets with the results shown in Table 3.The DNN, p-GCN, k-GCN, and AMGCN are the experimental results from previous work 18 .Their respective advantages and limitations are summarized in Table 4.We mainly focus on the D (all-cause death) risk classification task for detailed analysis.Among these baselines, DNN has the worst performance (44.6% MCC, 72.2% F1 score).This is due to the fact that typical DNNs only take into account node attributes and do not collect information about the network structure to better represent nodes; graph-based approaches (GCNs) have better performance than traditional approaches (63.www.nature.com/scientificreports/by 3.7%, Recall by 6.8%, and F1 score by 5.3%.Thus, Mace occurrence and All-cause death data were similarly distributed and highly correlated, and both were highly transferable.Table 5 shows the performance of the proposed model in different cross-domain node classification tasks.The findings lead to the following conclusions: (1) Compared with the single-domain prediction results in Table 3, it is clear that the cross-domain model proposed in this paper has better prediction results for specific CHD tasks (all-cause death, cardiac function, and Mace occurrence).Overall, the performance of each metric is improved; (2) different source domains have different effects on the target domain migration.For the D dataset, the knowledge provided by H was more helpful for the Recall metric, while M corresponded to the AUC metric.For the H dataset, the knowledge provided by D is more helpful for AUC and Recall metrics, while M corresponds to Precision and Accuracy.For the M dataset, the knowledge provided by D is more helpful for F1 score and Recall metrics, while H corresponds to Precision; (3) Since adversarial domain adaptation in model training tries to find an optimal solution method, both continuously reducing the source and target classifier differences and keeping each classifier trained optimally on its respective target task, it is also improving the source domain performance when improving the target domain.
The results show that by jointly modeling the consistency relationships of each graph, domain information, source domain information, and target domain information into a unified learning framework, the proposed DAMGCN can better capture the underlying representation of graph nodes and reduce the distribution gap across domains.
Table 6 summarizes the graph neural network models used in the paper and their different variants, where the symbolic √ representation algorithm utilizes the corresponding information.The specific analysis will be presented in the next part.

Analysis of variant components.
Since the proposed DAMGCN contains several key components, we compare variants of DAMGCN in this section to demonstrate the superiority of DAMGCN in terms of (1) the effect of target classifier loss; (2) the effect of the global GCN layer module; and (3) the effect of domain adversarial loss.The details are described as follows.
1. DAMGCN¬t.To demonstrate the effectiveness of cross-domain transfer learning, we designed a variant model, DAMGCN¬t, to simulate the case of direct parameter migration.The only difference between DAMGCN¬t and DAMGCN is that DAMGCN¬t removes the target classifier loss from the model, i.e., it does not use information from the target domain, which is the core of cross-domain learning .The results in Table 7 show that when the target classifier loss is not used, the migration effect for all-cause death decreases by 15.07%(H-> D) and 6.94% (M-> D), and the rest of the cross-domain tasks also have substantial decreases.This indicates that the migration effect brought about by directly using model parameters and domain adversarial variance reduction is not good.This would allow the target classifier to gradually approximate the source classifier when, in fact, the target domain information may be very different from the source domain.two.The key between them is the presence or absence of using the PMI matrix.From the results, we found that the DAMGCN model outperformed DAMGCN¬g, with an improvement of 0.83% (H-> D) and 0.55% (M-> D) for all-cause death migration.Similarly, there was some degree of accuracy improvement for both the cross-domain tasks at the heart level and Mace occurrence.This confirms that the ability to extract node information using only the local GCN layer is limited, while the addition of the global GCN layer module containing the PMI matrix enables us to uncover potential relational nodes.And it combines local and global relations to capture a comprehensive representation of nodes.3. DAMGCN¬d.To verify the effectiveness of domain adversarial loss, we compared the DAMGCN model with DAMGCN¬d.DAMGCN¬d removes the gradient inverse layer of DAMGCN, i.e., the domain classifier.Without using domain adversarial loss, the source and target domains each train their own models and share some parameters, and this knowledge transfer is weak.Figure 1 shows the DAMGCN variant for different cross-domain task accuracies.We can easily observe the superiority of the DAMGCN model with a migration improvement of 0.92% (H-> D) and 0.55% (M-> D) for all-cause death.Similarly, some degree of accuracy improvement was observed for both heart level and Mace occurrence for the cross-domain task.On the one hand, this confirms that better node representations can be learned from different domains using adversarial domain loss.On the other hand, adversarial domain loss does help to reduce the difference between two different domains during model training, improving the training effect and migrability to the target domain.

Visualization.
To demonstrate the effectiveness of our proposed model more visually, we performed the visualization task on three CHD datasets.We used the t-SNE 25 method to map the learned embedding vectors into a two-dimensional space before the original features and the last layer of the depth model for each task, respectively.The presentation results of the datasets in Figs. 2, 3, and 4 are colored by the real labels.Observing each figure, we can find that the original feature distribution is chaotic and it is difficult to distinguish between positive and negative class nodes; although GCN can distinguish the respective regions of the classes, there is also a mix of nodes with different labels.Obviously, DAMGCM has the best visualization effect.It learns a more compact embedding structure with the highest inter-class similarity and clear boundaries between different cla sses.

Methods
All the methods under this study were performed in accordance with the relevant guidelines and regulations of Fuzhou University (Fzu).The experimental protocols were approved by the Institutional Review Board (IRB) of Fujian Medical University Union Hospital.Informed consent was taken from all participants.In this section, we present our domain-adaptive graph convolutional network for cross-domain node classification.It is improved

The proposed model (DAMGCN).
Given a source domain G S = V S , E S , X S , Y S and a target domain , the goal of our model is to obtain node representations for the source domain graph embedding Z S and the target domain graph embedding Z T through supervised training.The training process continuously narrows the domain differences, shares the training parameters from the source domain to the target domain, and improves the classification ability of similar tasks in the target domain.First, we use a dual graph convolutional network structure to capture the local and global consistency relations of each graph separately.Among them, the initial inputs to the source and target domains are X S and X T , and the outputs are Z S A , Z S P , Z T A , and Z T P .Then, we apply the graph attention mechanism to the output of each domain to obtain the final outputs of node representations Z S and Z T .Finally, we can effectively learn domain invariant and semantic representa- tions to eliminate domain disparities in cross-domain node classification by employing the source classifier, domain adversary, and target classifier.

Deep transfer learning.
The important mathematical notation in deep transfer learning 9 is defined as follows: the domain can be represented as D = {X, P(X)} , where {x i , . . ., x n } ∈ X is the feature space and P(X) is the edge probability distribution.A task can be represented by T = y, f (x) .It consists of two parts: the label space y and the target prediction function f (x) .f (x) can also be viewed as the conditional probability function P y|x .The deep migration task can be defined as D s , D t , T s , T t , f t (•) .Given a learning task T t based on D t .Deep transfer learning aims to discover and transfer potential common knowledge from D t to T t and improve the performance of the prediction function f t (•) on the learning task T t .where it is sufficient to satisfy D s = D t or T s = T t .The generic transfer learning process is shown in Fig. 6.

Deep domain adaptation.
Domain adaptation is a sub-theme of transfer learning that aims to migrate from a source domain with sufficient labeling information to a target domain with a large amount of unlabeled or little labeled data by minimizing domain differences [27][28][29] .Domain adaptation can mitigate the detrimental effects of domain drift that occurs when knowledge is migrated from a source to a target by facilitating migration with models in different but related domains that share the same label space 30,31 .Several studies have also attempted to apply domain adaptation ideas to graph-structured data.The CDNE 32 algorithm learns transferable node embeddings for cross-network learning tasks by minimizing the maximum mean difference (MMD) loss.But its modeling capability is limited.The AdaGCN algorithm 33 uses graph convolutional networks as feature extractors to learn node representations and uses adversarial learning strategies to learn domain-invariant node representations.But it ignores the semantic information contained in the target domain samples.The UDA-GCN algorithm 26 works similarly to ours, using a composite framework in order to achieve knowledge adaption between graphs.But it studies the problem of graph domain adaptation in unsupervised scenarios.

Point-by-point mutual information matrix. Mutual information measures the amount of information
contained in a given random variable about another random variable.It involves the reduction of uncertainty in one random variable due to the presence of another random variable.Assuming that the two random variables are independent, point-by-point mutual information (PMI) measures the difference between the co-occurrence probability of the marginal distribution given X and Y and the co-occurrence probability given their joint distribution 34 .The PMI metric performs well in semantic similarity tasks.
In order to calculate the PMI matrix (denoted as P), we first use the random walk-based method to calculate the frequency matrix.Random wandering has been used as a similarity metric in semi-supervised graph representation learning and graph classification problems 35 .The random walk utilizes the entire graph information to calculate the neighbor probability of each node.The semantic similarity between different nodes can be calculated by the random walk method.Then, using the semantic knowledge from frequency to frequencybased matrix, we finally get the PMI matrix.

Frequency matrix:
A random wander is a Markov chain that describes the order of all nodes visited throughout the wander.If the random wanderer is on node x i at time t − 1 per unit time, the state at this point is defined as s(t − 1) = x i .If the current node x i jumps to any neighbor x j at the next moment t , the transfer probability is p s(t) = x j |s(t − 1) = x i .Given an arbitrary adjacency matrix A i,j , the transition probabilities of all nodes in A are as follows: 2. P matrix: From the frequency matrix (denoted F) obtained from Eq. ( 1), we define the context of the input as all nodes in X.The visited path is equivalent to a sentence, and the node corresponds to a word.Its i-th row is the row vector F i,: and the j-th column is the column vector F :,j .F i,: corresponds to node x i , and F :,j www.nature.com/scientificreports/corresponds to the context c j .The number of occurrences of x i in the context of c j is the value of the entry F i,j .Based on the frequency matrix F, we compute the P by the following equation: (2) P * ,j = i F i,j i,j F i,j (5) P i,j = max log P i,j P i, * P * ,j , 0   2) to ( 5), the semantic information in P is encoded.P i,j represents the estimated probability of node x i occurring in context c j ; P i, * represents the estimated probability of node x i ; and P * ,j represents the estimated probability of context c j .According to the definition of statistical independence, if x i is independent and occurs randomly with respect to c j , then P i,j = P i, * P * ,j , so PMI i,j = 0 .If x i and c j have a semantic relationship, then P i,j > P i, * P * ,j and PMI i,j have positive values.Similarly, if x i is uncorrelated with c j , PMI i,j may be negative.Since we are interested in pairs x i , c j with semantic relations, our approach uses non-negative PMI i,j .
Global consistency.In 36 , it demonstrates that the graph's global information is more important.Our model trains two GNN modalities to extract features, thereby capturing both local and global information about the graph network.And we encode the semantic information of each node in the network separately.The node representation learning process consists of a local GNN and a global GNN.The local consistency implementation of the graph is to directly use the initial adjacency matrix to perform the graph convolution 16 operation.This adjacency matrix can be the default or otherwise constructed as a single graph.The implementation method of the global consistency of the graph is the graph convolution operation based on random walk processing introduced in the previous section.This is shown below. 1. ConvA (i.e., local consistency network).It adopts the GCN model proposed by Kipf 16 .We briefly describe ConvA as a deep feedforward neural network.Input a feature set X and an adjacency matrix A, and output the embedding Z of the i-th hidden layer of the network as: where A = A + I n is the adjacency matrix with self-loop ( I n ∈ R N×N is the unit matrix) and D i,i = j A i,j .Therefore, is the normalized adjacency matrix.Z (i−1) is the output of the (I − 1)th level, Z (0) = X .W (i) is the trainable parameter of the network, and σ(•) denotes the activation function.2. ConvP (i.e., global consistency network).Here, we introduce the convolutional approach of PMI to encode global information in another GNN network.It extracts more comprehensive semantic information, represented as a matrix P ∈ R N×N .ConvP is derived from the similarity defined by the PMI matrix, indicating that graph nodes appearing in similar contexts tend to have the same label.This neural network is given by the following equation: where P is the PMI matrix and D i,i = j P i,j is used for normalization.Global consistency is ensured by using diffusion based on such a node context matrix P.Moreover, ConvP uses a similar neural network structure as ConvA .It has a higher degree of model coupling, which is convenient for model sharing and parameter transfer.With shared parameters, a node embedding module learns the representation using input from the source and destination domains.

Attention mechanism module.
For each domain, since the embeddings from the local and global consistency networks differ, i.e., contribute differently to the representation of graph learning, we use the graph attention module to obtain the importance of each domain's output.After performing graph convolution to extract features for the source and target domains, we obtain 4 embeddings from the local or global source graph embeddings Z S A , Z S P , and the local or global target graph embeddings Z T A , Z T P .The next step is to combine the aforementioned 4 embeddings from various graphs to create a single representation.
We use the raw inputs X S and X T as the keys to the attention mechanism.Then, we perform the attention mechanism on each of the above domain outputs, resulting in two attention coefficients, att K A and att K A .This is calculated from the attention function f (•) for each domain as follows.
where J is the shared weight matrix, and K indicates whether the output is from the source domain S or the target domain T.This ensures that the input X has the same dimension as the outputs Z K A and Z K P .Next, we use the softmax function to normalize the weights att K A and att K P .After performing the attention mechanism, we can get the final outputs Z S and Z S .

Domain adaptive module for cross-domain node classification.
We propose a model made up of a source classifier, a domain adversarial module, and a target classifier that jointly learn class discrimination and domain invariant node representation to improve the classification of nodes in the target network.This model ( 6) L S , L DA , and L T stand for the source classifier loss, domain adversarial loss, and target classifier loss, respectively.γ 1 is the balance parameter.
1. Loss of the source classifier.The source classifier loss L S f s Z S , Y S is to minimize the cross-entropy loss between the labeled data and the true labels in the source domain, as defined by Eq. ( 13).
where Y i denotes the true label of the ith node in the source domain, and Y i denotes the predicted class of the i-th source-labeled node V S i .2. Loss of the adversarial domain.The goal of the domain adversarial loss L DA Z S , Z T is to minimize the difference in node representation between the source domain network Gs and the target domain network Gt extracted by the convolutional layer to ensure that the model can perform parameter transfer and adversarial learning on two different domains.To achieve this, we add the domain classifier f d Q Z S , Z T ; θ D with the learning parameter θ D , which uses adversarial training to try to distinguish whether the nodes are from Gt or Gs.On the one hand, we want the source classifier f s to minimize Eq. ( 15) during the transfer process and correctly classify each node in the source domain.On the other hand, we want nodes from various domains to have representations that are as similar as possible, making it impossible for the domain classifier to tell if a node is from Gs or Gt (i.e., generative adversarial ideas).
Adversarial-based deep transfer learning is inspired by generative adversarial networks (GANs) [37][38][39][40] , which possess good learning and scalability.In our article, we implement adversarial training using the gradient reversal layer (GRL) 27 .GRL is defined in Eqs. ( 14) and (15) as Q (x) with inverted gradients.In contrast, the learning process of GRL is adversarial, forcing f s Z S to be maximized by inverting the gradient and optimizing θ D by minimizing the loss of the cross-entropy domain classifier.
where m i denotes the domain prediction of the i-th document in the source and target domains and m i ∈ {0, 1} signifies the base fact.
3. Loss of target classifier.In the target domain, a specific entropy loss is placed on the target classifier.
Consistent with the source classifier, we use cross-entropy as the label loss for supervised learning in the target domain.
where Y i denotes the true label of the i-th node in the target domain, and Y i denotes the predicted class of the labeled node V S i in the i-th target domain.The parameters in L S Z S , Y S , L DA Z S , Z T , and L T Z T , Y T are jointly optimized by the objective function in Eq. (12).All parameters are optimized using the standard back-propagation algorithm.

Conclusions
In this paper, we study the problem of cross-domain GNN transfer learning in healthcare.Most existing GNN learn models only in a single graph and do not consider knowledge transfer across graphs.Therefore, we propose a novel adversarial domain-adapted multichannel graph convolutional network (DAMGCN) to enable knowledge transfer between different CHD datasets.By using a dual graph convolutional network to aggregate the local and global relationships of graphs, we are able to learn better node representations in both source and target graphs.And the attention mechanism is combined to further generate uniform embeddings for downstream node classification tasks.By using cross-entropy loss in source domain classification, domain adversarial loss in domain differentiation, and cross-entropy loss in target domain information classification, we are able to reduce domain differences and achieve effective domain adaptation.We conducted experiments on three real CHD datasets, and the results show that our model outperforms existing single-domain network node classification methods and is able to distinguish well between high-risk CHD and low-risk CHD patient groups.This excellent predictive performance provides an aid to physicians in the diagnosis of cardiovascular problems.

Figure 1 .
Figure 1.Accuracy performance of DAMGCN and its variants on different cross-domain tasks.

Figure 2 .
Figure 2. Visualization of D dataset graph embedding learning results using t-SNE (a is the original embedding, b is the GCN, c is the DAMGCN for H-> D, and d is the DAMGCN for M-> D).

Figure 3 .
Figure 3. Visualization of embedding learning results for the Figure H dataset using t-SNE (a is the original embedding, b is the GCN, c is the DAMGCN for D-> H, and d is the DAMGCN for M-> H).

Figure 4 .
Figure 4. Visualization of M dataset graph embedding learning results using t-SNE (a is the original embedding, b is the GCN, c is the D-> M DAMGCN, and d is the H-> M DAMGCN).

Figure 5 .
Figure 5. Overall architecture of the proposed adversarial domain-adapted multichannel graph convolutional network (DAMGCN) for cross-domain node classification.

Figure 6 .
Figure 6.General deep transfer learning process.

( 12 ) 1 N
L Z S , Y S , Z T , Y T = L S Z S , Y S + γ 1 L DA Z S , Z T + L T Z T , Y T (13) L S f s Z S , Y S = − 1 S +N T N S +N T i=1 m i log m i + (1 − m i )log 1 − m i (17) L S f t Z T , Y T = − 1 N T N T i=1 Y i log Y i

Table 1 .
Statistics of the CHD dataset in the experiment.

Table 3 .
Predicted results of all-cause death (indexes 7-8 are predictions of heart level, 9-10 are predictions of Mace occurrences).

Table 4 .
Summary of previous models.

Table 5 .
2. DAMGCN¬g.It belongs to a variant of DAMGCN that removes the global GCN layer of DAMGNN and uses only the local GCN layer.The effectiveness of the global GCN approach is investigated by comparing the Comparison of classification performance for six cross-domain tasks.

Table 6 .
Summary of each graph convolutional network.The symbol √ denotes that the associated information is used by the algorithm.

Table 7 .
Comparison of the classification accuracy of DAMGCN variants on six cross-domain tasks.