A novel model for relation prediction in knowledge graphs exploiting semantic and structural feature integration

Relation prediction is a critical task in knowledge graph completion and associated downstream tasks that rely on knowledge representation. Previous studies indicate that both structural features and semantic information are meaningful for predicting missing relations in knowledge graphs. This has led to the development of two types of methods: structure-based methods and semantics-based methods. Since these two approaches represent two distinct learning paradigms, it is difficult to fully utilize both sets of features within a single learning model, especially deep features. As a result, existing studies usually focus on only one type of feature. This leads to an insufficient representation of knowledge in current methods and makes them prone to overlooking certain patterns when predicting missing relations. In this study, we introduce a novel model, RP-ISS, which combines deep semantic and structural features for relation prediction. The RP-ISS model utilizes a two-part architecture, with the first component being a RoBERTa module that is responsible for extracting semantic features from entity nodes. The second part of the system employs an edge-based relational message-passing network designed to capture and interpret structural information within the data. To alleviate the computational burden of the message-passing network on the RoBERTa module during the sampling process, RP-ISS introduces a node embedding memory bank, which updates asynchronously to circumvent excessive computation. The model was assessed on three publicly accessible datasets (WN18RR, WN18, and FB15k-237), and the results revealed that RP-ISS surpasses all baseline methods across all evaluation metrics. Moreover, RP-ISS showcases robust performance in graph inductive learning.

on textual semantic information.Nadkarni et al. 22 conducted a systematic investigation into the integration of graph embedding with pre-trained language models for the purpose of knowledge graph completion.They introduced a series of fusion strategies, exemplified by KGE-BERT, which have been applied specifically to the task of medical knowledge graph completion.Fundamentally, this approach amalgamates shallow structural features with profound textual characteristics, thereby achieving a more nuanced and comprehensive completion of the knowledge graph.The synthesis of these distinct facets offers a robust and innovative methodology, underscoring the potential of multi-modal integration within the realm of knowledge representation.Shen et al. 23 designed a novel method LASS that fuses semantic and structural features.To obtain semantic features, they adopted the KG-BERT approach, using pretrained language models to extract the semantic feature of entities.Instead of directly extracting structural features for structural information, their method employs a TransE-based scoring function to compute the probability scores of triples.

Problem definition
In a knowledge graph G = (V, E) , where V is a set of nodes and E is a set of edges, the graph is composed of multiple triples {h, r, t} , where h and t denote the head and tail nodes, and r represents the relation.A representative task within the domain of knowledge graph completion is Link Prediction, which aims to predict the missing components of a triplet {h, r, t} .Some studies further subdivide Link Prediction into several subtasks 24,25 .In our work, the relation prediction problem that we have conducted can be defined as: predicting the missing relation based on the head and tail entities, symbolized as predicting the missing relation in {h, ?, t} 26,27 .
Generally speaking, the fundamental concept of knowledge graph completion is to establish a scoring function S(•) , which assigns a score to any given triplet {h, ?, t} .If this triplet is present within the knowledge graph, it receives a high score, and vice versa.Building upon this idea, for the task of relation prediction, we can transform the scoring function into: where r , h and t respectively represent the relation, head entity, and tail entity.G denotes the knowledge graph where the triplets reside.represents the method for calculating relation scores, used to compute the likelihood of the existence of relation r between h and t.

Model architecture
To improve the prediction of missing relations by integrating textual semantic features and graph structural features, we propose the RP-ISS model.The model utilizes the tacit knowledge and semantic feature extraction ability of RoBERTa.Due to some advanced models for knowledge graph completion, such as KG-BERT and GilBERT, being end-to-end prediction models that cannot be used as PLMs, we primarily utilized RoBERTa in RP-ISS.To enable the model to process graph structural information, we design and apply a Relation Message Passing Network (as described in Section "Relational message passing").The structure of RP-ISS is shown in Fig. 1.
To extract semantic feature of entities, we transform the information of an entity node into a set of tokens to serve as input for RoBERTa.For an entity, we transform its name and description into a token sequence in the form of [ [SEP] signifies the separation of sentences.This set of tokens consists of two sentences, S e and S d .Sentence S e contains the tokens of the entity to be encoded, and sentence S d contains the tokens of its description.To predict the relations between the two nodes, we construct the tokens of head entity h and tail entity t separately, and input them into the same RoBERTa for encoding.The encoded output of h and t from the RoBERTa are represented as u and v respectively.
To extract the structural feature of the graph, we put the encoded vectors u and v from RoBERTa through a Relation Message Passing Network (as described in Section "Relational message passing").In the Relation Message Passing Network, we use subgraph sampling to recursively sample and aggregate the information of neighboring edges related to the current edge, to obtain the representation of the current edge.The edge representation is calculated based on the entity embeddings with a fully connected layer.During each sampling hop, multiple nodes are sampled and edge representations are calculated.The vectors of these sampled nodes are from the previous round of RoBERTa encoding stored in the Node Embedding Memory Bank and do not participate in the current backpropagation computation.The encoding process can be expressed as follows: where r sema represents the semantic representation of the edge between an entity pair and while r l strc refers to the structural representation of the edge between the entity pair.r h,t denotes the initial representation of the edge between head entity h and tail entity t .s l−1 e represents the result of sampling a subgraph and aggregating structural information, which aggregates information from the edges connected to the head node e h and the edges connected to the tail node separately e t .l represents the number of hops of the sampled subgraph, also known as sampling depth. (1) Our goal is to predict the type of relation that connecting h and t , that is, to find the most appropriate r for the incomplete triple {h, ?, t} .Once we calculated the semantic representation vector r sema and the structural feature representation vector r l strc , we concatenate r sema and r l strc into a comprehensive representation e r Then we pass e r through a fully connected layer to adjust the dimensions and output a probability distribution representing the relation type of e r .The softmax function for classification activates the final output.

Relational message passing
Our study explores integrating RoBERTa's linguistic capabilities with GNN's structural insights to improve relation prediction.A simple stacking of RoBERTa and GNN creates excessive computational load due to GNN's need to aggregate features from surrounding nodes.We propose a novel method combining RoBERTa's strengths with knowledge graph structures through an edge-based relational message-passing mechanism and a node embedding memory bank.This allows seamless integration of RoBERTa's encoded entity output into the messagepassing network, enhancing performance without the computational burden of a stacked model.

Edge-based relational message passing
The goal of the relation prediction task is to predict the type of the relation between two nodes.Therefore, we need to focus on the representation of the edges, rather than the representation of the nodes, which is the focus of most GNN models.Thus, we propose an edge-based relational message passing method.As shown in Fig. 2, we provide an example to illustrate how the edge-based relational message passing network helps us predict missing relations.Suppose there is a missing relation between the entities Alice and Bob, with the relationship type being Wife_of.This means that Alice is Bob's wife.There exists another entity Rick, whose mother and father are Alice and Bob respectively.When predicting this relation, the Edge-based relational message passing network can encode the information of the relation between Alice and Rick as well as between Bob and Rick, and pass it onto the current edge awaiting prediction.Compared to the general node-based message passing method, this method treats edges as nodes for message passing and information aggregation.The information aggregated on an edge is derived from the edges connected to the two entities to which the edge is linked.Nonetheless, considering the potentially large number of edges present in a graph, in our method, we do not directly store the representations of edges, but calculate them through the nodes.The message passing process of edges can be expressed as: where e h,t is the representation of the edge between an entity pair, which is calculated based on the head n h and tail n t .We use a fully connected layer to compute the representation of the edge e h,t .m h is the message from all edges connected to the head entity except for the edge e h,t .m t is the message from representation of all edges connected to the tail entity except for the edge e h,t .N(h) and N(t) are the neighbor nodes of entity h and t .m h and m t are calculated through an aggregate function.e h,t ′ is the updated representation of e h,t , which is calculated by the original representation e h,t , the messages from neighbor edges m h and m t .U represents the updating function for edge representations.Within U , e h,t , m h and m t are concatenated and passed through a residual connection targeting e h,t to update its representation.

Node embedding memory bank
Directly stacking RoBERTa with edge-based relation message passing networks incurs high computational costs due to the need for RoBERTa to encode each sampled neighboring node, exponentially increasing with sampling depth.To reduce this, especially during backward propagation in training, we implemented a node embedding memory bank (illustrated in Fig. 3).This bank stores RoBERTa-encoded nodes from previous training, updating all node pair embeddings after each training epoch.While sampling, it uses previously stored encodings, updating the memory bank with new encodings and propagating only the current node pair's gradient information to RoBERTa.However, this creates an inconsistency between the memory bank's node embeddings and RoBERTa's encodings, which we address with a weighted updating strategy for encoding results.This process can be represented as: where n i ′ is the updated embedding of node i , s i represents the embedding of node i encoded by BERT, n i refers to the original node embedding in the memory bank, and is a weighting factor employed to regulate the ratio of the new encoding to the original node embedding.In our model, we set the value of the weighting factor to 0.5.

Datasets and baselines
We carried out experiments on three representative and publicly accessible datasets: WN18, WN18RR, and FB15k-237.WN18 is a knowledge graph that contains concepts and semantic relations of English words, based on WordNet.WN18RR is a subset of WN18 that has removed the reverse relations.FB15k-237, on the other hand, is a knowledge graph based on a large-scale general knowledge from Freebase with reverse relations also removed.The statistics of these three datasets are presented in Table 1.
To evaluate the performance of our proposed model, we compare it with several baseline methods in order to gain insights into its relative effectiveness: including TransE 10 , DistMult 28 , RotatE 29 , SimplE 30 , ComplEx 31 , (4) R-GCN 12 , KG-BERT 32 ,KPE-PTransE 18 , KGE-BERT(with routers) 22 , GilBERT 21 , LASS 23 .TransE, DistMult, RotatE, SimplE, ComplEx, and KPE-PTransE are very classic methods of knowledge representation, primarily expressing the structural features of knowledge graphs.R-GCN stands as a highly representative model for relation prediction, utilizing graph structure effectively.Conversely, KG-BERT, KGE-BERT and GilBERT emerges as an equally prominent model in this domain, leveraging pre-trained language models for relation prediction.LASS is one of the state-of-the-art models for knowledge graph completion based on semantic and structural features.We have used it to carry out relation prediction tasks, and it serves as one of our baselines.
We used the DGL-KE toolkit (see https:// github.com/ awsla bs/ dgl-ke/) to reproduce the results of the TransE, DistMult, RotatE, and ComplEx methods on the experimental dataset, and the results of SimplE, R-GCN, KG-BERT, KGE-BERT, GilBERT and LASS were reproduced using the accompanying source code from the original paper.The results of KPE-PTransE is based on the original paper.

Experiments setting
Relation prediction involves predicting the missing relation between head and tail nodes, i.e., predicting the missing part in {h, ?, t} .We use a multi-class classification model to predict the missing relation.Analogous to the evaluation method employed in link prediction, we utilize the probability distribution generated by the model to compute MRR, Hits@1, and Hits@3.These metrics serve as indicators of the model's performance.
Our model was implemented in PyTorch, using RoBERTa for node sequence encoding and the Adam optimizer for training.We determined the optimal learning rates to be 5e−5 for RoBERTa and 3e−4 for the Relation Message Passing Network through grid search.A warmup mechanism was applied, with warmup steps set at 10% of total training steps, and training capped at 30 epochs to avoid fast convergence of RoBERTa's final layers.To save training time and prevent overfitting, we employ early stopping.Batch sizes were set at 1000 for WN18 and WN18RR, and 500 for FB15k-237, with a maximum input sequence length of 50.We used mixed precision to save GPU memory and expedite training.The models were trained and tested on 4 Nvidia A5000 RTX GPUs.

Main results
We recorded and compared the experimental outcomes of our proposed model and the established baseline models.The main results are shown in Table 2. RP-ISS-Mean and RP-ISS-Attn are two different methods for aggregating information from neighboring edges in a graph-based relation message passing network.RP-ISS-Mean aggregates information from neighboring edges through mean pooling, while RP-ISS-Attn uses a multihead attention mechanism to aggregate information from neighboring edges.As shown in Table 2, the RP-ISS-Attn model exhibits the best performance and outperforms RP-ISS-Mean.Additionally, the RP-ISS-Attn model significantly outperforms all the baseline models.Compared to the structure-based models (TransE, DistMult, RotatE, SimplE, ComplEx, KPE-PTransE, R-GCN), our model shows an average improvement of 5.7%, 4.2%, and 2.5% in MRR, Hits@1, and Hits@3, respectively, compared to the best performance of these models.Compared to the semantic-based models (KG-BERT, GilBERT and KGE-BERT), our model shows an average improvement of 2.7%, 4.0%, and 2.0% in MRR, Hits@1, and Hits@3, respectively, compared to the best performance of these models.
As depicted in Fig. 4, our proposed RP-ISS model, adept at synthesizing both structural and semantic features, demonstrates notable advancements over established baseline models within the experimental datasets.We benchmarked against three salient models: KPE-TransE, grounded in graph embedding; GilBERT, which predominantly relies on textual semantic information; and LASS, which adeptly amalgamates both semantic and structural attributes.Each of these models epitomizes cutting-edge techniques in their respective domains.Upon analyzing the outcomes across the datasets, the RP-ISS model consistently outshines its counterparts.Specifically, it exceeds the KPE-TransE's performance by averages of 6.1% in MRR, 4.2% in Hits@1, and 2.6% in Hits@3.Against the GilBERT model, RP-ISS manifests superiority with average margins of 3.6% in MRR, 5.4% in Hits@1, and 2.6% in Hits@3.Furthermore, when juxtaposed with the LASS model, RP-ISS achieves a lead of 2.1% in MRR, 2.5% in Hits@1, and 2.2% in Hits@3.
The reason why RP-ISS can demonstrate a significant improvement over the baseline models is that it fully leverages structural and semantic features to predict missing relations.For those relations predicted through structural information, RP-ISS's edge-based message passing network module can effectively learn and represent these structural patterns, making accurate predictions for these relations.Therefore, compared to models primarily based on semantic features such as KG-BERT, GilBERT, KGE-BERT, RP-ISS shows a notable performance boost due to its utilization of structural features.For those relations that require inference from semantic information, RP-ISS's RoBERTa module can effectively learn and represent these semantic features and make correct predictions about these relations.Hence, compared to models mainly based on structural features like R-GCN, ComplEx, KPE-PTransE, RP-ISS also exhibits significant improvements.Since RP-ISS uses deep features relative to advanced models like LASS which also utilize both semantic and structural characteristics; it achieves a clear improvement as well.

Ablation study
We conducted an ablation study to further understand the role of semantic feature and structural feature.As shown in Table 4.The results show that the performance of both models is significantly weaker than that of the RP-ISS model.Based on experiments on three datasets, the model without semantic feature fusion (RP-STRC) experienced a 2.87% decrease in MRR.The model without structural feature fusion (RP-SEM) saw a 2.94% MRR decrease.
On WN18RR, RP-STRC and RP-SEM had MRR reductions of 6.25% and 5.25%, Hits@1 reductions of 10.63% and 9.04%, and Hits@3 reductions of 2.17% and 1.75%, respectively.On WN18, RP-STRC and RP-SEM experienced MRR decreases of 1.21% and 2.80%, Hits@1 decreases of 2.04% and 5.42%, and Hits@3 decreases of 0.54% and 0.14%, respectively.On FB15K-237, RP-STRC and RP-SEM had MRR reductions of 1.16% and 0.79%, Hits@1 reductions of 1.73% and 1.67%, and Hits@3 reductions of 0.68% and 0.67%, respectively.RP-STRC exclusively utilizes the edge-based message passing network from our research, effectively capturing structural features.RP-STRC adopts randomly initialized embeddings.Specifically, we initialize the node embeddings in RP-STRC with a uniform distribution ranging from [− 1, 1].During the learning process of RP-STRC, the node embeddings are iteratively updated.Comparisons with established structure-based approaches, as shown in Fig. 5, demonstrate RP-STRC's competitive performance in relation prediction tasks.Notably, RP-STRC, focused on structural features, marginally outperforms the semantic feature-based RP-SEM model (see Table 3), highlighting the importance of structural information in knowledge graph relation predictions.
The RP-ISS model uses a hyperparameter λ to balance the weight of structural embeddings and semantic embeddings in the Node Embedding Memory Bank.To better study the impact of different settings of the hyperparameter λ on the model, we conducted experiments with various values of λ.As shown in Table 4, we found that setting the hyperparameter λ to 0.5 yields the best results.Looking at the distribution of results, it is evident that increasing λ significantly reduces model performance.A larger value for hyperparameter λ means fewer structural features and more semantic features are considered.This is consistent with conclusions drawn from ablation studies, which suggest that structural features are more important for relation prediction.

Inductive prediction
We conducted an inductive graph learning experiment to evaluate our model's performance in inductive prediction, crucial in knowledge graph research.In real-world applications, domain knowledge graphs often expand over time, requiring models to handle different graph structures during training and prediction (inductive tasks) 33 .Our approach uses BERT to convert node information into token sequences, aiding inductive learning.To assess this, we performed experiments on the WN18RR dataset by simulating an inductive task, varying the number of nodes in the training set and training models with different reduction ratios.
In Fig. 6, we compared four models' performance in inductive tasks, finding that our model and KG-BERT were effective.However, our model's performance slightly falls behind KG-BERT when more than 40% of nodes are reduced.This is likely due to differences in handling textual semantics: KG-BERT uses sequential encoding of entities for relation prediction without needing structural data, while RP-ISS uses individual entity encoding combined with structural information.KG-BERT's performance declines slowly in inductive prediction as it relies less on structural data.In contrast, RP-ISS, although also based on textual semantics, depends more on the graph's structural integrity, leading to faster performance drops as structural data decreases.We plan to further explore RP-ISS-based techniques for better handling inductive prediction challenges.

Conclusions
Our study introduces RP-ISS, a new model for relation prediction in knowledge graphs, integrating both semantic and structural features for enhanced accuracy.RP-ISS uses BERT for semantic encoding and an edge-based relational message passing network for structural representation, with a node embedding memory bank to reduce CLS][Entity Name][SEP][Entity Description].Here, [Entity Name] represents the tokens of the entity's name.[Entity Description] represents the tokens of the entity's description.[CLS] is a special classification token.

Figure 1 .
Figure 1.The architecture of our model RP-ISS.

Figure 2 .
Figure 2. Illustration of Edge-based relational message passing.

Figure 3 .
Figure 3. Illustration of node embedding memory bank.

Figure 4 .
Figure 4. Comparison of RP-ISS with representative SOTA models.

Figure 5 .
Figure 5.Comparison of RP-STRC with other structure-based methods.

Table 1 .
The statistics for the datasets.

Table 2 .
The main results on WN18RR, WN18 and FB15K-237 in relation prediction task.*The bold font indicates the best result among all outcomes.

Table 3 .
Results of ablation study on test set.

Table 4 .
Performance of the Model on the WN18RR with Different Hyperparameter λ Settings.