FGSI: distant supervision for relation extraction method based on fine-grained semantic information

Relation extraction is one of the important steps in building a knowledge graph. Its main objective is to extract semantic relationships from identified entity pairs in sentences, playing a crucial role in semantic understanding and knowledge graph construction. Remote supervised relation extraction aligns knowledge bases with natural language texts and generates labeled data, which alleviates the burden of manually annotating datasets. However, the labeled corpus obtained from remote supervision contains a large amount of noisy data, which greatly affects the training of relation extraction models. In this paper, we propose the hypothesis that key semantic information within the sentence plays a crucial role in entity relation extraction in the task of remote supervised relation extraction. Based on this hypothesis, we divide the sentence into three segments by splitting it according to the positions of entities, starting from within the sentence. Then, using intra-sentence attention mechanisms, we identify fine-grained semantic features within the sentence to reduce the interference of irrelevant noise information. We also improved the intra-bag attention mechanism by setting a threshold gate to filter out low-relevant noisy sentences, minimizing the impact of noise on the relation extraction model, and making full use of available positive semantic information. Experimental results show that the proposed relation extraction model in this paper achieves improvements in precision-recall curve, P@N value, and AUC value compared to existing methods, demonstrating the effectiveness of this model.


Introduction
Relationship extraction aims to identify the relationship between entity pairs in plain text sentences to obtain structured knowledge information, i.e., triple information (Entity A, Relation, Entity B), which is an important research hotspot in natural language processing [1] and an essential preparatory work for constructing knowledge graphs [2].Currently, machine learning methods for relationship extraction can be divided into unsupervised learning [3], supervised learning [4], semisupervised learning [5], and remote supervision learning [6] according to whether the required training corpus is annotated.Although supervised learning methods for relationship extraction have high accuracy and satisfactory overall performance, they require manual annotation of the dataset before model training, which involves a significant amount of human, material, and financial resources.With the continuous development of relationship extraction technology, Mintz [6] et al. proposed the idea of remote supervision in 2009, which automatically aligns the knowledge base with plain text to generate annotated data.The main idea is based on a strong assumption that "if two entities have a certain relationship in the knowledge base, then all sentences containing these two entities will express this relationship."For example, (Huawei, founder, Ren Zhengfei) is a triple relationship instance in Freebase, and all sentences containing these two entities will be labeled as founder relationship.However, the remote supervision method proposed by Mintz [6] et al. still has flaws.The strong assumption they proposed for relationship extraction tasks leads to incorrect annotation problems in the generated dataset, resulting in noise interference in the actual model training process and affecting model performance.
One of the main research directions for distant supervision relation extraction is to develop denoising methods for the relation model, as proposed by Yang Suizhu et al. [7].In recent years, scholars have proposed various solutions for sample denoising.Surdeanu et al. [8] addressed the noisy label problem by adopting a multi-instance learning strategy.Takamatsu et al. [9] designed a generative model to identify patterns of positive and negative samples, discarding negative pattern samples and retaining positive pattern samples to improve the overall performance of the relation extraction model.Zeng et al. [10] considered the limitations of traditional natural language processing tools and proposed the use of convolutional neural networks for relation extraction, using word vectors and word position vectors as inputs, which achieved better results than classical machine learning models.Nguyen et al. [11] proposed using windows of multiple scales to extract multidimensional features instead of conventional lexical features, which achieved better results than traditional convolutional neural network models.Zeng et al. [12] designed a segmented convolutional neural network to extract sentence features and used multi-instance learning to eliminate annotation errors in incorrect samples, reducing the impact of erroneous samples on the overall model performance.Yan Xu et al. [13] first proposed using Long Short-Term Memory (LSTM) networks for relation extraction and extracted key information through the shortest dependency path, enabling better extraction of sentence-level relations.Yankai Lin et al. [14] improved the selection of training sentences in each bag of multi-instance learning by designing a bag-level attention mechanism to score all sentences in the bag and integrate all sentence information for relation extraction, achieving better results than the baseline model.Guoliang Ji et al. [15] introduced entity description information and sentence-level attention mechanism for distant supervision relation extraction, further enriching entity information, reducing noise interference, and achieving better results than previous baseline models.Peng Zhou et al. [16] proposed using hierarchical selective attention for distant supervision relation extraction, where coarse sentencelevel attention was used to select relevant sentences, word-level attention was used to construct sentence representations, and fine-grained sentence-level attention was used to aggregate sentence representations as model inputs, demonstrating the superior performance of their model through experiments.Feng Jianzhou et al. [17] proposed an improved attention mechanism for relation extraction, in which the model found all positive instances that reflected the relation between the same entity pair at the sentence level, then constructed a combined sentence vector to fully utilize the semantic information of positive instances, achieving higher accuracy than the compared model.Ye Yuxin et al. [18] hypothesized that "the label of the final sentence alignment is a noisy observation result generated based on some unknown factors."They learned the transition probability from noisy labels to true labels by training on automatically labeled data for relation extraction, achieving better results than mainstream baseline models.Although sentence-level or bag-level attention mechanisms can achieve the goal of obtaining positive corpus, they do not consider the fine-grained semantic information within the sentence.If there is too much noise interference within the positive corpus, the corpus may be considered false positive by the program due to its low weight after attention calculation.This is catastrophic for distant supervision datasets with a large number of noisy sentences.
To accurately identify the relationship between two entities in a sentence, we need to focus on the semantic information within the sentence.A complete sentence typically consists of components such as subject, predicate, object, and adverbial.If a sentence can semantically express the relationship between two entities, it must be related to the key semantic information in the sentence, while other information is considered irrelevant or interfering noise.Liu et al.'s study [19] showed that in the classic dataset of distant supervised relation extraction, NYT-Freebase, nearly 99.4% of sentences contain a large amount of noisy words.If the entire sentence is input into the model for training without processing the fine-grained semantic features, it will inevitably be affected by irrelevant noise within the sentence, thus affecting the overall performance of the model.This paper proposes a remote supervision relationship extraction model based on fine-grained semantic information piecewise convolutional neural networks (PCNN+FGSI).The main contributions of this paper are as follows: (1) a new intra-sentence attention mechanism is proposed, which is different from the coarse-grained attention mechanism established at the sentence level.It is used to process fine-grained semantic features within the sentence, highlighting key semantic information and preventing irrelevant information and noise information from participating in the construction of sentence feature vectors with the same weight; (2) Based on (1), after obtaining sentence features that highlight fine-grained semantic information, a bag-level attention mechanism is used to screen positive training sentences with threshold gates and discard noisy sentences, in order to better distinguish positive and negative instances within all sentences containing the same entity pair and construct a combination feature vector to train the relationship classification network; (3) Comparative experiments and ablation experiments are designed to verify the performance advantages of the proposed relationship extraction method.

Segmented convolutional neural network models based on finegrained semantic information
This paper proposes a fine-grained semantic information piecewise convolutional neural network model (PCNN+FGSI) for remote supervised relation extraction.The entire model consists of four parts, namely the text embedding layer based on fine-grained semantic information, the singlesentence feature output layer, the multi-sentence combined feature output layer, and the relation classification layer.The overall structure of the model is shown in Figure 1.In the text embedding layer based on fine-grained semantic information, the entire sentence is divided into three parts based on the positions of the two entities, and then the intra-sentence attention mechanism is applied to increase the weight of the part containing key semantic information and decrease the weight of the part containing noise information.The resulting representation emphasizes fine-grained semantic information.After obtaining the semantic embedding representation that emphasizes fine-grained semantic information, the single-sentence feature representation is formed through the encoding layer.The package-level attention mechanism in the multi-sentence combined feature output layer is used to screen positive instance information from the sentence feature representations containing the same entity pair.The weights of the positive instance feature representations are obtained and then the feature vectors are recombined.The recombined feature vectors are sent to the relationship classification layer to train the classifier, which improves the training performance of the model.

Text embedding layer based on fine-grained semantic information
The proposed model relies on neural networks to accomplish the task of relation extraction.However, natural language text cannot be directly used by neural networks.Therefore, when completing natural language processing tasks with neural networks, the first step is to convert the natural language text into a real-valued vector representation.The based on fine-grained semantic information text embedding layer of this model processes natural language text in three steps, namely word embedding, intra-sentence attention mechanism, and relative position information embedding.The structure of the based on fine-grained semantic information text embedding layer is shown in Figure 2.After the training corpus is embedded by the word embedding part, the key semantic information part is given a greater weight by the intra-sentence attention mechanism, and then the relative position embedding information is concatenated to form the embedding vector representation of the sentence.

Intra-Sentence Attention
Lable:Product-Producer "The director has finished his new film and is hosting a celebration dinner."

Word Embedding
Word embedding is the process of transforming words into computable vectors, which are lowdimensional distributed representations of each word.The effectiveness of word embeddings in many natural language processing tasks has been demonstrated by Socher et al. [20].Different methods have been proposed to train word embeddings, such as those by Bengio et al. [21] and Mikolov et al. [22].Currently, the most commonly used pre-trained word vectors are LSA (Latent Semantic Analysis), Word2vec, and GloVe.LSA is an early count-based word vector representation tool based on co-occurrence matrix.It uses matrix factorization techniques based on singular value decomposition (SVD) to reduce dimensionality of large matrices.However, the computational cost of SVD is high.Word2vec's major limitation is that it only utilizes the corpus within a fixed window and does not fully leverage all the available corpus.GloVe combines the advantages of both methods.Figure 3 shows the distribution of the top 100 words with cosine similarity to the word "founder" in the semantic space of GloVe.In this model, we use the pre-trained word embeddings method from Stanford GloVe.Given a sentence  = ( 1 ,  2 ,  3 ,  h ,  5 , ⋯ ,   ,  t ,  +2 , ⋯ ,   ) , each word is represented as a  dimensional real-valued vector using the pre-trained word embedding matrix  ∈  ||×  , where  ℎ and   represent the head and tail entities, respectively.V is the size of the vocabulary (the number of words in the pre-trained word embedding corpus), and  is the length of the sentence.

Intra-sentence attention mechanism
Assuming a sentence  = ( 1 ,  2 ,  3 ,  ℎ ,  4 , ⋯ ,   ,   ,  +2 , ⋯ ,   ) contains an entity pair <  ℎ ,   > and is labeled with relation  , the word embedding vector representation  ′ of the sentence can be obtained using Section 2.1.1,which is a matrix  ×  , where  is the number of words in the sentence and   is the dimension of the word embedding.In this paper, the word embedding vector representation of the sentence  ′ = { 1 ′ ,  2 ′ ,  3 ′ } is divided into three segments based on the positions of the two entities <  ℎ ,   > in the sentence.If a sentence can express the semantic relationship between its two internal entities, it must be related to key semantic information.
After dividing the sentence into three parts according to the positions of the entities, the contributions of different parts to the model's ability to extract the correct entity relation are different.
To enable the model to better understand the key semantic information that expresses different entity relations, different weights are assigned to these three parts to reflect their contribution to relation r.
The equation for calculating the weight of each part is as follows: Where   is the contribution of the i-th segment of the sentence to the relation label  after the sentence is divided into three parts, and the calculation equation is as follows: Where   ′ represents the embedded vector representation of the  -th part of the sentence after embedding, and  ′ represents the embedded vector representation of the relationship label  in the semantic space used by this model.After calculating the contribution of each part, the formula for calculating the final embedded vector of the sentence is as follows:

Position Embedding
Zeng et al. [10] have shown through experiments the importance of positional features in relation extraction tasks.Feng et al. [17] also argue that when judging the relationship between entity pairs in a sentence, words that are closer to the entities are usually key information.Therefore, in order to better capture the structural information of a sentence, this paper introduces positional embeddings in the embedding stage, using positional features to record the relative distances of each word to the two entities.An example of relative distances is shown in Figure 4.
The [director] has finished his new [film] and is hosting a celebration dinner.

Single-sentence feature output layer
The effectiveness of the PCNN model for sentence-level feature extraction has been demonstrated in the studies by Zeng et al. [10] and G. Ji et al. [15].Therefore, in this paper, we adopt the PCNN structure as the single-sentence feature output layer of our model, as shown in Figure 5.After obtaining the embedded representation of the sentence, the embedding vector is fed into the PCNN structure, and the sentence's feature vector representation is obtained through convolutional and piecewise max-pooling computations.

Convolution
In the task of entity relation extraction, the length of each sentence varies.To address this issue, sentence padding is applied to align the length of the corpus.The alignment standard is based on the longest sentence in each ℎ of samples.Additionally, effective information for determining the relationship between target entities may exist at different positions within a sentence.To capture such information from different positions, the model needs to extract local features at different scales to predict the relationship classification for the entity pair.In deep learning, the convolution operation is often used to extract local features of different scales.Dumoulin V et al. [24] conducted in-depth research on convolution algorithms in deep learning.
After the text is embedded with the fine-grained semantic information in the text embedding layer, the final embedding vector representation of the input sentence is defined as  ′′ = { 1 , Here, 1 ≤  ≤  and 1 ≤  ≤ | ′′ | −  + 1.The convolution operation produces feature vectors for each sentence, denoted as  = { 1 ,  2 , ⋯ ,   }.

Piecewise Max Pooling
The max-pooling operation selects only the strongest response in each feature map to pass to the next layer, while discarding other elements, which eliminates a lot of redundant information in the network and makes the entire network structure easier to optimize.However, the single max-pooling operation also has drawbacks, as it often loses some detailed information in the feature maps.In order to solve this problem and make the model more robust to important feature location changes, PCNN divides the sentence instances into three parts based on the positions of the two entities in the given sentence, and performs max-pooling on each part separately.
After the convolution operation in 2.2.1, the feature vector   can be obtained, which can be represented as   = { _1 ,  _2 ,  _3 } by dividing the sentence instance into three parts according to the positions of the given entities.Based on this vector, the segmented max pooling operation is performed, i.e.,   = max( _ ) , where 1 ≤  ≤ ,  = 1,2,3 .Then, the resulting vectors are concatenated to obtain   = [ 1 ,  2 ,  3 ]( = 1,2, ⋯ , ) , where  3 .This represents the feature vector of each sentence obtained after being processed by the PCNN structure.

Multilingual Sentence Combination Feature Output Layer
In order to automatically filter out noisy sentences with significant differences from the labels during the remote supervised relation extraction task, this layer adopts a multiple-instance learning strategy and an intra-bag attention mechanism.It discards all noisy sentences and combines the features of all positive instance sentences to form the training positive examples for the final classifier.The structure of this layer is shown in Figure 6.Each sentence feature vector within a bag is subjected to attention calculation with the relation query vector, and the corresponding weights are obtained.Sentences with weights lower than the hyperparameter β are filtered out by a threshold gate, and then the bag-level vector representation is formed and involved in the training based on the weights.This paper includes all sentences that contain the given entity pair 〈 1 ,  2 〉 and have a relationship label  into the set . Assuming that there are  sentences that meet the requirement, the set  can be represented as  = { 1 ,  2 ,  3 , ⋯ ,   } .After obtaining the feature vector representation  for each sentence in Section 2.2, the vector set P corresponding to the sentence set  can be represented as  = { 1 ,  2 ,  3 , ⋯ ,   } .Due to the noise problem in remote supervision, each sentence in this set expresses the relationship label  differently.Therefore, an intra-bag attention mechanism is adopted to set a weight that can express the relationship label  for each sentence through attention calculation.After threshold gating, the weights ( 1 ,  2 ,  3 , ⋯ ,   )for each sentence involved in the formation of the bag-level vector representation are calculated using the formula in equation ( 6): Here,   represents the relevance degree of the i-th sentence in the set  to the relationship label , and its calculation formula is shown in equation (7): Here,   represents the feature vector of the i-th sentence in the sentence set , and   is the vector representation of the relationship label  in the semantic space, representing the weight of the relationship label  in calculating each sentence.
After the calculation of intra-sentence attention, each sentence in the set  has obtained a weight that expresses the relationship label .This paper believes that different sentences in the same set have different degrees of expression for the relationship label  , which can be reflected in the weight  accordingly.Therefore, positive instances score high on weight  , while negative instances score low on weight .Based on the above assumptions, by setting the hyperparameter , when forming the combination feature vector of multiple sentences, the sentence vectors with weights lower than  are filtered out, thus avoiding noise sentences from participating in the formation of combination feature vectors with low weights.Assuming that after filtering out noise sentences, there are still  sentences left in the set , the formula for generating the combination feature vector of the set is shown in equation ( 8):

Relation Classification Layer
For the set  in Section 2.3, where the distant supervision relationship label is known, in order to compute the probability distribution of the combined feature vector of the set for relationship classification, the  layer is applied to the relationship classification layer in this paper.Assuming that the combined feature vector of the  -th set  is denoted as   , the probability distribution of the relationship obtained by passing the combined feature vector through the softmax layer is shown in equation (9): Here,    ℎ×3 , where ℎ represents the number of pre-defined relations.

Optimization
The model parameters to be optimized in this paper are  = (,  ℎ1 ,  2 , ,   ), where  represents the word embeddings,  ℎ1 represents the position vectors of words relative to the head entity,  2 represents the position vectors of words relative to the tail entity,  represents the parameters involved in the convolutional operation, and   represents the parameters of the relation classification layer.The cross-entropy loss function used in this model is defined as shown in equation (10): Where  is the number of sentence sets, and   represents the combined feature vector of the th sentence set.During parameter updates, Li et al. [25] compared four common optimizers by performing parameter optimization on the hand-written digit recognition MNIST dataset and the FASHION dataset.Among them, the  optimizer performed well.Therefore, the  optimizer was used as the parameter update optimizer for the model in this paper.The Adam optimizer combines the first-order moment of the gradient of SGD-M and the second-order moment of the gradient of RMSprop, taking into account the mean and variance of the gradient, and adds two correction terms on this basis.The formula is shown in equations ( 11)-( 13): Here,   1 represents the bias-corrected first moment estimate and   2 represents the biascorrected second moment estimate, where  1 ,  2 ∈ [0,1] are the decay rates of the first and second moment estimates respectively, and  denotes the learning rate.

Experimentation and Evaluation
To demonstrate the effectiveness of the proposed method in this paper, comparative experiments and ablation experiments were designed in this section to demonstrate the advantages of the proposed method from different perspectives.

Dataset and Evaluation Metrics
The NYT-10 dataset was released by Riedel et al. [12], and many domestic and foreign scholars have conducted research on distant supervision relation extraction based on this dataset [27]  We use the held-out evaluation method to evaluate the proposed relation extraction model, and evaluate the performance of the model through the ( − ) curve and P@N(Precision@TopN).

Parameter Settings
In this study, we tested the performance of the model on the test dataset by adjusting parameters such as the maximum length of training sentences, polynomial decay learning rate, hyperparameters, and batch size.The other parameters were the same as those used by Lin et al. [26].Table 1 shows the main parameters used in the experiments of this study.

Comparative experimental results and analysis
To evaluate the proposed method on the NYT-10 dataset, we selected several classic baseline methods for comparison through held-out evaluation.The compared baseline methods are: • Mintz [6]: Mintz first proposed the idea of distant supervision and combined the advantages of supervised and unsupervised information extraction.
• MultiR [28]: This model, proposed by Hoffmann et al., combines a sentence-level extraction model with a simple corpus-level component for aggregating single facts.
• PCNN+MAX [10]: This method, proposed by Zeng, trains instances with the maximum logistic regression value.
• PCNN+ATT (Sentence-level Selective Attention Model) [26]: This is an improved model based on the PCNN model, proposed by Lin et al., which uses sentence-level attention mechanism.
• PCNN+MIL [10]: This method, proposed by Zeng, combines the advantages of multiinstance learning and the PCNN model.
• PCNN+RL [31]: This method, proposed by Jun Feng et al., applies reinforcement learning to instance selectors to choose high-quality sentences for training the relation classifier.
• APCNNS [15]: This is an extraction method that combines PCNN with entity information, proposed by Ji.
• BGWA [29]: This method, proposed by Jat S et al., uses word-level attention mechanism for relation extraction tasks.Although they can complete relation extraction tasks by extracting features, the experimental results are relatively poor due to the interference of too much noise, which further highlights the importance of extracting features through denoising from large-scale data.(2) In the models using sentencelevel attention, the experimental results of PCNN+RL, PCNN+MAX, and PCNN+MIL are inferior to that of the PCNN+ATT model because the PCNN+ATT model not only makes full use of the information provided by multiple instance sentences but also lowers the weight of noise sentences to reduce their negative impact during the entire model training period.(3) The APCNNS model for relation extraction, which provides additional entity information, enriches instance features and compensates for noise interference using external knowledge, resulting in an improvement compared to the PCNN+RL model.( 4) The proposed PCNN+FGSI model avoids the participation of noise sentences in the feature vector through threshold gating, thereby avoiding interference from noise sentences.Compared with the BGWA model, the proposed model not only focuses on a single word but also considers the segment composed of multiple words.By judging the semantics of the segment and then magnifying the important semantic information of the segment through intrasentence attention, the internal noise of the sentence instances involved in the training is further reduced.Across the entire recall range, the PCNN+FGSI model proposed in this paper achieved the highest precision.
Table 2 shows the comparison of P@N values between the proposed relation extraction method and baseline models.As can be seen from the table, among all the baseline models, the BGWA model has the slowest precision decline.Although the proposed PCNN+FGSI model does not perform as well as the BGWA model in terms of the rate of precision decline, it performs the best within the scope of the indicators.The average precision of PCNN+FGSI model is 8 percentage points higher than that of the PCNN+ATT model, which further validates the advantages of the proposed method.

Ablation experiment results and analysis
To investigate the role of the fine-grained semantic information in the text embedding layer in the model experiments, this section designs an ablation experiment.The control group (CG) represents the proposed model (PCNN+FGSI), while the experimental group (EG) uses a regular text embedding layer.Figure 9 shows the PR curves of the experimental group and the control group.From Figure 9, it can be seen that the control group performs the best on the PR curve.The performance of the experimental group drops slightly when using a regular text embedding layer.This is because the text embedding layer based on fine-grained semantic information can highlight the semantic information that expresses entity relationships in positive instances, enabling the model to learn fine-grained semantic information that expresses entity relationships, and thereby constructing more robust feature vectors.
This paper also uses P@N to compare the performance of the experimental group and the control group, as shown in Table 3. From Table 3, it can be observed that the experimental group with regular text embedding layer shows a decrease in performance in the P@N (N=100/200/300) evaluation metrics compared to the control group.This is consistent with the conclusion obtained from the PR curve analysis, indicating that the text embedding layer based on fine-grained semantic information is helpful in improving the model performance.

Conclusion
This paper proposes a relation extraction model based on fine-grained semantic information for distant supervision.In order to fully reduce the interference of noisy data in the distant supervision dataset and make full use of fine-grained semantic information in training instance sentences, the model first starts from within the sentence to find fine-grained semantic information that can reflect the label relationship in the sentence segment, reducing the interference of irrelevant semantic information, and forming a single-sentence feature vector.Then, by calculating the relevance of each sentence to the label within the same bag and screening positive instances through a threshold gate, all noise sentences are discarded, forming a high-quality combination feature vector to train the classifier.A large number of experiments have shown that the proposed model outperforms the baseline models on P@N and other indicators.

Figure 2 :
Figure 2: Structure diagram of the text embedding layer based on fine-grained semantic information.

Figure 3 :
Figure 3: Distribution of Semantics in Space

Figure 4 :
Figure 4: Example of Relative Distance The model looks up the relative distance of each word w i to the two entities, and then maps these two relative distances to two   -dimensional real-valued vectors (d i e h , d i e t ) .For each sentence that needs to be trained in the model, its word embedding and position embedding are concatenated to obtain the sentence vector representation matrix  = [ 1 ,  2 , ⋯ ,   ] ∈  × , where   = [  ; d i e h ; d i e t ] ,  denotes the length of the sentence,  is the dimension after concatenating the word embedding and position embedding vectors, that is,  =   +   × 2.

Figure 6 :
Figure 6: Multilingual Sentence Combination Feature Output Layer [32].The dataset is aligned with relations in Freebase, and the sentences obtained from news corpus from 2005 to 2006 are used as the training set, while the sentences obtained from news corpus in 2007 are used as the test set.The dataset contains 53 types of relations, including the special relation type "NA", which indicates that there is no relation between two entities.In both the training and test sets, the special relation type "NA" has the largest proportion among all the training sentences.We set the maximum length of sentences in the dataset to 256, and Figure7shows the distribution of sentence lengths in the NYT-10 dataset.It can be seen that the maximum length of sentences is concentrated within[20, 60].

•
Figure 8: Precision-Recall Curve From Figure 8, it can be observed that: (1) Traditional relation extraction methods, such as Mintz and MultiR, are feature-based research methods that do not consider the issue of noise in the dataset.

Figure 9 :
Figure 9: PR curve of the control experimental group

Vector representation Convolution Piecewise max pooling Sentence Feature vector Word Position
2 , ⋯ ,  | ′′ | }, where   denotes the embedding vector representation of the -th word in the sentence, and   ∈  || .In this paper,  : ′′ is used to represent the horizontal concatenation matrix of the embedding sequence [  ,  +1 , ⋯ ,   ] in the sentence, and  represents the length of the filter operator.The weight matrix of the filter operator is denoted as  × .The convolution operation is performed by filtering the embedding vector representation of the sentence with the filter operator, and a vector  ∈  | ′′ |−+1 is obtained, as shown in equation (4): In this formula, 1 ≤  ≤ | ′′ | −  + 1.During the feature extraction process through convolution, different filter kernels are needed to extract feature information at various positions in the sentence instance.Therefore,  different filter kernels are used, and correspondingly, there are  weight matrices  ̂= { 1 ,  2 , ⋯ ,   }.All convolution operations during the feature extraction process can be represented by equation (5):   =   ⊗  (−+1): ′′

Table 1
Parameter Settings

Table 2 P
@N comparison table of PCNN+FGSI and baseline model

Table 3 P
@N comparison table of experimental group and control group