Figure 4: Example of the modified Snowball algorithm pipeline. | Scientific Data

Figure 4: Example of the modified Snowball algorithm pipeline.

From: Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction

Figure 4

To train the algorithm, a training corpus is parsed to find all occurrences of the manually generated seed tuples. These tuples comprise quaternary relations that have been extracted from the sentence. They are then clustered based on the similarity measure given in Equation 3. Each phrase cluster generates an extraction pattern that represents the phrase most similar to all others within the cluster. To learn new relationships from previously unseen text, candidate sentences are compared to the extraction patterns and scored based on their level of similarity. The resulting relationships are accepted provided their confidence is above a pre-determined threshold.

Back to article page