Topological properties and organizing principles of semantic networks

Interpreting natural language is an increasingly important task in computer algorithms due to the growing availability of unstructured textual data. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. The fundamental properties of semantic networks must be taken into account when designing NLP algorithms, yet they remain to be structurally investigated. We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages. We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions. Our findings show that the majority of the considered networks are scale-free. Some networks exhibit language-specific properties determined by grammatical rules, for example networks from highly inflected languages, such as e.g. Latin, German, French and Spanish, show peaks in the degree distribution that deviate from a power law. We find that depending on the semantic relation type and the language, the link formation in semantic networks is guided by different principles. In some networks the connections are similarity-based, while in others the connections are more complementarity-based. Finally, we demonstrate how knowledge of similarity and complementarity in semantic networks can improve NLP algorithms in missing link inference.


Introduction
Due to the explosive increase in the availability of digital content, the demand for computers to efficiently handle textual data has never been greater.Large amounts of data and improved computing power have enabled a vast amount of research on Natural Language Processing (NLP).The goal of NLP is to allow computer programs to interpret and process unstructured text.In computers, text is represented as a string, while in reality, human language is much richer than just a string.People relate text to various concepts based on previously acquired knowledge.To effectively interpret the meaning of a text, a computer must have access to a considerable knowledge base related to the domain of the topic 1 .
Semantic networks can represent human knowledge in computers, as first proposed by Quillian in the 1960s 2,3 .'Semantic' means 'relating to meaning in language or logic' and a semantic network is a graph representation of structured knowledge.Such networks are composed of nodes, which represent concepts (e.g., words or phrases), and links, which represent semantic relations between the nodes 4,5 .The links are tuples of the format (source, semantic relation, destination) that encode knowledge.For example, the information that a car has wheels is represented as (car, has, wheels).Figure 1 shows a toy example of a semantic network as the subgraph with the neighborhood around the node car.The past two decades have witnessed a rise in the importance of NLP applications [6][7][8] .For instance, Google introduced Google Knowledge Graph to enhance their search engine results 9 .A knowledge graph is a specific type of semantic network, in which the relation types are more explicit 10,11 .Voice assistants and digital intelligence services, such as Apple Siri 12 and IBM Watson 13 , use semantic networks as a knowledge base for retrieving information 14,15 .As a result, machines can process information in raw text, comprehend unstructured user input, and achieve the goal of communicating with users, all up to a certain extent.Recently, OpenAI made a great leap forward in user-computer interaction with InstructGPT, better known to the general public as ChatGPT 16 .
Language is a complex system with diverse grammatical rules.To grasp the meaning of a sentence, humans leverage their natural understanding of language and concepts in contexts.Language is still poorly understood from a computational perspective and, hence, it is difficult for computers to utilize similar strategies.Namely, machines operate under unambiguous instructions that are strictly predefined and structured by humans.Though we can argue that human languages are structured by grammar, these grammatical rules often prove to be ambiguous 17 .After all, in computer languages, there are no synonyms, namesakes, or tones that can lead to misinterpretation 18 .Computers rely on external tools to enable the processing of the structure and meaning of texts.
In this paper, we conduct systematic analyses of the topological properties of semantic networks.Our work is motivated by the following purposes: • Understand fundamental formation principles of semantic networks.
In many social networks connections between nodes are driven by similarity [19][20][21][22] .The more similar two nodes are in terms of common neighbors, the more likely they are connected.Thanks to the intensive study of similarity-based networks, many successful tools of data analysis and machine learning were developed, such as link prediction 23 and community detection 24 .These tools may not work well for semantic networks, because words in a sentence do not necessarily pair together because of similarity.Sometimes, two words are used in conjunction because they have complementary features.Therefore, we study the principles that drive the formation of links in semantic networks.
• Document language-specific features.
Languages vary greatly between cultures and across time 25 .Two languages that originate from two different language families can differ in many types of features since they are based on different rules.It is natural to conjecture that there exist diverse structures in semantic networks for different languages.
• Better inform NLP methods.
Although there have been numerous real-world NLP applications across various domains, existing NLP technologies still have limitations 26 .For example, processing texts from a language where single words or phrases can convey more than one meaning is difficult for computers 27,28 .Existing, successful algorithms built on top of semantic networks are usually domain-specific, and designing algorithms for broader applications remains an open problem.To design better language models that can handle challenges such as language ambiguity, we first need to gain a better understanding of the topological properties of semantic networks.
Previous studies on semantic networks focused on a few basic properties and relied on multiple datasets with mixed semantic relations, which we discuss in detail in the 'Related work' section.Therefore, it is difficult to compare the results within one study and between two different studies.To our knowledge, there has been no systematic and comprehensive analysis of the topological properties of semantic networks at the semantic relation level.
To sum up, the main objective of this paper is to understand the structure of semantic networks.Specifically, we first study the general topological properties of semantic networks from a single language with distinct semantic relation types.Second, we compare semantic networks with the same relation type between different languages to find language-specific patterns.In addition, we investigate the roles of similarity and complementarity in the link formation principles in semantic networks.
The main contributions of this paper include: triangles and quadrangles in the graphs, following a recent work by Talaga and Nowak 29 .Hereby, we show to what extent these networks are similarity-or complementarity-based.We find that the connection principles in semantic networks are mostly related to the type of semantic relation, not the language origin.This paper is organized in the following manner: In the 'Related work' section, we provide a brief overview of the previous work on the properties of semantic networks.In the 'General properties of semantic networks' section, we study the general topological properties of seven English semantic networks.In the 'Language-specific properties' section, we compare the properties of semantic networks between 11 different languages.The section 'Similarity and complementarity in semantic networks' deals with the fundamental connection principles in semantic networks.We measure and compare the structural similarity and complementarity in the networks in this study and we discuss the patterns that arise.Finally, we summarize our conclusions and findings and give recommendations for future research in the 'Discussion' section.

Related work
Due to the growing interest in semantic networks, related studies were carried out in a wide range of different fields.Based on our scope, we focus on two main aspects in each work: the topological properties that were analyzed in the study and the dataset that was used in the analysis (i) and the universal and language-specific patterns which were found and discussed (ii).
The majority of semantic networks literature is centered around three link types: co-occurrence, association and semantic relation.In a co-occurrence network, sets of words that co-occur in a phrase, sentence or text form a link.For association networks, participants in a cognitive-linguistic experiment are given a word and asked to give the first word that they think of.There are several association datasets, one example is the University of South Florida Free Association Norms 30 .Semantic relations are relations defined by professionals like lexicographers, typical examples are synonym, antonym, hypernym and homonymy.The specific instances of the semantic relations are also defined by the lexicographers or extracted computationally from text corpora.
In 2001, Ferrer-i-Cancho and Sole 31 studied undirected co-occurrence graphs constructed from the British National Corpus dataset 32 .They measured the average distance between two words and observed the small-world property, which was found in many real-world networks 33 .Motter et al. 34 analyzed an undirected conceptual network constructed from an English Thesaurus dictionary 35 .They focused on three properties: sparsity (small average degree), average shortest path length and clustering.That same year, Sigman and Cecchi 36 studied undirected lexical networks extracted from the noun subset of WordNet 37 , where the nodes are sets of noun synonyms.They grouped networks by three semantic relations: antonymy, hypernymy and meronymy.A detailed analysis of characteristic length (the median minimal distance between pairs of nodes), degree distributions and clustering of these networks were provided.Semantic networks were also found to possess the small-world property of sparse connectivity, short average path length, and strong local clustering 34,36 .
Later, Steyvers and Tenenbaum 38 performed statistical analysis of 3 kinds of semantic networks: word associations 30 , WordNet and Roget's Thesaurus 39 .Apart from the above-mentioned network properties, they also considered network connectedness and diameter.They pointed out that the small-world property may originate from the scale-free organization of the network, which exists in a variety of real-world systems [40][41][42] .
As for patterns across different languages, Ferrer-i-Cancho et al. 43 built syntactic dependency networks from corpora (collections of sentences) for three European languages: Czech, German and Romanian.They showed that networks from different languages have many non-trivial topological properties in common, such as the small-world property, a power-law degree distribution and disassortative mixing 44 .
Existing studies have identified some general network properties in semantic networks such as the small-world property and power-law degree distributions.However, the datasets used in these studies are often different, sometimes even within the same study, rendering direct comparison of results difficult.Some used associative networks generated from experiments and some studied thesauri that were manually created by linguists.In addition, most of the research performed consists of coarse-grained statistical analyses.Specifically, different semantic relations were sometimes treated as identical and the subset of included nodes was often limited (e.g., only words and no phrases or only nouns).Further, there are only very few studies on semantic networks from languages other than English.
Therefore, our analyses focus on semantic networks with different semantic relations (link types) from a single dataset.We consider networks defined by a specific link type, make these networks undirected and unweighted and compare the structural properties between networks with different link types.In addition, we apply similar analyses to semantic networks with the same link type across different languages.Furthermore, we investigate the roles that similarity and complementarity play in the formation of links in semantic networks.

General properties of semantic networks
To gain an understanding of the structure of semantic networks, we first study their general topological properties.We introduce the main characteristics of the dataset that we use throughout this study, ConceptNet 45 , in the section 'Data' in SI.Next, we list the semantic relations that define the networks in this study in Supplementary Table S1.The overview of the semantic networks is given in Supplementary Table S2.In this section, we compute various topological properties of these networks related to connectedness, degree, assortative mixing and clustering.We summarize the overall descriptive statistics of the semantic networks in Supplementary Table S3.

Connectedness
We measure the connectedness of a network by the size of the largest connected component and the size distribution of all connected components.The complete component size distributions of the English semantic networks are shown in Supplementary Figure S2.Supplementary Table S4 lists the sizes of the largest connected components (LCCs) in absolute numbers as well as relative to the network size.The same statistics are computed after degree-preserving random rewiring of the links for comparison 44 .The purpose of random rewiring is to estimate the value of a graph metric that could be expected by chance, solely based on the node degrees (see SI for details on the rewiring process).
Based on the percentages of nodes in the LCC, all seven semantic networks are not fully connected.The networks 'Is-A', 'Related-To' and 'Union' are almost fully connected given that their LCCs contain over 90% of nodes.Networks 'Has-A', 'Part-Of', 'Antonym' and 'Synonym' are largely disconnected, with the percentages of nodes in their LCCs ranging from 22% to 65%.Most of the rewired networks are more connected than the corresponding original networks, especially networks 'Antonym' and 'Synonym'.In other words, the majority of our semantic networks are less connected than what could be expected by chance.For networks 'Related-To' and 'Union', the percentage of nodes in the LCC remains almost unchanged, while the 'Is-A' network is more connected than expected.

Degree distribution
Figure 2 shows that the densities Pr[D = k] of the degree distributions of our seven English semantic networks all appear to approximately follow power laws in the tail visually.A more rigorous framework for assessing power laws was proposed by Voitalov et al. 46 , who consider networks to have a power-law degree distribution if Pr[D = k] = ℓ(k)k −γ for a slowly varying function ℓ(k), see the section 'Consistent power-law exponent estimators' in SI. Figure 2 includes the estimates γ based on the slopes of the densities Pr[D = k] on a log-log scale, along with the three consistent estimators from the framework of Voitalov et al. 46,47 .According to these estimators, the degree sequences of 5 out of the 7 networks are power-law.The degree sequences of the 'Synonym' and 'Antonym' networks are hardly power-law because at least one of the γ > 5 and therefore the estimated exponents are not listed.
For most networks, the estimated exponent γ lies between 2 and 3. Therefore, most semantic networks are scale-free, except for the 'Synonym' and 'Antonym' networks.In the literature, semantic networks were also found to be highly heterogeneous 38,48 .Moreover, the word frequencies in several modern languages were found to follow power laws 49 .In the section 'Language-specific properties', we will see that the 'Synonym' and 'Antonym' networks in most considered languages are hardly power-law or not power-law networks.
The heterogeneity in the degree distribution seems natural for networks such as the 'Is-A' network: there are many specific or unique words with a small degree that connect to only a few other words, while there are also a few general words that connect to almost anything, resulting in a large degree.Examples of general words with a large degree are 'plant' and 'person', while specific words like 'neotectonic' and 'cofinance' have a small degree.Our results show that many semantic networks have power-law degree distributions, like many other types of real-world networks [50][51][52] .

Degree assortativity
A number of measures have been established to quantify degree assortativity, such as the degree correlation coefficient ρ D and the Average Nearest Neighbor Degree (ANND) 44 .Figure 3 shows the average nearest neighbor degree as a function of the degree k for four selected networks and their values after random rewiring as well as the degree correlation coefficient ρ D .Refer to Supplementary Fig. S3 for the ANND plots of all networks.The randomized networks with preserved degree distribution have no degree-degree correlation.As a result, the function ANND does not vary with k.The randomized networks serve as a reference for the expected ANND values when the links are distributed at random.We find that most semantic networks are disassortative as ANND is a decreasing function of the degree k and the degree correlation coefficient ρ D is negative.These networks are 'Has-A', 'Part-Of', 'Is-A', 'Related-To' and 'Union'.In disassortative networks, nodes with larger degrees (general words) tend to connect to nodes with smaller degrees (less general words).This is not surprising.Indeed, if we use these relations in a sentence, then we often relate specific words to more general words.For example, we say 'horse racing is a sport', in which 'horse racing' is a very specific phrase while 'sport' is more general.
The degree sequences of the networks 'Antonym' and 'Synonym' were estimated to be hardly power-law because at least one of the γ > 5.The data are logarithmically binned to suppress noise in the tails of the distributions, see the section 'Power-law degree distributions' in SI for details on how the power-law densities are processed and the power-law exponent estimation procedures.
On the other hand, network 'Synonym' is assortative as the function ANND increases in the degree k.This indicates that large-degree nodes (general words) connect to nodes that have similar degree (words with the same generality).The same applies to network 'Antonym'.Although the degree correlation is not very pronounced and reflected by the small correlation coefficient ρ D = −0.005,we still see a slight upward trend in the curve of ANND.
The function ANND of a rewired network is not degree-dependent anymore, shown by the orange curves in Figure 3.The curve is almost flat for 'Synonym' and 'Related-To'.At the larger degree k, the curve may drop slightly, as for large-degree nodes there are not enough nodes of equal degree to connect to.

Clustering coefficient
In networks such as social networks, the neighbors of a node are likely to be connected as well, a phenomenon which is known as clustering 33,53 .If a person has a group of friends, there is a high chance that these friends also know each other.These networks are characterized by many triangular connections.
Figure 4 shows the average clustering coefficient c G (i) of nodes with degree d i = k of four English networks.Refer to Supplementary Fig. S4 for the clustering coefficients of all seven networks.All networks have small clustering coefficients in absolute terms, which, in combination with the small average degree E[D], indicates a local tree-like structure.We find that the networks 'Part-Of', 'Antonym' and 'Synonym' have substantially larger clustering coefficients than their rewired counterparts: there are more triangles in these networks than expected by chance.On the other hand, the network 'Has-A' has lower clustering coefficients c G (i) than the randomized network, therefore it seems that the 'Has-A' network is organized in a different way than the other networks.As for the networks 'Is-A', 'Related-To' and 'Union', the clustering coefficients c G (i) are similar to their corresponding rewired networks.
In summary, we find that English semantic networks have power-law degree distributions and most are scale-free, which coincides with the results in previous studies 38,48 .Besides, semantic networks with different link types show different levels of degree assortativity and average local clustering.Most works in the literature have identified high clustering coefficients in semantic networks 34,36,38,48 .This encourages us to further investigate the organizing principles of these semantic networks, which we will discuss explicitly in the 'Similarity and complementarity in semantic networks' section.

Language-specific properties
Up to this point, we have only considered English semantic networks, while there are thousands of other languages in the world besides English.In this section, we consider semantic networks from 10 other languages contained within ConceptNet: French, Italian, German, Spanish, Russian, Portuguese, Dutch, Japanese, Finnish and Chinese.We group the in total 11 languages based on their language families and we again study the topological properties of 7 different semantic relations per language.Finally, we observe peculiarities in the degree distribution densities of the 'Related-To' networks in some languages, which we later explain by grammar.

Language classification
In linguistics, languages can be partitioned in multiple different ways.Mainly, there are two kinds of language classifications: genetic and typological.The genetic classification assorts languages according to their level of diachronic relatedness, where languages are categorized into the same family if they evolved from the same root language. 54.An example is the Indo-European family, which includes the Germanic, Balto-Slavic and Italic languages 55 .
One popular typological classification distinguishes isolating, agglutinating and inflecting languages.It groups languages in accordance with their morphological word formation styles.A morph or morpheme (the Greek word µoρϕ ή means 'outer shape, appearance' of which the English 'form' is derived) is the basic unit of a word, such as a stem or an affix 56 .For instance, the word 'undoubtedly' consists of three morphs: 'un-', 'doubted' and '-ly'.In an isolating language, like Mandarin Chinese, each word contains only a single morph 54 .In contrast, words from an agglutinating language can be divided into morphs with distinctive grammatical categories like tense, person and gender.In an inflecting language, however, there is no exact match between morphs and grammatical categories 54 .A word changes its form based on different grammar rules.Most Indo-European languages belong to the inflecting category.Based on these two types of classifications, we have selected 11 languages to cover different language types, Supplementary Table S5.Typologically, Chinese is an isolating language, while Japanese and Finnish are agglutinating languages.The rest of the languages under consideration (8 out of 11) belong to the inflecting category.Genetically, French, Italian, Spanish and Portuguese belong to the Italic family, while English, German and Dutch are Germanic.Russian is a Balto-Slavic language, Japanese is Transeurasian, Chinese is Sino-Tibetan and Finnish belongs to the Uralic family.We mainly refer to the typological classification throughout our analyses.

Overview of semantic networks from eleven languages
For every language, we construct seven undirected semantic networks with the link types 'Has-A', 'Part-Of', 'Is-A', 'Related-To', 'Union', 'Antonym' and 'Synonym'.Due to missing data in ConceptNet, only three languages have the 'Has-A' relation.For these languages, the 'Union' network is the union of three link types: 'Part-Of', 'Is-A' and 'Related-To'.In this section, we provide an overview of the numbers of nodes N and numbers of links L of the semantic networks.Again, we restrict our study to the LCCs of these networks.
Regarding the numbers of nodes N, the networks 'Related-To' and 'Union' are generally the largest networks in a language, with the French 'Union' network being the absolute largest with N = 1, 296, 622, as denoted in Supplementary Table S6.Nevertheless, there are many small networks with size N < 100, particularly for the 'Part-Of' and 'Synonym' networks.
Similar to the English semantic networks, we observe that most networks with more than 100 nodes are sparse.All networks have an average degree between 1 and 6, which is small compared to the network size.Consider the Dutch 'Is-A' network for example, where a node has about 5 connections on average, which is only 2.45% of 191 nodes in the whole network.Supplementary Table S7 lists the average degree E[D] of all our semantic networks.

7/31
Degree distribution Many of the semantic networks in the 11 languages have degree distributions that are approximately power laws.We estimate the power-law exponents only for networks with size N > 1000 because we require a sufficient number of observations to estimate the power-law exponent γ.Supplementary Table S8 lists the estimated power-law exponents γ using the same 4 methods as in Fig. 2 for each semantic network.Refer to the section 'Power-law degree distributions' in SI for details on these estimation procedures.
We find that many networks have power laws in their degree distributions and many of those networks are scale-free (2 < γ < 3).The Chinese 'Related-To' network even has a power-law exponent γ < 2. The degree distributions of all 'Synonym' and 'Antonym' networks are hardly or not power laws, however.The likely reason for this is that nodes in these networks generally have smaller degrees than in other networks.As a result, the slope of the degree distribution is steeper and therefore not classified as a power law.This is not unexpected, as for a given word there are only a certain number of synonyms or antonyms and therefore there are not many nodes with high degrees.Another interesting finding is that the densities of the degree distributions of the 'Related-To' and 'Union' networks for French, Spanish, Portuguese and Finnish show notable deviations from a straight line in the log-log plot, which we discuss in-depth in the next section.

Language inflection
In some languages, the densities of the degree distributions of the 'Related-To' and 'Union' networks show deviations from a straight line on a log-log scale.An example is the Spanish 'Related-To' network in Fig. 5a, where we observe a peak in the tail of the distribution.To find the cause of the anomaly in the degree distribution, we inspect the words with a degree k located in the peak, referred to as peak words, and their neighbors.Supplementary Table S9 lists a few examples of the peak words, which are almost all verbs and have similar spellings.The links adjacent to these nodes with higher-than-expected degrees might be the result of grammatical inflections of the same root words since Spanish is a highly inflected language.We observe a similar anomaly in the degree distributions of French, Portuguese and Finnish 'Related-To' and 'Union' networks.In Supplementary Table S6 we saw that the network 'Union' is mostly composed of 'Related-To' in these four languages, therefore we restrict the analysis to the 'Related-To' networks.
Two common types of language inflection are conjugation, the inflection of verbs, and declension, the inflection of nouns.The past tense of the verb 'to sleep' is 'slept', an example of conjugation in English.The plural form of the noun 'man' is 'men', an example of declension.The languages Spanish, Portuguese and French are much richer in conjugations than English, while Finnish is rich in declensions.

Part-of-speech tags
In the ConceptNet dataset, only part of the nodes is part-of-speech (POS) tagged with one of four types: verb, noun, adjective and adverb.For French, Spanish and Portuguese, the percentage of verbs in the peak is larger than in the LCC, while for Finnish the percentage of nouns in the peak is larger than in the LCC, see Supplementary Table S10.Remarkably, 100% of the Portuguese peak words are verbs.Most neighbors of the peak words are verbs for Spanish (97%), Portuguese (99%) and French (87%), while most neighbors of the peak words are nouns for Finnish (90%), Supplementary Table S11.This strengthens our belief that the abnormal number of nodes with a certain degree k is related to language inflection in these four languages.

Merging of word inflections
To investigate whether the peaks in the degree distribution densities are indeed related to word inflections, we leverage the 'Form-Of' relation type in ConceptNet, which connects two words A and B if A is an inflected form of B, or B is the root word of A 57 .We merge each node and its neighbors from the 'Form-Of' network (its inflected forms) into a single node in the 'Related-To' network, as depicted in Supplementary Figure S5. Figure 5 shows the degree distribution densities of the 'Related-To' networks before and after node merging.The range of the anomalous peak in the density of the degree distribution is highlighted in yellow.In each panel, the number of grammatical variations m coincides with the center of the peak.As seen in Fig. 5a, the peak completely disappears in the Spanish 'Related-To' after node merging, thus the peak is described entirely by connections due to word inflections.We also observe significant reductions in the heights of the peaks for Portuguese and Finnish 'Related-To' networks.However, for the French 'Related-To' network there is only a slight reduction in height after merging, which we believe is likely due to poor coverage in the French 'Form-Of' network with only 17% of words in the peak.In contrast, the Spanish 'Form-Of' network covers 97% of the Spanish peak words, while for Portuguese and Finnish approximately 50% of the peak words are covered, Supplementary Table S12.

The number of inflections
In a language, the number of distinct conjugations of regular verbs is determined by the number of different pronouns and the number of verb tenses, which are grammatical time references 58 .In Spanish, there are 6 pronouns and 9 simple verb tenses, resulting in at most m = 6 × 9 = 54 distinct verb conjugations 59,60 .Table 1 exemplifies these 54 different conjugations for the verb 'amar', which means 'to love'.There are also irregular verbs that follow different, idiosyncratic grammatical rules, but the majority of the verbs in Spanish are classified as regular, like in most inflecting languages.The number of grammatical variations m = 54 coincides with the center of the peak in Figure 5a.Like Spanish, Portuguese has m = 54 distinct conjugated verb forms 61 .In French, there are m = 6 × 7 = 42 distinct verb conjugations 62 .In Finnish, there are in total 15 noun declensions or cases with distinct spelling, each having singular and plural forms, resulting in m = 30 different cases of a Finnish noun 63 .Supplementary Table S13 lists the number of grammatical variations m in French, Spanish, Portuguese and Finnish, along with the minimum and maximum degree k min and k max where the peak starts and ends.By Fig. 5 we confirm that the number of grammatical variations m coincides with or is close to the center of the peak.
In summary, we observe anomalies in the degree distributions of 'Related-To' networks from the inflecting languages Spanish, French and Portuguese and the agglutinating Finnish.Because of grammatical structures, root words in these languages share many links with their inflected forms, resulting in more nodes with a certain degree than expected.While Finnish is typologically classified as agglutinating, it still has many noun declensions, suggesting that the agglutinating and inflecting language categories are not mutually exclusive.

Similarity and complementarity in semantic networks
Although we have identified several universal characteristics of semantic networks, we also observe notable differences in some of their properties.The clustering coefficient in some semantic networks, for instance, is greater than expected by chance, while in other semantic networks, e.g., the English 'Has-A' network, the clustering coefficient is smaller than expected by chance.
We hypothesize that these semantic networks are organized according to different principles.It is commonly known that one such principle is similarity: all factors being equal, similar nodes are more likely to be connected.Similarity is believed to play a leading role in the formation of ties in social networks and lies at the heart of many network inference methods.At the same time, recent works indicate that many networks may be organized predominantly according to the complementarity principle, which dictates that interactions are preferentially formed between nodes with complementary properties.Complementarity has been argued to play a significant role in protein-protein interaction networks 64 and production networks 65 .In addition, a geometric complementarity framework for modeling and learning complementarity representations of real networks was recently formulated by Budel and Kitsak 66 .This section aims to assess the relative roles of complementarity and similarity mechanisms in different semantic networks.We utilize the method by Talaga and Nowak 29 .The method assesses the relative roles of the two principles by measuring the relative densities of triangular and quadrangle motifs in the networks.Intuitively, the transitivity of similarity -A similar to B and B similar to C implies A similar to C -results in a high density of triangles 20,67,68 , Supplementary Figure S6a.The non-transitivity of complementarity, on the other hand, suppresses the appearance of triangles but enables the appearance of quadrangles in networks 64,69 .

9/31
We measure and compare the density of triangles and quadrangles with the structural similarity and complementarity coefficients using the framework of Talaga and Nowak 29 .After computing the densities of triangles and quadrangles, the framework assesses their significance by comparing the densities to those of the configurational models built with matching degree distributions, see the SI for a summary.As a result of the assessment, the network of interest is quantified by two normalized structural coefficients corresponding to complementarity and similarity.
Figure 6 depicts the relative roles of complementarity and similarity in 50 semantic networks.We observe that semantic networks are clustered together according to semantic relation types and not their language families, indicating that specific types of semantic relations matter more for the organizing principles of a semantic network rather than its language.
Based on the calibrated complementarity and similarity values, we can categorize most semantic networks as (i) predominantly complementarity-based, (ii) predominantly similarity-based, and (iii) networks where both complementarity and similarity are substantially present.
We observe four clustering patterns in Figure 6.
• Cluster 1 (light blue): the 'Synonym' networks are characterized by stronger similarity than complementarity values.This observation is hardly surprising since 'Synonym' networks link words with similar meanings.Since similarity is transitive, the Synonym' networks contain significant numbers of connected node triples, leading to large clustering coefficients.
• Cluster 2 (red): the 'Antonym' networks, as observed in Fig. 6, belong to the upper triangle of the scatter plot plane, hence complementarity is more prevalent in these networks than similarity.This observation is as expected, as antonyms are word pairs with opposite meanings that complement each other.In our earlier work 66 we learned a geometric representation of the English 'Antonym' network demonstrating that antonyms indeed complement each other.
More surprisingly, some antonym networks are characterized by substantial similarity values, implying the presence of triangle motifs.This is the case since there are instances of three or more words that have opposite meanings to all other words in the group.One example is the triple of words (man, woman, girl).Each pair of words in the triple is opposite in meaning along a certain dimension, here either gender or age.
• Cluster 3 (purple): the 'Has-A' networks show more complementarity than similarity.Intuitively, words in 'Has-A' complement one another.For instance, 'a house has a roof ' describes a complementary relation and these two objects are not similar to one another.We highlight four clusters of networks using different colors.
• Cluster 4 (dark blue/yellow): Most of the 'Related-To' and 'Union' networks show more similarity, except for Chinese.
As defined in the 'Related-To' relation, words are connected if there is any sort of positive relationship between them, therefore triangles are easily formed.One exception to that rule is that the Chinese 'Related-To' network (dark blue cross) shows the strongest complementarity of all networks and lower-than-expected similarity.We find that a possible explanation is that the Chinese language has many measure words that are connected to a wide range of nouns.Measure words, also known as numeral classifiers, are used in combination with numerals to describe the quantity of things 70,71 .For example, the English 'one apple' translates to 一个苹果 in Chinese, where the measure word 个 (gè) must be added as a unit of measurement between the number 'one', 一 (yī) , and the noun 'apple', 苹果 (píng guǒ) .The Chinese measure word 个 can be loosely translated to English as 'unit(s) of', as in 'one unit of apple'.This grammatical construct is comparable to the phrase 'one box of apples' in English, where 'box' serves as a measure word, but, contrary to Chinese, measure words are rare in English.In the Chinese 'Related-To' network, there are many measure words that are connected to multiple nouns, and these nouns may have no connection with each other at all.Most of the measure words are not connected with each other either.Hence, the pairings of measure words and nouns lead to quadrangles, a likely explanation of why the Chinese 'Related-To' network shows the highest complementarity.

Discussion
In summary, we have conducted an exploratory analysis of the topological properties of semantic networks with 7 distinct semantic relations arising from 11 different languages.We identified both universal and unique characteristics of these networks.We find that semantic networks are sparse and that many are characterized by a power-law degree distribution.We also find that many semantic networks are scale-free.We observe two different patterns of degree-degree mixing in these networks, 11/31 some networks are assortative, while some are disassortative.In addition, we find that most networks are more clustered than expected.
On the other hand, some semantic networks -'Related-To' in French, Spanish, Portuguese, and Finnish -have unique features that can be explained by rules of grammatical inflection.Because of the grammar in these languages, words have many conjugations or declensions.We have related anomalous peaks in the degree distributions to the language inflections.Notably, we found word inflection not only in inflecting languages but also in Finnish, which is an agglutinating language.
We have also quantified the relative roles of complementarity and similarity principles in semantic networks.The proportions of similarity and complementarity in networks differ depending on the semantic relation type.For example, the 'Synonym' networks exhibit stronger similarity, while the links in the 'Antonym' network are primarily driven by complementarity.In addition, the Chinese 'Related-To' network has the highest structural complementarity coefficient of all networks, which we attribute to a unique grammatical phenomenon in Chinese: measure words.
Through the analysis of the topological properties of semantic networks, we found that complementarity may play an important role in their formation.Since most of the state-of-the-art network inference methods are either built on or inspired by the similarity principle, we call for a careful re-evaluation of these methods when it comes to inference tasks on semantic networks.One basic example is the prediction of missing links.In a seminal work, Kovács et al. 64 demonstrated that protein interactions should be predicted with complementarity-tailored methods.We expect that similar methods might be in place for semantic networks.Instead of using the triangle closure principle, one might benefit from the methods based on quadrangle closure, Figure 7.It is not as easy to illustrate quadrangle closure in network embedding methods or NLP methods in general.A plethora of methods use multiple modules and parameters in learning tasks and can, in principle, be better optimized for the complementarity structure of semantic networks.In our recent work, we propose a complementarity learning method and apply it to several networks, including the 'Antonym' semantic network 66 .
Recent groundbreaking advances in large language models are attributed to the multi-head attention mechanism of the Transformer, which uses ideas consistent with the complementarity principle 72 .We advocate that a better understanding of statistical mechanisms underlying semantic networks can help us improve NLP methods even further.

Clustering coefficient
We investigate clustering in semantic networks by measuring the clustering coefficient c G (i) of a node i, which equals the ratio of the number y of connected neighbors to the maximum possible number of connected neighbors, as defined by Watts and Strogatz 33,42 .The graph clustering coefficient c G is the average over all node clustering coefficients, We also calculate the average clustering coefficient c G (i) of nodes with degree d i = k.In addition, we calculate c G (i) also after random rewiring for comparison.

Yes bird → wing Manual + Automatic
Part-Of A is a part of B. This is the part meronym relation in WordNet.

Is-A
A is a subtype or a specific instance of B; every A is a B. This can include specific instances; the distinction between subtypes and instances is often blurry in language.This is the hyponym relation in WordNet.

Related-To
The most general relation.There is some positive relationship between A and B, but ConceptNet can't determine what that relationship is based on the data.

No learn ↔ erudition Manual + Automatic
Antonym A and B are opposites in some relevant way, such as being opposite ends of a scale, or fundamentally similar things with a key difference between them.Counterintuitively, two concepts must be quite similar before people consider them antonyms.This is the antonym relation in WordNet.

No black ↔ white Automatic
Synonym A and B have very similar meanings.They may be translations of each other in different languages.This is the synonym relation in WordNet.

No sunlight ↔ sunshine Automatic
Table S1.Definition of the six relations and related information from ConceptNet 57 .

Overview of the English semantic networks
In Table S2, we list basic descriptive statistics of the seven semantic networks: the number of nodes N, the number of links L, the maximum degree d max and the average degree Based on the number of nodes, network 'Has-A' is the smallest (N = 7, 503) and network 'Union' is the largest (N = 677, 426).The number of links L ranges from 5,421 to 1,819,646.Relative to the network sizes, all 7 networks have a small average degree, ranging from 1.45 to 5.43.For instance, in network 'Part-Of', on average a node only has connections to 2 (0.02%) of the in total 11,839 nodes.In other words, the number of links is of the same order as the number of nodes, which indicates that semantic networks are sparse.Table S2.Basic statistics of the seven English semantic networks extracted from ConceptNet.

Descriptive statistics semantic networks
Table S3 shows the overall descriptive statistics of the English semantic networks: the number of nodes N, the number of links L, the maximum degree d max , the average degree E[D], the average nearest neighbor degree (ANND), the graph clustering coefficient c G and the estimated power-law exponents γ.We rewire all semantic networks using the methods described before, after which the same statistics are calculated for the rewired networks.
For networks obtained by degree-preserving rewiring, only the ANND and the graph clustering coefficient c G change.The average nearest neighbor degree ANND becomes smaller for all randomized semantic networks, except for the 'Synonym' network.
All networks except the 'Has-A' network have a remarkably larger graph clustering coefficient c G (at least by an order of magnitude) than the randomized networks.Because in random networks links are randomly distributed, there are fewer triangles.On the contrary, the randomized networks of 'Has-A' exhibit a clustering coefficient more than seven times larger than their original networks.S3.Statistics of the LCCs of seven English semantic networks extracted from ConceptNet.A cross ( Ś ) indicates the degree sequence of the corresponding network is hardly or no power-law.
In summary, we find universalities across semantic networks from different languages in the degree distribution, degree assortativity, clustering, sparsity and connectedness.Most semantic networks have power-law degree distributions and most of them are scale-free networks.There are two types of degree mixing patterns in semantic networks: assortative and disassortative.Most networks have higher average clustering coefficients than expected by chance, except for one network, the network 'Has-A', which shows lower clustering.All semantic networks have high sparsity.Most networks have a single connected component containing the majority of the nodes, except for the network 'Has-A', which is more fragmented.Table S4.Number of nodes in the LCCs of the seven English networks in the original and rewired networks.The LCC sizes of the rewired networks are each the average over 10 rewiring realizations with standard deviation shown.

Degree-preserving network rewiring
Degree-preserving network rewiring randomly rewires the links between nodes without changing the node degrees.To preserve the degrees of all nodes, we randomly select 1 link pair (4 nodes) and swap the endpoints of these 2 links.Figure S1 illustrates the rewiring method.To make sure that all links are likely to be rewired at least once, we repeat the random selection of links for T times, where we choose T = 4L, four times the number of links.The pseudocode is provided in Algorithm 1.

Distribution of connected components
We compute all connected components for each network and count the occurrence of the different component sizes.The results are presented in Figure S2.Overall, almost every network has a large connected component that is several orders of magnitude larger than the other connected components, except for the network 'Has-A', which has multiple larger connected components.Hence, network 'Has-A' is more fragmented, having three relatively larger connected components, where the node with the largest degree is not in the LCC but in the second largest one.We inspected each of these three connected components and find that each of the components has a distinct theme.For example, the component with the largest degree node contains all kinds of disease names.We believe that the fragmentation is caused by the partial automatic creation of the dataset.

Descriptive statistics of the semantic networks from different languages
This section shows the descriptive statistics of semantic networks from the eleven languages.Each property is compared among the seven networks for the eleven languages.

Language classifications
Table S5 shows the typological and genetic classifications of the eleven considered languages.

Germanic
English, Dutch, German

Sino-Tibetan Chinese
Uralic Finnish Table S5.Genetic and typological language classifications of the eleven languages.

Overview statistics of semantic networks from different languages
Table S6 shows the number of nodes of each semantic network in the eleven different languages.A blank element in the table indicates that the network does not exist, i.e., a relation is not available in that language.

Estimated power-law exponents for semantic networks from different languages
Table S8 lists the estimated power-law exponents γ for each semantic network in the eleven languages.We consider a network to not have a power-law degree distribution if it is not or hardly power-law according to the method of Voitalov et al. 46 .

Node merging procedure
First, we extract the network 'Form-Of' in the same way as for all other networks.Then we treat the merged group of words as a single word in the 'Related-To' network in the same language.Next, we calculate the number of nodes with degree k in the new 'Related-To' network.Finally, we plot the densities of the degree distributions of French, Spanish, Portuguese and Finnish networks.

27/31
Figure S5.Illustration of the merging of words in the 'Related-To' network.After merging a root word and its neighbors, all words in a circle are seen as a single word.Table S13.The maximum number of grammatical variations m for the grammatical rule of interest in French, Spanish, Portuguese and Finnish.The minimum and maximum degree k min and k max where the peak starts and ends in the densities of the degree distributions of the 'Related-To' networks are included for comparison.

Structural similarity coefficient
For the convenience of the reader, here we summarize the main components of the framework for computing structural similarity and complementarity coefficients by Talaga and Nowak 29 .The structural similarity coefficient s i generalizes the local clustering and closure coefficients.The local clustering coefficient s W i of a node i is the classic clustering coefficient.It is defined as the fraction of triples centered at i which can be closed to form a triangle, where T i is the number of triangles including i and t W i is the number of wedge triples (Fig. S6b), or 2-paths with node i in the middle, e.g., ( j, i, k).The definition of the local closure coefficient 74 is given as follows

28/31
where t H i is the number of head triples (Fig. S6c), i.e., 2-paths starting from node i, such as (i, j, k).Both s W i and s H i are bounded in the range [0, 1], but they capture different parts of the spectrum of similarity-driven structures 29 .
Combining the weighted average of these two coefficients results in a more comprehensive measure of local structure, the structural similarity coefficient 29 , which captures the full spectrum of structural similarity.It is defined as The coefficient s i = 1 only if node i is in a fully connected network.The structural similarity coefficient of a whole network G is then the average over all nodes

Structural complementarity coefficient
Analogously, the local quadruples clustering coefficient at node i is defined as the fraction of closed quadruples with i at the second position 29 where Q i represents the number of quadrangles contain that node i and q W i is the number of wedge quadruples (Fig. S6e), or 3-paths with i at the second node, e.g., (l, i, j, k).Similarly, the local quadruples closure coefficient of a node i calculates the percentage of closed quadruples beginning at i where q H i is the number of head quadruples originating from node i (Fig. S6f).Finally, the structural complementarity coefficient is constructed as the weighted average of the local quadruples clustering and closure coefficients 29 The structural complementarity coefficient c i ∈ [0, 1], which is proven to be a more general measure than using only c W i or c H i 29 .The maximum c i = 1 happens only if node i belongs to a fully connected bipartite graph.In a bipartite graph, nodes are divided into two groups, and connections are only formed between groups but not within the same group.
The structural complementarity coefficient of a whole network G is then the average of all nodes: Table S14 lists the procedures of how we compute the structural similarity and complementarity coefficients to quantify the density of triangles and quadrangles in a network G, respectively.

Procedure Structural coefficients
Network G Similarity (△) Complementarity (□) Step 1 Wedge triple/quadruple s W i , Eq. 5 c W i , Eq. 9 Head triple/quadruple s H i , Eq. 6 c H i , Eq. 10 Step 2 Node-wise s i , Eq. 7 c i , Eq. 11 Step 3 Network-wise s(G) = 1 N ∑ N i=1 s i , Eq. 8 c(G) = 1 N ∑ N i=1 c i , Eq. 12 Step 4 Calibrated Network-wise C (s Table S14.The procedure of calculating the structural similarity coefficient and complementarity coefficient of a network G.
The calibrated structural coefficients in step 4 are obtained by taking the average log ratio of network-wise coefficient over the coefficient of a sampled network G i , see Eq. 13.

Structural coefficients
From the previous section, we learn that similarity-based networks are rich in triangles because of the triangle closure principle.The clustering coefficient is a classic measure of the density of triangles in a network.However, we cannot simply compare the number of triangles and quadrangles between two networks, because these networks have different sizes and degree distributions.We need to reliably calculate the statistics of triangles and quadrangles of a network to quantify similarity and complementarity.
To this end, we rely on a recent work of complementarity 29 .The structural similarity coefficient is a weighted average of two clustering coefficients based on head and wedge triples (Figs.S6c and S6b).Analogous to the clustering coefficient, we can use structural complementarity measures based on quadrangle closure rules (Fig. S6d).Similarly, the structural complementarity coefficient is a weighted average of two coefficients based on head and wedge quadrangles (Figs.S6f and S6e).The detailed procedure of calculating the structural similarity coefficient and complementarity coefficient of a network G is explained in SI.

Calibration
This section presents the configuration model used to calibrate structural coefficients of semantic networks.The details of the calibration process are provided as well.

Undirected Binary Configuration Model
In this paper, we utilize Undirected Binary Configuration Model (UBCM) 75 to calibrate the structural coefficients.The UBCM generates a maximum entropy probability distribution over a network with the constraints of an expected degree sequence.It is suitable for undirected and unweighted networks.The resulting maximum entropy distributions are maximally unbiased with respect to any other property 76 .

Calibration of structural coefficients
First of all, we calculate one structural coefficient (similarity or complementarity) of a given network G.We denote this coefficient as x(G).Second, we sample R randomized copies G i 's of the given network from the configuration model.Then, we calculate the structural coefficient x(G i ) for each sampled network.At last, we take the average log ratio of x(G) and x(G i )'s.
As a result, the calibrated coefficient C G (x) based on R samples from the configuration model is obtained as follows 29 The calibrated structural coefficient can be less than, equal to or larger than zero.Consider the calibrated structural similarity coefficient C G (s) for example: • C G (s) < 0, the structural similarity coefficient s(G) is smaller than s(G i ) of random networks.
• C G (s) = 0, the structural similarity coefficient is comparable to the ones in random networks.
• C G (s) > 0, the structural similarity coefficient is larger than in random networks.
We do not compute the structural coefficients for networks that have less than 100 nodes, because there is a high chance that there exist no triangles or quadrangles in the sampled networks and the structural coefficient x(G i ) = 0, in that case.When x(G i ) = 0, Eq. 13 is undefined.
Since the runtime of the algorithm depends on the size of a network and the choice of the number of randomized networks R, we do not compute the structural coefficients for the two largest networks, the French 'Related-To' and 'Union' networks with N > 1, 200, 000 each, as the computation time would be infeasible.We use R = 500 for most networks and for the remaining two largest networks, English 'Related-To' and 'Union', we set R = 100 to avoid long computation time.

Figure 1 .
Figure 1.Toy example of a semantic network with six concepts and five semantic relations of four different types.

Figure 2 .
Figure 2. Degree distribution densities Pr[D = k] of the LCCs of the seven English semantic networks.The data is scaled by powers of 1000 to better visualize the power law in each density.The corresponding estimated power-law exponents γ are shown if there is a power law, Pr[D= k] ≈ ℓ(k)k −γ .The degree sequences of the networks 'Antonym' and 'Synonym' were estimated to be hardly power-law because at least one of the γ > 5.The data are logarithmically binned to suppress noise in the tails of the distributions, see the section 'Power-law degree distributions' in SI for details on how the power-law densities are processed and the power-law exponent estimation procedures.

Figure 3 .
Figure 3. Average nearest neighbor degree (ANND) as a function of degree k and degree correlation coefficient ρ D of four English semantic networks.(a) Network 'Has-A', (b) Network 'Is-A', (c) Network 'Antonym', (d) Network 'Synonym'.See SI for the results of all seven networks.Data points in circles are the average ANND of nodes with degree k in a network, triangles represent the data after logarithmic binning, and squares are the average ANND of nodes with degree k in the rewired network.Logarithmic binning is to better visualize the data.

Figure 4 .
Figure 4.The average clustering coefficient c G (i) of nodes with degree d i = k of four English semantic networks.(a) Network 'Has-A', (b) Network 'Is-A', (c) Network 'Related-To', (d) Network 'Synonym'.See the SI for the results of all seven networks.Data points in circles are the original average local clustering coefficients of nodes with degree d i = k, triangles represent data after logarithmic binning, and squares show the average clustering coefficients of nodes with degree d i = k (logarithmically binned) in the randomized networks.

FinnishFigure 5 .
Figure 5. Densities Pr[D = k] of the degree distributions of the 'Related-To' networks before and after node merging of inflected forms in (a) Spanish, (b) French, (c) Portuguese and (d) Finnish.The logarithmically binned densities after node merging are shown in orange.The peaks are highlighted in yellow.The vertical black lines indicate the number of grammatical variations m for the relevant grammatical rule in the respective language.In each panel the number of grammatical variations m coincides with the center of the peak.

Figure 6 .
Figure 6.Calibrated average structural coefficients of the LCCs of the 50 semantic networks from 11 languages.Languages that belong to the same family are marked with similar shapes.Triangles represent Italic, quadrilaterals represent Germanic, circles represent Balto-Slavic, a star represents Transeurasian, a cross represents Sino-Tibetan and a pentagon represents Uralic.The marker size scales logarithmically with the number of nodes N in the network and is further adjusted for visibility.The grey lines at x = 0 and y = 0 indicate the expected coefficients based on the configuration model (see SI).The dashed line at y = x indicates that the structural similarity and complementarity coefficients are equal.Networks in the upper left area (shaded in blue) are more complementarity-based, while networks in the lower right area (shaded in yellow) are more similarity-based.We highlight four clusters of networks using different colors.

Figure 7 .
Figure 7. Examples of similarity and complementarity in semantic networks.(a) Similarity: triangle closure in the 'Synonym' network.(b) Complementarity: quadrangle closure in the 'Antonym' network.

Figure S1 .
Figure S1.Illustration of degree-preserving rewiring.By randomly swapping the endpoints of two links (a, b) and (c, d), new links can be constructed without changing the node degrees.

Figure S6 .
Figure S6.Quadrangle and quadruples in comparison with triangle and triples.Wedge and head triples (or quadruples) are different at where node i is centered.Node i in a wedge triple (b) is centered in the middle, while i in a head triple (c) is centered at the beginning.Similarly, node i in a wedge quadruple (e) is centered at the second location, while i is at the beginning of a head quadruple (f).

Table 1 .
Verb conjugation table for the Spanish verb 'amar' (to love).The 6 pronouns and 9 verb tenses result in a maximum of 54 different conjugated forms.

Table S6 .
Number of nodes N in the LCCs of the semantic networks from the eleven different languages extracted from ConceptNet.A blank element indicates the corresponding network is not available.The 'Union' network is the union of four networks ('Has-A', 'Is-A', 'Part-Of' and 'Related-To').Because we display the LCC sizes, for some 'Union' networks, the number of nodes exceeds the sum of the sizes of its four constituent networks.

Table S7 .
Average degree E[D] in the LCCs of the semantic networks from the eleven different languages extracted from ConceptNet.A blank element indicates the corresponding network is unavailable.

Table S8 .
Estimated power-law exponents γ for the LCCs of the semantic networks in different languages.A blank element indicates the corresponding network is either unavailable or the number of nodes N < 1000.A cross ( Ś ) indicates that the degree sequence of that network is not or hardly power-law.

Table S9 .
Examples of words in the peak and their neighoring words in the Spanish 'Related-To' network.

Table S10 .
Percentages of POS tags among peak words and in the LCCs of the 'Related-To' networks of four inflecting languages.

Table S11 .
The mean and Standard Deviation (SD) percentage of verbs and nouns in the neighbors of peak words of the LCC of network 'Related-To' in four inflecting languages.

Table S12 .
The percentages of matched words among the peak words of the LCCs of the 'Related-To' networks in four languages.