Top influencers can be identified universally by combining classical centralities

Information flow, opinion, and epidemics spread over structured networks. When using node centrality indicators to predict which nodes will be among the top influencers or superspreaders, no single centrality is a consistently good ranker across networks. We show that statistical classifiers using two or more centralities are instead consistently predictive over many diverse, static real-world topologies. Certain pairs of centralities cooperate particularly well in drawing the statistical boundary between the superspreaders and the rest: a local centrality measuring the size of a node’s neighbourhood gains from the addition of a global centrality such as the eigenvector centrality, closeness, or the core number. Intuitively, this is because a local centrality may rank highly nodes which are located in locally dense, but globally peripheral regions of the network. The additional global centrality indicator guides the prediction towards more central regions. The superspreaders usually jointly maximise the values of both centralities. As a result of the interplay between centrality indicators, training classifiers with seven classical indicators leads to a nearly maximum average precision function (0.995) across the networks in this study.

. Comparing the location of the top nodes as ranked by (left) neighbourhood size, (centre) eigenvector centrality, and (right) SIR spread size at the epidemic threshold for the coauthorship network Arxiv GRQC. The network layout is force-directed. The colour of the nodes in each panel shows the value of that metric: darker nodes have higher centrality values or spread size. The top f = 5% of the nodes in each case are encircled.

Results
We run an empirical study over 60 real-world examples of static network topologies (listed in Table 1 in Methods). The networks are directed, unweighted, and fall into six categories: human social networks (separately, online or offline), human networks formed by professional coauthorship or online communication, computer networks, and physical infrastructure. The influence of a node is the SIR spread size when the node is the seed of diffusion, estimated via Monte Carlo simulation (see Methods). Analyses are shown in this section for the SIR influence at the epidemic threshold c for every network; they hold also above the epidemic threshold, at 1.5 · c (with numerical results for these shown in the Supplementary Information).
We study seven classical centrality indicators and their combinations, as follows.
• Local metrics, simple to compute, reflect the density of a node's neighbourhood: the degree, neighbourhood (the sum of the degrees of direct neighbours), and two-hop neighbourhood (the sum of the degrees of neighbours exactly two hops away). • The core number results from k-shell decomposition.
• Distance-based centralities, such as closeness and betweenness, reflect the importance of nodes by their link distances in the network. Of these two popular centralities, in prior studies on the SIR model, betweenness showed weak predictiveness both as a ranker of nodes in large networks 5,7 and also in combinations with other centralities on small networks 30 . We thus study here the closeness centrality. • Normalised spectral centralities: PageRank and eigenvector centrality.
The predictive power of single centralities is inconsistent across networks. We first show that the ability of any one centrality indicator to predict the top spreaders across a large number of network cases is too variable to be of universal practical use. Take a network of N nodes, f a fraction, and the task of selecting the best fN spreaders in the network. The standard ranking method has each centrality rank the nodes in this network; the top fN nodes by this ranking are put forward as best spreaders [5][6][7][8][9] (see Methods). The predictive power of the degree centrality is shown in Fig. 2, across all networks, at the epidemic threshold. This is measured via the recognition rate (also called recall) r(f): the fraction of correctly identified top spreaders (Eq. 1 in Methods); the 95% confidence interval around r(f) is shown as a shaded area. In Fig. 2, for each of the three categories of networks with lowest recognition rates at f = 20% , the worst-case network is named. The degree-influence scatterplots, also in Fig. 2, show the reason: a correlation between degree and influence does exist even in these worst cases, but with too wide a variance of influence per degree for accurate ranking.
Compared to the degree, the performance of the core number as a ranker is much less consistent across networks (Fig. 3). The same cause holds for the three worst-case networks marked in the figure: all have few k-shells (between 1 and 5), so the core number by itself it not a discriminative variable for a ranking task. In the very worst case (as in the case of Gnutella25), the network has a single k-shell, so predicting the top spreaders by ranking the nodes in the network is the same as doing a random draw. In Fig. 3, three more networks are marked, for which ranking by core number gives good recognition rates at f = 20% , but poor rates when f < 5% . The scatterplots between core number and influence show the cause. The nodes with the highest core number in the Twitter Stanford network are poor spreaders; a topological reason for this was found in a prior study focused on the core number 8 : the most effective core in the network depends not only on its core number, but also on its connectivity to other cores. Even in other topologies, in which high core numbers do correlate with wide spreading (as is the case for Twitch ES and US Airports), the highest core contains many nodes of very variable influence, so the core number alone is not a sufficiently discriminative variable when f is low.
Neither the degree centrality nor the core number are universally better than the other across the network space. If the core number can be a more accurate ranker in some cases (Fig. 3  www.nature.com/scientificreports/ for the core number, as was also found in prior studies on selected topologies 5,6 ), it is also a poor predictor in absolute terms when f < 5% for many networks, and also across all f values when the network doesn't have a strong core structure. For online human networks (categories Ca, Cm, and S in this study), and with f > 5% , Figs. 2 and 3 show the two centralities to be comparable, with the core number marginally better. In general, as recognised before 7-9 , the predictive power of the core number is not consistently better than the degree centrality for SIR influence. Another popular ranker, the eigenvector centrality was previously found (on average across a set of networks) more predictive than the core number 9 . By the summary in Fig. 4, this is the case for low values of f, but there is still a wide variance between networks. In some cases (such as Gnutella24 and Euroroad, marked in the figure), the distribution of centrality values is such that ranking is not better than a random draw; in others, such as Adolescent40, there is little correlation between the centrality and influence, so the ranking remains poor. In the best of cases (for two of which scatterplots are shown in the figure), this correlation is strong, which explains why the eigenvector centrality can be a very good predictor across the range of f.
A second performance metric is also of interest: the precision function p(f) (Eq. 1 in Methods), which compares the SIR influence of the predicted nodes with the SIR influence of the correct top spreaders. A p(f) value close to 1 for a prediction task means that, regardless whether or not the exact top spreaders were identified, the influence of the nodes which were identified is close to that of the set of top spreaders-so p(f) does not penalise node substitutions, if the substitutes are similar in terms of influence. For ranking by single centralities, the results for both the recognition rate and the precision function are shown in Fig. 5. Each data point marks the performance of a ranking task, over a given network, for a value of f in 1, 2, . . . 20% . (To make the data points visible despite many partial overlaps, each data point is a horizontal line; this line does not denote the uncertainty of the data, but is of fixed size.) The centroid of each data cloud summarises the performance of that centrality over this set of networks. Overall, the neighbourhood centrality makes for the best single ranker, with an average recognition rate of 0.804 and an average precision function of 0.962. The two-hop neighbourhood (not shown in the figure) is only slightly worse (on average 0.781 and 0.942, respectively). PageRank is the least accurate, with an average recognition rate of 0.487, and an average precision function of 0.727. This latter result is not entirely surprising: although widely used for ranking nodes in network structures 32 , PageRank was found before to not be a competitive predictor for measured diffusion in various networks 6,9 . Next, we show that certain pairs of centrality indicators have, together, sufficient topological information about network nodes to improve the accuracy of the prediction tasks.  www.nature.com/scientificreports/

Pairs of centralities combine into better predictors.
A statistical classifier is now trained with multivariate data from part of the nodes in each network. The result is one trained classifier per network and fraction f. For training, a centrality is one input feature. The target variable (or class) is binary, and it shows whether or not a node is in the top fraction f in the network by spread size. The two performance metrics for the classifiers are the same as for ranking tasks, with the difference that the recall r(f) is now improved as the F1 score, which is the harmonic mean between the precision of classification and the recall (for motivation, see Methods, Eq. 2). Parsimonious statistical models are beneficial to gain clear intuition about the results. We report here the most interpretable statistical models which have good performance: support-vector machine (SVM) with seconddegree polynomials as kernels (see Methods), whose decision boundaries between classes are simple to understand. We verified that other, higher-variance statistical models based on decision trees have similar performance (with numerical results for Random Forests shown in the Supplementary Information). We start with training SVM classifiers with two centralities, and show that, for certain network examples, certain pairs of centralities build on each other's strengths and obtain predictive models that are significantly better than either centrality alone.
Combinations with the eigenvector centrality. We show four network examples in Fig. 6. For each network, the left panel maps the distribution of the spread size at the epidemic threshold for all the nodes in the network, against the pairing of the eigencentrality with a neighbourhood indicator. The right panel notes a value for f, and colours the nodes according to their true class: the red nodes are the top f by spread size. Also in the right panel, two dotted lines show the decision boundaries made by the corresponding single-centrality rankers. If f = 1% , these boundaries are the 99th percentiles for either centrality; a ranker will predict as top spreaders all nodes above this boundary. These ranking boundaries are improved upon by the classifier, whose decision boundary is shown as the transition between background colours, with a blue (or darker) background showing the centrality space where the top spreaders are predicted to be. (Note that only part of this centrality space may be occupied by nodes; in other words, not every combination of centrality values may be physically possible.) The optimal decision boundary would leave no nodes misclassified and would lead to values of 1 for both the precision function and the recall or F1 score.
There are clear commonalities among the improved decision boundaries in Fig. 6: for Facebook Artists, Brightkite, and Arxiv GRQC, the joint increase in the values of both centralities in the pair is what determines www.nature.com/scientificreports/ an effective spreader. For Facebook Artists and Brightkite (both relatively large networks of over 50,000 nodes), ranking the nodes by only one centrality would place some nodes in the wrong class; unlike this, the twocentrality classifier (F1 scores of 0.920 and 0.924, respectively) draws a decision boundary that is much closer to optimal. We illustrated the intuition behind the Arxiv GRQC result (F1 score 0.900) in Fig. 1: the size of the local neighbourhood does affect the spreading ability of nodes, but proximity to the 'hub' of high eigencentrality also helps. There are also exceptions from this. The US Power Grid network (4941 nodes) shown in the same figure has an outlying cluster of low-eigencentrality nodes as top spreaders, while the lesser spreaders instead follow the expected trend described above. Supplementary Figure S1 shows the cause: a small hub of high eigencentrality values lies at a periphery of the network, while a larger region of nodes with large neighbourhoods (but low eigencentrality) is located far apart. It is the latter, larger region which enables the top 1% of the spreaders, and the classifier is able to learn this pattern slightly better, with a 0.162 increase (F1 score 0.509) compared to the r(f) of ranking by the two-hop neighbourhood alone.
Combinations with the core number. A similar intuition holds when pairing the core number with eigenvector centrality, and also with neighbourhood centralities. (Other pairings with the core number are less effective.) We show two examples in Fig. 7. Again it is the joint increase in both centralities which enables superspreading. For Facebook Politicians (F1 score 0.894), Fig. 7 (bottom) also illustrates the intuition. A number of dense cores are distributed in the network, with the highest core numbers not in close proximity, but isolated by regions of low density. On the other hand, a single region of high eigencentrality exists, and the top 5% of spreaders are located exactly in those cores of highest eigencentrality. Interestingly, pairing the core number with a neighbourhood centrality (GooglePlus, F1 score 0.968) also shows that not all the nodes in dense cores are equally good spreaders, and that their neighbourhood size can help to make a selection.
Combinations with closeness. Closeness also plays a role similar to the eigencentrality-that of guiding the selection of nodes away from more peripheral nodes with dense neighbourhoods, towards the centre of the network, with an increase in performance. Figure 8 shows two examples. In the Adolescent41 offline social network (1,640 nodes), the best ranker is that by neighbourhood ( r(f ) = 0.469 ), but when considering also closeness, the F1 score rises to 0.598. On the topology of the network (at the bottom of the same figure), closeness values identify only very few of the top spreaders, while the neighbourhood size identifies more; the correct top spreaders, however, again lie in a region where both centralities jointly have high values. In the Gnutella05 computer network, for a similar reason, the best ranker is instead closeness ( r(f ) = 0.594 ), but when considering also the two-hop neighbourhood, the F1 score rises to 0.725.  www.nature.com/scientificreports/ In the examples from Figs. 6, 7 and 8, each classifier's decision boundary improves upon the decision boundary of the best ranker such that r(f) is raised by between 0.090 and 0.213. Among our 60 test cases, we also found other examples of networks, combined with certain values for f, for which the single-centrality rankers could not be improved by any classifier. For example, only when f = 1% , none of the five Adolescent networks is resolved any better by using two centralities-but also there the performance improves when f increases.
From all pairs of centralities, the combination of two-hop neighbourhood and core number has the best average F1 score (0.865) across all the network cases in this study, and across the range of f. On the other hand, the combination of two-hop neighbourhood and eigenvector has the best average precision function (0.992). Figure 9 is a summary for the averages of both performance scores across all single centralities (on the diagonal) and pairs of centralities (the rest of the matrix). All possible pairs of centralities are studied, except for the redundant combinations between degree and neighbourhood, and between the two types of neighbourhood centralities. The six pairs which improve significantly on the most predictive ranker are all composed of one of the neighbourhood centralities, and one of: core number, eigenvector centrality, closeness, or PageRank. These six pairs improve on both recall and precision function.

Multi-centrality predictors and summary of results.
While the previous subsection demonstrated that centrality indicators can play on each others' strengths and improve the prediction of top spreaders by the SIR diffusion model at the critical threshold, we now show that classifiers using all seven centralities as features give near-perfect prediction on most network examples. One exception is that of offline human social networks (the HS network category) and only at very low fractions f. This category contains networks that are not structurally unusual, but are some of the smallest networks in the study, which leads to very few training data points, thus lower classification performance.
We train a seven-centrality SVM classifier for each prediction task, and summarise the results in Fig. 10. The centroid of all prediction scores (Fig. 10, left) is an average recognition rate of 0.921, and an average precision function of 0.995. While the precision function was almost as high (0.992) when training the classifier using only the eigenvector centrality and the two-hop neighbourhood as features (Fig. 9), the average recognition rate is now further improved by adding more features to the statistical model. Not all six network categories are equal: a breakdown of the scores by network category and by the value of the fraction f (Fig. 10, right) shows that  www.nature.com/scientificreports/ recognising the top 1% of spreaders in the Adolescent networks (the HS network category) remains difficult. All other prediction tasks are resolved well, particularly when performance is measured by the precision function, which ranges between 0.969 and 1. These conclusion hold also above the epidemic threshold, at 1.5 · c ; numerical results showing very similar prediction scores are in Supplementary Fig. S2. They are also not an artefact of the type of statistical model used in the classifier. When training nonlinear Random Forest classifiers, which are high-variance so-in generalare able to obtain better performance than the polynomial SVM, a similar conclusion emerges ( Supplementary  Fig. S3), so there no significant advantage to using higher-variance classifiers.

Discussion
Insights gained. We showed that two or more classical centrality indicators can contain sufficient statistical information about the nodes in a real-world network to train an accurate supervised predictor of SIR influence, and outperform node rankers. The decision boundaries between the two classes, as learnt by classifiers, demonstrate where the advantage of multi-variate prediction comes from: certain centrality indicators are particularly good complements to others. Notably, there are multiple answers to the question: what is a good pair of centralities? For the degree centrality, the best complement is the eigenvector centrality. For the neighbourhood centrality (the best overall single ranker), three other centralities make good complements: the eigenvector centrality, closeness, and core number (with PageRank also close). For those network cases where multi-variate prediction has an advantage, the joint distribution of the centralities and the SIR influence is such that one centrality (or, a one-dimensional decision boundary) is insufficient to classify the nodes accurately, but a multi-dimensional decision boundary is able to refine the decision in the most important region of centrality values. When the entire set of classical centralities are used, the prediction performance is close to optimal (to an average recognition rate of 0.921, and an average precision function of 0.995).
We showed the topological intuition behind this improvement in the prediction of superspreaders. Often, when a subset of the top nodes by local centrality indicators are located in more peripheral regions of the network, global centrality indicators step in and act as a selector and guide towards the effective centre of the network, so that the nodes selected jointly maximise the values of both centralities. In exceptional topologies, when the global centrality has high values at a peripheral location (such as US Power Grid, in Supplementary Fig. S1), the roles reverse: the local centrality becomes the selector, and the statistical model learns that high global centrality values are not beneficial.

Practical use, assumptions, and limitations. The basic insight of jointly maximising the values of two
or more centralities can help improve existing, unsupervised node ranking methods. The advantage of ranking algorithms is that they are unsupervised, i.e., require no ground truth; their disadvantage is lower recall and precision.
Network practitioners can also use supervised classification as presented here, and train a new classifier on a new network. While this method delivers good predictions, it assumes (a) complete knowledge of the network links, and (b) means to estimate the spread size for a fraction of the network nodes. If historical diffusion data is available (such as the number of retweets on Twitter), this data replaces the need to simulate a theoretical diffusion model in order to obtain ground truth for the spread size. Only a fraction of nodes need ground truth data, since the statistical classifier is trained on a random sample of the nodes in the network, and will predict the class for the others. The size of the training data necessary to obtain good predictions depends on the network and on the distributions of centrality and influence values, but is expected to be small. In Supplementary  Fig. S4, we measure the required training set size from the learning curves of three of the largest networks in this study. These show that, to obtain maximum performance, some networks only require a training data size of 1% of the network size, while others need around 10%. The set of centralities to use as features can be tailored to the computational budget available. The type of statistical model can also be tailored with the network size: heuristic training algorithms, such as those training Random Forest classifiers, scale better with large networks.

Future work.
There are follow-ups to explore as continuations of this study, at the intersection between real-world network dynamics and machine learning. A method to train a single statistical model for predicting superspreaders across networks is desirable, as long as its performance remains good; this was previously achieved only for small networks 30 . An unsupervised or semi-supervised learning method (for example, based on clustering nodes using the same centrality indicators as features, such as in the related work 33 from the domain of natural-language processing) would lower the computational load required to estimate the spread size of many nodes. Other directions include the prediction of other measures of node influence (such as the measured diffusion of information in large online social networks 6 ) and of node importance (such as the ability of a node to block the diffusion of information), and the study of other types of networks (such as different network categories, networks with node and link attributes, and networks with dynamic structure).

Methods
Networks, centrality indicators, and the estimation of node influence. Most of our network case studies (see Table 1 for the overview) model entire communities at a specific point in time. This is the case for the high-school friendships in the Adolescent networks, the daily Gnutella peer-to-peer file sharing networks, the five sets of institutional email exchanges, or the networks of mutual likes between verified Facebook pages. A minority of the networks (such as the Facebook Stanford friendships, collected from survey participants) are instead bounded samples from a larger community. All are (transformed into) directed, strongly connected, and unweighted networks; when the original version in the repository had timestamp, attribute, or weight annota- www.nature.com/scientificreports/ tions, these were removed. The direction of the edges is reversed when needed, to model information flow-so the degree centrality of interest is the out-degree. To be able to study the closeness centrality 34 which computes the lengths of shortest paths, only the largest strongly connected component (SCC) was kept. These networks were selected from public repositories such that (a) they fit into these six categories, and (b) have the size of their SCC above 1,000 nodes. The upper bound on network size is simply imposed by finite computing resources.
The following centrality indicators were computed for every node in every network: its degree, neighbourhood (i.e., the sum of the degrees of the nearest neighbours, previously denoted k sum and found to be a competitive predictor in a previous study 6 ), two-hop neighbourhood (as before 6 for nearest neighbours exactly two hops away and previously denoted k 2sum ), PageRank 34 with a 0.85 damping factor, eigenvector centrality 34 , closeness centrality 34 , and core number 5 . An additional set of indicators that we tried, the link strength of a node towards upper, equal, or lower shells 8 , denoted r u , r e , or r l , did not provide notable results.
The ultimate influence of a node in a network is estimated numerically, as the average among 10 4 runs of the susceptible-infectious-recovered (SIR) 4 diffusion model for infectious diseases. In SIR, an infectious node infects a susceptible neighbour at a rate β (meaning the number of infection events per time unit, so can be higher than 1). An infectious node recovers at a rate µ . The effective transmission rate is = β/µ . Here, we take µ = 1 and study the normalized rate .
As increases in SIR simulations, the size of the outbreaks increase from an infinitesimal fraction to a finite fraction of the network size. The regime of interest is neither very low values (in which case, the diffusion remains localised to the neighbourhood of the seed node) nor very high (in which case, all nodes should reach a large fraction of the network). Since our test cases are both finite in size, and diverse (a scenario studied previously 39 ), we estimate the epidemic threshold c numerically by identifying it with the variability measure 39 Here, ρ denotes the random variable of outbreak size from different seed nodes, and �·� denotes the mean. Given a value for , is estimated by setting seed nodes from a random sample of 10 4 of the nodes in a network (or the entire network size, if this is smaller). After estimating for a range of values at regularly spaced intervals, we take c to be the position of the peak of . The resulting values are noted in Table 1. The maximum spread size (influence) at c in any network is between 0.7% and 6% of the network size (with two exceptions among the smallest infrastructure networks, where this reaches 8% and 11%).
Ranking by a single centrality. Method. We first predict superspreaders using the single-centrality ranking method common in prior studies [5][6][7][8][9] , and also carry forward the performance metrics defined in these studies. This ranking method builds the assumption that higher centrality values for a node will also indicate higher node influence. Given a centrality C, first all the nodes have their values for C computed. The top fraction f of spreaders is then predicted to be the fraction f of nodes with the highest values for C. At ties between nodes (which occur for discrete-valued centralities such as degree and core number) a random subset of the tied nodes are selected. This random sampling is then repeated 10 2 times for a bootstrap technique (described below), which averages among the scores of these individual random choices.
Performance metrics. In prior studies, this ranking is evaluated via two metrics. Denote by I f the set of the top fraction f of nodes as ranked by their SIR influence, and by C f the set of top fraction f of nodes as ranked by their centrality values; the sizes of these sets are equal for a given f, I f = C f . Also denote by ρ i the spread size when setting node i as seed. The recognition rate r(f) measures the extent to which the identities of the predicted superspreaders match the true identities 6 . A synonym for the recognition rate is recall. The precision function p(f) is a weaker, but more practically useful performance measure comparing the spread of the predicted superspreaders to that of the true top spreaders: Both metrics take values in the interval [0, 1]. An imprecision function ǫ(f ) was defined previously 5 , such that lower values of ǫ(f ) are better. Here, to present the two metrics in a unified fashion, we use instead www.nature.com/scientificreports/ network of N nodes, 10 2 times, we draw a random sample of the N nodes uniformly with replacement. Among these nodes, the ranking method is applied and a prediction is made and evaluated via either r(f) or p(f), as needed. The final value for each performance metric is the average, together with the 95% confidence interval among these samples.

Classification by a combination of centralities. Method.
A multi-centrality method learns a discriminative statistical model able to classify network nodes into superspreaders or not. For this, a dataset is formed for every network; a record describes a node via its centrality values (the predictors). When training the model to recognise the top fraction f of the nodes, the nodes are ranked by their true SIR spread size, and each node is assigned one of two target classes based on whether or not they are in the top fraction f. The model is trained and tuned on a training fraction t = 0.5 of the nodes (sampled randomly without replacement), and tested on the remaining nodes. A binary statistical classifier learns a decision boundary between the classes. We use a support-vector machine (SVM) 40 , which learns optimal separating hyperplanes in the multi-dimensional predictor space, including in cases where the classes overlap in this space. Here, the optimal decision boundary is that which leaves the largest margin in space between the classes, with still allowing some data points to fall on the wrong side of the boundary. SVMs have advantages: (a) they are optimal learners rather than heuristics, and (b) the kernel function K and the regularisation parameter C, which ultimately give the shape and variance of the boundary 41 , are tunable hyperparameters.
We aim to obtain the simplest, most interpretable classifier with good performance; higher-variance classifiers bring little performance advantages for this problem, and may lose in interpretability. The results presented are for second-degree polynomials K (which gives a low-variance model, less prone to overfitting), C tuned in the range [1,100] with five-fold cross-validation, and a fixed tolerance for the stopping criterion 42 of 5e-4. No class weights are added to balance the classes artificially. (We tested other, higher-variance statistical models: SVMs with third-degree polynomials for K, and nonlinear models based on decision trees, either boosted or in ensembles 43 ; since they had similar performance to the SVM with a second-degree polynomial for kernel, we retain and present the results for the latter.) We show the decision boundaries learnt by two-centrality models via plotting them in the predictor space.
Performance metrics. For a network of size N and the fraction f, a classifier produces a guess for the class of each network node in the test set. We port the same notation C f to mean here the set of nodes classified as top spreaders. The number of superspreaders predicted in this way is decided by the classifier, and may not equal fN. We measure the overlap between the classifier prediction and the ground truth with metrics similar to Eq. 1. In binary classification, the measure r(f) as defined in Eq. 1 is called recall or sensitivity. It is a useful metric, but insufficient to characterise the classifier: alongside making many correct choices (giving a high true positive rate, I f ∩ C f ), the classifier may also add many false positives. The precision metric helps to quantify the false positives, and a classical metric is the combination of recall and precision in their harmonic mean, the F1 score 44 : Note that precision is an established name in the area of Information Retrieval 44 , while the imprecision function ǫ(f ) which gave the precision function p(f) was defined recently 5 for analysing networks. Although the names are unfortunately too similar, their meaning is different and should not be confused. (2)