Introduction

Understanding of the structure and dynamics of complex networks found in nature, society and elsewhere have been greatly facilitated by remarkable advances in the modern science of networks1,2. Fundamental network problems that have garnered interest among natural scientists include the highly skewed (approximately power-law) degree (connectivity) distributions, identification of communities or modules and various critical phenomena and their implications on the functioning and stability of networked systems3,5,6. “Centrality”, the measure of the importance or superiority of a node in the network, is another concept very often studied7,8. The best known modern example is Google's PageRank9, whose algorithm for calculation can be understood as a random walk (surfing) problem along the hyperedges connecting the web pages on the Worldwide web.

The idea of ranking the nodes based on their relative strengths or relevance can be useful for understanding the underlying dynamics of many networks; in fact, in many complex systems – natural, social, or man-made – the competition-and-reward mechanism is an essential ingredient of their functioning and evolutionary dynamics. The dominance hierarchy or ranking refers to the linear ordering of things from the strongest to the weakest based on the results of competitions or comparisons10. In the case where the things undergo pairwise (one-to-one) competitions, the entire set of competitions can be represented as a directed network where an arrow points from the winner to the loser of a competition (see Fig. 1 (a) and (b)) or vice versa, depending on convention. Food webs (of predators and preys) in ecological systems, the domination-submission interaction networks of animals (observed in birds, American bison, etc.10), sport schedules and certain types of elections or voting systems where the candidates are compared pairwise (such as the Condorcet method, to be discussed later) are widely-known examples of competition networks. The dominance hierarchy may assume different names depending on the domain — “trophic levels” in ecology and “ranking” or “standings” in sports, for example. In the remainder of this letter, for convenience we shall often utilize sports terminology, e.g. ranking, contestant (or player or team), game, match, win, loss, tie and so forth. Among competition networks a tournament is one in which every player competes against everybody else, also called a round robin. It can be represented as a complete (full network) with a directed edge between every pair of nodes, so that in this paper we also call one a complete competition. In such as case, determining the ranking is straightforward: We can simply rank the players in the decreasing order of their total wins, i.e. the out-degree kout. When there exists a tie in kout, we can employ the following “tie breaker”: We consider the reduced round robin composed of those that are tied and rank them according to their wins therein. This can be applied iteratively to continue breaking the ties that persist. (Note that no tie may be further broken in some cases, for instance when three teams i, j and l have the same total wins and i lost to j, j lost to l and l lost to i, i.e. {σij, σjl, σli} = {1, 1, 1} in the adjacency matrix notation. In such a case we may adopt yet another tie breaker such as the total points scored in games). We call the ranking of nodes obtained this way the “Natural Ranking,” as it results from a tournament and is the fairest – every player competes against every other. Note that this is applicable to multiple round-robins as well, as long as each node pair plays an equal number of times.

Figure 1
figure 1

(a) The incomplete competition schedule network of US college football. The nodes represent universities and edges represent games. The network is incomplete (partially filled), with its connectance — the fraction of node pairs that are connected — typically ~ 10%. This causes uncertainties in the rankings determined based on the outcomes of the games. The result of a game can be represented as a directed edge pointing from the winner to the loser. (b) The incomplete competition network (middle) can be thought of as an intermediary stage of a schedule that starts as empty (left) and ends as complete, when all possible games have been played (right). The progression of the season is the data acquisition process that forms the basis for inference of the expected natural ranking. (c) The setup of a single strength parameter (ϕ) model for estimating the expected win probability of a team in a potential contest.

We note that the earliest recorded form of this scheme is by Ramon Llull4 and is now also called the Condorcet method for voting (on bills in the parliament, for instance). Lull's original formulation is analogous to the round-robin in which after a full round of voting between two candidates (also called alternatives) the one that has won the most pairings is chosen. A more common formulation of the Condorcet system is one in which each alternative is given an order of preference by the voters (although the voters are not obligated to give every alternative a preference) and the winner of a pairwise comparison is one that has been preferred by more voters. Ranking and voting are intimately related and much effort has been put into understanding the issues as well11.

Despite its simplicity and intuitiveness, the natural ranking introduced here is often inapplicable to many real-world networks, as they are often incomplete. Expecting a real-world competition to be complete is perhaps unreasonable in practice for several reasons. First, the cost of a complete competition can be very high even for moderately large systems: In a network of n contestants, the minimum number of competitions required is . For the case of the popular US college football of 120 teams for instance (Fig. 1 (b)) there may simply be not enough number of weeks for each team to play the 119 games necessary in a year for those who care for the athletes' health (or their education). Second, there may be insurmountable physical constraints as in an ecological food web where the spatial separation between the habitats of two species may hinder them from interacting directly12. In this paper we propose an analytical method to infer (estimate) the natural rankings from an incomplete network.

Results

The final natural ranking from an incomplete network can be estimated by considering the actual (incomplete) competition network as an intermediate stage of the “schedule” of a complete competition, between an empty network and a hidden complete network where all competitions have been made (see Fig. 1 (c)). This then becomes the problem of inferring the future of the system of variables based on current information (i.e. data), for which the Bayesian one of the most widely used frameworks13. The Bayesian framework is a process in which the estimate (called the prior) of the distribution π(x) of parameter x is updated into a new estimate (called the posterior) π(x|D) when new data D is made available, via the following Bayes' formula13,14:

where , called the likelihood, is the probability of D being observed given a parameter value x. At the next round of update with new available data, the posterior becomes the prior. (This is also called Recursive Bayes).

Here we use the Bayes' formula Eq. (1) to estimate , the projected total wins (outdegree) based on an incomplete competition network to infer the natural ranking. of node i is the sum of two quantities: The number of actual wins thus far, and the expected number of wins from unplayed games, that is the sum of the probabilities of winning the games. Thus our goal becomes estimating pji = pji given the current state of the competition. We find pji consistent with Eq. (1) via the following steps. First, when we have no basis on which to judge the two teams' strengths, for examples at the beginning of the season when the teams have not played any games, we are maximally ignorant or uncertain of pji. A naïve guess would be a flat (uniform) prior π(pji) = 1 where all pji values are equiprobable. A more common choice for this type of a problem is the Jeffreys prior given as

due to the desirable property that it is invariant under re-parametrization of the parameter (pij) thanks to being proportional to the Fisher information. The idea behind Jeffreys prior and its history of development is an interesting topic in itself, which are discussed more deeply in Ref. 14, 15.

Now assume that we observe that i loses to j, i.e. D = {σji = 1}, whose probability is simply pij. From Eqs. (1) and (2) we obtain

The above procedure will serve as the foundation for our method of estimating the π(pij) for a node pair that is yet to play. We introduce a strength parameter ϕi [0, ∞) for each contestant such that pij between two contestants is

In terms of π(ϕi), the distribution of ϕi, we can now write π(pij) as

An essential step in our formalism is finding the exact form of π(ϕ) that renders this expression consistent with Bayes' formula, Eq. (1). For the initial Jeffreys prior, Eq. (2), for instance, we find for both i and j gives the correct Bayes' formula. In the case of Eq. (3) (i.e. D1 = {σij = 1}) we see that the following change is appropriate:

This agrees with our intuition that σij = 1 means that i is likely to be weaker than j, since using these posteriors we find and .

This procedure can be repeated to find a general pattern. Assume that now j, having defeated i, competes against l with . Then, using Eqs. (5) and (6) we find between j and l. Using the Bayes' formula, the possible posteriors are

depending on D2, the outcome of the game between l and j. It turns out that the following update rules for the winner's strength are consistent with the Bayes' formula (again, no changes are necessary for the loser's):

In general, the strength distribution of a contestant dependent on its accumulated wins kout in the form

leads correctly to the following estimate between two teams with and wins:

With these we can now calculate 〈pji〉, the expected win score (outdegree) gain of i against j:

At any given point in the competition, therefore, the expected final outdegree Wi for team i is

where Ωi is the set of nodes that i is yet to play against. In a complete competition network the second term vanishes, while in an incomplete network it differentiates two teams with the same kout. From its functional form we can also tell that having beaten stronger opponents gives one an advantage — the unplayed opponent's kout appears in the denominator — thereby naturally incorporating the “strength of schedule.”

We can also compute the expected variance of the final outdegree given as

We note that the second term is non-vanishing, since 〈σjiσli〉 ≠ 〈σji〉 〈σli〉 due to the shared index i; we call σjiσliconnected, analogous to connected Feynman diagrams in quantum field theory, which have also found some uses in network theory16,17. 〈σjiσli〉 is

where is the incomplete gamma function. No closed solution for exists at the time of this writing, so we resort to numerical evaluation. Finally, the variance is

Application to Real Competition Networks

We now showcase our method by applying to two well-known competition networks that feature unique challenges, namely American college football and the English Premier League soccer.

American college football: Incomplete competitions and the number of playoff rounds

The American college football network is incomplete, with each university playing against merely ~10% of the nodes in each season. The popularity of the sport and the desire for an annual national championship – “Who is the best?” is perhaps sports fans' biggest interest – has resulted in the invention of ranking systems that are purported to overcome the deficiency. The popularity of the sport and the substantial benefits, financial and otherwise, awarded to the champions render a robust and fair ranking method essential. The official ranking system called Bowl Championship Series (BCS) used until 2013 combined human polls and mathematical formulae to select determine the two “best” teams for annual the national championship18,19,20. This is to be succeeded by College Football Playoff (CFP for short) system where four teams will contend in a two-round playoff series starting in 2014. The increase from two to four is intended to overcome the criticism of the BCS system by dissatisfied fans who argued that choosing only two teams from a pool of more than a hundred is insufficient given the low connectance of ~10% of the schedule network. While it may be too early to truly assess the efficacy of the CFP system, issues raised by the sparsity of a schedule network are worth exploring in their own right, which we tackle with our method.

The result of our method applied to American college football of 2010 is shown in Fig. 2 (a). It shows the estimated final outdegrees with the error bars indicating the squared–root–variance as the measure of the uncertainty. As expected, teams with the same number of actual wins kout are separated by the strength-of-schedule term in Eq. (11). The further allows an interesting interpretation: Taking as the reasonable expected range of the final outdegree, Fig. 2 (a) implies that the proposed four-team playoff system may still be quite insufficient: the first-ranked team (University of Texas at Austin) has an overlapping win score range with those of teams up to the 32nd-ranked team (University of Nevada), suggesting that a further expansion of the playoff system to include 32 teams would not be unreasonable.

Figure 2
figure 2

(a) The calculated projected final outdegree ranges of American universities that competed in the 2010 network. While the University of Texas (farthest left) has the highest expected , its uncertainty is sufficiently large that its range overlaps with those of 31 other teams, indicating the current playoff system involving two or four top teams is insufficient. (b) We can then estimate the appropriate size of playoff participants by counting the number of teams with overlapping score ranges with the then-highest ranking team as a function of the connectance. The data points indicate the actual numbers of overlapping teams up until the actual final connectance (0.095, blue dotted line) and the estimated numbers past it averaged over 1 000 random simulations. Also indicated are four points at which the estimated mean numbers are 16, 8, 4 and 2; the proposed four-team playoff system would have required a connectance of 0.78 for this season.

As the connectance of the network increases the uncertainty is bound to decrease, so it would be interesting to see how the score range overlap changes as a function of the increasing connectance. In Fig. 2 (b) we show the number of teams with an overlapping score ranges with the first-ranked team. Beyond the actual number of games (679 games), we generated 1 000 simulated complete seasons based on the final fitness. Our method indicates that on average a connectance of 0.29 would be necessary for a 16-team playoff, 0.54 for an 8-team playoff, etc.

English premier league football: complete network and stabilization of rankings

We now apply our method to the English Premier League (EPL) soccer network. The EPL network presents another interesting opportunity for our method. The EPL network, composed of 20 teams, is complete with every pair of teams playing twice in a given season. As the true natural ranking will be revealed in a complete network, the question of the sufficient size of a playoff system is not relevant here, unlike American college football.

One of the interesting questions in such a complete network is the convergence of the teams' rankings. In Fig. 3 (a) we show the of teams chosen from three distinct tiers, Manchester City from the top tier (with final score 30.5, blue), Liverpool from the middle tier (final score 19, red) and Wolverhampton from the bottom tier (final score 10, green), as the function of games played in the 2011–2012 season. We observe significant fluctuations in the standings in the beginning that attenuate and to reveal clear standing as the season progresses. In Fig. 3 (b) we show the number of teams with an overlapping range with the first-ranked team, which reaches a minimum of 2 (Manchester City and Manchester United ended up tying with the same ) when the first 270 games (connectance = 0.71) were played.

Figure 3
figure 3

(a) The changes in of three EPL (English Premier League) teams in the 2011–2012 season, representing the top (Manchester City, blue), the middle (Liverpool, red) and the bottom (Wolverhampton, green) tiers. Significant fluctuations and uncertainties dissipate as the season progresses to revel a clear standing of the teams. (b) The number of teams with overlapping ranges with the incumbent #1 team.

Comparison with Elo and Win-Loss Differential Methods

Of many ranking methods for competitions ELO is one of the best known, adopted by the World Chess Federation (FIDE) and online service providers including Yahoo! Games. We now briefly compare, for illustrative purposes, our method and the ELO devised by physicist Arpad Elo21. While there exist several variations of ELO we consider the most basic version. For reference we also compare the simplest method of win-loss differential, i.e. koutkin.

In ELO each player is assigned a rating value R. When two teams i and j with ratings Ri and Rj play a game, the probability of i winning plus half the probability of drawing is posited as

A , therefore, could mean 0.5 probability of winning and 0.5 probability of drawing, or 0.7 probability of winning and 0.1 probability of drawing. For each 400 rating point difference against the opponent the ratio is multiplied by a factor of ten so that if Ri = Rj + 400 we have and and if Ri = Rj + 800 we have and and so forth. Elo's original suggestion was that a difference of 200 in ratings points mean that the stronger player has an , as in Eq. (15), . After the game the rating R of i is updated as and similarly for Rj, where σji is the actual outcome (1 for a win for i, 0.5 for a draw and 0 for a loss). The initial ELO rating is 1400.

In Fig. 4 we compare the prediction accuracies of the three methods – ours (triangle), ELO (square) and the simplest win-loss scheme (circle) – for American college football. We award each method 1 point for a correct prediction (the team with a higher pre-game rating wins) and 0 point for a wrong prediction. We treat the case of “indeterminate prediction” (two teams tied in pre-game ratings) in two separate ways: First, we include it as a half-correct prediction, awarding 0.5 points to a method; Second, we excluded it (0 points) to consider only the determinate cases as the correct ones. In the first case our Bayesian method and ELO both earn 452 points and win-loss earns 446 points respectively for prediction accuracies of 0.666 and 0.657. In the second case they earn 424, 423 and 360 points for prediction accuracies of 0.624, 0.623 and 0.530. Since the differences between the two plots represent the fraction of indeterminate predictions, it demonstrates the comparative limitation of the win-loss scheme in differentiating two teams based on the wins–losses scheme. As another angle of comparison, we studied how the ratings difference correlates with the points scored in the game, finding Pearson correlation coefficients of 0.756 for our method and 0.731 for ELO. Which method, then, is preferable? Given the comparable performances we would argue that the ability to estimate errors and the solid theoretical foundations render our method preferable to ELO. There exists, as a matter of fact, a deeper connection between the two methods. We start from Eq. (4) which we rewrite as

the last form being identical to the foundation of the ELO system, Eq. (15) with a change of variables R = 400 log10 ϕ. The difference is that our method free of ad hoc constraints and thus more general: our method allows R to be negative as necessary and, furthermore, gives R a Bayes' formula-consistent distribution π(R) that can be calculated from π(R)dR = π(ϕ)dϕ. It is not true of ELO, reflecting the lack of robust theoretical founding.

Figure 4
figure 4

The comparison between the prediction accuracies of the the three methods: our Bayesian (triangle), ELO (square) and the win-loss differential (circle) as the season is played out in American College Football.

The red dotted line indicates the random baseline of 0.5. We consider two scenarios regarding the indeterminate-prediction cases (i.e. the teams have an equal pre-game rating): The upper band of plot points includes them as half-correct ones and the lower band excludes them. The differences between the bands correspond to the fraction of indeterminate predictions, which is notably larger for the wins–losses scheme than the other methods.

Discussion

In this report we studied the concept of natural ranking, also called the Condorcet method, in competition networks. While straightforward and intuitive, natural ranking can only be applied exactly to complete networks that are rare in the real world, prompting us to propose an analytical framework for inferring the final natural ranking using Bayes' formula. We formulated a single-variable strength parameter model with exact update rules as new information (wins and losses) is gained. Bayesian inference is fundamentally distribution-based, meaning that it produces not one specific value of a variable but a range of values. This allowed us to estimate not only the mean expectation of the final outdegrees of teams in the network but their uncertainties in variance, enabling us to answer important questions of practical value, such as the sufficient size of a playoff system and the convergence speed to the final natural ranking.

We envision a couple of future research directions based on the work presented here, one theoretical and the other practical. First, it would render ranking methods — ours included — more useful to understand more analytically how their performances relate to the system parameters such as its size and connectance, although they have been studied here to some extent through various means (calculations and simulations). Second, a further improvement of our method to deal with the aspects of competition networks that were not discussed in this paper would be welcome; for instance, our current model uses a node's past performance as the sole basis for estimating its current strength (fitness), while it can be affected by many other factors. The true distributions of node strengths is deeply related to any ranking method, as one can alternatively view a ranking method as a way to uncover the hidden, true strengths of the nodes. Thus a better estimation technique can lead to better performance and prediction accuracy. Giving more weight or importance to more recent records, for example, may be useful. We hope that such efforts will prove useful for understanding many competition networks better.