Benford’s Distribution in Complex Networks

Morzy, Mikołaj; Kajdanowicz, Tomasz; Szymański, Bolesław K.

doi:10.1038/srep34917

Download PDF

Article
Open access
Published: 17 October 2016

Benford’s Distribution in Complex Networks

Mikołaj Morzy^1,2,
Tomasz Kajdanowicz² &
Bolesław K. Szymański^2,3

Scientific Reports volume 6, Article number: 34917 (2016) Cite this article

3420 Accesses
9 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Many collections of numbers do not have a uniform distribution of the leading digit, but conform to a very particular pattern known as Benford’s distribution. This distribution has been found in numerous areas such as accounting data, voting registers, census data, and even in natural phenomena. Recently it has been reported that Benford’s law applies to online social networks. Here we introduce a set of rigorous tests for adherence to Benford’s law and apply it to verification of this claim, extending the scope of the experiment to various complex networks and to artificial networks created by several popular generative models. Our findings are that neither for real nor for artificial networks there is sufficient evidence for common conformity of network structural properties with Benford’s distribution. We find very weak evidence suggesting that three measures, degree centrality, betweenness centrality and local clustering coefficient, could adhere to Benford’s law for scalefree networks but only for very narrow range of their parameters.

Worldwide divergence of values

Article Open access 09 April 2024

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Genome-wide association studies

Article 26 August 2021

Introduction

Benford’s law is a well-documented phenomenon describing the distribution of the most significant digit in many different datasets. Originally noticed by Newcomb¹ and Benford², it states that the probability of the most significant digit of a random element of a real-world numerical dataset being d is given by

At first, Benford’s law seems very counter-intuitive. Why wouldn’t the leading digits be uniformly distributed in real-world datasets? Yet, this phenomenological law holds for an extraordinary diversity of datasets. Benford’s distribution has been observed in geophysical data³, such as distributions of lengths of rivers, areas of lakes, etc., in the distribution of auction prices on eBay⁴, or in the effects of introducing Euro currency in EU member states⁵. Recently, Benford’s law has been used in fraud detection^6,7,8, to indicate vote counting manipulation during elections in the US⁹, Ukraine and Russia¹⁰ (although some researchers claim that Benford’s law is not the right tool to assess the veracity of elections¹¹), and to disclose inconsistencies in census surveys¹². The same distribution has been found in engineering where failure rates and mean-time-to-failure (MTTF) values of information systems closely follow the logarithmic pattern¹³. It has also been reported that several properties of complex networks (such as centrality indexes) obey Benford’s law as well¹⁴. Even more surprisingly, Benford’s law applies also if the numbers are multiplied by a constant, or expressed in a numeral system other than decimal. In other words, Benford’s law is both scale-invariant and base-invariant.

Benford’s law has intrigued both scientists and general population for over a century. There were many who claimed that it is an inherent property of the universe, an esoteric law of nature which applies to some datasets. It has not been helpful that the original discoverer of this logarithmic rule, American astronomer Simon Newcomb, following the infamous example of Pierre de Fermat, described his discovery as “evident”, without any explanation. His statement was simply that “The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally likely”¹. 60 year later, when Frank Benford, a physicist working at Corporate Research and Development Center of General Electric, assembled the collection of over 20 000 numbers from many different sources (atomic weights, population sizes, physical constants, street addresses, Readers’ Digest articles) and re-discovered the logarithmic distribution of the leading digit, he claimed that the phenomenon only applied to “anomalous” and “outlaw” numbers.

This does not mean that no serious attempts have been made to come up with a plausible explanation of the origins of Benford’s law. Raimi¹⁵ presents a thorough summary of previous works on the derivation of Benford’s law. He claims that the first robust statistical explanation of Benford’s law has been proposed by Pinkham¹⁶. The argument of Pinkham relied heavily on the scale-invariance property of Benford’s distribution. Today, it is widely accepted that another explanation, given by Hill^17,18 and based on random sampling from a mixture of random distributions, is more correct. An analytical explanation based on the multiplicative nature of fluctuations has been proposed by Pietronero et al.¹⁹.

In this paper we examine whether structural properties of complex networks agree with Benford’s distribution. In order to present our findings, we introduce basic notions and definitions pertaining to network structural properties, and in particular, to centrality measures. Let G = 〈V, E〉 be a network with the set of vertices V = {v₁, v₂, …, v_n} and the set of edges E = {(v_i, v_j):v_i, v_j ∈ V}. Let d(v_i) denote the degree of the vertex v_i, i.e. the number of vertices adjacent to v_i. Let δ(v_i, v_j) be the set of shortest paths between vertices v_i and v_j in the network G, and let δ_k(v_i, v_j) be the set of shortest paths between vertices v_i and v_j which pass through the vertex v_k. Finally, let Δ(v_i, v_j) denote the length of the shortest path between vertices v_i and v_j. A centrality measure is a function which assigns to each vertex a value representing the “importance” of the vertex in the network G. Of course, there are many different ways in which the importance of a vertex can be defined.

degree centrality C_D(v_i) = d(v_i) simply measures the number of vertices adjacent to the vertex v_i. The assumption here is that a vertex is important if it is directly connected to many vertices in the network.
betweenness centrality measures the number of shortest paths between any pair of vertices which pass through the vertex v_i. This interpretation of importance highlights the influence of a vertex on communication pathways through the network.
closeness centrality measures the average distance from the vertex v_i to all other vertices in the network. According to this definition, a vertex is important if it can quickly communicate with all remaining vertices in the network.

Apart from these three centrality measures²⁰, vertices in complex networks are commonly described using the local clustering coefficient. This feature describes the local neighborhood of a vertex, also known as the egocentric network of v_i, which consists of the vertex v_i, all its adjacent vertices, and all edges between these vertices. For a given vertex v_i, its local clustering coefficient is defined as the number of edges existing in its egocentric network divided by the maximum number of edges which could exist in this egocentric network (i.e. the number of edges that would exist in a clique of equal size). Local clustering coefficient is a convenient measurement of the completeness of the local neighborhood of a vertex. Figure 1 illustrates centrality measures for vertices. This network has been introduced by Ulrik Brandes and it is the smallest network in which four different vertices attain the maximum value of degree, betweenness, closeness, and local clustering coefficient, respectively. For each of the discussed centrality measures the size and the intensity of color of each vertex correspond to the value of the centrality measure.

In this paper we search for Benford’s distribution in various characteristics of complex networks. We investigate both real world networks and artificial networks, generated from popular network models: Erdös-Rényi random network model, Watts-Strogatz small world network model, Albert-Barabási preferential attachment model, and the forest fire model. We compute the distributions of centrality measures and perform multiple tests of agreement of these distributions with Benford’s distribution. Quite surprisingly, we find that despite power law distributions of centrality measures, they do not conform to Benford’s distribution, with a notable exception of betweenness centrality, which, for many of the examined networks, exhibits signs of conformity with Benford’s distribution.

Results

Real world datasets

In our experiments we have used datasets from the Stanford Large Network Dataset Collection²¹, as well as the datasets used by Golbeck¹⁴ and by Zhong et al.²². Table 1 summarizes main characteristics of these datasets. Since there is no agreed-upon procedure of testing for the presence of Benford’s distribution in a dataset, for each of the considered networks we have performed 11 independent tests described in Section Methods. Each of these tests tries to establish the goodness of fit with Benford’s distribution based on a different criterion. We have observed that none of approximately 8000 distributions of structural properties of artificial and real word networks was able to pass more than 2 goodness of fit tests, with the notable exception of betweenness. Thus, for the purpose of the evaluation of results within the paper we have decided to use, as a local criterion of agreement with Benford’s distribution, the threshold of 2 passed goodness of fit tests.

Table 1 Real world datasets (the sets used by Golbeck¹⁴ are marked with an ^† after their name).

Full size table

Table 2 presents our findings, the last column contains the number of goodness of fit tests with positive results. Out of 15 real world networks only 5 networks have a structural property which passes the local criterion of agreement with Benford’s distribution, and this property is almost exclusively betweenness. Our local criterion is very lenient, should we have used a slightly more strict threshold, only two relatively small datasets (facebook and twitter) would have fulfilled the local criterion.

Table 2 Real world network properties which pass at least 2 goodness of fit tests.

Full size table

Artificial datasets

Real world datasets are often incomplete, dirty, or biased by the harvesting method. The obvious lack of Benford’s distribution in structural properties of real world networks could be caused by the noise in real world data that distorted the outcomes of our analysis. To eliminate this possibility, we perform the analysis on artificial networks generated from a few popular generative network models. We have used the following artificial network models:

Erdös-Rényi random model²³ creates a network consisting of n vertices, and for each pair of vertices (v_i, v_j) an edge is created between them with the probability p (where n and p are the parameters of the model).
Watts-Strogatz small world model²⁴ creates a network of n vertices organized in a ring topology, where each vertex is connected to its k closest neighbors. After creating the initial ring each edge is randomly rewired with a very small probability p. Vertices in the resulting network tend to have similar degrees and their local clustering coefficients are an order of magnitude greater than in a random network. The rewiring process drastically changes the betweenness of a small number of nodes, which serve as bridges to remote parts of the network.
Albert-Barabási preferential attachment model²⁵ creates a network from an initial complete graph consisting of n₀ vertices. Subsequent vertices are added sequentially, and each new vertex creates k edges. The probability of choosing a vertex v_i as the target vertex for a new edge is proportional to its current degree d(v_i). The resulting network has a power law distribution of vertex degrees and vertex betweennesses.
forest fire model²⁶ also adds vertices sequentially. Upon arrival each vertex creates edges to k uniformly selected vertices, called ambassadors, and then adds more edges to neighbors of ambassadors with the forward burning probability p. The process continues recursively for each vertex to which an edge has been added.

We have generated 50 instances of networks for each artificial network model and each value of the model parameter, and for each model we have tested 10 different values of the main model parameter. Each network had a constant size of n = 1000 vertices and for each network we have computed four distributions: degree, betweenness, closeness, and local clustering coefficient. Altogether we have tested 4*10*50*4 = 8000 possible distributions for the agreement with Benford’s distribution. Model parameters have been uniformly selected from the following ranges:

Erdös-Rényi random model: random edge probability ep ∈ [0.001, 0.01]
Watts-Strogatz small world model: random edge rewiring probability rp ∈ [0.01, 0.05]
Albert-Barabási preferential attachment model: power law exponent ac ∈ [1, 3]
forest fire model: forward burning probability fb ∈ [0.01, 0.25]

Table 3 presents the results of our experiments on artificially generated networks. Most of tests failed to discover Benford’s distribution in any of complex networks’ structural properties, and only 5 tests produced any positive results. The number of positive results for each test is presented in Table 4. Despite very weak evidence for the presence of Benford’s distribution in artificial networks, both Mantissa Arc test and the χ² test signal the conformity with Benford’s distribution in networks generated using the preferential attachment process. These networks are known to have a power law distribution of betweenness²⁷ and local clustering coefficient²⁸. As has been shown before²⁹, a distribution is more likely to adhere to Benford’s distribution if it resembles a survival distribution, i.e. it puts most of its mass on small values of the random variable, and power law distribution fulfills this condition. The Albert-Barabási preferential attachment model generates networks with power law distributions of vertex degrees as well. Yet, on the first glance surprisingly, this structural feature is never found to conform to Benford’s distribution. However, the analysis of network properties with the nodes distributed according to the power law provides an explanation.

Table 3 Artificial network properties which pass at least 2 goodness of fit tests.

Full size table

Table 4 Number of accepted goodness of fit tests from 60 real-world and 320 artificial network centrality measures distributions.

Full size table

For a series of elements with power law distribution, the probability of series element having the given value is a decreasing function of such value with the maximum probability at the minimum value in the series. An immediate conclusion is that only the series with the minimum value having the leading digit of 1 has a chance to conform to Benford’s law. There is also a restriction on the power law exponent, which cannot be too large. As we have empirically checked, it must be no larger than 1.25 when the series has the minimum and maximum values of 1 and 10 respectively. This range is even smaller for larger ranges of the minimum and maximum values. In summary, only series with the minimum value in the range [10^k, 10^k+1), k = 1, …, the maximum value around 10^k+m, m = 1, … and with the power law exponent in the range [1, 1.25] may have distribution of its element values resembling Benford’s distribution.

This analysis directly applies to the degree centrality measure for networks with power law distribution of node degrees. Thus, only a network with minimum degree in the range [10^k, 10^k+1), k = 0, 1, …, the natural cut off around 10^k+m, m = 1, … and with the power law exponent in the range [1, 1.25] may have degree centrality measure distributed according to Benford’s law. Betweenness centrality for networks with power law distributed node degrees is also power law distributed²⁷. The minimum number of shortest paths between any two nodes passing through the given node is n − 1. Hence, only a network with the number of nodes in the range [10^k + 1, 10^k+1], k = 0, 1, …, the natural cut off around 10^k+m, m = 1, … and with the power law exponent of betweenness centrality between 1 and 1.25 may have betweenness centrality measure distributed according to Benford’s law. Also local clustering coefficient of networks with power law distribution of node degree has the power law distribution²⁸. Here, only when the minimum non-zero local clustering coefficient has its first significant digit being 1 and the power law exponent of the distribution of local clustering coefficients is in the range [1, 1.25] this measure may obey Benford’s law.

Finally, a similar analysis of node degree distribution for the Erdös-Rényi random network model may start with an observation that the node degree with the highest probability of appearing in the network is the integer closest to p(n − 1) (where p is the probability of having an edge between any pair of nodes) and it must have the leading digit of 1. On the other hand, the width of the distribution is narrow, about the square root of the average degree, so it is too narrow to reach on the right of the average degree to the degrees with digits larger than 2. Thus, the frequencies for such larger digits have to come from the range left of the average degree. Hence the frequencies will be increasing for digits growing from 3 to 9, while in Benford’s distribution those frequencies are decreasing. The conclusion is that an Erdös-Rényi network may have the node degree distribution resembling Benford’s distribution only if its average degree is close to 1, in agreement with our results.

Discussion

Our analysis shows that previously reported presence of Benford’s distribution in complex networks¹⁴ is not supported by the rigorous set of tests that we conducted. A thorough examination using several different statistical tools does not indicate the presence of Benford’s distribution in complex networks. These results allow us to conclude that Benford’s distribution is not commonly present in the structural properties of either empirical or artificial complex networks. We also present here theoretical analysis of networks with power law distribution of node degrees and measures of degree centrality, betweenness centrality, and local clustering coefficient. The analysis demonstrates that for only narrow ranges of the parameters of the power law distribution, specifically the minimum degree, the natural cut off and the power law exponent, the distributions of the considered measures may resemble Benford’s distribution.

The main practical conclusion that can be drawn from our results is that Benford’s Law cannot be used to check the correctness of structural properties of complex networks. However, for the networks with power law distributed node degrees, we show that the distribution of leading digits of these three measures is well defined by the parameters of power law distribution. So these easy to establish distributions can be used instead of Benford’s distribution to discover fraud, incompleteness or manipulation of network structure and such applications will be the subject of our future work.

Methods

The literature provides several methods of testing the conformity of a given distribution with Benford’s distribution. These methods are highly dependent on the area of application; different protocols are used when analyzing financial results, voting registers, or network intrusion records. For instance, Nigrini and Miller³⁰ advocate the use of second order tests for financial data diagnostics (testing frequencies of leading digits of differences between ranked values instead of values themselves) claiming that this method is superior when rounding of data occurs. Other tests include the Distortion Factor Model³¹ and the Bayesian approach proposed by Ley³². In order to perform a thorough verification of the presence of Benford’s distribution in complex networks structural features we employ 11 different tests, summarized below. We have used two R packages, BenfordTests³³ and benford.analysis³⁴. For each of the performed tests we reject the null hypothesis for p-value ≤ 0.05.

χ² test: Pearson’s chi-square goodness of fit test with the statistic defined as , where is the observed frequency of the digit i, and is the expected frequency of the digit i. The null hypothesis is that there is no difference between observed and expected frequencies.
Mean Absolute Deviation (MAD): the average deviation of the actual digit distribution from the expected Benford’s distribution, this statistic is defined as . We follow the suggestion of Nigrini³⁵ and define MAD ≤ 0.0012 as close conformity, MAD ∈ [0.0012, 0.0018] as acceptable conformity, MAD ∈ [0.0018, 0.0022] as marginally acceptable conformity, and MAD ≥ 0.0022 as non-conformity to Benford’s distribution.
Mantissa Arc Test (MAT): the test computes the center of mass of a set of mantissae distributed on a unit circle. For a number x its coordinates on a circle are defined as follows: , . If the mantissae of a set of numbers {x₁, x₂, …, x_n} are uniformly distributed on the circle, the center of the mass, also known as the mean vector, is at (0, 0), in other cases it will be at the distance of L² from the center of the circle. The MAT test defines the following test statistics: and this statistic is checked for significance against the χ² distribution with 2 degrees of freedom. The MAT test has been first proposed by Alexander³⁶.
Distortion Factor: proposed by Nigrini³⁵, this test compares the actual mean of the set of numbers with the mean expected for a Benford’s set of the same size using the standard Z-statistic.
Pearson’s r: traditional Pearson’s product-moment correlation coefficient measuring the linear correlation between the observed frequency of digits and the frequency of digits expected in Benford’s distribution.
Kolmogorov-Smirnov test: traditional test of the distance between cumulative distributions, with the test statistic defined as . The result of the test is determined by the p-value of the D statistic.
Freedman-Watson Test: a test to compare discrete distributions, its statistic is defined as

The result of the test is determined by the p-value of the U² statistic.

Chebyshev Distance Test: a simple maximum norm statistic defined as . The result of the test is determined by the p-value of the m statistic.
Euclidean Distance Test: performs a goodness of fit test based on the Euclidean distance between the observed and the expected digit distributions, test statistic is . The result of the test is determined by the p-value of the d statistic.
Judge-Schechter Mean Deviation Test: a goodness of fit test based on the deviation of mean digits, with the test statistic defined as , where is the observed mean of the chosen k number of digits, and is the expected mean should the sample conform to Benford’s distribution. The test statistic a* under the null hypothesis has a truncated normal distribution, a* ~ N_T(μ = 0, σ = σ_B, a = 0, b = ∞).
Joenssens Test: a sign-preserving squared correlation test between the observed distribution and Benford’s distribution, with the test statistic defined as . The result of the test is determined by the p-value of the statistic.
Hotelling T² Test: a generalization of the Student’s t statistic to a multivariate case, this test uses the following statistic: , where S is the pooled covariance matrix. Under the null hypothesis the T² statistic follows the F-distribution and the result of the test is determined by the p-value of the T² statistic.

Having used so many statistical tests to verify the goodness of fit with Benford’s distribution, we need to establish the sensitivity of tests and their mutual correlation. A simple way to do this is to run a suite of tests on data with varying degree of conformity with Benford’s distribution and to compute the p-values of these tests. In this experiment we use the following protocol. For each data point we create 50 random samples of 10 000 numbers, and we run all of the above tests on each sample. Then, we compute the average p-value for each test over these 50 samples. There are 100 data points, each representing a different mixture of Benford’s and normal distributions. Summary of these two distributions are presented in Table 5. Initially, all 10 000 numbers were drawn from the normal distribution, and in each step 1% of the sample is replaced by the numbers drawn from Benford’s distribution. Figure 2 shows the average p-values (ordinate) depending on the pureness of Benford’s distribution (abscissa). Most of the tests behave very similarly and reject the null hypothesis of the presence of Benford’s distribution until the distribution is 95% pure, while Freedman-Watson U-squared test and the Mantissa Arc test are slightly more conservative. The only exception is the Judge-Schechter Mean Deviation test, which signals the presence of Benford’s distribution already at the 82% pureness threshold.

Table 5 Summary of distributions used in tests comparing goodness of fit.

Full size table

In each test the null hypothesis states that the given set of numbers follows Benford’s distribution. Assuming the standard rejection threshold of the null hypothesis at p-value ≤ 0.05 level, Table 6 presents the average purity of Benford’s distribution accepted by each test. As can be seen, all tests (except for the Distortion Factor and the Judge-Schechter Mean Deviation tests) behave in a very coherent way, requiring a strong goodness of fit before accepting the null hypothesis. These results allow us to conclude that Benford’s distribution is not present in the structural properties of either empirical, or artificial complex networks. We also present here the analysis of networks with power law distribution of node degrees and measures of degree centrality, betweenness centrality, and local clustering coefficient. The analysis demonstrates that for only narrow ranges of the parameters of the power law distribution, specifically the minimum degree, the natural cut off, and the power law exponent, the distributions of the considered measures may resemble Benford’s distribution.

Table 6 Average purity of Benford’s distribution accepted by each test.

Full size table

Additional Information

How to cite this article: Morzy, M. et al. Benford’s Distribution in Complex Networks. Sci. Rep. 6, 34917; doi: 10.1038/srep34917 (2016).

References

Newcomb, S. Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics 4, 39–40 (1881).
Article ADS MathSciNet Google Scholar
Benford, F. The law of anomalous numbers. Proceedings of the American Philosophical Society 551–572 (1938).
Nigrini, M. J. & Miller, S. J. Benford’s law applied to hydrology data–results and relevance to other geophysical data. Mathematical Geology 39, 469–490 (2007).
Article Google Scholar
Giles, D. E. Benford’s law and naturally occurring prices in certain ebay auctions. Applied Economics Letters 14, 157–161 (2007).
Article Google Scholar
El Sehity, T., Hoelzl, E. & Kirchler, E. Price developments after a nominal shock: Benford’s law and psychological pricing after the euro introduction. International Journal of Research in Marketing 22, 471–480 (2005).
Article Google Scholar
Bolton, R. J. & Hand, D. J. Statistical fraud detection: A review. Statistical science 235–249 (2002).
Geyer, C. L. & Williamson, P. P. Detecting fraud in data sets using benford’s law. Communications in Statistics-Simulation and Computation 33, 229–246 (2004).
Article MathSciNet Google Scholar
Durtschi, C., Hillison, W. & Pacini, C. The effective use of benford’s law to assist in detecting fraud in accounting data. Journal of forensic accounting 5, 17–34 (2004).
Google Scholar
Mebane Jr, W. R. Election forensics: Vote counts and benford’s law. In Summer Meeting of the Political Methodology Society, UC-Davis, July, 20–22 (2006).
Deckert, J., Myagkov, M. & Ordeshook, P. C. Benford’s law and the detection of election fraud. Political Analysis 19, 245–268 (2011).
Article Google Scholar
Deckert, J., Myagkov, M. & Ordeshook, P. C. The irrelevance of benford’s law for detecting fraud in elections. Caltech/MIT Voting Technology ProjectWorking Paper (2010).
Judge, G. & Schechter, L. Detecting problems in survey data using benford’s law. Journal of Human Resources 44, 1–24 (2009).
Article Google Scholar
Becker, P. W. Patterns in listings of failure-rate & mttf values and listings of other data. Reliability, IEEE Transactions on 31, 132–134 (1982).
Article Google Scholar
Golbeck, J. Benford’ law applies to online social network. PLoS One 10 (2015).
Article PubMed PubMed Central Google Scholar
Raimi, R. A. The first digit problem. The American Mathematical Monthly 83, 521–538 (1976).
Article MathSciNet Google Scholar
Pinkham, R. S. On the distribution of first significant digits. The Annals of Mathematical Statistics 32, 1223–1230 (1961).
Article MathSciNet Google Scholar
Hill, T. P. A statistical derivation of the significant-digit law. Statistical Science 354–363 (1995).
Article MathSciNet Google Scholar
Hill, T. P. The significant-digit phenomenon. The American Mathematical Monthly 102, 322–327 (1995).
Article MathSciNet Google Scholar
Pietronero, L., Tosatti, E., Tosatti, V. & Vespignani, A. Explaining the uneven distribution of numbers in nature: the laws of benford and zipf. Physica A: Statistical Mechanics and its Applications 293, 297–304 (2001).
Article ADS Google Scholar
Freeman, L. C. Centrality in social networks conceptual clarification. Social networks 1, 215–239 (1978).
Article Google Scholar
Leskovec, J. & Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data. [Online; accessed 9-May-2016](2014).
Zhong, C. et al. Social bootstrapping: how pinterest and last. fm social communities benefit by borrowing links from facebook. In Proceedings of the 23^rdinternational conference on World wide web, 305–314 (ACM, 2014).
Erdös, P. & Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci 5, 17–61 (1960).
MathSciNet MATH Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’networks. Nature 393, 440–442 (1998).
CAS ADS PubMed MATH Google Scholar
Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
ADS MathSciNet MATH PubMed Google Scholar
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 177–187 (ACM, 2005).
Barthelemy, M. Betweenness centrality in large complex networks. The European Physical Journal B-Condensed Matter and Complex Systems 38, 163–168 (2004).
Article CAS ADS Google Scholar
Bollobás, B. & Riordan, O. M. Mathematical results on scale-free random graphs. Handbook of graphs and networks: from the genome to the internet 1–34 (2003).
Formann, A. K. The newcomb-benford law in its relation to some common distributions. PloS one 5, e10541 (2010).
Article ADS PubMed PubMed Central Google Scholar
Nigrini, M. J. & Miller, S. J. Data diagnostics using second-order tests of benford’s law. Auditing: A Journal of Practice & Theory 28, 305–324 (2009).
Article Google Scholar
Nigrini, M. J. A taxpayer compliance application of benford’s law. The Journal of the American Taxation Association 18, 72 (1996).
Google Scholar
Ley, E. On the peculiar distribution of the us stock indexes’ digits. The American Statistician 50, 311–313 (1996).
Google Scholar
Joenssen, D. W. BenfordTests: Statistical Tests for Evaluating Conformity to Benford’s Law. URL http://CRAN.R-project.org/package=BenfordTests. R package version 1.2.0 (2015).
Cinelli, C. benford.analysis: Benford Analysis for Data Validation and Forensic Analytics. URL http://CRAN.R-project.org/package=benford.analysis. R package version 0.1.3 (2015).
Nigrini, M. Benford’s Law: Applications for forensic accounting, auditing, and fraud detection, vol. 586 (John Wiley & Sons, 2012).
Alexander, J. C. Remarks on the use of benford’s law. Available at SSRN 1505147 (2009).

Download references

Acknowledgements

The work was partially supported by the European Commission under the 7^th Framework Programme, Coordination and Support Action, Grant Agreement Number 316097 [ENGINE], the RENOIR project, “Reverse EngiNeering of sOcial Information pRocessing”, the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 691152, by the National Science Centre research project DEC-2013/09/B/ST6/02317, by the Faculty of Computer Science and Management, Wrocław University of Science and Technology statutory funds, and by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 (the Network Science CTA).

Author information

Authors and Affiliations

Institute of Computing Science, Poznań University of Technology, Poznań, 60–965, Poland
Mikołaj Morzy
Faculty of Computer Science & Management, Wrocław University of Science and Technology, Wrocław, 50–370, Poland
Mikołaj Morzy, Tomasz Kajdanowicz & Bolesław K. Szymański
Social and Cognitive Networks Academic Research Center, Rensselaer Polytechnic Institute, Troy, 12180, NY, USA
Bolesław K. Szymański

Authors

Mikołaj Morzy
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Kajdanowicz
View author publications
You can also search for this author in PubMed Google Scholar
Bolesław K. Szymański
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors analyzed the data, conceived and designed experiments, M.M. conducted experiments and wrote the manuscript. B.S. provided theoretical explanations of the results. All authors edited the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Morzy, M., Kajdanowicz, T. & Szymański, B. Benford’s Distribution in Complex Networks. Sci Rep 6, 34917 (2016). https://doi.org/10.1038/srep34917

Download citation

Received: 14 May 2016
Accepted: 21 September 2016
Published: 17 October 2016
DOI: https://doi.org/10.1038/srep34917

This article is cited by

Large scale analysis of violent death count in daily newspapers to quantify bias and censorship
- Marco Casolino
Journal of Big Data (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.