Growth dynamics of the World-Wide Web

Huberman, Bernardo A.; Adamic, Lada A.

doi:10.1038/43604

Download PDF

Brief Communication
Published: 09 September 1999

Internet

Growth dynamics of the World-Wide Web

Bernardo A. Huberman¹ &
Lada A. Adamic¹

Nature volume 401, page 131 (1999)Cite this article

5522 Accesses
585 Citations
13 Altmetric
Metrics details

Abstract

The exponential growth of the World-Wide Web has transformed it into an ecology of knowledge in which highly diverse information is linked in an extremely complex and arbitrary manner. But even so, as we show here, there is order hidden in the web. We find that web pages are distributed among sites according to a universal power law: many sites have only a few pages, whereas very few sites have hundreds of thousands of pages. This universal distribution can be explained by using a simple stochastic dynamical growth model.

Spike sorting with Kilosort4

Article Open access 08 April 2024

Scientific discovery in the age of artificial intelligence

Article 02 August 2023

Principal component analysis

Article 22 December 2022

Main

The existence of a power law in the growth of the web not only implies the lack of any length scale for the web, but also allows the expected number of sites of any given size to be determined without exhaustively crawling the web. The distribution of site sizes for crawls by Alexa and Infoseek is shown in Fig. 1. Both data sets display a power law over several orders of magnitude, so on a log–log scale the distribution of the number of pages per site appears as a straight line. This distribution should not be confused with Zipf's like distributions^1,2, where a power law arises from rank ordering the variables³.

**Figure 1: Log–log plot of the distribution of pages in sites for Alexa and Infoseek crawls, which covered 259,794 and 525,882 sites, respectively.**

In order to describe the growth process underlying this distribution⁴, we assume that the day-to-day fluctuations in site size are proportional to the size of the site. One would not be surprised to find that a site with a million pages has lost or gained a few hundred pages on any given day. On the other hand, finding an additional hundred pages on a site with just ten pages within a day would be unusual. So we assume that the number of pages on the site, n, on a given day, is equal to the number of pages on that site on the previous day plus or minus a random fraction of n.

If a set of sites is allowed to grow with the same average growth rate but with individual random daily fluctuations in the number of pages added, their sizes will be distributed log-normally after a sufficiently long period of time⁵. A log-normal distribution gives high probability to small sizes and small, but significant, probability to very large sizes. But although it is skewed and has a long tail, the log-normal distribution is not a power-law one.

Two additional factors that determine the growth of the web need to be considered: sites appear at different times and grow at different rates. The number of web sites has been growing exponentially since its inception, which means that there are many more young sites than old ones. Once the age of the site is factored in to the multiplicative growth process, P(n), the probability of finding a site of size n, is a power law, that is, it is proportional to n^−β. Similarly, considering sites with a wide range of distributions in growth rates yields the same result: a power-law distribution in site size. The simple assumption of stochastic multiplicative growth, combined with the fact that sites appear at different times and/or grow at different rates, therefore leads to an explanation of the observed power-law behaviour.

The existence of this universal power law, which is yet another example of the strong regularities^6,7 revealed by studies of the web, also has practical consequences. The expected number of sites of any arbitrary size can be estimated, even if a site of that size has not yet been observed. This can be achieved by extrapolating the power law to any large n; for example, P(n₂)=P(n₁)×(n₂/n₁)^−β. The expected number of sites of size n₂ in a crawl of N sites would be NP(n₂). For instance, from the Alexa data we can infer that, if data were collected from 250,000 sites, the probability of finding a site with a million pages would be 10⁻⁴. This information is not readily available from the crawl alone, as it stops at 10⁵ pages per site.

References

Zipf, G. K. Human Behavior and the Principle of Least Effort (Addison-Wesley, Cambridge, Massachusetts, 1949).
Mantegna, R. N. et al. Phys. Rev. E 52, 2939–2950 (1995).
Article ADS CAS Google Scholar
Gunther, R. et al. Int. J. Theor. Phys. 35, 395–417 (1996).
Article Google Scholar
http://www.parc.xerox.com/iea/www/growth.html
Crow, E. L & Shimizu, K. Lognormal Distributions: Theory and Applications (Dekker, New York, 1988).
Huberman, B. A., Pirolli, P., Pitkow, J. & Lukose, R. M. Science 280, 95–97 (1998).
Article ADS CAS Google Scholar
Huberman, B. A. & Lukose, R. M. Science 277, 535–538 (1997).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Xerox Palo Alto Research Center, 3333 Coyote Hill, Palo Alto, 94304, California, USA
Bernardo A. Huberman & Lada A. Adamic

Authors

Bernardo A. Huberman
View author publications
You can also search for this author in PubMed Google Scholar
Lada A. Adamic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lada A. Adamic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huberman, B., Adamic, L. Growth dynamics of the World-Wide Web. Nature 401, 131 (1999). https://doi.org/10.1038/43604

Download citation

Issue Date: 09 September 1999
DOI: https://doi.org/10.1038/43604

This article is cited by

Exponential Synchronization for Variable-order Fractional Complex Dynamical Networks via Dynamic Event-triggered Control Strategy
- Ruihong Li
- Huaiqin Wu
- Jinde Cao
Neural Processing Letters (2023)
Finite-time Bipartite Synchronization of Complex-valued Complex Networks with Time Delay
- Li Lei
- Degang Yang
- Wanli Zhang
International Journal of Control, Automation and Systems (2023)
Impulsive Exponential Synchronization of Fractional-Order Complex Dynamical Networks with Derivative Couplings via Feedback Control Based on Discrete Time State Observations
- Ruihong Li
- Huaiqin Wu
- Jinde Cao
Acta Mathematica Scientia (2022)
Configuration models as an urn problem
- Giona Casiraghi
- Vahan Nanumyan
Scientific Reports (2021)
KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins
- Hongli Ma
- Guojun Li
- Zhengchang Su
BMC Genomics (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Growth dynamics of the World-Wide Web

Abstract

Similar content being viewed by others

Spike sorting with Kilosort4

Scientific discovery in the age of artificial intelligence

Principal component analysis

Main

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

Exponential Synchronization for Variable-order Fractional Complex Dynamical Networks via Dynamic Event-triggered Control Strategy

Finite-time Bipartite Synchronization of Complex-valued Complex Networks with Time Delay

Impulsive Exponential Synchronization of Fractional-Order Complex Dynamical Networks with Derivative Couplings via Feedback Control Based on Discrete Time State Observations

Configuration models as an urn problem

KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins

Comments

Search

Quick links

Abstract

Similar content being viewed by others

Main

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links