Early onset of structural inequality in the formation of collaborative knowledge in all Wikimedia projects

Abstract

The Wikimedia project, including Wikipedia, is one of the largest communal data sets and has served as a representative medium to convey collective knowledge in the twenty-first century. Researchers have believed that the analysis of these collaborative digital data sets provides a unique window into the processes of collaborative knowledge formation; yet, in reality, most previous studies have usually focused on its narrow subsets. Here, by analysing all 863 Wikimedia projects (various types and in different languages), we find evidence for a universal growth pattern in communal data formation. We observe that inequality arises early in the development of Wikimedia projects and stabilizes at high levels. To understand the mechanism behind the observed structural inequality, we develop an agent-based model that considers the characteristics of the editors and successfully reproduces the empirical results. Our findings from the Wikimedia projects data, along with other types of collaboration data, such as patents and academic papers, show that a small number of editors have a disproportionately large influence on the formation of collective knowledge. This analysis offers insights into how various collaboration environments can be sustained in the future.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Correlation between the number of edits (Ne), editors (Np) and articles (Na), and the total size of the data set (S).
Fig. 2: Variant of the Gini coefficient applied to Wikimedia projects as a function of the number of edits and real time.
Fig. 3: The properties of revisiting editors are characterized by their activity.
Fig. 4: Variant of the Gini coefficient from the model as a function of the number of edits.

Data availability

Wikimedia dumps for the main analysis are available from the Wikimedia Downloads (https://dumps.wikimedia.org/). Additional public data sets are also available from the following sources: OECD (https://data.oecd.org/); UNESCO (http://data.uis.unesco.org/); and the CIA (https://www.cia.gov/library/publications/the-world-factbook/).The data set of the total number of speakers for each language is owned by SIL International and can be accessed from their website by means of a subscription (https://www.ethnologue.com/). Bibliographic metadata of academic papers and patents were retrieved from the in-house system of the Korea Institute of Science and Technology Information and were licensed from Scopus (https://www.scopus.com/) and the European Patent Office (https://www.epo.org/searching-for-patents/business/patstat.html); distribution is prohibited. The pre-processed data used to create the figures are available from GitHub (https://github.com/bluekura/wikimedia-inequality), along with codes.

References

  1. 1.

    Bruns, A. Blogs, Wikipedia, Second Life, and Beyond: From Production to Produsage (Peter Lang, New York, 2008).

    Google Scholar 

  2. 2.

    Lemke, C. & Coughlin, E. The change agents. Educ. Leadersh. 67, 54–59 (2009).

    Google Scholar 

  3. 3.

    Walker, L. Spreading knowledge, the wiki way. Washington Post (9 September 2004); http://www.washingtonpost.com/wp-dyn/articles/A5430-2004Sep8.html

  4. 4.

    Seelye, K. Q. Snared in the web of a Wikipedia liar. The New York Times (4 December 2005); https://www.nytimes.com/2005/12/04/weekinreview/snared-in-the-web-of-a-wikipedia-liar.html

  5. 5.

    Chesney, T. An empirical examination of Wikipedia’s credibility. First Monday 11, https://doi.org/10.5210/fm.v11i11.1413 (2006).

  6. 6.

    Giles, J. Internet encyclopaedias go head to head. Nature 438, 900–901 (2005).

    CAS  Article  Google Scholar 

  7. 7.

    Gandica, Y., Carvalho, J. & Sampaio dos Aidos, F. Wikipedia editing dynamics. Phys. Rev. E 91, 012824 (2015).

    CAS  Article  Google Scholar 

  8. 8.

    Yun, J., Lee, S. H. & Jeong, H. Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia. Phys. Rev. E 93, 012307 (2016).

    Article  Google Scholar 

  9. 9.

    Heaberlin, B. & DeDeo, S. The evolution of Wikipedia’s norm network. Future Internet 8, 14 (2016).

    Article  Google Scholar 

  10. 10.

    Zha, Y., Zhoua, T. & Zhou, C. Unfolding large-scale online collaborative human dynamics. Proc. Natl Acad. Sci. USA 113, 14627–14632 (2016).

    CAS  Article  Google Scholar 

  11. 11.

    Ortega, F., Gonzalez-Barahona, J. M. & Robles, G. On the inequality of contributions to Wikipedia. In Proc. 41st Annual Hawaii International Conference on System Sciences (IEEE, 2008).

  12. 12.

    Kittur, A., Suh, B. & Chi E. H. Can you ever trust a wiki? Impacting perceived trustworthiness in Wikipedia. In Proc. 2008 ACM Conference on Computer Supported Cooperative Work 477–480 (ACM, 2016).

  13. 13.

    Adler, B. T. et al. Assigning trust to Wikipedia content. In Proc. 4th International Symposium on Wikis 26 (ACM, 2008).

  14. 14.

    Yasseri, T., Sumi, R., Rung, A., Kornai, A. & Kertész, J. Dynamics of conflicts in Wikipedia. PLoS ONE 7, e38869 (2012).

    CAS  Article  Google Scholar 

  15. 15.

    Barber, W. & Badre, A. Culturability: the merging of culture and usability. In Proc. 4th Conference on Human Factors and the Web http://zing.ncsl.nist.gov/hfweb/att4/proceedings/barber/ (1998).

  16. 16.

    Marcus, A. & Gould, E. W. Crosscurrents: cultural dimensions and global Web-user interface design. Interactions 7, 32–46 (2000).

    Article  Google Scholar 

  17. 17.

    Schmid-Isler, S. The language of digital genres: a semiotic investigation of style and iconology on the World Wide Web. In Proc. 33rd Hawaii International Conference on System Sciences https://doi.org/10.1109/HICSS.2000.926695 (IEEE, 2000).

  18. 18.

    Pfeil, U., Zaphiris, P. & Ang, C. S. Cultural differences in collaborative authoring of Wikipedia. J. Comput. Mediat. Commun. 12, 88–113 (2006).

    Article  Google Scholar 

  19. 19.

    Kim, S. et al. Understanding editing behaviors in multilingual Wikipedia. PLoS ONE 11, e0155305 (2016).

    Article  Google Scholar 

  20. 20.

    Blei, D. M. & Jordan, M. I. Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–143 (2006).

    Article  Google Scholar 

  21. 21.

    van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  22. 22.

    Simons, G. F. & Fennig, C. D. (eds) Summary by language size. In Ethnologue: Languages of the World, Twenty-first Edition Online Version (SIL International, Dallas, 2018); https://www.ethnologue.com/statistics/size

  23. 23.

    Gini, C. Variabilità e Mutabilità: Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche (Tipogr. di P. Cuppini, Bologna, 1912).

    Google Scholar 

  24. 24.

    Mankiw, N. G. Principles of Economics (Dryden, Forth Worth, 1998).

    Google Scholar 

  25. 25.

    George, B. P. & George, B. P. Past visits and the intention to revisit a destination: place attachment as the mediator and novelty seeking as the moderator. J. Tour. Stud. 15, 37–50 (2004).

    Google Scholar 

  26. 26.

    Crane, R. & Sornette, D. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl Acad. Sci. USA 105, 15649–15653 (2008).

    CAS  Article  Google Scholar 

  27. 27.

    Wu, F. & Huberman, B. A. Novelty and collective attention. Proc. Natl Acad. Sci. USA 104, 17599–17601 (2007).

    CAS  Article  Google Scholar 

  28. 28.

    Karsai, M., Kaski, K., Barabási, A.-L. & Kertész, J. Universal features of correlated bursty behaviour. Sci. Rep. 2, 397 (2012).

    Article  Google Scholar 

  29. 29.

    Karsai, M., Perra, N. & Vespignani, A. Time varying networks and the weakness of strong ties. Sci. Rep. 4, 4001 (2014).

    Article  Google Scholar 

  30. 30.

    Jo, H.-H., Perotti, J. I., Kaski, K. & Kertész, J. Correlated bursts and the role of memory range. Phys. Rev. E 92, 022814 (2015).

    Article  Google Scholar 

  31. 31.

    Gandica, Y. et al. On the origin of burstiness in human behavior: the Wikipedia edits case. Preprint at https://arxiv.org/abs/1601.00864 (2016).

  32. 32.

    Hube, C. Bias in Wikipedia. In Proc. 26th International Conference on World Wide Web Companion 717–721 (IW3C2, 2017).

  33. 33.

    Callahan, E. S. & Herring, S. C. Cultural bias in Wikipedia content on famous persons. J. Assoc. Inf. Sci. Technol. 62, 1899–1915 (2011).

    Article  Google Scholar 

  34. 34.

    Reagle, J. & Rhue, L. Gender bias in Wikipedia and Britannica. Int. J. Commun. 5, 1138–1158 (2011).

    Google Scholar 

  35. 35.

    Kittur, A. & Kraut, R. E. Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In Proc. 2008 ACM Conference on Computer Supported Cooperative Work 37–46 (ACM, 2008).

  36. 36.

    Suh, B., Convertino, G., Chi, E. H. & Pirolli, P. The singularity is not near: slowing growth of Wikipedia. In Proc. 5th International Symposium on Wikis and Open Collaboration 8 (ACM, 2009).

  37. 37.

    Gherardi, M., Mandrà, S., Bassetti, B. & Cosentino Lagomarsino, M. Evidence for soft bounds in Ubuntu package sizes and mammalian body masses. Proc. Natl Acad. Sci. USA 110, 21054–21058 (2013).

    CAS  Article  Google Scholar 

  38. 38.

    Tsay, J., Dabbish, L. & Herbsleb, J. Influence of social and technical factors for evaluating contribution in GitHub. In Proc. 36th International Conference on Software Engineering 356–366 (ACM, 2014).

  39. 39.

    Padhye, R., Mani, S. & Sinha, V. S. A study of external community contribution to open-source projects on GitHub. In Proc. 11th Working Conference on Mining Software Repositories 332–335 (ACM, 2014).

  40. 40.

    Benkler, Y. Wealth of Networks: How Social Production Transforms Markets and Freedom (Yale Univ. Press, New Haven, 2006).

  41. 41.

    Yergeau, F. UTF-8, A Transformation Format of ISO 10646 https://tools.ietf.org/html/rfc3629 (2003).

Download references

Acknowledgements

This work received institutional supports from the Korea Institute of Science and Technology Information. The National Research Foundation (NRF) of Korea grant funded by the Korean Government also supported this work through grant nos. NRF-2017R1E1A1A03070975 (J.Y.), NRF-2018R1C1B5083863 (S.H.L.) and NRF-2017R1A2B3006930 (H.J.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

All three authors designed the experiment and wrote the manuscript. J.Y. collected and analysed the data.

Corresponding authors

Correspondence to Sang Hoon Lee or Hawoong Jeong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–27, Supplementary Tables 1–7, Supplementary Methods, Supplementary References

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yun, J., Lee, S.H. & Jeong, H. Early onset of structural inequality in the formation of collaborative knowledge in all Wikimedia projects. Nat Hum Behav 3, 155–163 (2019). https://doi.org/10.1038/s41562-018-0488-z

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing