Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Information flow reveals prediction limits in online social activity

A Publisher Correction to this article was published on 04 February 2019

This article has been updated

Abstract

Modern society depends on the flow of information over online social networks, and users of popular platforms generate substantial behavioural data about themselves and their social ties1,2,3,4,5. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual’s social ties. Here, we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual’s data. We used information theoretic tools to estimate the predictive information in the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8–9 of an individual’s contacts are sufficient to obtain predictability compared with that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to better study the dynamics of online activity. Our results have distinct privacy implications: information is so strongly embedded in a social network that, in principle, one can profile an individual from their available social ties even when the individual forgoes the platform completely.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Information and predictability in online social activity.
Fig. 2: Recency of information.
Fig. 3: Social interactions are visible in information flow.
Fig. 4: An ‘information homophily’ between egos and alters.

Code availability

The code used to generate the results of this paper is available from the corresponding authors upon request.

Data availability

The data that support the findings of this study are available at Figshare.

Change history

  • 04 February 2019

    The original and corrected figures are shown in the accompanying Publisher Correction.

References

  1. 1.

    Kossinets, G. & Watts, D. J. Empirical analysis of an evolving social network. Science 311, 88–90 (2006).

    CAS  Article  Google Scholar 

  2. 2.

    Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).

    CAS  Article  Google Scholar 

  3. 3.

    Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proc. 19th International Conference on the World Wide Web (WWW ‘1 0) 591–600 (ACM, 2010).

  4. 4.

    Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Garcia, D. Leaking privacy and shadow profiles in online social networks. Sci. Adv. 3, e1701172 (2017).

    Article  Google Scholar 

  6. 6.

    Shirky, C. The political power of social media: technology, the public sphere, and political change. Foreign Aff. 90, 28–41 (2011).

    Google Scholar 

  7. 7.

    Lotan, G. et al. The revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 1375–1405 (2011).

    Google Scholar 

  8. 8.

    Del Vicario, M. et al. The spreading of misinformation online. Proc. Natl Acad. Sci. USA 113, 554–559 (2016).

    Article  Google Scholar 

  9. 9.

    Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).

    Article  Google Scholar 

  10. 10.

    Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl Acad. Sci. USA 111, 8788–9790 (2014).

    CAS  Article  Google Scholar 

  11. 11.

    Mønsted, B., Sapieżyński, P., Ferrara, E. & Lehmann, S. Evidence of complex contagion of information in social media: an experiment using Twitter bots. PLoS ONE 12, e0184148 (2017).

    Article  Google Scholar 

  12. 12.

    Jurgens, D., Tsvetkov, Y. & Jurafsky, D. in Social Informatics. SocInfo 2017. Lecture Notes in Computer Science Vol. 10540 (eds. Ciampaglia, G. et al.) 537–558 (Springer, Cham, 2017).

  13. 13.

    Garcia, D., Goel, M., Agrawal, A. K. & Kumaraguru, P. Collective aspects of privacy in the Twitter social network. EPJ Data Sci. 7, 3 (2018).

    Article  Google Scholar 

  14. 14.

    Gruhl, D., Guha, R., Liben-Nowell, D. & Tomkins, A. Information diffusion through blogspace. In Proc. 13th International Conference on World Wide Web (WWW ‘04) 491–501 (ACM, 2004).

  15. 15.

    Bakshy, E., Rosenn, I., Marlow, C. & Adamic, L. The role of social networks in information diffusion. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 519–528 (ACM, 2012).

  16. 16.

    Aral, S., Muchnik, L. & Sundararajan, A. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl Acad. Sci. USA 106, 21544–21549 (2009).

    CAS  Article  Google Scholar 

  17. 17.

    Centola, D. The spread of behavior in an online social network experiment. Science 329, 1194–1197 (2010).

    CAS  Article  Google Scholar 

  18. 18.

    Aral, S. & Walker, D. Identifying influential and susceptible members of social networks. Science 337, 337–341 (2012).

    CAS  Article  Google Scholar 

  19. 19.

    Ver Steeg, G. & Galstyan, A. Information transfer in social media. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 509–518 (ACM, 2012).

  20. 20.

    Borge-Holthoefer, J. et al. The dynamics of information-driven coordination phenomena: a transfer entropy analysis. Sci. Adv. 2, e1501158 (2016).

    Article  Google Scholar 

  21. 21.

    Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, Hoboken, New Jersey, 2012).

  22. 22.

    Shannon, C. E. Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).

    Article  Google Scholar 

  23. 23.

    Brown, P. F., Pietra, V. J. D., Mercer, R. L., Pietra, S. A. D. & Lai, J. C. An estimate of an upper bound for the entropy of English. Comput. Linguist. 18, 31–40 (1992).

    CAS  Google Scholar 

  24. 24.

    Schürmann, T. & Grassberger, P. Entropy estimation of symbol sequences. Chaos 6, 414–427 (1996).

    Article  Google Scholar 

  25. 25.

    Kontoyiannis, I., Algoet, P., Suhov, Y. M. & Wyner, A. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Trans. Inf. Theory 44, 1319–1327 (1998).

    Article  Google Scholar 

  26. 26.

    Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).

    CAS  Article  Google Scholar 

  27. 27.

    Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461 (2000).

    CAS  Article  Google Scholar 

  28. 28.

    Staniek, M. & Lehnertz, K. Symbolic transfer entropy. Phys. Rev. Lett. 100, 158101 (2008).

    Article  Google Scholar 

  29. 29.

    Dunbar, R. I. Coevolution of neocortical size, group size and language in humans. Behav. Brain Sci. 16, 681–694 (1993).

    Article  Google Scholar 

  30. 30.

    Albert, R., Jeong, H. & Barabasi, A.-L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).

    CAS  Article  Google Scholar 

  31. 31.

    Wasserman, S. & Faust, K. Social Network Analysis: Methods and Applications (Cambridge Univ. Press, Cambridge, 1994).

  32. 32.

    De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M. & Blondel, V. D. Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013).

    Article  Google Scholar 

  33. 33.

    de Montjoye, Y.-A., Radaelli, L., Singh, V. K. & Pentland, A. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347, 536–539 (2015).

    Article  Google Scholar 

  34. 34.

    Pariser, E. The Filter Bubble: What the Internet is Hiding From You (Penguin, London, 2011).

  35. 35.

    Mosteller, F. & Wallace, D. L. Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed federalist papers. J. Am. Stat. Assoc. 58, 275–309 (1963).

    Google Scholar 

  36. 36.

    Katz, S. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. 35, 400–401 (1987).

    Article  Google Scholar 

  37. 37.

    Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003).

    Google Scholar 

  38. 38.

    Shalizi, C. R. & Thomas, A. C. Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. 40, 211–239 (2011).

    Article  Google Scholar 

  39. 39.

    Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438 (1969).

    Article  Google Scholar 

  40. 40.

    Twitter REST APIs (Twitter, accessed 7 July 2016); https://dev.twitter.com/rest/public

  41. 41.

    Botometer API (Botometer, accessed 7 July 2016); https://botometer.iuni.iu.edu/

  42. 42.

    Varol, O., Ferrara, E., Davis, C. A., Menczer, F. & Flammini, A. Online human–bot interactions: detection, estimation, and characterization. in Proc. 11th International AAAI Conference on Web and Social Media 280–289 (AAAI, 2017).

  43. 43.

    Davis, C. A., Varol, O., Ferrara, E., Flammini, A. & Menczer, F. BotOrNot: a system to evaluate social bots. In Proc. 25th International Conference Companion on World Wide Web 273–274 (International World Wide Web Conferences Steering Committee, 2016).

  44. 44.

    Ferrara, E., Varol, O., Davis, C. A., Menczer, F. & Flammini, A. The rise of social bots. Commun. ACM 59, 96–104 (2016).

    Article  Google Scholar 

  45. 45.

    Subrahmanian, V. S. et al. The DARPA Twitter bot challenge. Computer 49, 38–46 (2016).

    Article  Google Scholar 

  46. 46.

    Ziv, J. & Merhav, N. A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inf. Theory 39, 1270–1279 (1993).

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the resources provided by the Vermont Advanced Computing Core. This material is based on work supported by the National Science Foundation under grant no. IIS-1447634 (J.P.B.). L.M. acknowledges support from the Data To Decisions Cooperative Research Centre (D2D CRC) and the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

J.P.B. and L.M. designed the research. L.M. oversaw data collection and processing. X.L. collected and analysed human rater data. J.P.B. and L.M. analysed the data and wrote the manuscript.

Corresponding authors

Correspondence to James P. Bagrow or Lewis Mitchell.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1–9, Supplementary Figures 1–13, Supplementary Tables 1–4

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bagrow, J.P., Liu, X. & Mitchell, L. Information flow reveals prediction limits in online social activity. Nat Hum Behav 3, 122–128 (2019). https://doi.org/10.1038/s41562-018-0510-5

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing