Letter | Published:

Information flow reveals prediction limits in online social activity

Abstract

Modern society depends on the flow of information over online social networks, and users of popular platforms generate substantial behavioural data about themselves and their social ties1,2,3,4,5. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual’s social ties. Here, we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual’s data. We used information theoretic tools to estimate the predictive information in the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8–9 of an individual’s contacts are sufficient to obtain predictability compared with that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to better study the dynamics of online activity. Our results have distinct privacy implications: information is so strongly embedded in a social network that, in principle, one can profile an individual from their available social ties even when the individual forgoes the platform completely.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

The code used to generate the results of this paper is available from the corresponding authors upon request.

Data availability

The data that support the findings of this study are available at Figshare.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

  • 04 February 2019

    The original and corrected figures are shown in the accompanying Publisher Correction.

References

  1. 1.

    Kossinets, G. & Watts, D. J. Empirical analysis of an evolving social network. Science 311, 88–90 (2006).

  2. 2.

    Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).

  3. 3.

    Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proc. 19th International Conference on the World Wide Web (WWW ‘1 0) 591–600 (ACM, 2010).

  4. 4.

    Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).

  5. 5.

    Garcia, D. Leaking privacy and shadow profiles in online social networks. Sci. Adv. 3, e1701172 (2017).

  6. 6.

    Shirky, C. The political power of social media: technology, the public sphere, and political change. Foreign Aff. 90, 28–41 (2011).

  7. 7.

    Lotan, G. et al. The revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 1375–1405 (2011).

  8. 8.

    Del Vicario, M. et al. The spreading of misinformation online. Proc. Natl Acad. Sci. USA 113, 554–559 (2016).

  9. 9.

    Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).

  10. 10.

    Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl Acad. Sci. USA 111, 8788–9790 (2014).

  11. 11.

    Mønsted, B., Sapieżyński, P., Ferrara, E. & Lehmann, S. Evidence of complex contagion of information in social media: an experiment using Twitter bots. PLoS ONE 12, e0184148 (2017).

  12. 12.

    Jurgens, D., Tsvetkov, Y. & Jurafsky, D. in Social Informatics. SocInfo 2017. Lecture Notes in Computer Science Vol. 10540 (eds. Ciampaglia, G. et al.) 537–558 (Springer, Cham, 2017).

  13. 13.

    Garcia, D., Goel, M., Agrawal, A. K. & Kumaraguru, P. Collective aspects of privacy in the Twitter social network. EPJ Data Sci. 7, 3 (2018).

  14. 14.

    Gruhl, D., Guha, R., Liben-Nowell, D. & Tomkins, A. Information diffusion through blogspace. In Proc. 13th International Conference on World Wide Web (WWW ‘04) 491–501 (ACM, 2004).

  15. 15.

    Bakshy, E., Rosenn, I., Marlow, C. & Adamic, L. The role of social networks in information diffusion. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 519–528 (ACM, 2012).

  16. 16.

    Aral, S., Muchnik, L. & Sundararajan, A. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl Acad. Sci. USA 106, 21544–21549 (2009).

  17. 17.

    Centola, D. The spread of behavior in an online social network experiment. Science 329, 1194–1197 (2010).

  18. 18.

    Aral, S. & Walker, D. Identifying influential and susceptible members of social networks. Science 337, 337–341 (2012).

  19. 19.

    Ver Steeg, G. & Galstyan, A. Information transfer in social media. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 509–518 (ACM, 2012).

  20. 20.

    Borge-Holthoefer, J. et al. The dynamics of information-driven coordination phenomena: a transfer entropy analysis. Sci. Adv. 2, e1501158 (2016).

  21. 21.

    Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, Hoboken, New Jersey, 2012).

  22. 22.

    Shannon, C. E. Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).

  23. 23.

    Brown, P. F., Pietra, V. J. D., Mercer, R. L., Pietra, S. A. D. & Lai, J. C. An estimate of an upper bound for the entropy of English. Comput. Linguist. 18, 31–40 (1992).

  24. 24.

    Schürmann, T. & Grassberger, P. Entropy estimation of symbol sequences. Chaos 6, 414–427 (1996).

  25. 25.

    Kontoyiannis, I., Algoet, P., Suhov, Y. M. & Wyner, A. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Trans. Inf. Theory 44, 1319–1327 (1998).

  26. 26.

    Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).

  27. 27.

    Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461 (2000).

  28. 28.

    Staniek, M. & Lehnertz, K. Symbolic transfer entropy. Phys. Rev. Lett. 100, 158101 (2008).

  29. 29.

    Dunbar, R. I. Coevolution of neocortical size, group size and language in humans. Behav. Brain Sci. 16, 681–694 (1993).

  30. 30.

    Albert, R., Jeong, H. & Barabasi, A.-L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).

  31. 31.

    Wasserman, S. & Faust, K. Social Network Analysis: Methods and Applications (Cambridge Univ. Press, Cambridge, 1994).

  32. 32.

    De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M. & Blondel, V. D. Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013).

  33. 33.

    de Montjoye, Y.-A., Radaelli, L., Singh, V. K. & Pentland, A. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347, 536–539 (2015).

  34. 34.

    Pariser, E. The Filter Bubble: What the Internet is Hiding From You (Penguin, London, 2011).

  35. 35.

    Mosteller, F. & Wallace, D. L. Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed federalist papers. J. Am. Stat. Assoc. 58, 275–309 (1963).

  36. 36.

    Katz, S. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. 35, 400–401 (1987).

  37. 37.

    Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003).

  38. 38.

    Shalizi, C. R. & Thomas, A. C. Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. 40, 211–239 (2011).

  39. 39.

    Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438 (1969).

  40. 40.

    Twitter REST APIs (Twitter, accessed 7 July 2016); https://dev.twitter.com/rest/public

  41. 41.

    Botometer API (Botometer, accessed 7 July 2016); https://botometer.iuni.iu.edu/

  42. 42.

    Varol, O., Ferrara, E., Davis, C. A., Menczer, F. & Flammini, A. Online human–bot interactions: detection, estimation, and characterization. in Proc. 11th International AAAI Conference on Web and Social Media 280–289 (AAAI, 2017).

  43. 43.

    Davis, C. A., Varol, O., Ferrara, E., Flammini, A. & Menczer, F. BotOrNot: a system to evaluate social bots. In Proc. 25th International Conference Companion on World Wide Web 273–274 (International World Wide Web Conferences Steering Committee, 2016).

  44. 44.

    Ferrara, E., Varol, O., Davis, C. A., Menczer, F. & Flammini, A. The rise of social bots. Commun. ACM 59, 96–104 (2016).

  45. 45.

    Subrahmanian, V. S. et al. The DARPA Twitter bot challenge. Computer 49, 38–46 (2016).

  46. 46.

    Ziv, J. & Merhav, N. A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inf. Theory 39, 1270–1279 (1993).

Download references

Acknowledgements

We gratefully acknowledge the resources provided by the Vermont Advanced Computing Core. This material is based on work supported by the National Science Foundation under grant no. IIS-1447634 (J.P.B.). L.M. acknowledges support from the Data To Decisions Cooperative Research Centre (D2D CRC) and the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

J.P.B. and L.M. designed the research. L.M. oversaw data collection and processing. X.L. collected and analysed human rater data. J.P.B. and L.M. analysed the data and wrote the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to James P. Bagrow or Lewis Mitchell.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Notes 1–9, Supplementary Figures 1–13, Supplementary Tables 1–4

  2. Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Information and predictability in online social activity.
Fig. 2: Recency of information.
Fig. 3: Social interactions are visible in information flow.
Fig. 4: An ‘information homophily’ between egos and alters.