Abstract
Modern society depends on the flow of information over online social networks, and users of popular platforms generate substantial behavioural data about themselves and their social ties1,2,3,4,5. However, it remains unclear what fundamental limits exist when using these data to predict the activities and interests of individuals, and to what accuracy such predictions can be made using an individual’s social ties. Here, we show that 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual’s data. We used information theoretic tools to estimate the predictive information in the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8–9 of an individual’s contacts are sufficient to obtain predictability compared with that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to better study the dynamics of online activity. Our results have distinct privacy implications: information is so strongly embedded in a social network that, in principle, one can profile an individual from their available social ties even when the individual forgoes the platform completely.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Promoting and countering misinformation during Australia’s 2019–2020 bushfires: a case study of polarisation
Social Network Analysis and Mining Open Access 24 June 2022
-
Contrasting social and non-social sources of predictability in human mobility
Nature Communications Open Access 08 April 2022
-
Characterizing reticulation in online social networks during disasters
Applied Network Science Open Access 16 June 2020
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Code availability
The code used to generate the results of this paper is available from the corresponding authors upon request.
Data availability
The data that support the findings of this study are available at Figshare.
Change history
04 February 2019
The original and corrected figures are shown in the accompanying Publisher Correction.
References
Kossinets, G. & Watts, D. J. Empirical analysis of an evolving social network. Science 311, 88–90 (2006).
Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Proc. 19th International Conference on the World Wide Web (WWW ‘1 0) 591–600 (ACM, 2010).
Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).
Garcia, D. Leaking privacy and shadow profiles in online social networks. Sci. Adv. 3, e1701172 (2017).
Shirky, C. The political power of social media: technology, the public sphere, and political change. Foreign Aff. 90, 28–41 (2011).
Lotan, G. et al. The revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 1375–1405 (2011).
Del Vicario, M. et al. The spreading of misinformation online. Proc. Natl Acad. Sci. USA 113, 554–559 (2016).
Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl Acad. Sci. USA 111, 8788–9790 (2014).
Mønsted, B., Sapieżyński, P., Ferrara, E. & Lehmann, S. Evidence of complex contagion of information in social media: an experiment using Twitter bots. PLoS ONE 12, e0184148 (2017).
Jurgens, D., Tsvetkov, Y. & Jurafsky, D. in Social Informatics. SocInfo 2017. Lecture Notes in Computer Science Vol. 10540 (eds. Ciampaglia, G. et al.) 537–558 (Springer, Cham, 2017).
Garcia, D., Goel, M., Agrawal, A. K. & Kumaraguru, P. Collective aspects of privacy in the Twitter social network. EPJ Data Sci. 7, 3 (2018).
Gruhl, D., Guha, R., Liben-Nowell, D. & Tomkins, A. Information diffusion through blogspace. In Proc. 13th International Conference on World Wide Web (WWW ‘04) 491–501 (ACM, 2004).
Bakshy, E., Rosenn, I., Marlow, C. & Adamic, L. The role of social networks in information diffusion. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 519–528 (ACM, 2012).
Aral, S., Muchnik, L. & Sundararajan, A. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl Acad. Sci. USA 106, 21544–21549 (2009).
Centola, D. The spread of behavior in an online social network experiment. Science 329, 1194–1197 (2010).
Aral, S. & Walker, D. Identifying influential and susceptible members of social networks. Science 337, 337–341 (2012).
Ver Steeg, G. & Galstyan, A. Information transfer in social media. In Proc. 21st International Conference on World Wide Web (WWW ‘12) 509–518 (ACM, 2012).
Borge-Holthoefer, J. et al. The dynamics of information-driven coordination phenomena: a transfer entropy analysis. Sci. Adv. 2, e1501158 (2016).
Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, Hoboken, New Jersey, 2012).
Shannon, C. E. Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).
Brown, P. F., Pietra, V. J. D., Mercer, R. L., Pietra, S. A. D. & Lai, J. C. An estimate of an upper bound for the entropy of English. Comput. Linguist. 18, 31–40 (1992).
Schürmann, T. & Grassberger, P. Entropy estimation of symbol sequences. Chaos 6, 414–427 (1996).
Kontoyiannis, I., Algoet, P., Suhov, Y. M. & Wyner, A. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Trans. Inf. Theory 44, 1319–1327 (1998).
Song, C., Qu, Z., Blumm, N. & Barabási, A.-L. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461 (2000).
Staniek, M. & Lehnertz, K. Symbolic transfer entropy. Phys. Rev. Lett. 100, 158101 (2008).
Dunbar, R. I. Coevolution of neocortical size, group size and language in humans. Behav. Brain Sci. 16, 681–694 (1993).
Albert, R., Jeong, H. & Barabasi, A.-L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).
Wasserman, S. & Faust, K. Social Network Analysis: Methods and Applications (Cambridge Univ. Press, Cambridge, 1994).
De Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M. & Blondel, V. D. Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013).
de Montjoye, Y.-A., Radaelli, L., Singh, V. K. & Pentland, A. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347, 536–539 (2015).
Pariser, E. The Filter Bubble: What the Internet is Hiding From You (Penguin, London, 2011).
Mosteller, F. & Wallace, D. L. Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed federalist papers. J. Am. Stat. Assoc. 58, 275–309 (1963).
Katz, S. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. 35, 400–401 (1987).
Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003).
Shalizi, C. R. & Thomas, A. C. Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. 40, 211–239 (2011).
Granger, C. W. J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438 (1969).
Twitter REST APIs (Twitter, accessed 7 July 2016); https://dev.twitter.com/rest/public
Botometer API (Botometer, accessed 7 July 2016); https://botometer.iuni.iu.edu/
Varol, O., Ferrara, E., Davis, C. A., Menczer, F. & Flammini, A. Online human–bot interactions: detection, estimation, and characterization. in Proc. 11th International AAAI Conference on Web and Social Media 280–289 (AAAI, 2017).
Davis, C. A., Varol, O., Ferrara, E., Flammini, A. & Menczer, F. BotOrNot: a system to evaluate social bots. In Proc. 25th International Conference Companion on World Wide Web 273–274 (International World Wide Web Conferences Steering Committee, 2016).
Ferrara, E., Varol, O., Davis, C. A., Menczer, F. & Flammini, A. The rise of social bots. Commun. ACM 59, 96–104 (2016).
Subrahmanian, V. S. et al. The DARPA Twitter bot challenge. Computer 49, 38–46 (2016).
Ziv, J. & Merhav, N. A measure of relative entropy between individual sequences with application to universal classification. IEEE Trans. Inf. Theory 39, 1270–1279 (1993).
Acknowledgements
We gratefully acknowledge the resources provided by the Vermont Advanced Computing Core. This material is based on work supported by the National Science Foundation under grant no. IIS-1447634 (J.P.B.). L.M. acknowledges support from the Data To Decisions Cooperative Research Centre (D2D CRC) and the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
J.P.B. and L.M. designed the research. L.M. oversaw data collection and processing. X.L. collected and analysed human rater data. J.P.B. and L.M. analysed the data and wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Notes 1–9, Supplementary Figures 1–13, Supplementary Tables 1–4
Rights and permissions
About this article
Cite this article
Bagrow, J.P., Liu, X. & Mitchell, L. Information flow reveals prediction limits in online social activity. Nat Hum Behav 3, 122–128 (2019). https://doi.org/10.1038/s41562-018-0510-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-018-0510-5
This article is cited by
-
Contrasting social and non-social sources of predictability in human mobility
Nature Communications (2022)
-
Promoting and countering misinformation during Australia’s 2019–2020 bushfires: a case study of polarisation
Social Network Analysis and Mining (2022)
-
Exploring the effect of streamed social media data variations on social network analysis
Social Network Analysis and Mining (2021)
-
Characterizing reticulation in online social networks during disasters
Applied Network Science (2020)
-
Privacy beyond the individual
Nature Human Behaviour (2019)