Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Democratic classification of free-format survey responses with a network-based framework

A Publisher Correction to this article was published on 30 July 2019

An Author Correction to this article was published on 18 July 2019


Social surveys have been widely used as a method of obtaining public opinion. Sometimes, it is more ideal to collect opinions by presenting questions in free-response formats than in multiple-choice formats. Despite their advantages, free-response questions are rarely used in practice because they usually require manual analysis. Therefore, classification of free-format texts can present a formidable task in large-scale surveys and can be influenced by the interpretation of analysts. In this study, we propose a network-based survey framework in which responses are automatically classified in a statistically principled manner. This can be achieved because, in addition to the text, similarities among responses are also assessed by each respondent. We demonstrate our approach using a poll on the 2016 US presidential election and a survey taken by graduates of a particular university. The proposed approach helps analysts interpret the underlying semantics of responses in large-scale surveys.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic of our network-based approach.
Fig. 2: Results of the poll on the 2016 US presidential election.
Fig. 3: Results of the survey focusing on graduates of the Faculty of Education.

Similar content being viewed by others

Data availability

The network datasets that support the findings of this study are available in a GitHub repository at The graph clustering code that supports the findings of this study is available in a GitHub repository at


  1. Kahn, R. L. & Cannell, C. F. The Dynamics of Interviewing: Theory, Technique, and Cases (Wiley, 1957).

  2. Schuman, H. & Scott, J. Problems in the use of survey questions to measure public opinion. Science 236, 957–959 (1987).

    Article  Google Scholar 

  3. Schuman, H. & Presser, S. Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context (Sage, 1996).

  4. RePass, D. E. Issue salience and party choice. Am. Polit. Sci. Rev. 65, 389–400 (1971).

    Article  Google Scholar 

  5. Kelley, S. Jr. Interpreting Eelections (Princeton Univ. Press, 2014).

  6. Geer, J. G. What do open-ended questions measure? Public Opin. Q. 52, 365–371 (1988).

    Article  Google Scholar 

  7. Singleton, R. & Straits, B. C. Approaches to Social Research. 6th edn (Oxford Univ. Press, 2017).

  8. Schuman, H. The random probe: a technique for evaluating the validity of closed questions. Am. Sociol. Rev. 31, 218–222 (1966).

    Article  Google Scholar 

  9. Lombard, M., Snyder-Duch, J. & Bracken, C. C. Content analysis in mass communication: assessment and reporting of intercoder reliability. Human Commun. Res. 28, 587–604 (2002).

    Article  Google Scholar 

  10. Giddens, A. & Sutton, P. W. Sociology 7th edn (Polity Press, 2013).

  11. Aicher, C., Jacobs, A. Z. & Clauset, A. Learning latent block structure in weighted networks. J. Complex Netw. 3, 221–248 (2015).

    Article  MathSciNet  Google Scholar 

  12. Newman, M. E. J. Network structure from rich but noisy data. Nat. Phys. 14, 542–545 (2018).

    Article  Google Scholar 

  13. Peixoto, T. P. Reconstructing networks with unknown and heterogeneous errors. Phys. Rev. X 8, 041011 (2018).

    Google Scholar 

  14. Rosvall, M. & Bergstrom, C. T. Mapping change in large networks. PLoS ONE 5, 1–7 (2010).

    Article  Google Scholar 

  15. Kawamoto, T. & Kabashima, Y. Comparative analysis on the selection of number of clusters in community detection. Phys. Rev. E 97, 022315 (2018).

    Article  Google Scholar 

  16. Danon, L., Díaz-Guilera, A., Duch, J. & Arenas, A. Comparing community structure identification. J. Stat. Mech. 2005, P09008 (2005).

    Article  Google Scholar 

  17. Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).

    Article  Google Scholar 

  18. Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).

    Article  Google Scholar 

  19. Simon, A. F. & Xenos, M. Dimensional reduction of word-frequency data as a substitute for intersubjective content analysis. Polit. Anal. 12, 63–75 (2004).

    Article  Google Scholar 

  20. Hopkins, D. J. & King, G. A method of automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54, 229–247 (2010).

    Article  Google Scholar 

  21. Roberts, M. E. et al. Structural topic models for open-ended survey responses. Am. J. Polit. Sci. 58, 1064–1082 (2014).

    Article  Google Scholar 

  22. Benoit, K., Conway, D., Lauderdale, B. E., Laver, M. & Mikhaylov, S. Crowd-sourced text analysis: reproducible and agile production of political data. Am. Polit. Sci. Rev. 110, 278–295 (2016).

    Article  Google Scholar 

  23. Lind, F., Gruber, M. & Boomgaarden, H. G. Content analysis by the crowd: assessing the usability of crowdsourcing for coding latent constructs. Commun. Methods Meas. 11, 191–209 (2017).

    Article  Google Scholar 

  24. Jacobson, M. R., Whyte, C. E. & Azzam, T. Using crowdsourcing to code open-ended responses: a mixed methods approach. Am. J. Eval. 39, 413–429 (2018).

    Article  Google Scholar 

  25. Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011).

    Article  Google Scholar 

  26. Moore, C. The computer science and physics of community detection: landscapes, phase transitions, and hardness. Preprint at (2017).

  27. Fishkin, J. S. When the People Speak: Deliberative Democracy and Public Consultation (Oxford Univ. Press, 2011).

  28. Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: first steps. Soc. Netw. 5, 109–137 (1983).

    Article  MathSciNet  Google Scholar 

  29. Wang, Y. J. & Wong, G. Y. Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. 82, 8–19 (1987).

    Article  MathSciNet  Google Scholar 

  30. Karrer, B. & Newman, M. E. J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).

    Article  MathSciNet  Google Scholar 

  31. Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer, 2006).

  32. Mézard, M. & Montanari, A. Information, Physics, and Computation (Oxford Univ. Press, 2009).

  33. Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett. 107, 065701 (2011).

    Article  Google Scholar 

  34. Kawamoto, T. Algorithmic detectability threshold of the stochastic block model. Phys. Rev. E 97, 032301 (2018).

    Article  Google Scholar 

  35. Abbe, E. Community detection and stochastic block models: recent developments. Preprint at (2017).

  36. Peixoto, T. P. Bayesian stochastic blockmodeling. Preprint at (2017).

  37. Kawamoto, T. Algorithmic infeasibility of community detection in higher-order networks. Preprint at (2017).

  38. Kawamoto, T. & Kabashima, Y. Cross-validation estimate of the number of clusters in a network. Sci. Rep. 7, 3327 (2017).

    Article  Google Scholar 

Download references


The authors thank H. Tokioka and S. Shinomoto for discussions. The authors are also grateful to J. Park and M. Rosvall for their comments. Finally, the authors appreciate all the people who contributed to the poll on the 2016 US presidential election and acknowledge support from the Faculty of Education in Kagawa University and the reunion of the faculty. T.K. was supported by JSPS (Japan) KAKENHI grant no. 26011023. T.A. was supported by the Research Institute for Mathematical Sciences, a joint research centre at Kyoto University, and open collaborative research at the National Institute of Informatics (NII) Japan (FY2017). T.K. and T.A. acknowledge financial support from JSPS KAKENHI grant no. 18K18604.

Author information

Authors and Affiliations



T.K. and T.A. designed the survey framework, analysed the data and wrote the manuscript. T.K. implemented the online survey system. T.K. conducted a survey of the poll on the 2016 US presidential election and T.A. mainly conducted a survey focusing on graduates of the Faculty of Education of a particular university.

Corresponding author

Correspondence to Tatsuro Kawamoto.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9 and Supplementary Tables 1–3

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kawamoto, T., Aoki, T. Democratic classification of free-format survey responses with a network-based framework. Nat Mach Intell 1, 322–327 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing