Understanding the prevalence and impact of personal attacks in online discussions is challenging. A method that combines crowdsourcing and machine learning provides a way forward, but caveats must be considered.
Since the late 1990s, a growing body of work has emerged that applies data-mining technologies to language collected from the World Wide Web on a massive scale. The aim of these efforts has been to tackle problems such as monitoring opinions about products, restaurants or films; identifying spam; detecting paedophiles, trolls or other kinds of malicious web users; and, in the past few years, doing things such as predicting who might win a presidential election or discovering when an epidemic might be breaking out. Writing in Proceedings of the 2017 World Wide Web Conference, Wulczyn et al.1 use similar techniques to study the nature of personal attacks in online discussions. The authors illustrate their methodology by analysing the discussion pages of Wikipedia, which contain an ample supply of abusive comments.
The approaches that were initially used for automated text analysis of this kind were not altogether different from techniques that had been developed in the 1990s for search engines2, or even in the 1960s for automated essay scoring3. In these approaches, information was tabulated about features that could be easily extracted from the text, including counts of individual words or sequences of words, particular grammatical forms, unusual words, or instances of special word classes such as profanity. Early successes paved the way for pioneering industries4, first in opinion detection, and later in text analytics more generally. Off-the-shelf solutions became increasingly prevalent, and even became part of standard statistical-modelling software packages (such as the Statistical Analysis System) and available through free, open-source tools5.
With each generation of research, more-advanced modelling technologies have been used6, including support vector machines and other systems based on kernel methods (a class of algorithm used for distinguishing complex patterns in data); probabilistic graphical models; and, in the past few years, neural networks and a kind of artificial intelligence called deep learning. These technologies are sometimes accompanied by innovative types of features that can be extracted from text and are designed to reveal insights about the phenomena of interest. Work of this sort constitutes the research area of social-media analysis.
Wulczyn and colleagues' study grows out of this tradition, and focuses on a recognized problem of societal importance — the identification of personal attacks. The authors developed a rigorous annotation method for labelling Wikipedia comments according to whether they matched an agreed definition of a personal attack. The annotation was done using crowdsourcing, which allowed more than 100,000 comments to be analysed efficiently and cheaply by a pool of crowdworkers.
The authors used these data to train machine-learning models to recognize personal attacks. The training involved identifying patterns associated with attacks in the labelled examples, and using these patterns to predict where attacks could be found in unlabelled comments. Wulczyn et al. used models known as multilayer neural networks that enabled the identification of patterns involving combinations of features — consisting of either words or sequences of characters. After the training process, the authors evaluated the models, and those that performed accurately were applied to data on an enormous scale (a total of 63 million comments). The authors found that sequences of characters were more valuable than words in terms of achieving predictive accuracy using the trained models.
Finally, Wulczyn et al. analysed the model-labelled data. Their results suggest that Wikipedia comments are six times more likely to contain personal attacks if they are submitted anonymously, but that the bulk of attacks are made by registered users (who are much more numerous). Similarly, about half of attacks come from users who made 5 or fewer comments in 2015, whereas about 30% are from users who contributed more than 100 comments in 2015 (Fig. 1). The authors find that attacks are often clustered in time, suggesting that early intervention by moderators could greatly reduce the prevalence of attacks.
Nevertheless, Wulczyn and colleagues' study falls prey to limitations that are common in current work in this area, and therefore exemplifies ways in which such work could be strengthened. In particular, a deeper understanding of the language of personal attacks from a sociolinguistic perspective, and a more stringent adoption of methodologies from behavioural sciences, would have resulted in a more rigorous approach. For example, one limitation of the authors' work concerns the evaluation of the annotations themselves. The authors fine-tuned a definition of a personal attack to achieve a high level of agreement between independent judgements of the same text by multiple crowdworkers. However, they did not evaluate the validity of the judgements — so it was not clear whether the fine-tuned definition corresponded to the notion of a personal attack that was of interest to the authors, or to some other notion on which agreement happened to be easy.
In addition, although the predictive power of different models was compared, Wulczyn and colleagues do not describe what was learnt by their models. They don't discuss which features were part of learnt patterns, or whether the features that the models were trained to detect made sense. Models that are trained to make predictions from relatively large numbers of fine-grained features (such as those used by the authors) have a common problem: the features end up serving as proxies for more-abstract text characteristics that correlate with the features in the training data, but might not do so in a different corpus. Furthermore, the authors do not provide an analysis of the types of attack that the models were able to predict correctly. We therefore do not know whether the judgements of the trained models were triggered by the observation of the same text characteristics that the crowdworkers used, so we can't be sure for which situations the models can be trusted.
Since the 1960s, the field of automated essay scoring has turned its attention from simply achieving predictive accuracy to doing so using features that enable the models to be scrutable and trustworthy3,7. Researchers have accomplished this by focusing on validity and interpretability. The field of social-media analysis would benefit from a similar focus, which could be achieved through a deeper understanding of the target-language phenomena themselves. With such a shift in methodology in future work, it would be possible to derive meaningful findings from research that applies computational models to social-media data. For the time being, Wulczyn and colleagues' study is an example of the highest-quality echelon of papers published in this area in terms of generating and analysing data on a huge scale.Footnote 1
Wulczyn, E., Thain, N. & Dixon, L. Proc. 2017 World Wide Web Conf., 1391–1399 (Int. World Wide Web Conf. Committee, 2017). go.nature.com/2pvpizu
Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval (Cambridge Univ. Press, 2008).
Shermis, M. D. & Burstein, J. C. Automated Essay Scoring: A Cross-disciplinary Perspective (Taylor & Francis, 2003).
Pang, B. & Lee, L. Opinion Mining and Sentiment Analysis (Now Publishers, 2008).
Mayfield, E. & Rosé, C. P. in Handbook of Automated Essay Grading (eds Shermis, M. D. & Burnstein, J.) 124–135 (Routledge, 2013).
Nguyen, D., Doğruöz, A. S., Rosé, C. P. & de Jong, F. M. G. Comput. Linguist. 42, 537–593 (2016).
Shermis, M. D. & Burstein, J. Handbook of Automated Essay Evaluation: Current Applications and New Directions (Routledge, 2013).
About this article
Zeitschrift für Kinder- und Jugendpsychiatrie und Psychotherapie (2020)