Introduction

Society benefits from the responsible use of anonymity; for instance, newspapers and police departments use anonymous tips as a source of information, people in countries with strict political systems hide their identities to avoid persecution for their political views, and individuals are permitted to speak freely about personal matters, such as religious issues.

The Onion Router (Tor) provides online anonymity for millions of internet users every day, and it is often portrayed favourably as a method to avoid surveillance by concealing the origin of communications, resisting web browser fingerprinting, circumventing traffic for unrestricted internet access without censorship, and providing anonymous online hosting using onion domains.

On the other hand, online anonymity serves as a catalyst for the dark side of human behaviour: it is one of the principal causes of the online disinhibition effect, characterised by lowered psychological restraints resulting in intensified aggressive, illicit, or unethical behaviour1—including higher levels of harassment, threats, racial agitation, and sexism2. Tor users hosting anonymous onion websites behave accordingly: the websites predominantly host unethical or illicit content, including illegal drug trade, fraud, computer crime, and the distribution of child sexual abuse material (CSAM)1.

We use the term ‘CSAM’ instead of ‘child pornography’ to emphasise the distinction. The term ‘pornography’ implies consent, and whilst it is contested to what extent consent is present in the production and dissemination of adult pornography, in the case of CSAM it is not possible for any child to consent in the first place. CSAM means media, including images, videos, and live streaming, that depict sexual violence against a child.

It is common for those who search for and view CSAM to engage in other related compulsive behaviours, such as collecting and organising CSAM by age, gender, sex act, and fantasy3. In order to encompass all activities, we refer to these individuals as ‘CSAM users’ in this article. This group includes individuals who search for, view, disseminate, and/or trade CSAM. A large portion of this group is likely to have sexual interest in children (i.e., paedophilia or hebephilia)4. These sexual preferences are classified as mental health disorders because they result in self-harm and harm to others, and therapy can improve well-being and prevent harm to children5.

Many users are not just passive observers of CSAM; rather, they are sexually motivated6. Previous research has suggested that problematic use of legal pornography can escalate to violent sexual behaviour and the use of CSAM. Consumers who view legal pornography and engage in masturbation fuel this process of escalation by providing themselves with a ‘powerful neurochemical reward’ through orgasm7. This process, along with repeated exposure, may condition users into continuing to use the material despite possibly wanting to stop6.

The fact that CSAM is easy to access through the Tor network—and other anonymous networks—increases the likelihood that more children will be sexually abused: one study found that 41.8% (N = 647 of 1546) of anonymous people who answered a survey after searching for CSAM on Tor search engines said they had tried to seek direct contact with children online after viewing CSAM, and 57.9% (N = 895 of 1546) said they were afraid that viewing CSAM could lead to sexual acts with a child8. This suggests that roughly half of CSAM users do not expect to become offline offenders, which is relevant for subsequent public health interventions, as this may indicate a separation between the populations of online-only offenders and online and in-person offenders.

Despite abundant evidence of the growing prevalence and severe consequences of CSAM accessible through the Tor network, computer science research on CSAM remains limited. A report to the US Congress in 20229 addresses the lack of research regarding CSAM accessible through the Tor network, as well as: ‘The ethical failure of computer science researchers with respect to acknowledging the harms against children carried out via Tor and Freenet is vast. Dozens of papers on Tor, Freenet, and I2P have been written in the past decade and published in the flagship security and privacy conferences of the computer science research community: USENIX Security, ACM Computer and Communications Security, ISOC Network and Distributed Systems Security, and the Proceedings of Privacy Enhancing Technology. Virtually none have mentioned the harms of these anonymous services.’

All of these venues have rules for stating harms and disclosing and discussing ethical issues, but the sponsoring organisations, chairs, and reviewers do not strictly enforce these rules9.

Articles in top computer science and security venues even conduct research on Tor usage; one even poses the research question, ‘Why do people use Tor?’, and despite the fact that one of their interview responses raises the issue, the authors make no mention of child abuse10. Similarly, even when the subject of one article is sexual abuse and it references Tor Browser usage, child abuse is not mentioned11.

In 2018, the USENIX Security Symposium article ‘How Do Tor Users Interact With Onion Services?’ presents an online survey of 517 Tor users where several users are concerned about CSAM; however, this does not lead to any further analysis in the paper, and it avoids mentioning child abuse12.

Similarly, a 2019 article in the Proceedings of the Web Science Conference (WebSci) titled ‘A Broad Evaluation of the Tor English Content Ecosystem’ omits any mentions of CSAM, despite the authors’ claims to have performed an exhaustive evaluation of the content and use of Tor13.

There are surely articles covering the harms and studying CSAM accessible through the Tor network, but mainly outside of the top computer science venues. In 2020, an article published in The Proceedings of the National Academy of Sciences (PNAS) acknowledged that ‘The Tor anonymity network allows users to protect their privacy and circumvent censorship restrictions but also shields those distributing child abuse content14’.

As early as 2011, research revealed the widespread distribution of CSAM in peer-to-peer networks15. This issue affects not only Tor but also the I2P and Freenet anonymity networks; in a 2022 investigation, a Freenet content analysis revealed that 12.0% of the 7161 analysed freesites contained CSAM16. The first systematic analysis in 2013 indicated that 17.6% (206 of 1171) of the onion services surveyed shared CSAM and concluded ‘The support for the further development of Tor hidden services should hence stop1’. Measurements on onion service visits find CSAM sites to be the most popular ones17. In 2014, an estimated 17% of the onion websites provided sexual material, of which about half was CSAM18. In 2018, additional research pointed in the same direction19.

We collect and analyse web content accessible through onion websites, study user searches on the Tor search engine, Ahmia.fi, and demonstrate that CSAM is widely available and Tor users actively seek this material. We show a questionnaire for those who search for CSAM and analyse 11,470 responses.

Our research questions (RQs) are:

RQ1

What is the distribution volume of CSAM hosted through the Tor network?

RQ2

What is the CSAM search volume, and what exactly are users seeking?

RQ3

What does the survey reveal about CSAM users?

RQ4

How can search engine-based intervention reduce child abuse?

Figure 1
figure 1

We investigate the availability of CSAM hosted through the Tor network and its users. (a) Tor enables anonymous web publishing through onion domains. These websites host a variety of content, and there are Tor-specific search engines for searching. 21/26 of the popular Tor search engines allow CSAM websites. (b) Our web crawlers collected online content from 176,683 different onion domains from 2018 to 2023. (c) Using text-based CSAM detection methods, we investigate the number of websites sharing CSAM. The identification methods provide us with 404 phrases that accurately identify CSAM content. (d) This enables text-based detection and filtering for Tor search engines. CSAM-related searches are among the most popular of the 239 million total queries. Out of 110 million search sessions, 11.1% are seeking CSAM. (e) The search engine directs CSAM-seekers to self-help websites and asks them to complete our survey. The results indicate that they want assistance and are motivated to stop using CSAM.

Figure 1 shows our approaches for analysing CSAM availability and usage on Tor. The Methods section presents our research methodology in detail. Our contributions are: (1) We measure the CSAM distribution hosted through the Tor network over a five-year period using onion website crawling, which indicates that through the years 2018–2023 about one-fifth of the websites share CSAM. (2) We show that these CSAM websites can be reached directly from the majority (21/26) of the top Tor search engines used today, and four Tor search engines even advertise CSAM. (3) We find that 11.1% (N = 12,270,042 of 110,133,715) of the search sessions are explicitly searching for CSAM; the single phrase ‘child porn’ is one of the top queries. (4) When we prompt those who search for CSAM with a survey, we find that 61.6% (N = 5200 of 8447 who replied to the question) of CSAM users have tried to stop watching CSAM, 48.1% (N = 4120 of 8566 who replied to the question) want to stop using CSAM, and there is an unmet demand for help resources. (5) Search engines are a key part of the solution; hence, we demonstrate the effectiveness of CSAM detection and the ability of search engines to intervene and steer CSAM users towards help.

Results

Surge of CSAM hosted through the Tor network

RQ1: What is the distribution volume of CSAM hosted through the Tor network?

We investigate the years 2018–2023 and use a sample of 10,000 unique onion domains for each year. We then subject the text content to duplicate content filtering, fine-tuned phrase search, and supervised learning classifiers. This returns the detected CSAM percentage for each year, as shown in Fig. 2.

Figure 2
figure 2

We measure the proportion of CSAM onion websites inside the Tor network in 2018–2023. (a) We use a sample of 10,000 unique onion domains for each year. (b) Many websites have several onion domains. We compare the title and sentences of the pages to detect duplicates, and restrict to a single domain if multiple domains share identical content. (c) We execute text-based CSAM detections against the content of these distinct domains. Some CSAM websites do not use explicit sexual language, and text-based detection fails. Using three separate methods – manual validation, phrase matching, and the naive Bayes classifier – we discover that the detected percentage of websites sharing CSAM is 16.2–23.8% in 2023. Comparing automated methods to human validation by hand yields consistent results (22.1% in 2023 and 19.5% in 2022). We randomly select the plain text representations of 1000 onion websites for each year, 2018–2023, and read the text content of these websites to determine whether they share CSAM and what the English vocabulary is for this type of page.

The phrase matching fails to detect anything that does not describe CSAM with the obvious phrases. Furthermore, there are real CSAM websites that do not use explicit sexual language and instead refer to content such as ‘baby love videos’. Our text-based detection does not match these websites. In addition, there are link directory websites that provide descriptions of CSAM website links. Our text-based detection matches the description phrases despite the fact that this type of website does not share CSAM, rather merely links to websites that do.

We achieve 93.8% accuracy with a basic naive Bayes classifier (see Supplementary Methods A.2). Some legal adult pornography websites, like PornHub, provide an alternative onion domain accessible via Tor. We manually review a sample set of PornHub pages, and there is no indication of anything other than adult material; therefore, we include these in our training data to teach the classifier to differentiate between legal and illegal content. The classifier performs well and can distinguish between legitimate pornography websites and unlawful CSAM websites. The accuracy is as expected and actually quite consistent with previous research (93.5%) for Tor content classification20.

We anticipate that the phrase matching method will generate few false positives due to the explicit nature of the matching phrases. It is rare for a website to contain these phrases unless it also contains CSAM, and even those exceptions are describing linked CSAM websites. We anticipate – for the same reason – that this method will generate false negatives, as it requires exact CSAM-describing language. Indeed, the matching works accordingly: its accuracy is 85.4% with CSAM websites (some false negatives) and 98.6% with non-CSAM websites (almost no false positives).

When law enforcement has seized control of CSAM servers operating through the Tor network, they have documented terabytes of content with hundreds of thousands of users (see Supplementary Discussion C.2). In a comparable manner, we find indicators that suggest the distribution of extensive CSAM collections. While we read texts from the websites, we see numerous CSAM websites that claim to share gigabytes of media and thousands of videos and images.

Using three separate methods – manual validation, phrase matching, and the naive Bayes classifier – we conclude that around one-fifth of the unique websites hosted through the Tor network share CSAM. Previous research, in 2013, found that 17.6% of onion services shared CSAM, which corresponds with our findings1.

Examining CSAM user behaviour

RQ2: What is the CSAM search volume, and what exactly are users seeking?

11.1% of the search sessions seek CSAM

We examine search chains generated by users who enter consecutive queries. We follow the searches entered by users, track queries per user, and study 110,133,715 total search sessions, and discover that 32.5% (N = 35,751,619) include sexual phrases. Finally, we find that 11.1% (N = 12,270,042) of the search sessions reveal that the user is explicitly searching for content related to the sexual abuse of children. Some of these CSAM search sessions include either ‘girl(s)’ (393,261) or ‘boy(s)’ (289,407); searching for girls is more prevalent, with a ratio of 4:3. Seeking torture material is not typical: 0.5% of CSAM search sessions (57,429) contain the terms ‘pain’, ‘hurt’, ‘torture’, ‘violence’, ‘violent’, ‘destruction’, or ‘destroy’.

During the COVID-19 pandemic and the first months of lockdowns, there was a significant surge in the user base of legal pornography websites across nations21. Surprisingly, we find that before and after COVID-19 pandemic measures (lockdowns, individuals spending more time at home), there were no significant changes in the behaviour of CSAM users (see more in Supplementary Methods A.8).

54.5% are searching for 12- to 16-year-olds

We determine if the search session reveals the exact age in which the CSAM user is interested. For example, for a 13-year-old, we count search sessions that include 13y(*), 13+y(*), 13teen, thirteen+year, 13boy(s) or 13girl(s). We use the same logic for other ages. In addition, to compare these searches for CSAM to legal adult sexual content searches, we include search sessions seeking 18-year-old (N = 12,347 from 110,133,715) and 19-year-old adults (N = 458 from 110,133,715). We find the age information in total for 479,555 search sessions. Figure 3 illustrates the age distribution of CSAM queries. This age distribution aligns with findings from previous studies22. An article titled ‘Pedophilia, Hebephilia, and the DSM-V’ finds qualitative differences between offenders who preferred pubertal and those with a prepubertal preference (a clinical trial of 881 men with problematic sexual behavior)23. The authors also note that the majority of child abuse victims are 14 years old. They concluded that the psychiatric diagnoses should be improved to include the following: sexually attracted to children younger than 11 (paedophilic type), sexually attracted to children aged 11–14 (hebephilic type), or sexually attracted to both (pedohebephilic type). Our data suggests that it may be more appropriate to observe the high percentage of individuals who have a sexual interest in 12-year-olds but not 11-year-olds. This finding is consistent with the national average age at menarche in the United States, which is 12.54 years24. Additionally, observe the decline in sexual interest that occurs across the ages of 17, 18, and 19, which indicates a distinct sexual interest in those aged 12 to 16 years old.

Moreover, in Fig. 4, we investigate CSAM search sessions containing age-indicating search terms and find that users are predominantly interested in 12- to 14-year-old teen content; for example, ‘lolita’ is the most popular term when compared to other age-related terms, with 33.2% (N = 746,786 of 2,287,057). In Vladimir Nabokov’s 1955 novel ‘Lolita’, a middle-aged male is sexually attracted to a 12-year-old girl and sexually abuses her. In the 1962 Stanley Kubrick film adaptation of the novel, ‘Lolita’ is 14 years old.

Figure 3
figure 3

Ages between zero and 17 included in the CSAM search sessions and search sessions seeking 18-year-old and 19-year-old adults as a comparison (N = 479,555). 16-year-old (N = 61,083 of 479,555 – 12.7%) is the top-mentioned age. 54.5% of age-revealing searches (N = 261,162 of 479,555) target those aged 12–16 years old. Outside of this age range, the interest declines.

Figure 4
figure 4

In the context of explicit CSAM search sessions, there are a total of 2,287,057 broad age-indicating searches with terms ‘toddler’, ‘infant’, ‘baby’, ‘pthc’ (preteen hardcore), ‘preteen’ (preadolescence, ages between nine and 12), ‘lolita’ (refers to a girl around 12–14 years old), and ‘teen’ (when included with CSAM terms).

How accessible is CSAM using Tor?

Based on result click statistics, we rank the top 26 most visited search engines online on 17 March 2023. To determine whether CSAM is permitted in search results, we test the searches ‘child’, ‘sex’, ‘videos’, ‘love’, and ‘cute’, then study the search results. 21 out of 26 search engines provide CSAM results. Four search engines even promote and advocate CSAM. One of them even states that ‘child porn’ is the number one search phrase. It is positive that five Tor search engines attempt to block CSAM. Yet, a user can utilise these search engines to locate other search engines and ultimately locate CSAM through the latter. Even if search engines block sites that directly share CSAM, it is still possible to find other entry points for onion sites that provide links to CSAM websites. With any major Tor website entrypoint, search engine, or link directory, a Tor user is only a few clicks away from CSAM content.

Self-help services are reaching users

When we study the searches, we discover that there are a few hundred queries from people who want to cease viewing CSAM and are concerned about their sexual interest in children, including queries ‘overcome child porn addiction’ and ‘how to stop watching child porn’ (see more in Supplementary Methods A.6).

When a person searches for CSAM, three prominent Tor search engines provide only links to self-help programmes for those who are concerned about their thoughts, feelings, or behaviours. The intervention of CSAM searches directs individuals away from CSAM and towards help. Data from one of the self-help websites indicates that CSAM users actively visit the website, and those who start the self-help programme are very likely to continue following the programme (see more in Supplementary Methods A.6). In the next section, we show that when we present a survey for those who search for CSAM, many reply with motivation to stop using CSAM.

Intervention for CSAM users

RQ3: What does the survey reveal about CSAM users?

Figure 5
figure 5

Our anonymous survey received responses from 11,470 individuals who sought CSAM on three popular Tor search engines. (a) The survey results indicate that 65.3% (N = 7199 of 11,030 who replied to the question) of CSAM users first saw the material when they were under 18 years old. 36.7% (N = 4048 of 11,030) first saw CSAM when they were 13 years old or younger. 50.5% (N = 4843 of 9599 who replied to the question) report that they first saw CSAM accidentally. (b) We asked the respondents what types of images and videos they view. Viewing CSAM depicting girls is more prevalent, with a ratio of 7:3. (c) The survey results indicate that 48.1% (N = 4120 of 8566 who replied to the question) of CSAM users are willing to change their behaviour to stop using CSAM, and 61.6% (5200 of 8447 who replied to the question) have tried to stop using CSAM. However, only 14.0% (N = 985 of 7013 who replied to the question) of CSAM users have sought help to stop using CSAM, and an even smaller portion of 3.7% (N = 257) have actually received help. Notably, 21.4% (N = 1498) are afraid to seek help.

Individuals who seek CSAM on Tor search engines answered our survey that aims at developing a self-help programme for them. Figure 5 aggregates statistics from the responses.

Most CSAM users were first exposed to CSAM while they were children themselves, and half of the respondents (N = 4843 of 9599 who replied to the question) first saw the material accidentally, demonstrating the accessibility and availability of CSAM online. Exposure to sexually explicit material in childhood is associated, inter alia, with risky sexual behaviour in adulthood25, sexual harassment perpetration26, and the normalisation of violent sexual behaviour27, and has been defined as an adverse childhood experience and a form of noncontact sexual abuse28.

We ask for information regarding two age ranges in the survey: zero to three years and four to 13 years. We structured the question with the intention of focusing on pre-pubescent children, aged 0–13. Respondents were able to specify the age in the option ‘Other violent material, what?’ The majority (60.7%, N = 5342 of 8796 who replied to the question) of respondents say they view CSAM depicting girls or boys aged between four and 13 years, indicating a preference for images and videos depicting prepubescent and pubescent children. Of the respondents, 69.7% (N = 3725 of 5342) say they view girls, compared to 30.3% (N = 1617 of 5342) who view boys. There is also a small group 5.8% (N = 506) of CSAM users with a preference for CSAM depicting infants and toddlers aged between zero and three years old. Additionally, 25.1% (N = 2205) reported viewing images and videos related to violent or sadistic and brutal material.

8.4% (N = 743) of respondents say they view ‘other violent material’, and 458 provide explanatory open-ended responses. 61.6% (N = 282) of these responses explicitly mention the age of children depicted in the CSAM viewed, providing N = 1637 mentions of age, as Fig. 6 illustrates. Most responses refer to age brackets, for example ‘over 12 years old’, in which we define this to mean 12–17. The most common age is 15-year-old (N = 234), followed by 16-year-old (N = 221), and 14-year-old (N = 209).

The survey data provides an age distribution that appears similar but is distinct from the search data (see statistical tests in Supplementary Methods A.5). The survey responses give the age ranges that respondents say they are interested in – whereas the search sessions reveal the precise age that people are most interested in. 54.5% of age-revealing CSAM search sessions target sexual content aimed at 12- to 16-year-olds. This survey’s age distribution yields an almost identical percentage: the range between 12 and 16 years old accounts for 56.8% (N = 929 mention age in this range from all 1637 mentioned ages).

Figure 6
figure 6

282 of the open-ended responses explicitly mention the age of children depicted in the CSAM viewed, providing 1637 mentions of age.

Our analysis of 458 open-ended responses for ‘Other violent material, what?’ supports the prevalence of viewing material depicting girls. 33.8% (N = 155) of the 458 open-ended responses explicitly mention the gender of children. 91.6% (N = 142) of the responses that mention gender refer to girls, and 30.3% (N = 47) refer to boys. 21.9% (N = 34) of the responses mention both girls and boys. Considering the quantitative and qualitative data together (N = 5488), viewing CSAM depicting girls (N = 3839, 70.0%) is more prevalent than viewing CSAM depicting boys (N = 1649, 30.0%), with a ratio of 7:3.

Overall, these findings are consistent with the latest Internet Watch Foundation’s Annual Report 2022 (see Supplementary Methods A.9).

CSAM users do want assistance

The Prevention Project Dunkelfeld offers clinical and support services to men who experience sexual attraction towards children and reaches these individuals with media campaigns29. Similarly, in collaboration with some legal adult pornography websites, including Pornhub, the Stop It Now! organisation alerts users who conduct CSAM searches of the illegality of their actions and directs them to help resources21.

Our results verify the feasibility of this type of intervention: a large proportion of CSAM users report that they want and have tried to change their behaviour to stop using CSAM. Almost half of the respondents report wanting to stop viewing CSAM monthly, weekly, or nearly every time (48.1%, N = 4120 of 8566 who replied to the question), and the majority of respondents report having tried to stop (61.6%, N = 5200 of 8447 who replied to the question). 31.0% (N = 2656) say that they do not want to stop, and 20.9% (N = 1790) say they have not thought about it.

Despite self-reported willingness and attempts to change behaviour, the fact that they are responding to the survey is evidence of their continued search for CSAM – albeit temporarily stepping away from CSAM to contemplate the concerns posed in the survey and provide a response. This raises the question of the commonalities between CSAM use and addiction. While addiction to the internet is not listed as a diagnostic disorder in the Diagnostic And Statistical Manual Of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR)5, there is extensive debate over whether problematic use of the internet – in particular in the context of legal pornography and CSAM use – can be considered an addiction. The International Classification of Diseases 11th Revision (ICD-11) includes compulsive sexual behaviour disorder, characterised by ‘persistent pattern of failure to control intense, repetitive sexual impulses or urges resulting in repetitive sexual behaviour’ – including repetitive use of legal pornography30.

Repetitive pornography use has similar effects to substance addiction31. Many people continually use CSAM and display addictive behaviours32, and the intensity of CSAM use often has properties that users call addictive33. In the search engine data, we notice that users who seek help often refer to their condition as ‘child porn addiction’. Understanding the commonalities between CSAM use and addiction is beneficial to prevention and treatment.

Help-seeking behaviour among CSAM users

Despite many respondents reporting that they would like to stop using CSAM, help-seeking behaviour among CSAM users remains low. Only 14.0% (N = 985 of 7013 who responded to the question) of CSAM users have sought help. Many report that they feel afraid to seek help (21.4%, N = 1498 of 7013), and the majority 64.6% (N = 4530 of 7013) report that they have not sought help.

Of those who have actively sought help to change their illicit behaviour, 73.9% (N = 728 of 985) have not been able to get help. This population has an unmet demand for effective intervention resources34. This may be due to a lack of awareness of the resources available35 or because the available resources are not desirable. Recent studies34,35,36 found the following barriers to seeking and receiving psychological services for child sexual offenders and people concerned about their sexual interest in children: fear of legal consequences; fear of stigmatisation; shame; affordability; and a perceived or actual lack of understanding by professionals.

The unmet demand for help resources demonstrates the urgent need for investment and further implementation of perpetration prevention programmes in order to effectively reach those who require intervention37.

Through bivariate analysis, we examine the associations between a number of covariates and the outcome of help-seeking in the survey data. We find the following to be determinants of help-seeking: duration and frequency of CSAM use; depression, anxiety, self-harming thoughts, guilt, and shame. We have only included the respondents with non-missing answers; for all results on determinants of help-seeking, see Supplementary Methods A.4 and Supplementary Tables.

Individuals who have used CSAM for a longer duration and those who use it more frequently are more likely to actively seek assistance to change their behaviour. There is an opportunity to intervene with those who have been viewing CSAM for a shorter duration in order to increase help-seeking at an earlier stage. Those who use CSAM more frequently may be more likely to seek help due to the detrimental impact that frequent use of CSAM may have on an individual’s daily life, including impaired social and occupational functioning and deep distress, which may be a strong motivator to seek help to reduce use of CSAM in order to improve life situations. Common reasons for seeking help for substance abuse include habitual use, taking a substance for a long time, and a need to take it daily38. Such driving factors for help-seeking appear to be similar in this sample of CSAM users. Respondents who face more difficulties in carrying out ordinary daily routines and activities are more likely to have sought help to stop using CSAM. Those who experience such difficulties daily have one of the highest rates of help-seeking.

RQ4: How can search engine-based intervention reduce child abuse?

We demonstrate that not only are CSAM websites widely hosted through the Tor network, but that they are also actively sought. However, in contrast, instead of watching CSAM, individuals voluntarily participate in the search engine-prompted survey. Consequently, even this intervention reduces CSAM usage.

We propose an intervention strategy based on our observation that some CSAM users do indeed recognise their problem. Even when CSAM users are seeking CSAM, they are willingly visiting self-help pages and continuing to study cognitive behavioural therapy information. Search engines, which are the main way people find onion sites, should start filtering CSAM and diverting people towards help to stop seeking CSAM. This is technically possible because of the accurate, text-based detection of CSAM that we demonstrate, and furthermore, the CSAM phrase detection list can be shared between search engines.

Discussion

Lack of interdisciplinary research

Technical and non-technical scientists work in separate silos and publish in separate venues, and these venues – including their peer-review processes – promote these silos by focusing on technical or non-technical research. In the present era of 2024 – when the online environment is common and data facilitates innovative research – psychology journals display a hesitancy to publish articles that employ technical methodologies (such as a Naive Bayes classifier). This leads to a lack of an overall methodology to seek solutions to reduce child abuse and CSAM. Interdisciplinary research provides key insights to our work: we combine survey methods and social scientists with computer scientists to produce holistic research instead of fragmented views.

Unwillingness to acknowledge CSAM in the top-ranked computer science venues

How is it possible that there are so many studies classifying Tor websites and usage without addressing CSAM? A plausible explanation is that the researchers omitted the CSAM findings from their data without providing an explanation in the articles (i.e., ‘A Broad Evaluation of the Tor English Content Ecosystem13’).

We find a big gap in relevant research in the top-ranked venues in the field of computer science. Prior investigations encompassed a limited number of scholarly articles of comparable calibre, and these yielded consistent results with our own: in 2013, a systematic analysis determined that 17.6% of onion websites distribute CSAM1, and in 2016, research revealed that CSAM websites are the most popular among Tor users17. However, despite these findings, computer scientists have continued to neglect CSAM distribution through the Tor network. Could this be due to the contentious nature of CSAM?

Computer scientists should evolve Tor and other anonymity networks so that the privacy goals are consistent with ethical and legal concerns39,40. Current peer-to-peer networks have essentially no remedies for widespread abuse; this is a problem that we intend to investigate in our future research.

Policy to combat CSAM and implement public health programmes for CSAM users

CSAM provides a paramount example of how technology can be used in harmful ways. As highlighted in the ninth report about model legislation and global review by the International Centre for Missing and Exploited Children41, a global effort must be further conducted to harmonise the legal and regulatory framework in the international arena.

Policies aimed at preventing the spread and use of online CSAM should be implemented. Situational Crime Prevention (SCP) is a criminological approach that employs five strategies and 25 techniques to reduce crime opportunities42. SCP has been shown to reduce criminal behaviour43, and these tactics might be effective against CSAM44. As an illustration, a study examining individuals who accessed a honeypot website that displayed pornography portraying adult actresses as children found that online warning messages offer an effective and scalable tactic to reduce access to CSAM45.

It is urgent to deploy public health programmes for CSAM users. These individuals are motivated to seek help, but the help is largely currently unavailable. There is a growing global epidemic of CSAM usage, and some describe masturbation and pornography as coping mechanisms to alleviate economic strain, feelings of isolation, depression, and anxiety21. Such public health prevention programmes should be initiated, financed, evaluated, and developed not by a single industry or actor but as part of a holistic approach. It must be the task of a broad range of actors to take responsibility for the prevention of sexual violence against children, including but not limited to governments, the technology industry, international organisations, and civil society.

Established in Germany, the Prevention Project Dunkelfeld offers cognitive behaviour therapy to improve coping skills, stress management, and control sexual attraction towards children29. The impact assessment of the ‘Stop It Now!’ campaign demonstrates the high effectiveness of the public health approach in preventing child sexual abuse. A series of awareness-raising films widely disseminated through media channels can successfully reach people concerned about their or others’ behaviour, directing them to help services. After establishing trust and committing to treatment, individuals who are sexually attracted to children can gain the ability to consistently regulate their impulses. The favourable confidentiality legislation in Germany, which prohibits therapists from disclosing planned or actual child abuse offences, naturally strengthens this trust. Project evaluation shows that post-treatment recidivism is lower among individuals who commit contact offences as opposed to child sexual abuse material users. This demonstrates the need to develop tailored interventions based on the offender’s background and behaviour.

Previous research indicates that anonymous online therapy reduces the use of CSAM: the Prevent It study, a clinical trial of an online therapist-supported cognitive behavioural therapy, indicates promising results in terms of the feasibility of dark web recruitment and the effectiveness of anonymous online interventions46. These public health programmes should offer in-person psychotherapy, anonymous online self-help material in all languages, anonymous online person-to-person support, and an overall drive towards treatment.

Methods

Participants

The study and its methods are in accordance with relevant guidelines and regulations. The Board of Suojellaan Lapsia, Protect Children ry. approved the experimental protocols with human participants, in accordance with the Declaration of Helsinki Ethical Principles for Medical Research Involving Human Subjects: (i) All participants in the Help us to help you survey have provided informed consent. (ii) These participants received clear information on the purposes of the study before beginning the survey. (iii) Without compensation, they volunteered to respond to the questions. (iv) No personal or identifiable information was collected. (v) The survey data is stored and managed exclusively by the research team at Suojellaan Lapsia, Protect Children ry. without anyone else – not even the co-authors – having access to the collected survey answers.

Intervention for CSAM users

Previous research discovers how people find onion websites12: ‘The three most popular ways that almost half of our survey participants discovered onion sites by were via (i) social networking sites such as Twitter and Reddit (48%), (ii) search engines such as Ahmia, (46%) and (iii) randomly encountering links when browsing the Web (46%).’ Since Tor search engines serve as popular entry points to CSAM, we requested that search engines recruit Tor users who access CSAM to answer our survey.

Three prominent Tor search engines – Ahmia.fi, OnionLand, and Onion Search Engine – display our questionnaire to the user who searched for CSAM. In this research, we analyse the responses of users who searched for CSAM on Tor web search engines using at least one of the 179 search phrases used to find CSAM. The search phrases in English, Russian, Spanish, and Swedish are only used to locate CSAM, e.g., the term ‘childporn’. When a user submits a query containing any of these terms on one of these three Tor search engines, they are instead given the opportunity to voluntarily participate in the survey, which is available in 21 languages.

We may potentially be targeting a specific population due to the fact that the demographics of Tor users are probably not representative of all internet users. Furthermore, there is a possibility that the English-speaking population is overrepresented, as users who conduct their initial search in English may have limited vocabulary and thus be unable to identify our survey invitation in order to continue responding.

The participants in the sample are Tor users who (i) conducted a search for CSAM and (ii) opted to complete the survey; thus, they constitute a convenience sample. Although the sample is informative, it does not generalise to all CSAM users, and there is a high probability of selection effects at play. The absence of identifying information in the survey permits multiple responses from a single respondent. The trend of decreasing new responses over time suggests that users who have previously encountered the survey are less likely to respond to it.

The Help us to help you survey consists of 32 questions, takes about 15 to 20 minutes to complete, and participants receive no compensation. For this study, we analyse responses to 12 survey questions: 1, 2, 3, 4, 5, 7, 8, 13, 20, 22, 24, and 28 (see Supplementary Tables B).

The survey does not request any personally identifiable information from respondents – such as age, country, or gender – that would put privacy at risk. Questions avoid specifics of criminal conduct (e.g., time, date, place, or victim details). We ask CSAM users about their thoughts, feelings, and actions related to their use of CSAM so that in the future we can build a cognitive behavioural theory-based anonymous rehabilitation programme for CSAM users.

We included the term ‘illegal violent material’ for those respondents who do not categorise the material they view as CSAM but indicated to us via their search terms that they are in fact searching for material depicting sexual abuse of children.

We analyse responses from all participants who answered our Help us to help you survey from 5 May 2021 to 28 February 2023 (N = 11,470) and compare the tendencies and habits of people who searched for CSAM (see Supplementary Methods A.4 and Supplementary Tables).

Measuring CSAM hosted through the Tor network

In our study, we crawl webpages hosted on onion services. According to the Tor Project statistics, there were 693,683 onion domains on 1 January 202347. Onion domains can and do provide any internet service; not all of them host websites.

In practice, we employ parallel crawlers to follow onion links on onion websites, which are subsequently fed to fresh crawlers. This allows us to continue harvesting in both depth and breadth. From 2018 to 2023, we collect online content from 176,683 unique onion domain addresses.

We investigate the years 2018–2023 and use a random sample of 10,000 unique onion domains for each year. Computer-generated random sampling guarantees the genuine randomness of our methods when we refer to random selection in this study. We then subject the text content to duplicate content filtering, phrase search, and classification. This returns the detected CSAM percentage for each year, as shown in Fig. 2.

To extract only the textual content, we use the html2text Python library to convert the HTML pages to plain text representation. See an example of such a CSAM website in Fig. 2 and more examples in Supplementary Information A.3.

In our textual representation of websites, we can see the file names for images and videos, and also their corresponding caption text (see detailed examples in Supplementary Methods A.3). Even websites that offer their full content only after authentication (see Supplementary Methods A.3, Fig. 2) or behind a paywall serve limited CSAM samples immediately on the landing page.

It would be simple to do true, accurate validation by selecting a random sample of 1000 distinct domains that – according to the opening lines of the text – are unique websites, and opening these in the Tor Browser to see if they share CSAM. Although this is one possible assessment method and yields the ground truth, we do not download, open, or view any media content in this research, rather solely focus on textual data. Accessing CSAM websites would raise ethical, safety, and legal concerns, even in the context of academic research.

Web crawling as a method is biased towards websites that are frequently linked, and it cannot locate onion websites if there are no links to them. As a separate issue, the sampling includes onion websites that employ multiple alternative onion addresses. Despite the fact that we eliminate duplicates, a website with multiple publicly linked onion domains on other onion websites has not only a greater chance of being crawled but – through random sampling – there is also a higher likelihood of being selected for measurements. Therefore, our methodology favours and estimates popular linked onion websites with several domains – not all possible onion websites. By using a large dataset and continual onion link discovery, we minimise this bias.

Manual investigation

We randomly select the plain text representations of 1000 onion websites for each year, 2018–2023. We read the text content of these websites to determine whether they share CSAM and what the English vocabulary is for this type of page. Websites that share CSAM make this fact abundantly evident on the front page (see Supplementary Methods A.3), as well as through the use of explicit, distinct wording.

We have identified 22.1% (N = 221 of 1000) domain addresses that share CSAM in January 2023. We repeated this test for the years 2018–2022 (see Supplementary Methods A.3.1) and find that 19.5% (N = 195 of 1000) of onion domains possess CSAM in 2022, 27.2% (N = 272 of 1000) in 2021, and 19.0% (190 of 1000) in 2020. While in 2019, 10.8% (N = 108 of 1000) and in 2018, 9.0% (N = 90 of 1000) of onion domains shared CSAM.

Basic keyword search

Now, we randomly select 10,000 onion websites that were online in December 2022 and perform a basic case-insensitive keyword search (see Supplementary Methods A.1). This modest matching with 11 explicit CSAM phrases – including ‘child porn’, ‘childxxx’, ‘lolita’, ‘preteen’, and similar – produces 2642 domains from 10,000 onion domains.

When a website has multiple alternate onion domains, we eliminate duplicates. We execute the search against the content of these distinct websites. As expected, the algorithm returns a smaller subset – 2142 domains that present unique websites. The search returns 306 matches from these 2142 domains.

Manually reading the websites, we estimate the false positive (20 from 306, 6.5%) and false negative (6.0%) rates (see Supplementary Equations D). According to this keyword-based basic search with the stated false positive and false negative estimation, 18.5% of unique websites hosted through the Tor network share CSAM in December 2022.

Text-based CSAM detection classifiers

Using the NLTK Python library, we construct a naive Bayes classifier and a decision tree classifier (see Supplementary Methods A.2). To train classifiers, we manually produce representative CSAM (positive) and other (negative) website datasets. We curate 1006 pages from 306 unique CSAM websites and 6271 pages from 733 unique non-CSAM websites. These methods have simplistic designs and apparently unrealistic assumptions, but are known to be accurate for text classification; a naive Bayes classifier even outperforms sophisticated support vector machines (SVMs) in text classification, or reaches similar accuracy48,49. For us, they offer a clear benefit: we can understand and interpret them, and after training, output the detection phrases, combine them, and fine-tune a powerful detection algorithm.

Shareable text-based detection for CSAM

Our goal is to provide search engines with shareable matching phrases so that they can filter CSAM and we can continue to update the phrases. The classifiers use obvious phrases without much extra logic to match CSAM websites. This enables us to create a detection algorithm with 404 accurate English phrases (‘childxxx’, ‘childrenxxx’, ‘underage slut’, etc.). This is effective, as the vast majority of onion websites are written in English, and search data indicates that almost all users seek explicit material using English terminology.

While selecting the 404 phrases, we only include those that explicitly refer to sexual activity with children; therefore, we exclude phrases such as ‘baby love’ – although it does not generate false positives in the context of Tor. The inclusion of implicit terminology would give rise to ethical concerns about censorship.

A total of 32.5% (N = 35,751,619) of searches on Ahmia.fi include sexual phrases, and many of them implicitly might seek CSAM (i.e., ‘young teen girls sex’). Thus, the creator of the search engine – the first author of this paper – decided after the research in November 2023 to filter all sexual and suspicious searches, despite the collateral damage. This is in response to the widespread search and distribution of illegal child sexual abuse content via Tor, as opposed to legal pornography.

Measuring CSAM searches on the Tor search engine

We analyse search queries from a well-known public search engine for onion websites. Ahmia.fi provided us with a list of all search queries from February 2018 to February 2023. During these five years, search engine users performed 238,794,231 queries. We analyse these search phrases to determine what Tor users are seeking primarily from onion services.

We conducted limited initial experiments using small-scale interference techniques with our partner Tor search engine to prevent users from accessing CSAM. Hence, a priori we expected that users would seek little to no CSAM content because the search engine removes detected CSAM from search results; redirects users who search for such material using obvious terminology to seek assistance; and bans any sex-related queries, including legal ones. Nevertheless, in January 2023, 25 of the top 100 queries seek CSAM content, despite these previous interference techniques.

We examine searches (N = 238,794,231, Ahmia.fi, February 2018 – February 2023) from users seeking content from the Tor network and discover that explicit CSAM-related search phrases account for 6.7% of the queries (see separate analysis of individual queries in Supplementary Methods A.7).

Investigating the user’s search sessions

We track queries per user. We examine the entire search history to follow a total of 110,133,715 search sessions and study how many search sessions include at least one search phrase exclusive to underage content (see Supplementary Discussion C.1).

Even without cookies or IP addresses, it is simple to track a user’s searches by looking at the HTTP referring metadata. This means that the HTTP request for the new search includes the previous search. We assume that a user inputs new searches within five minutes of the last search. See an example snippet from the web server logs in Fig. 7.

Figure 7
figure 7

User-entered search phrases produce a search session in the HTTP logs.

During this illustrative search session, the user entered seven distinct queries, some of which reveal that the user is not only interested in teen sex by adults (age eighteen or nineteen) but also explicit underage sexual material:

16 years old → 16 years old porn → cp free → child porn free → teen homemade → teen homemade free → teen blowjob