Investigating child sexual abuse material availability, searches, and users on the anonymous Tor network for a public health intervention strategy

Tor is widely used for staying anonymous online and accessing onion websites; unfortunately, Tor is popular for distributing and viewing illicit child sexual abuse material (CSAM). From 2018 to 2023, we analyse 176,683 onion domains and find that one-fifth share CSAM. We find that CSAM is easily available using 21 out of the 26 most-used Tor search engines. We analyse 110,133,715 search sessions from the Ahmia.fi search engine and discover that 11.1% seek CSAM. When searching CSAM by age, 40.5% search for 11-year-olds and younger; 11.0% for 12-year-olds; 8.2% for 13-year-olds; 11.6% for 14-year-olds; 10.9% for 15-year-olds; and 12.7% for 16-year-olds. We demonstrate accurate filtering for search engines, introduce intervention, show a questionnaire for CSAM users, and analyse 11,470 responses. 65.3% of CSAM users first saw the material when they were children themselves, and half of the respondents first saw the material accidentally, demonstrating the availability of CSAM. 48.1% want to stop using CSAM. Some seek help through Tor, and self-help websites are popular. Our survey finds commonalities between CSAM use and addiction. Help-seeking correlates with increasing viewing duration and frequency, depression, anxiety, self-harming thoughts, guilt, and shame. Yet, 73.9% of help seekers have not been able to receive it.


Introduction
Society benefits from the responsible use of anonymity; for instance, newspapers and police departments use anonymous tips as a source of information, people in countries with strict political systems hide their identities to avoid persecution for their political views, and individuals are permitted to speak freely about personal matters, such as religious issues.
The Onion Router (Tor) provides online anonymity for millions of internet users every day, and it is often portrayed favourably as a method to avoid surveillance by concealing the origin of communications, resisting web browser fingerprinting, circumventing traffic for unrestricted internet access without censorship, and providing anonymous online hosting using onion domains.
On the other hand, online anonymity serves as a catalyst for the dark side of human behaviour: it is one of the principal causes of the online disinhibition effect, characterised by lowered psychological restraints resulting in intensified aggressive, illicit, or unethical behaviour 1 -including higher levels of harassment, threats, racial agitation, and sexism 2 .Tor users hosting anonymous onion websites behave accordingly: the websites predominantly host unethical or illicit content, including illegal drug trade, fraud, computer crime, and the distribution of child sexual abuse material (CSAM) 1 .
We use the term 'CSAM' instead of 'child pornography' to emphasise the distinction.The term 'pornography' implies consent, and whilst it is contested to what extent consent is present in the production and dissemination of adult pornography, in the case of CSAM it is not possible for any child to consent in the first place.CSAM means media, including images, videos, and live streaming, that depict sexual violence against a child.
It is common for those who search for and view CSAM to engage in other related compulsive behaviours, such as collecting and organising CSAM by age, gender, sex act, and fantasy 3 .In order to encompass all activities, we refer to these individuals as 'CSAM users' in this article.This group includes individuals who search for, view, disseminate, and/or trade CSAM.A large portion of this group is likely to have sexual interest in children (i.e., paedophilia or hebephilia) 4 .These sexual preferences are classified as mental health disorders because they result in self-harm and harm to others, and therapy can improve well-being and prevent harm to children 5 .
Many users are not just passive observers of CSAM; rather, they are sexually motivated 6 .Previous research has suggested that problematic use of legal pornography can escalate to violent sexual behaviour and the use of CSAM.Consumers who view legal pornography and engage in masturbation fuel this process of escalation by providing themselves with a 'powerful neurochemical reward' through orgasm 7 .This process, along with repeated exposure, may condition users into continuing to use the material despite possibly wanting to stop 6 .
The fact that CSAM is easy to access through the Tor network -and other anonymous networks -increases the likelihood that more children will be sexually abused: one study found that 41.8% (N = 647 of 1,546) of anonymous people who answered a survey after searching for CSAM on Tor search engines said they had tried to seek direct contact with children online after viewing CSAM, and 57.9% (N = 895 of 1,546) said they were afraid that viewing CSAM could lead to sexual acts with a child 8 .This suggests that roughly half of CSAM users do not expect to become offline offenders, which is relevant for subsequent public health interventions, as this may indicate a separation between the populations of online-only offenders and online and in-person offenders.
Despite abundant evidence of the growing prevalence and severe consequences of CSAM accessible through the Tor network, computer science research on CSAM remains limited.A report to the US Congress in 2022 9 addresses the lack of research regarding CSAM accessible through the Tor network, as well as: 'The ethical failure of computer science researchers with respect to acknowledging the harms against children carried out via Tor and Freenet is vast.Dozens of papers on Tor, Freenet, and I2P have been written in the past decade and published in the flagship security and privacy conferences of the computer science research community: USENIX Security, ACM Computer and Communications Security, ISOC Network and Distributed Systems Security, and the Proceedings of Privacy Enhancing Technology.Virtually none have mentioned the harms of these anonymous services.' All of these venues have rules for stating harms and disclosing and discussing ethical issues, but the sponsoring organisations, chairs, and reviewers do not strictly enforce these rules 9 .
Articles in top computer science and security venues even conduct research on Tor usage; one even poses the research question, 'Why do people use Tor?', and despite the fact that one of their interview responses raises the issue, the authors make no mention of child abuse 10 .Similarly, even when the subject of one article is sexual abuse and it references Tor Browser usage, child abuse is not mentioned 11 .
In 2018, the USENIX Security Symposium article 'How Do Tor Users Interact With Onion Services?' presents an online survey of 517 Tor users where several users are concerned about CSAM; however, this does not lead to any further analysis in the paper, and it avoids mentioning child abuse 12 .
Similarly, a 2019 article in the Proceedings of the Web Science Conference (WebSci) titled 'A Broad Evaluation of the Tor English Content Ecosystem' omits any mentions of CSAM, despite the authors' claims to have performed an exhaustive evaluation of the content and use of Tor 13 .
There are surely articles covering the harms and studying CSAM accessible through the Tor network, but mainly outside of the top computer science venues.In 2020, an article published in The Proceedings of the National Academy of Sciences (PNAS) acknowledged that 'The Tor anonymity network allows users to protect their privacy and circumvent censorship restrictions but also shields those distributing child abuse content' 14 .
As early as 2011, research revealed the widespread distribution of CSAM in peer-to-peer networks 15 .This issue affects not only Tor but also the I2P and Freenet anonymity networks; in a 2022 investigation, a Freenet content analysis revealed that 12.0% of the 7,161 analysed freesites contained CSAM 16 .The first systematic analysis in 2013 indicated that 17.6% (206 of 1,171) of the onion services surveyed shared CSAM and concluded 'The support for the further development of Tor hidden services should hence stop' 1 .Measurements on onion service visits find CSAM sites to be the most popular ones 17 .In 2014, an estimated 17% of the onion websites provided sexual material, of which about half was CSAM 18 .In 2018, additional research pointed in the same direction 19 .
We collect and analyse web content accessible through onion websites, study user searches on the Tor search engine, Ahmia.fi, and demonstrate that CSAM is widely available and Tor users actively seek this material.We show a questionnaire for those who search for CSAM and analyse 11,470 responses.
Our research questions (RQs) are: Figure 1 shows our approaches for analysing CSAM availability and usage on Tor.The Methods section presents our research methodology in detail.Our contributions are: (1) We measure the CSAM distribution hosted through the Tor network over a five-year period using onion website crawling, which indicates that through the years 2018-2023 about one-fifth of the websites share CSAM.(2) We show that these CSAM websites can be reached directly from the majority (21/26) of the top Tor search engines used today, and four Tor search engines even advertise CSAM.(3) We find that 11.1% (N = 12,270,042 of 110,133,715) of the search sessions are explicitly searching for CSAM; the single phrase 'child porn' is one of the top queries.(4) When we prompt those who search for CSAM with a survey, we find that 61.6% (N = 5,200 of 8,447 who replied to the question) of CSAM users have tried to stop watching CSAM, 48.1% (N = 4,120 of 8,566 who replied to the question) want to stop using CSAM, and there is an unmet demand for help resources.(5) Search engines are a key part of the solution; hence, we demonstrate the effectiveness of CSAM detection and the ability of search engines to intervene and steer CSAM users towards help.

Results
Surge of CSAM hosted through the Tor network RQ1: What is the distribution volume of CSAM hosted through the Tor network?
We investigate the years 2018-2023 and use a sample of 10,000 unique onion domains for each year.We then subject the text content to duplicate content filtering, fine-tuned phrase search, and supervised learning classifiers.This returns the detected CSAM percentage for each year, as shown in Figure 2.
The phrase matching fails to detect anything that does not describe CSAM with the obvious phrases.Furthermore, there are real CSAM websites that do not use explicit sexual language and instead refer to content such as 'baby love videos'.Our text-based detection does not match these websites.In addition, there are link directory websites that provide descriptions of CSAM website links.Our text-based detection matches the description phrases despite the fact that this type of website does not share CSAM, rather merely links to websites that do.
We achieve 93.8% accuracy with a basic naive Bayes classifier (see Supplementary Methods A.2). Some legal adult pornography websites, like PornHub, provide an alternative onion domain accessible via Tor.We manually review a sample set of PornHub pages, and there is no indication of anything other than adult material; therefore, we include these in our training data to teach the classifier to differentiate between legal and illegal content.The classifier performs well and can  Comparing automated methods to human validation by hand yields consistent results (22.1% in 2023 and 19.5% in 2022).We randomly select the plain text representations of 1,000 onion websites for each year, 2018-2023, and read the text content of these websites to determine whether they share CSAM and what the English vocabulary is for this type of page.distinguish between legitimate pornography websites and unlawful CSAM websites.The accuracy is as expected and actually quite consistent with previous research (93.5%) for Tor content classification 20 .
We anticipate that the phrase matching method will generate few false positives due to the explicit nature of the matching phrases.It is rare for a website to contain these phrases unless it also contains CSAM, and even those exceptions are describing linked CSAM websites.We anticipate -for the same reason -that this method will generate false negatives, as it requires exact CSAM-describing language.Indeed, the matching works accordingly: its accuracy is 85.4% with CSAM websites (some false negatives) and 98.6% with non-CSAM websites (almost no false positives).
When law enforcement has seized control of CSAM servers operating through the Tor network, they have documented terabytes of content with hundreds of thousands of users (see Supplementary Discussion C.2).In a comparable manner, we find indicators that suggest the distribution of extensive CSAM collections.While we read texts from the websites, we see numerous CSAM websites that claim to share gigabytes of media and thousands of videos and images.
Using three separate methods -manual validation, phrase matching, and the naive Bayes classifier -we conclude that around one-fifth of the unique websites hosted through the Tor network share CSAM.Previous research, in 2013, found that 17.6% of onion services shared CSAM, which corresponds with our findings 1 .
Examining CSAM user behaviour RQ2: What is the CSAM search volume, and what exactly are users seeking?

11.1% of the search sessions seek CSAM
We examine search chains generated by users who enter consecutive queries.We follow the searches entered by users, track queries per user, and study 110,133,715 total search sessions, and discover that 32.5% (N = 35,751,619) include sexual phrases.Finally, we find that 11.1% (N = 12,270,042) of the search sessions reveal that the user is explicitly searching for content related to the sexual abuse of children.Some of these CSAM search sessions include either 'girl(s)' (393,261) or 'boy(s)' (289,407); searching for girls is more prevalent, with a ratio of 4:3.Seeking torture material is not typical: 0.5% of CSAM search sessions (57,429) contain the terms 'pain', 'hurt', 'torture', 'violence', 'violent', 'destruction', or 'destroy'.
During the COVID-19 pandemic and the first months of lockdowns, there was a significant surge in the user base of legal pornography websites across nations 21 .Surprisingly, we find that before and after COVID-19 pandemic measures (lockdowns, individuals spending more time at home), there were no significant changes in the behaviour of CSAM users (see more in Supplementary Methods A.8).

54.5% are searching for 12-to 16-year-olds
We determine if the search session reveals the exact age in which the CSAM user is interested.For example, for a 13-year-old, we count search sessions that include 13y(*), 13+y(*), 13teen, thirteen+year, 13boy(s) or 13girl(s).We use the same logic for other ages.In addition, to compare these searches for CSAM to legal adult sexual content searches, we include search sessions seeking 18-year-old (N = 12,347 from 110,133,715) and 19-year-old adults (N = 458 from 110,133,715).We find the age information in total for 479,555 search sessions.Figure 3 illustrates the age distribution of CSAM queries.This age distribution aligns with findings from previous studies 22 .An article titled 'Pedophilia, Hebephilia, and the DSM-V' finds qualitative differences between offenders who preferred pubertal and those with a prepubertal preference (a clinical trial of 881 men with problematic sexual behavior) 23 .The authors also note that the majority of child abuse victims are 14 years old.They concluded that the psychiatric diagnoses should be improved to include the following: sexually attracted to children younger than 11 (paedophilic type), sexually attracted to children aged 11-14 (hebephilic type), or sexually attracted to both (pedohebephilic type).Our data suggests that it may be more appropriate to observe the high percentage of individuals who have a sexual interest in 12-year-olds but not 11-year-olds.This finding is consistent with the national average age at menarche in the United States, which is 12.54 years 24 .Additionally, observe the decline in sexual interest that occurs across the ages of 17, 18, and 19, which indicates a distinct sexual interest in those aged 12 to 16 years old.
Moreover, in Figure 4, we investigate CSAM search sessions containing age-indicating search terms and find that users are predominantly interested in 12-to 14-year-old teen content; for example, 'lolita' is the most popular term when compared to other age-related terms, with 33.2% (N = 746,786 of 2,287,057).In Vladimir Nabokov's 1955 novel 'Lolita', a middle-aged male is sexually attracted to a 12-year-old girl and sexually abuses her.In the 1962 Stanley Kubrick film adaptation of the novel, 'Lolita' is 14 years old.

Self-help services are reaching users
When we study the searches, we discover that there are a few hundred queries from people who want to cease viewing CSAM and are concerned about their sexual interest in children, including queries 'overcome child porn addiction' and 'how to stop watching child porn' (see more in Supplementary Methods A.6).When a person searches for CSAM, three prominent Tor search engines provide only links to self-help programmes for those who are concerned about their thoughts, feelings, or behaviours.The intervention of CSAM searches directs individuals away from CSAM and towards help.Data from one of the self-help websites indicates that CSAM users actively visit the website, and those who start the self-help programme are very likely to continue following the programme (see more in Supplementary Methods A.6).In the next section, we show that when we present a survey for those who search for CSAM, many reply with motivation to stop using CSAM.Individuals who seek CSAM on Tor search engines answered our survey that aims at developing a self-help programme for them.Figure 5 aggregates statistics from the responses.
Most CSAM users were first exposed to CSAM while they were children themselves, and half of the respondents (N = 4,843 of 9,599 who replied to the question) first saw the material accidentally, demonstrating the accessibility and availability of CSAM online.Exposure to sexually explicit material in childhood is associated, inter alia, with risky sexual behaviour in adulthood 25 , sexual harassment perpetration 26 , and the normalisation of violent sexual behaviour 27 , and has been defined as an adverse childhood experience and a form of noncontact sexual abuse 28 .
We ask for information regarding two age ranges in the survey: zero to three years and four to 13 years.We structured the question with the intention of focusing on pre-pubescent children, aged 0-13.Respondents were able to specify the age in the option 'Other violent material, what?' The majority (60.7%,N = 5,342 of 8,796 who replied to the question) of respondents say they view CSAM depicting girls or boys aged between four and 13 years, indicating a preference for images and videos depicting prepubescent and pubescent children.Of the respondents, 69.7% (N = 3,725 of 5,342) say they view girls, compared to 30.3% (N = 1,617 of 5,342) who view boys.There is also a small group 5.8% (N = 506) of CSAM users with a preference for CSAM depicting infants and toddlers aged between zero and three years old.Additionally, 25.1% (N = 2,205) reported viewing images and videos related to violent or sadistic and brutal material.The survey data provides an age distribution that appears similar but is distinct from the search data (see statistical tests in Supplementary Methods A.5).The survey responses give the age ranges that respondents say they are interested in -whereas the search sessions reveal the precise age that people are most interested in.54.5% of age-revealing CSAM search sessions target sexual content aimed at 12-to 16-year-olds.This survey's age distribution yields an almost identical percentage: the range between 12 and 16 years old accounts for 56.8% (N = 929 mention age in this range from all 1,637 mentioned ages).Overall, these findings are consistent with the latest Internet Watch Foundation's Annual Report 2022 (see Supplementary Methods A.9).

CSAM users do want assistance
The Prevention Project Dunkelfeld offers clinical and support services to men who experience sexual attraction towards children and reaches these individuals with media campaigns 29 .Similarly, in collaboration with some legal adult pornography websites, including Pornhub, the Stop It Now! organisation alerts users who conduct CSAM searches of the illegality of their actions and directs them to help resources 21 .
Our results verify the feasibility of this type of intervention: a large proportion of CSAM users report that they want and have tried to change their behaviour to stop using CSAM.Almost half of the respondents report wanting to stop viewing CSAM monthly, weekly, or nearly every time (48.1%,N = 4,120 of 8,566 who replied to the question), and the majority of respondents report having tried to stop (61.6%,N = 5,200 of 8,447 who replied to the question).31.0%(N = 2,656) say that they do not want to stop, and 20.9% (N = 1,790) say they have not thought about it.
Despite self-reported willingness and attempts to change behaviour, the fact that they are responding to the survey is evidence of their continued search for CSAM -albeit temporarily stepping away from CSAM to contemplate the concerns posed in the survey and provide a response.This raises the question of the commonalities between CSAM use and addiction.While addiction to the internet is not listed as a diagnostic disorder in the Diagnostic And Statistical Manual Of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR) 5 , there is extensive debate over whether problematic use of the internet -in particular in the context of legal pornography and CSAM use -can be considered an addiction.The International Classification of Diseases 11th Revision (ICD-11) includes compulsive sexual behaviour disorder, characterised by 'persistent pattern of failure to control intense, repetitive sexual impulses or urges resulting in repetitive sexual behaviour' -including repetitive use of legal pornography 30 .
Repetitive pornography use has similar effects to substance addiction 31 .Many people continually use CSAM and display addictive behaviours 32 , and the intensity of CSAM use often has properties that users call addictive 33 .In the search engine data, we notice that users who seek help often refer to their condition as 'child porn addiction'.Understanding the commonalities between CSAM use and addiction is beneficial to prevention and treatment.

Help-seeking behaviour among CSAM users
Despite many respondents reporting that they would like to stop using CSAM, help-seeking behaviour among CSAM users remains low.Only 14.0% (N = 985 of 7,013 who responded to the question) of CSAM users have sought help.Many report that they feel afraid to seek help (21.4%,N = 1,498 of 7,013), and the majority 64.6% (N = 4,530 of 7,013) report that they have not sought help.
Of those who have actively sought help to change their illicit behaviour, 73.9% (N = 728 of 985) have not been able to get help.This population has an unmet demand for effective intervention resources 34 .This may be due to a lack of awareness of the resources available 35 or because the available resources are not desirable.Recent studies [34][35][36] found the following barriers to seeking and receiving psychological services for child sexual offenders and people concerned about their sexual interest in children: fear of legal consequences; fear of stigmatisation; shame; affordability; and a perceived or actual lack of understanding by professionals.
The unmet demand for help resources demonstrates the urgent need for investment and further implementation of perpetration prevention programmes in order to effectively reach those who require intervention 37 .
Through bivariate analysis, we examine the associations between a number of covariates and the outcome of help-seeking in the survey data.We find the following to be determinants of help-seeking: duration and frequency of CSAM use; depression, anxiety, self-harming thoughts, guilt, and shame.We have only included the respondents with non-missing answers; for all results on determinants of help-seeking, see Supplementary Methods A.4 and Supplementary Tables.
Individuals who have used CSAM for a longer duration and those who use it more frequently are more likely to actively seek assistance to change their behaviour.There is an opportunity to intervene with those who have been viewing CSAM for a shorter duration in order to increase help-seeking at an earlier stage.Those who use CSAM more frequently may be more likely to seek help due to the detrimental impact that frequent use of CSAM may have on an individual's daily life, including impaired social and occupational functioning and deep distress, which may be a strong motivator to seek help to reduce use of CSAM in order to improve life situations.Common reasons for seeking help for substance abuse include habitual use, taking a substance for a long time, and a need to take it daily 38 .Such driving factors for help-seeking appear to be similar in this sample of CSAM users.Respondents who face more difficulties in carrying out ordinary daily routines and activities are more likely to have sought help to stop using CSAM.Those who experience such difficulties daily have one of the highest rates of help-seeking.

RQ4: How can search engine-based intervention reduce child abuse?
We demonstrate that not only are CSAM websites widely hosted through the Tor network, but that they are also actively sought.However, in contrast, instead of watching CSAM, individuals voluntarily participate in the search engine-prompted survey.Consequently, even this intervention reduces CSAM usage.
We propose an intervention strategy based on our observation that some CSAM users do indeed recognise their problem.Even when CSAM users are seeking CSAM, they are willingly visiting self-help pages and continuing to study cognitive behavioural therapy information.Search engines, which are the main way people find onion sites, should start filtering CSAM and diverting people towards help to stop seeking CSAM.This is technically possible because of the accurate, text-based detection of CSAM that we demonstrate, and furthermore, the CSAM phrase detection list can be shared between search engines.

Lack of interdisciplinary research
Technical and non-technical scientists work in separate silos and publish in separate venues, and these venues -including their peer-review processes -promote these silos by focusing on technical or non-technical research.In the present era of 2024when the online environment is common and data facilitates innovative research -psychology journals display a hesitancy to publish articles that employ technical methodologies (such as a Naive Bayes classifier).This leads to a lack of an overall methodology to seek solutions to reduce child abuse and CSAM.Interdisciplinary research provides key insights to our work: we combine survey methods and social scientists with computer scientists to produce holistic research instead of fragmented views.

Unwillingness to acknowledge CSAM in the top-ranked computer science venues
How is it possible that there are so many studies classifying Tor websites and usage without addressing CSAM?A plausible explanation is that the researchers omitted the CSAM findings from their data without providing an explanation in the articles (i.e., 'A Broad Evaluation of the Tor English Content Ecosystem' 13 ).
We find a big gap in relevant research in the top-ranked venues in the field of computer science.Prior investigations encompassed a limited number of scholarly articles of comparable calibre, and these yielded consistent results with our own: in 2013, a systematic analysis determined that 17.6% of onion websites distribute CSAM 1 , and in 2016, research revealed that CSAM websites are the most popular among Tor users 17 .However, despite these findings, computer scientists have continued to neglect CSAM distribution through the Tor network.Could this be due to the contentious nature of CSAM?
Computer scientists should evolve Tor and other anonymity networks so that the privacy goals are consistent with ethical and legal concerns 39,40 .Current peer-to-peer networks have essentially no remedies for widespread abuse; this is a problem that we intend to investigate in our future research.
Policy to combat CSAM and implement public health programmes for CSAM users CSAM provides a paramount example of how technology can be used in harmful ways.As highlighted in the ninth report about model legislation and global review by the International Centre for Missing and Exploited Children 41 , a global effort must be further conducted to harmonise the legal and regulatory framework in the international arena.
Policies aimed at preventing the spread and use of online CSAM should be implemented.Situational Crime Prevention (SCP) is a criminological approach that employs five strategies and 25 techniques to reduce crime opportunities 42 .SCP has been shown to reduce criminal behaviour 43 , and these tactics might be effective against CSAM 44 .As an illustration, a study examining individuals who accessed a honeypot website that displayed pornography portraying adult actresses as children found that online warning messages offer an effective and scalable tactic to reduce access to CSAM 45 .
It is urgent to deploy public health programmes for CSAM users.These individuals are motivated to seek help, but the help is largely currently unavailable.There is a growing global epidemic of CSAM usage, and some describe masturbation and pornography as coping mechanisms to alleviate economic strain, feelings of isolation, depression, and anxiety 21 .Such public health prevention programmes should be initiated, financed, evaluated, and developed not by a single industry or actor but as part of a holistic approach.It must be the task of a broad range of actors to take responsibility for the prevention of sexual violence against children, including but not limited to governments, the technology industry, international organisations, and civil society.
Established in Germany, the Prevention Project Dunkelfeld offers cognitive behaviour therapy to improve coping skills, stress management, and control sexual attraction towards children 29 .The impact assessment of the 'Stop It Now!' campaign demonstrates the high effectiveness of the public health approach in preventing child sexual abuse.A series of awareness-raising films widely disseminated through media channels can successfully reach people concerned about their or others' behaviour, directing them to help services.After establishing trust and committing to treatment, individuals who are sexually attracted to children can gain the ability to consistently regulate their impulses.The favourable confidentiality legislation in Germany, which prohibits therapists from disclosing planned or actual child abuse offences, naturally strengthens this trust.Project evaluation shows that post-treatment recidivism is lower among individuals who commit contact offences as opposed to child sexual abuse material users.This demonstrates the need to develop tailored interventions based on the offender's background and behaviour.
Previous research indicates that anonymous online therapy reduces the use of CSAM: the Prevent It study, a clinical trial of an online therapist-supported cognitive behavioural therapy, indicates promising results in terms of the feasibility of dark web recruitment and the effectiveness of anonymous online interventions 46 .These public health programmes should offer in-person psychotherapy, anonymous online self-help material in all languages, anonymous online person-to-person support, and an overall drive towards treatment.

Participants
The study and its methods are in accordance with relevant guidelines and regulations.The Board of Suojellaan Lapsia, Protect Children ry.approved the experimental protocols with human participants, in accordance with the Declaration of Helsinki Ethical Principles for Medical Research Involving Human Subjects: (i) All participants in the Help us to help you survey have provided informed consent.(ii) These participants received clear information on the purposes of the study before beginning the survey.(iii) Without compensation, they volunteered to respond to the questions.(iv) No personal or identifiable information was collected.(v) The survey data is stored and managed exclusively by the research team at Suojellaan Lapsia, Protect Children ry.without anyone else -not even the co-authors -having access to the collected survey answers.

Intervention for CSAM users
Previous research discovers how people find onion websites 12 : 'The three most popular ways that almost half of our survey participants discovered onion sites by were via (i) social networking sites such as Twitter and Reddit (48%), (ii) search engines such as Ahmia, (46%) and (iii) randomly encountering links when browsing the Web (46%).'Since Tor search engines serve as popular entry points to CSAM, we requested that search engines recruit Tor users who access CSAM to answer our survey.
Three prominent Tor search engines -Ahmia.fi,OnionLand, and Onion Search Engine -display our questionnaire to the user who searched for CSAM.In this research, we analyse the responses of users who searched for CSAM on Tor web search engines using at least one of the 179 search phrases used to find CSAM.The search phrases in English, Russian, Spanish, and Swedish are only used to locate CSAM, e.g., the term 'childporn'.When a user submits a query containing any of these terms on one of these three Tor search engines, they are instead given the opportunity to voluntarily participate in the survey, which is available in 21 languages.
We may potentially be targeting a specific population due to the fact that the demographics of Tor users are probably not representative of all internet users.Furthermore, there is a possibility that the English-speaking population is overrepresented, as users who conduct their initial search in English may have limited vocabulary and thus be unable to identify our survey invitation in order to continue responding.
The participants in the sample are Tor users who (i) conducted a search for CSAM and (ii) opted to complete the survey; thus, they constitute a convenience sample.Although the sample is informative, it does not generalise to all CSAM users, and there is a high probability of selection effects at play.The absence of identifying information in the survey permits multiple responses from a single respondent.The trend of decreasing new responses over time suggests that users who have previously encountered the survey are less likely to respond to it.
The survey does not request any personally identifiable information from respondents -such as age, country, or genderthat would put privacy at risk.Questions avoid specifics of criminal conduct (e.g., time, date, place, or victim details).We ask CSAM users about their thoughts, feelings, and actions related to their use of CSAM so that in the future we can build a cognitive behavioural theory-based anonymous rehabilitation programme for CSAM users.
We included the term 'illegal violent material' for those respondents who do not categorise the material they view as CSAM but indicated to us via their search terms that they are in fact searching for material depicting sexual abuse of children.
We analyse responses from all participants who answered our Help us to help you survey from 5 May 2021 to 28 February 2023 (N = 11,470) and compare the tendencies and habits of people who searched for CSAM (see Supplementary Methods A.4 and Supplementary Tables).

Measuring CSAM hosted through the Tor network
In our study, we crawl webpages hosted on onion services.According to the Tor Project statistics, there were 693,683 onion domains on 1 January 2023 47 .Onion domains can and do provide any internet service; not all of them host websites.
In practice, we employ parallel crawlers to follow onion links on onion websites, which are subsequently fed to fresh crawlers.This allows us to continue harvesting in both depth and breadth.From 2018 to 2023, we collect online content from 176,683 unique onion domain addresses.
We investigate the years 2018-2023 and use a random sample of 10,000 unique onion domains for each year.Computergenerated random sampling guarantees the genuine randomness of our methods when we refer to random selection in this study.We then subject the text content to duplicate content filtering, phrase search, and classification.This returns the detected CSAM percentage for each year, as shown in Figure 2.
To extract only the textual content, we use the html2text Python library to convert the HTML pages to plain text representation.See an example of such a CSAM website in Figure 2 and more examples in Supplementary Information A.3.
In our textual representation of websites, we can see the file names for images and videos, and also their corresponding caption text (see detailed examples in Supplementary Methods A.3).Even websites that offer their full content only after authentication (see Supplementary Methods A.3, Figure 3) or behind a paywall serve limited CSAM samples immediately on the landing page.
It would be simple to do true, accurate validation by selecting a random sample of 1,000 distinct domains that -according to the opening lines of the text -are unique websites, and opening these in the Tor Browser to see if they share CSAM.Although this is one possible assessment method and yields the ground truth, we do not download, open, or view any media content in this research, rather solely focus on textual data.Accessing CSAM websites would raise ethical, safety, and legal concerns, even in the context of academic research.
Web crawling as a method is biased towards websites that are frequently linked, and it cannot locate onion websites if there are no links to them.As a separate issue, the sampling includes onion websites that employ multiple alternative onion addresses.Despite the fact that we eliminate duplicates, a website with multiple publicly linked onion domains on other onion websites has not only a greater chance of being crawled but -through random sampling -there is also a higher likelihood of being selected for measurements.Therefore, our methodology favours and estimates popular linked onion websites with several domains -not all possible onion websites.By using a large dataset and continual onion link discovery, we minimise this bias.

Manual investigation
We randomly select the plain text representations of 1,000 onion websites for each year, 2018-2023.We read the text content of these websites to determine whether they share CSAM and what the English vocabulary is for this type of page.Websites that share CSAM make this fact abundantly evident on the front page (see Supplementary Methods A.3), as well as through the use of explicit, distinct wording.

Basic keyword search
Now, we randomly select 10,000 onion websites that were online in December 2022 and perform a basic case-insensitive keyword search (see Supplementary Methods A.1).This modest matching with 11 explicit CSAM phrases -including 'child porn', 'childxxx', 'lolita', 'preteen', and similar -produces 2,642 domains from 10,000 onion domains.
When a website has multiple alternate onion domains, we eliminate duplicates.We execute the search against the content of these distinct websites.As expected, the algorithm returns a smaller subset -2,142 domains that present unique websites.The search returns 306 matches from these 2,142 domains.
Manually reading the websites, we estimate the false positive (20 from 306, 6.5%) and false negative (6.0%) rates (see Supplementary Equations D).According to this keyword-based basic search with the stated false positive and false negative estimation, 18.5% of unique websites hosted through the Tor network share CSAM in December 2022.

Text-based CSAM detection classifiers
Using the NLTK Python library, we construct a naive Bayes classifier and a decision tree classifier (see Supplementary Methods A.2). To train classifiers, we manually produce representative CSAM (positive) and other (negative) website datasets.We curate 1,006 pages from 306 unique CSAM websites and 6,271 pages from 733 unique non-CSAM websites.These methods have simplistic designs and apparently unrealistic assumptions, but are known to be accurate for text classification; a naive Bayes classifier even outperforms sophisticated support vector machines (SVMs) in text classification, or reaches similar accuracy 48,49 .For us, they offer a clear benefit: we can understand and interpret them, and after training, output the detection phrases, combine them, and fine-tune a powerful detection algorithm.

Shareable text-based detection for CSAM
Our goal is to provide search engines with shareable matching phrases so that they can filter CSAM and we can continue to update the phrases.The classifiers use obvious phrases without much extra logic to match CSAM websites.This enables us to create a detection algorithm with 404 accurate English phrases ('childxxx', 'childrenxxx', 'underage slut', etc.).This is effective, as the vast majority of onion websites are written in English, and search data indicates that almost all users seek explicit material using English terminology.
While selecting the 404 phrases, we only include those that explicitly refer to sexual activity with children; therefore, we exclude phrases such as 'baby love' -although it does not generate false positives in the context of Tor.The inclusion of implicit terminology would give rise to ethical concerns about censorship.
A total of 32.5% (N = 35,751,619) of searches on Ahmia.fiinclude sexual phrases, and many of them implicitly might seek CSAM (i.e., 'young teen girls sex').Thus, the creator of the search engine -the first author of this paper -decided after the research in November 2023 to filter all sexual and suspicious searches, despite the collateral damage.This is in response to the widespread search and distribution of illegal child sexual abuse content via Tor, as opposed to legal pornography.

Measuring CSAM searches on the Tor search engine
We analyse search queries from a well-known public search engine for onion websites.Ahmia.fiprovided us with a list of all search queries from February 2018 to February 2023.During these five years, search engine users performed 238,794,231 queries.We analyse these search phrases to determine what Tor users are seeking primarily from onion services.
We conducted limited initial experiments using small-scale interference techniques with our partner Tor search engine to prevent users from accessing CSAM.Hence, a priori we expected that users would seek little to no CSAM content because the search engine removes detected CSAM from search results; redirects users who search for such material using obvious terminology to seek assistance; and bans any sex-related queries, including legal ones.Nevertheless, in January 2023, 25 of the top 100 queries seek CSAM content, despite these previous interference techniques.
We examine searches (N = 238,794,231, Ahmia.fi,February 2018 -February 2023) from users seeking content from the Tor network and discover that explicit CSAM-related search phrases account for 6.7% of the queries (see separate analysis of individual queries in Supplementary Methods A.7).

Investigating the user's search sessions
We track queries per user.We examine the entire search history to follow a total of 110,133,715 search sessions and study how many search sessions include at least one search phrase exclusive to underage content (see Supplementary Discussion C.1).Even without cookies or IP addresses, it is simple to track a user's searches by looking at the HTTP referring metadata.This means that the HTTP request for the new search includes the previous search.We assume that a user inputs new searches within five minutes of the last search.See an example snippet from the web server logs in Figure 7.During this illustrative search session, the user entered seven distinct queries, some of which reveal that the user is not only interested in teen sex by adults (age eighteen or nineteen) but also explicit underage sexual material: 16 years old → 16 years old porn → cp free → child porn free → teen homemade → teen homemade free → teen blowjob

Figure 2 .
Figure 2. We measure the proportion of CSAM onion websites inside the Tor network in 2018-2023.(a) We use a sample of 10,000 unique onion domains for each year.(b) Many websites have several onion domains.We compare the title and sentences of the pages to detect duplicates, and restrict to a single domain if multiple domains share identical content.(c) We execute text-based CSAM detections against the content of these distinct domains.Some CSAM websites do not use explicit sexual language, and text-based detection fails.Using three separate methods -manual validation, phrase matching, and the naive Bayes classifier -we discover that the detected percentage of websites sharing CSAM is 16.2-23.8%in 2023.Comparing automated methods to human validation by hand yields consistent results (22.1% in 2023 and 19.5% in 2022).We randomly select the plain text representations of 1,000 onion websites for each year, 2018-2023, and read the text content of these websites to determine whether they share CSAM and what the English vocabulary is for this type of page.

Figure 3 .
Figure 3. Ages between zero and 17 included in the CSAM search sessions and search sessions seeking 18-year-old and 19-year-old adults as a comparison (N = 479,555).16-year-old (N = 61,083 of 479,555 -12.7%) is the top-mentioned age.54.5% of age-revealing searches (N = 261,162 of 479,555) target those aged 12-16 years old.Outside of this age range, the interest declines.

Figure 5 .
Figure 5.Our anonymous survey received responses from 11,470 individuals who sought CSAM on three popular Tor search engines.(a) The survey results indicate that 65.3% (N = 7,199 of 11,030 who replied to the question) of CSAM users first saw the material when they were under 18 years old.36.7% (N = 4,048 of 11,030) first saw CSAM when they were 13 years old or younger.50.5% (N = 4,843 of 9,599 who replied to the question) report that they first saw CSAM accidentally.(b) We asked the respondents what types of images and videos they view.Viewing CSAM depicting girls is more prevalent, with a ratio of 7:3.(c) The survey results indicate that 48.1% (N = 4,120 of 8,566 who replied to the question) of CSAM users are willing to change their behaviour to stop using CSAM, and 61.6% (5,200 of 8,447 who replied to the question) have tried to stop using CSAM.However, only 14.0% (N = 985 of 7,013 who replied to the question) of CSAM users have sought help to stop using CSAM, and an even smaller portion of 3.7% (N = 257) have actually received help.Notably, 21.4% (N = 1,498) are afraid to seek help.

8 .
4% (N = 743) of respondents say they view 'other violent material', and 458 provide explanatory open-ended responses.61.6% (N = 282) of these responses explicitly mention the age of children depicted in the CSAM viewed, providing N = 1,637 mentions of age, as Figure 6 illustrates.Most responses refer to age brackets, for example 'over 12 years old', in which we define this to mean 12-17.The most common age is 15-year-old (N = 234), followed by 16-year-old (N = 221), and 14-year-old (N = 209).

Figure 6 .
Figure 6.282 of the open-ended responses explicitly mention the age of children depicted in the CSAM viewed, providing 1,637 mentions of age.

Figure 7 .
Figure 7. User-entered search phrases produce a search session in the HTTP logs.
We investigate the availability of CSAM hosted through the Tor network and its users.(a) Tor enables anonymous web publishing through onion domains.These websites host a variety of content, and there are Tor-specific search engines for searching.21/26 of the popular Tor search engines allow CSAM websites.(b) Our web crawlers collected online content from 176,683 different onion domains from 2018 to 2023.(c) Using text-based CSAM detection methods, we investigate the number of websites sharing CSAM.The identification methods provide us with 404 phrases that accurately identify CSAM content.(d) This enables text-based detection and filtering for Tor search engines.CSAM-related searches are among the most popular of the 239 million total queries.Out of 110 million search sessions, 11.1% are seeking CSAM.(e) The search engine directs CSAM-seekers to self-help websites and asks them to complete our survey.The results indicate that they want assistance and are motivated to stop using CSAM.
is the distribution volume of CSAM hosted through the Tor network?RQ2.What is the CSAM search volume, and what exactly are users seeking?RQ3.What does the survey reveal about CSAM users?RQ4.How can search engine-based intervention reduce child abuse?
Based on result click statistics, we rank the top 26 most visited search engines online on 17 March 2023.To determine whether CSAM is permitted in search results, we test the searches 'child', 'sex', 'videos', 'love', and 'cute', then study the search results.21 out of 26 search engines provide CSAM results.Four search engines even promote and advocate CSAM.One of them even states that 'child porn' is the number one search phrase.It is positive that five Tor search engines attempt to block CSAM.Yet, a user can utilise these search engines to locate other search engines and ultimately locate CSAM through the latter.Even if search engines block sites that directly share CSAM, it is still possible to find other entry points for onion sites that provide links to CSAM websites.With any major Tor website entrypoint, search engine, or link directory, a Tor user is only a few clicks away from CSAM content.

Intervention for CSAM users
RQ3: What does the survey reveal about CSAM users?
a Age during initial exposure to CSAM