Unveiling Local Patterns of Child Pornography Consumption in France using Tor

Child pornography represents a severe form of exploitation and victimization of children, leaving the victims with emotional and physical trauma. In this study, we aim to analyze local patterns of child pornography consumption across 1341 French communes in 20 metropolitan regions of France using fine-grained mobile traffic data of Tor network-related web services. We estimate that approx. 0.08 % of Tor mobile download traffic observed in France is linked to the consumption of child sexual abuse materials by correlating it with local-level temporal porn consumption patterns. This compares to 0.19 % of what we conservatively estimate to be the share of child pornographic content in global Tor traffic. In line with existing literature on the link between sexual child abuse and the consumption of image-based content thereof, we observe a positive and statistically significant effect of our child pornography consumption estimates on the reported number of victims of sexual violence and vice versa, which validates our findings, after controlling for a set of spatial and non-spatial features including socio-demographic characteristics, voting behaviour, nearby points of interest and Google Trends queries. While this is a first, exploratory attempt to look at child pornography from a spatial epidemiological angle, we believe this research provides public health officials with valuable information to prioritize target areas for public awareness campaigns as another step to fulfil the global community's pledge to target 16.2 of the Sustainable Development Goals:"End abuse, exploitation, trafficking and all forms of violence and torture against children".

(Behind any exchange of child pornographic images or videos, there is an attacker and an attacked minor.) As pointed out by the French Secretary of State for Child Protection Adrien Taquet in 2021, child sexual abuse materials (CSAM), better known as child pornography, represent both a severe form of exploitation and victimization of children and at the same time a criminal offense Assemblée Nationale (2022).Sexual violence leaves affected children with emotional and physical trauma Pinheiro (2006).For France, the National Institute of Health and Medical Research (INSERM) estimated in a general population survey, conducted between 2020 and 2021 that 1 in 10 French adults, approx.5.5 million individuals, have been subject to sexual violence in their childhood Sauvé et al. (2021), with serious health consequences as shown by Brown and Scodellaro (2023).The Independent Commission on Incest and Sexual Violence against Children (CIIVISE) installed by the French president on March 23, 2021, estimates that every year in France alone 160,000 children become victims of sexual violence.Eke et al. (2011) found that 24% of child pornography users from their sample had committed sexual offenses in the past.Similarly, Hall and Hall (2007) reported that 30% to 80% of individuals who viewed child pornography had molested a child.That emphasizes the important link between CSAM consumption and sexual violence against children.

Descriptions of key terms used in this study
• Tor: The Tor network, short for "The Onion Router," is a privacy-focused network that directs internet traffic through a series of volunteer-operated servers, encrypting it at each step and making it difficult to trace back to the user's origin, thereby enhancing online anonymity.It is commonly used to access the internet anonymously.
• Darknet: The darknet is a part of the internet that is intentionally hidden and not indexed by traditional search engines, often accessible only through specialized software like Tor.
• Hidden services: Hidden services, often associated with the Tor network using ".onion" domains, refer to websites and online services that are hosted on servers configured to be accessible only through the Tor network.
• CSAM: CSAM, or Child Sexual Abuse Material, refers to explicit media that involves the sexual exploitation or abuse of minors, including images, videos, or other content.
When it comes to CSAM detection, various automatic approaches have been proposed.Sae-Bae et al. ( 2014) developed a classifier with a true positive rate of 83% in detecting explicit-like child images and 96.5% in detecting child faces on a test set of 105 images featuring semi-naked children.Vitorino et al. (2017) utilized convolutional neural networks (CNN) to differentiate regular images from adult and child pornographic content, respectively.Macedo et al. (2018) created a regionbased annotated child pornography dataset (RCPD) in collaboration with the Brazilian Federal Police.They combined facebased child detection with a pornography detector and achieved an accuracy of 79.84% on the proposed benchmark.Overall, consistently improving CSAM detection algorithms might prompt illegal content creators and distributors to turn to the socalled "darknet" even more, making it harder for the authorities to assess and prevent CSAM circulation on the web.While the advancement of technology made it easier to moderate and filter abusive and illegal content, it has also provided opportunities for sharing such content with little accountability.CIIVISE states in its interim report that even though France is the fourth largest online host of CSAM in the world, it only employs 1 cyber-crime investigator per 2.2 million people compared to about 1 investigator per 100,000 people in the Netherlands CIIVISE (2021).
With its advanced anonymity and privacy features, the Tor network1 has been criticized in the past for facilitating illegal activities in the digital space, including the distribution of CSAM Deutsche Welle (2019).Gannon et al. (2023) find that child abuse sites are 2000 times more prevalent in the darknet, for which Tor provides the main entry point.But they also find that CSAM communities use both the darknet and the clearnet for content sharing: While live streams of child sexual abuse -predominantly taking place in developing countries -are mainly hosted in the clearnet, presumably as the risk of law enforcement agencies being aware of live streams is generally perceived to be low, non-live content is predominantly shared via CSAM forums in the darknet.According to Gannon et al. (2023), CSAM-related hidden services usually showcase archaic layouts and do not use high-security technology.Their main protocol to keep the community safe is to share the sites only with like-minded users, typically by invitation from the site administrators or moderators.Some sites require the user to post similar content before they can access the forums.van der Bruggen et al. (2022) found in a study on a large CSAM forum that while only a fraction of the forum members (0.7 %) were responsible for 40 % of the content posted, 9 out of 10 forum members tried to download CSAM at least once.
In this work, we present two major contributions to this field of research: First, to the best of our knowledge, this is the first time that consumption patterns of CSAM are estimated at such a high geographic granularity by correlating it with locallevel temporal adult porn consumption patterns.Second, we link these fine-granular consumption patterns to both small-area socio-demographic characteristics as well as nearby points of interest and Google Trends2 queries.While local patterns of both the consumption as well as production of CSAM are relevant for public health professionals and law enforcement agencies alike, we focus on the consumption of CSAM for two reasons: First, we assume that uploads of CSAM are mainly done via fixed internet lines/Wifi rather than via the mobile network.Since we only observe mobile network traffic, we consequently expect download traffic to carry stronger signals related to CSAM-related darknet activities.Second, recalling from above, there is a strong empirical link between the consumption of child pornographic content and being involved in sexual violence against children.As Insoll et al. (2022) points out: 42 % of survey respondents in his study who have viewed CSAM tried to connect with children online afterwards.Therefore, knowledge about local patterns of CSAM consumption in the darknet may also inform about the prevalence of sexual violence against children in the physical world.
The paper is structured as follows: We describe the data used for this study in Section 2. In Section 3, we explain the methodology applied to derive local-level estimates of CSAM consumption and the assumptions used.Commune-level estimates of CSAM consumption for 20 metropolitan regions in France are presented in Section 4 alongside their links to POIs, Google Trends and other socio-demographic characteristics.Limitations of the study and words of caution are extensively discussed in Section 5.

Data
In this study, we aim to analyze local patterns of child pornography consumption for 1341 communes across 20 metropolitan regions in France.The population sizes of the communes in the sample range from 80 in Mont-Saint-Martin, Grenoble to 498,596 in Toulouse averaging at 14,802 across all areas (INSEE, 2019).The data for Tor usage patterns are derived from geo-referenced, service-level mobile network traffic data measured by the mobile network operator Orange for 20 major cities in France across 77 consecutive days from March 16 to May 31, 2019, provided on a 100×100m spatial grid, also called tiles in the following, through the NetMob 2023 data challenge (Martìnez-Durive et al., 2023).The upload and download data obtained from the mobile network operator is normalized by a random value to conceal the actual traffic of the operator while retaining comparability across web services.Therefore the actual values do not have any unit, such as GB, attached to them.It is important to note that mobile network traffic data does not include web traffic generated when connected to fixed-line internet or Wifi.The geographic location of a specific user equipment (UE), e.g. a mobile phone, is captured via the base stations of the mobile network the UE is connected with during a given time interval of 15 minutes.The captured web service-specific network traffic is then distributed within the estimated coverage area of the respective base station.For details on the effect of different coverage area estimation approaches we refer to Koebe (2020).For more details on the data preprocessing performed on the Netmob dataset, we refer to Martìnez-Durive et al. (2023).
While data for a variety of web services are provided, we focus on Tor as the main entry point to the darknet.In addition, we consider download traffic from mainly pornographic websites (referred to as 'Web Adult' in the following) as a reference for the consumption of pornographic content and download traffic to YouTube as a reference for general mobile video consumption.Both Web Adult and Tor represent multiple web services grouped into a broader category, respectively.However, details on the exact composition of these categories are not available from Martìnez-Durive et al. (2023).
In order to investigate spatial relationships of CSAM consumption and local points of interest (POI), we build on the recently released Overture Maps Foundation (OMF) Places dataset that provides information on about 3 million points of interest for France derived from Meta and Microsoft products such as Bing Maps and Facebook pages (Overture Maps Foundation, 2023).Using data from OpenStreetMap (OSM) has also been considered, however, OSM provides comparatively little POI information on local businesses.
Furthermore, we use the reported number of victims of sexual violence as our groundtruth retrieved from the Service Statistique Ministériel de la Sécurité Intérieure (SSMSI) database of the interior ministry of France (Ministère de l'Intérieur et des Outre-Mer) de l'Intérieur et des Outre-Mer (2022).Socio-demographic information provided by the French National Statistical Office INSEE (INSEE, 2019) and voting outcomes from the 2017 French presidential election are used to control for potential confounders when investigating the link between estimated child pornography consumption and sexual violence.
Lastly, we complement our analysis with information on the relative popularity of search terms from Google Trends.Specifically, we consider the following set of partially community-specific keywords inspired by Owens et al. (2022): pedoporno, porno mineur, porno enfant, site pedoporno, pre-teen hardcore, zoo preteen, zoo pre-teen, pedomom, pedodad, pthc, boylove, girllove, porno jeune ado, video porno ado, ado porno, porno jeune fille, omegle and hurtcore.We extract the relative popularity values of these search terms for each of the 21 regions of France (excluding Corsica, note that Google Trends still uses the regional delineations prior to the 2015 reform) pooled across the years 2017 to 2021 to avoid excessive data sparsity.We map these values to the departments in our sample.While acknowledging that hidden services cannot be found via Google search queries and that the CSAM community actively exchanges "best practices" to stay anonymous (cf.Gannon et al. (2023)), we expect that these keywords may still be able to capture deviances from these practices.

Methodology
In order to narrow down from general Tor usage to child pornography consumption via Tor, we follow a simple, yet effective approach: First, we estimate the global share of CSAM-related Tor traffic by combining three interlinked estimates: i) According to Tor project (2023a), approx.1.1 % of global Tor traffic went to Onion services during our study period (i.e. March 16 -May 31, 2019).We believe this number to be a conservative estimate for France as Jardine et al. (2020) report that in 'free' countries -as which France classifies according to Freedom House -Tor is used more often to access onion services than in the rest of the world.Specifically, they estimate that approx.7.8 % of Tor users in free countries use Tor to access onion services vis-à-vis ∼6.7 % on a global level.ii) Jin et al. ( 2023) collected 5,437,248 of these .onion-pagesduring the years 2020-22 and observed that the category 'Pornography' accounted for approx.41.7 % of the collected pages.The authors used the hidden service indexing website Ahmia.fi3 to collect seed addresses for crawling.On the one hand, since Ahmia.fiexplicitly blacklists hidden services related to child abuse, we expect that CSAM sites are potentially under-sampled in this dataset (the blacklist contains 40,875 .onionsites as of August 2023).On the other hand, Cloudflare, a major content delivery network and domain name system service provider, allowed Tor browser users from September 2018 onwards to route some of their visits to clearnet websites via one of the ten .onion-addresses of Cloudflare.This could have potentially led to a one-sided increase of onion-traffic that may not have been fully captured by Jin et al. (2023).However, we cannot observe a substantial increase in the share of onion-traffic to overall Tor traffic between 2017 and the end of 2019 Tor project (2023a), thus we assume this to have a negligible effect on our approximation.iii) Al-Nabki et al. ( 2019) further disaggregated the category 'Pornography' in their DUTA dataset and classified 41.5 % of .onionwebsites in this category to be related to child pornography specifically.Consequently, we conclude that approx.0.19 % of global Tor download traffic is linked to the consumption of child pornographic content.However, commune-level CSAM consumption in France most likely deviates from global estimates.Thus, in order to locally adapt the global estimate to the 1341 French communes in our study, we use web service-level mobile traffic information from the Netmob dataset.Specifically, we approximate (ii) with the share of Tor traffic related to pornographic content by correlating the observed activity patterns for Web Adult and Tor for each of the 1341 French communes in our sample on an hourly basis across the whole time window of the study using Pearson's ρ.The underlying assumption is that the consumption of pornographic content, irrespective of whether adults or children are depicted, follows similar temporal patterns.Thus, locations j with a higher temporal correlation are then assumed to have a larger fraction of their Tor traffic related to pornography in general, with ρj = 1 corresponding to 100% pornographic content.Figure 1 illustrates the composition of the estimate for global and France, respectively.The 16.5 % for France represents the mean of commune-level correlation coefficients ρj .To avoid non-sensible negative estimates of child pornographic consumption (in the following abbreviated as cpc) due to negative correlation coefficients, we replace them with small positive values near zero, denoting it with ρ′j .We choose small non-zero replacements to avoid log transformed values going to infinity in later analysis.This affects 14 out of 1341 French communes with negligible effects on the overall distribution.Thus, our commune-level correction factor cj is defined as cj = 0.011 × 0.415 × ρ′j .Table 1 shows the summary statistics of ρj and cj .
Finally, we define our cpc estimates per 1000 inhabitants for all the J = 1341 French communes in our sample by where cj denotes the correction factor as described above, T or DL j the normalized download traffic related to Tor services and popj commune-level population counts.An average c of 0.0008 therefore can be interpreted as an estimated 0.08 % of the observed Tor mobile download traffic in our sample of 20 French metropolitan areas being related to child pornography.
We consider this to be a conservative estimate of CSAM consumption via Tor for multiple reasons: First, the 41.5 % refers to the share of pornographic .onion-sitesthat can be linked to CSAM.However, Owen and Savage (2015) found in 2015, that during the 6-month observation period, sites linked to sexual violence against children accounted for only 2 % of the hidden services screened in the study, but 82 % of all requests made via Tor.Second, we assume that image-based content (such as CSAM) largely drives traffic.This assumption is backed by the fact that the top 5 web services in terms of download traffic in the Netmob dataset are predominantly image-or video-based (namely Instagram, Facebook, Netflix, YouTube, Facebook Live) Martìnez-Durive et al. (2023).Third, France is the fourth-largest host of online CSAM globally.Assuming a somehow positive relationship between hosting and consuming CSAM, this gives an indication of an overall larger share of CSAM consumption compared to the global average.Lastly and importantly, we assume that pornographic content, especially illegal forms thereof is mainly consumed at home, thus handled via Wifi or a fixed internet line.Thus, this gives indication that the correction factor for France for these internet connection types to be higher.
While directly validating our estimates with information on the actual commune-level consumption of CSAM in France is not possible due to the lack of ground truth data, we indirectly validate our findings by correlating the cpc estimates with an appropriate proxy indicator, in our case commune-level statistics on the number of victims of sexual violence (both adults and minors) per 1000 inhabitants.Recalling the link between child pornography consumption and sexual violence against children indicated by Eke et al. (2011), Insoll et al. (2022) and Hall and Hall (2007) in Section 1 and assuming that a non-negligible fraction of victims of sexual violence are minors, we expect our cpc estimates to show stronger correlations with our proxy than general mobile consumption patterns of e.g.YouTube.However, we stress that this proxy most likely just captures the tip of the iceberg of sexual child abuse: First, the indicator includes rape, attempted rape, and sexual assault including sexual harassment.However, somewhat surprisingly, it does not include sexual abuse, where abuse is distinguished from assault per definition as "it is carried out without violence, coercion or surprise" de l'Intérieur et des Outre-Mer (2022).Second, while official numbers report 39,314 victims (minors and adults) of sexual violence in France for the year 2019, CIIVISE (2021) estimates that 160,000 children alone become victims of sexual violence every year in France, as already noted above.Third, the indicator is only reported for those communes with at least five recorded incidences in three consecutive years in total.This statistical disclosure control measure clearly leads to a non-random selection of communes as large communes are more likely to surpass this threshold.Fourth, local variations in reporting behaviour, especially in small communes with low overall reported numbers, may impact significantly the observed spatial patterns.
Since simple correlations in complex social settings most likely suffer from confounding factors, we build a hierarchical multi-level regression model in order to single out the influence on the number of reported cases of sexual violence that can be uniquely attributed to our cpc estimates, while controlling for a set of potentially relevant other socio-demographic and spatial features.To the best of our knowledge, this is the first attempt to look at large-scale local-level CSAM consumption from a spatial epidemiology perspective.We note that this analysis is exploratory and the presented effects do neither imply a causal relationship nor the directionality of any observed relationship.To underline that both directions of influences are possible, we also present analysis results with our cpc estimates as dependent variable.
In addition, we explore points of interest in 0.1 % tiles with the highest levels of estimated CSAM consumption.As some of these tiles are located in close proximity to each other, we remove duplicate entries by their unique place identifier.However, we noticed that some places in the OMF Places dataset may still be listed twice, e.g. in two different languages.Thus, duplicate entries may occur, however, we expect these to be negligible.Overture Maps Foundation classifies each POI into categories.We display only those POI categories with n ≥ 3 in order to limit accidental occurrences on one hand and not to miss out on relevant, but rare categories on the other.To get an estimate for the average download traffic per POI category, we divide the observed download traffic by the number of POIs located for any given tile.In a second step, we average the download traffic across POIs for a given POI category.This leaves us with the average download traffic per POI category.While we acknowledge this to be a crude approximation for the actual traffic generated at a certain POI, we assume that POI categories across the large number of tiles observed are still indicative of existing spatial relationships.
Of the 18 search terms we extract from Google Trends, we discard seven due to complete sparsity.On the remaining 11 search terms, we perform a principal component analysis with varying number of components.We decided to go for three components by balancing the explained variance and the desired level of abstraction.Figure 2 shows how the search terms are associated with each of the three components.Of the three components, we just consider the first (PC1) and the third (PC3) in further analysis as they appear to capture sexual preferences towards children more succinctly.

Results
Estimated CSAM consumption per 1000 inhabitants ranges from 0 in 14 communes to 157,077 in Mondouzil, Toulouse averaging 3,703 across all areas between March 16 to May 31, 2019.As noted above, there is no actual unit attached to the traffic volume as it is normalized by the mobile phone operator.For comparison, YouTube download traffic per 1,000 inhabitants averages 3,743,939,828 across all areas during the same time window, thus more than a million times the average Tor download traffic estimated to be related to CSAM.Commune-level results displayed in Figure 5 in the Appendix.
While more fine-granular estimates, e.g. on the tile-level (100m or census district (IRIS)-level, are technically possible, the share of census population estimates close to zero grows dramatically for small areas, thus rendering lower-level estimates per 1,000 inhabitants increasingly volatile.Therefore, we opt to present commune-level estimates in this study.However, as we observe mobile internet traffic only, the locations of (i) the traffic generation and (ii) the place of residence of the user do not necessarily coincide.Although we account for varying population sizes across communes, we observe that tile-level activity patterns are not necessarily propagated and visible on the commune-level.In other words, highly active tiles do not lead to highly active communes in terms of Tor download traffic, especially if these communes are large.This hints at spatially highly concentrated traffic generation.This argument is also supported when looking at Figure 3, which shows the normalized download traffic for YouTube, web adult content, and Tor services summarized by weekday and hour across all cities in the sample.
As one might expect, all of the services analyzed show major peak traffic in the evening hours outside of regular business hours, thus hinting at the private entertainment purpose of these services.Download traffic from YouTube and adult content vary smoothly across the hours of the day with additional subtle peaks around 8am and 1pm during weekdays.CPC-related traffic in the 10 communes with the highest CSAM consumption estimates shows a stronger concentration of download activity in the evening hours compared to overall Tor download traffic.However, Tor-based traffic appears more coarsegrained in general.A potential explanation for the pixelated appearance of Tor-based download traffic is that Tor services saw approx.2.5 million daily visitors globally in 2022 (Tor project, 2023b), while the general internet is used by approx.five billion users per day in 2022 (International Telecommunications Union, 2023).The Tor project estimates 100,537 mean daily Tor users for France during the time window of our study.Thus, it is likely that local-level Tor mobile download traffic via one mobile network operator is driven by a comparatively small subscriber base in our sample, so individual uses have a larger effect on the aggregate.

Validating estimates against official statistics on sexual violence
While direct validation of our methodology is hardly possible due to the lack of statistical data on child porn consumption habits, we indirectly validate our findings by correlating our cpc estimates with commune-level statistics on the number of victims of sexual violence per 1000 inhabitants as described in Section 3. Looking back at Eke et al. (2011) and Hall and Hall (2007) in Section 1 that link child pornography consumption and sexual violence, we expect our cpc estimate to indicate a positive association with the reported number of victims of sexual violence than general mobile consumption patterns.Table 2 shows the correlations of the number of victims of sexual violence with download traffic of YouTube, Web Adult, Tor, and cpc estimates, respectively, and whether these correlations are significantly different from zero.In addition, we perform paired-samples tests for dependent correlation coefficients to check whether the correlation coefficient of our cpc estimates with the reported number of victims of sexual violence differs significantly from the other three web services.We see that the cpc estimates correlate significantly stronger with the number of victims of sexual violence (per 1,000 inhabitants) than the other three web services (all three p-values < 1e-08).However, relying on correlations to investigate complex social phenomena is prone to confounding influences.Consequently, in further analysis, we link our commune-level cpc estimates to socio-demographic characteristics and other expectedly relevant spatial factors.To do so, we collect demographic data at the levels of communes, intercommunalities, and departments in France from the French statistical office INSEE including data on voting behaviour during the 2017 French presidential election, and combine them with the number of certain POIs per 1000 inhabitants and sets of Google Trends search terms related to (child) pornography.We chose the POIs based on the argument by Sauvé et al. (2021) that sexual violence against children mostly happens in places where a lot children are, e.g. at home, in schools or in sports clubs.Although child abuse and CSAM consumption may not happen at the same location, it is feasible to assume that offenders are in most cases not strangers to those places and likely live nearby, i.e. in the same commune.As CIIVISE (2021) states: In France, 8 out of 10 victims of child sexual abuse are victims of incest, in most cases committed by the older brother or father.Although both directions of the effect between our cpc estimates and the reported number of victims of sexual violence are plausible and supported by academic literature (cf.Section 1), we cannot determine the directionality of the relationship in our study design.Thus, we provide results for both directions by fitting one indicator on the other while controlling for a set of potential confounders using an ordinary least squares model with heteroscedasticity-robust standards errors.The results are presented in Table 3.
We observe that both the cpc estimates as well as the sexual violence indicator have a small, but positive and statistically significant impact on the respective outcome.Also, we see that the overall explained variance measured in (adjusted) R 2 is higher for cpc estimates than for the sexual violence indicator.This is expected as we control for download traffic of related web services.Interestingly, the effect of adult porn consumption (log_Web_Adult_per_1000) is negative, which hints at a subtle substitution effect: adult porn consumption in the clearnet is to some extent replaced by CSAM consumption in the darknet.Furthermore, we notice little consistency with regard to the direction, significance, and size of the observed effects of the control variables across the two regression setups.Together, this hints at the fact that our cpc estimates and the sexual violence capture two distinct behaviours.While this could either support or undermine the validity of our estimate -we are able to single out the signal related to CSAM from the noisy sexual violence indicator vis-à-vis we measure some completely different Tor usage behaviour -both the positive and significant association of the two indicators as noted above and the fact that we control for overall Tor download traffic supports the validity of our estimates.To investigate this further, we repeat the analysis for various specifications (see Appendix).We use the reported cases of drug abuse per 1000 inhabitants as a proxy for another presumably popular use of the darknet -ordering drugs.As Table 5 in the Appendix shows, the drug abuse rate does not inform our cpc estimates, giving further indication that we capture (child) porn-related consumption as we do not capture marketplace-related uses of Tor.Further, we observe that the sexual violence indicator is zero for approx.half of the communes in our sample.Zero inflation may bias our parameter estimates as it hints at unmodelled factors causing the zeros in the first place.In Table 6 in the Appendix, we therefore exclude the communes with no reported cases to check for the impact of a zero-inflated setting.Overall, the significance of the observed effects is reduced which can be to some extent explained by the reduction in sample size, but significant effects do not show a change in sign or size.Lastly, it needs to be pointed out that the level of variation attached to these findings are most likely vastly underestimated, since the uncertainty involved in both the approximation of the correction factor as well as the underreporting of sexual abuse/violence cases is not accounted for, just to name a few.Also, we would like to stress, that this analysis does not, in any way, indicate that people of certain demographics participate in child abuse.Rather, our results should be interpreted as a first step into little-charted territory, namely looking at sexual child abuse via CSAM consumption from a spatial epidemiological perspective.

Investigating spatial relationships of child sexual abuse materials
We further investigate the spatial relationship of estimated CSAM consumption with the local environment.In Figure 4, we present the cpc estimates of the commune for which we estimate the highest CSAM consumption per 1,000 inhabitants in our sample of 1341 communes.Tile-specific Tor download traffic is multiplied by the respective commune-level correction factor.The correction factor does not vary over time, but has been calculated for the whole time window of our study.
By looking at the timeline of Tor download traffic in this commune, Tor services appear to be used rather irregularly, as already mentioned before.Thus, we not only see a spatially, but also temporarily highly concentrated Tor usage.This would be in line with some common CSAM practices as described by Gannon et al. (2023), where CSAM is usually not streamed ondemand, but downloaded and consumed offline.While this may explain the "front-loaded" cpc download activity apparent in Figure 3d when compared to adult porn download activity in Figure 3b and therefore validates our main assumption that porn consumption follows the same temporal pattern, regardless whether adults or children are depicted, it lays open a caveat in it: porn consumption and consumption-related download traffic do not necessarily occur simultaneously, especially in the CSAM community.Based on the visual inspection of Figures 3d and 3b, we determine the potential time lag between activity and assumed consumption to be around two hours.Consequently, we lag Tor traffic by two hours and re-run both the calculation of the correction factor and the subsequent regression analysis.The lagged Tor traffic improves our pairwise correlation with the sexual violence indicator as reported in Table 2 from 0.28 to 0.34 as well as our regression analysis as presented in Table 7 of the Appendix.Even though the patterns observed via the day-of-week by hour-of-day heatmap does not indicate a generalizable usage pattern, one can clearly see that it does not align with regular business hours and therefore indicate private use.This argument is supported by the fact that the most active tiles within the commune displayed here are located in residential or rural neighborhoods as visual inspection of the respective tile locations on Google Earth shows.
By looking not only at the top 10 communes with the highest estimated CSAM consumption, but at the 0.1 % of all tiles with the highest download traffic (n = 5,259) for the three different web services in our study, we observe distinct sets of adjacent points of interest (POIs) as shown in Table 4.Although one could think of plausible explanations for some of the POIs in Table 4 (e.g.concerning the use of web services related to adult pornography around prisons or the use of YouTube at tourist attractions), drawing more general spatial relationships from Table 4 appears challenging, especially for our CPC estimates.For example, it is unclear whether Tor plays an important role in fulfilling diplomatic duties or whether these high levels of Tor mobile download traffic are simply a geographic coincidence.An argument against the latter is that this coincidence is not limited to one larger diplomatic area, but occurs across several cities in France.A detailed look at the POI locations for our CPC estimates reveals that many of the POIs across the mentioned categories are located around Porte de Passy, which surrounding area represents the largest Tor download traffic hotspot in our sample of 20 urban areas in France.However, most of the CPC-related traffic in the corresponding commune is generated throughout the study period at the end or outside of regular office hours.Noticeable is that many of the identified POIs are located in densely populated areas.One explanation for that is that we look at total traffic on the tile-level as tile-level population statistics are on one hand not readily available and on the other hand potentially misleading, especially in tourist areas.Interestingly, a closer look at the actual POI locations also reveals generally fewer POI locations in the OMF Places dataset vis-à-vis Google Maps.
Importantly, it needs to be stressed here that just because traffic is generated in close proximity to these places, it does not mean that this traffic is generated by the inhabitants, owners, or employees themselves, but by any subscriber near the location.Related to the well-known concept of ecological fallacy, area-level correlations do necessarily not imply individual or POI-level causal relationships.As an example, while prostitution occurs mainly in poorer neighborhoods, the clients are not necessarily the poor locals.

Discussion
In this study, we shed light on a topic usually hidden in the dark from a novel angle: We looked at spatial patterns in the consumption of child sexual abuse material using mobile network data for 1341 small areas across 20 metropolitan areas in France for 77 consecutive days in 2019.To the best of our knowledge, this is the first time that spatial CSAM consumption patterns have been mapped at such a high geographical detail.Validated against the reported numbers of victims of sexual violence at the commune-level, we further explored geographic links to both local socio-demographic characteristics as well as to nearby points of interest and Google search queries.These insights may contribute to a better understanding of the whereabouts of CSAM consumption and thus inform targeting public awareness campaigns such as the one launched in September 2023 by the French government (Le Monde with AFP, 2023).However, it is important to address the limitations inherent to this study: First, the study analyzes mobile network traffic from one major mobile network provider only; hence, it misses out on web traffic generated both via Wifi or fixed internet connections or via other mobile network operators.Structural differences between mobile-only and overall traffic, especially when it comes down to the consumption of (child) pornography, need to be expected but cannot be further quantified in this study.Second, our estimates build on assumptions as laid out in Section 3, since detailed information concerning the specific origin of the observed Tor download traffic is not available.While we try to support the assumptions with evidence, they may not hold to a full extent, especially on local levels, as the sample size of actual Tor users generating the observed traffic might be very small.Third, linking consumption patterns with local phenomena such as socio-demographic characteristics or points of interest is subject to additional uncertainty as the mobile traffic is assumed to be generated partly out-of-home, i.e. not exclusively by the inhabitants of that area, but potentially by any visitor.Therefore, relationships observed on the area-level may not hold on the individual-level.Fourth, the sourcing of the POI information is not described in detail by the data provider and, therefore, may be prone to certain selection biases.Especially as residential homes are usually not counted as a point of interest, information on these might be underrepresented or captured indirectly by POIs prevalent in residential areas such as schools.Fifth, our groundtruth indicator, i.e. the reported number of victims of sexual violence, is imperfect in many ways as laid throughout the study, but -to the best of our knowledge -the most suitable proxy for child sexual abuse in France on local levels.Consequently, our CSAM-related consumption estimates need to be considered with caution, especially at the local level.
As described in the Netmob dataset description, data collection, processing and aggregation took place in compliance with GDPR under the supervision of the Data Protection Officer of the mobile network operator Orange (Martìnez-Durive et al., 2023).Individual-level traffic has been aggregated to 15-minute intervals and spatially distributed across a network coverage grid.Furthermore, the study authors refrain from any detailed depiction of small areas, e.g.presenting geographic coordinates for single tiles that could put people or businesses at risk of being accused of wrongdoing.In addition, we tried to add flags of caution throughout the study to avoid that individual figures or paragraphs can be misinterpreted when taken out of context.
Going ahead, we see multiple ways how this research can be extended: First, the regression could benefit from additional indicators that capture attitudes, behaviours, and opinions in a more nuanced way.This is of particular importance for deriving policy implications from our work.Second, we have not found any major external shock such as take-downs of large CSAM forums in the darknet during the time window of analysis.Re-running the analysis around such an event may provide further insights into the agility and resilience of the community to external interventions.Third, in a related manner, temporal information on forum activities may help to link specific forum activities (e.g. release of a new curated CSAM collection) with traffic patterns.Fourth, extending the analysis to fixed internet connections may allow to capture the full extent of CSAM consumption online and help to quantify the bias induced by observing mobile traffic only.This would also allow to investigate the supply side of the CSAM market more rigorously, namely the upload traffic.Lastly, we hope that the release of the Netmob dataset will strike a precedent for other mobile network operators and internet service providers to provide web service-level network traffic information to researchers in an ethical manner.While the internet has fundamentally transformed the way we behave and communicate, it is still little known how it is actually used in everyday life.Consequently, more such data releases would facilitate research not only on the darknet, but across a wide range of disciplines.
In conclusion, we believe that our study sheds light on the consumption of CSAM from a novel angle using so far littletapped data source -large-scale web service mobile traffic.In that way, we hope that our study can help in better understanding the spatial relationship between CSAM consumption and child sexual abuse and ultimately help to move forward on target 16.2 of the Sustainable Development Goals: "End abuse, exploitation, trafficking and all forms of violence and torture against children".

Figure 1 :
Figure 1: Composition of CSAM estimates for global and France.

Figure 2 :
Figure 2: Association of Google Trends search terms related to child pornography with their principal components for French regions pooled across the years 2017-2021.

Figure 3 :
Figure 3: Normalized download traffic aggregated to day-of-week by hour-of-day, by web service.

Figure 4 :
Figure 4: Commune with the highest cpc estimate.The line plot describes the hourly cpc estimates for each tile within the respective commune.The heatmap shows the cpc estimates by weekday and hour.

Figure 5 :
Figure 5: Commune-level estimates of child pornography consumption per 1000 inhabitants, by major French metropolitan area.See the Discussion section for limitations of the underlying analysis.

Table 2 : Correlation of the reported number of victims of sexual violence with log download traffic, per 1000 inhabitants at the commune-level and by web service.
For n = 731 communes for which official numbers are available.

Table 4 : Points of interest in the 0.1 % of tiles with the highest cpc traffic, by web service.
Number of POIs in parentheses.

Table 6 :
Regression results without zero reported cases, by commune.