Introduction

The mesopelagic zone, extending between 200 m and 1000 m depth in the world’s oceans, constitutes one of the largest yet least understood ecosystems on the planet1,2. Scientific engagement with the mesopelagic zone has a long history, from the Challenger expedition of 1872–18763, to the identification of the “deep scattering layer” with echo-sounding technology in the 20th century4. Deep-sea fishing programmes were run by South Africa and the former USSR in the 1970s and, more recently, by Iceland and Norway in the 2000s5. Scientific interest in the mesopelagic zone has grown rapidly in the 21st century6,7 (see also Supplementary Fig. 1), largely funded by the European Union and United States8,9.

Deep-sea scientists argue that it is necessary to produce new knowledge in order to understand the mesopelagic zone and its current and potential role in supporting human life through climate regulation and marine food systems10. Attempts to quantify these dimensions of the mesopelagic zone have so far been accompanied by wide uncertainty ranges. For example, estimates of atmospheric carbon sequestration through the mesopelagic zone’s biological carbon pump range between 2–6 billion metric tons per year globally – at least equivalent to twice all car emissions worldwide5,11. Gross estimates of available fish resources also vary considerably from 1.8 Gt12 to 19.5 Gt13 – with some scientists shaping expectations that fishing these resources could double global fisheries landings6.

Scientists and the work they produce are thus central to current and future decision-making over the deep sea. By measuring and codifying the natural world, science is responsible for rendering nature ‘legible’ to states and publics alike14. A legible natural world is one that is measured and simplified, making it suitable for manipulation, experimentation, and, ultimately, governance14. In making the natural world legible, science, often with state support, plays a role in supporting and legitimising political decisions about environmental management15, contributing to the transformation of the natural world into resources16. In this way, science explicitly interacts with state rules that can steer resource extraction through de jure state-led, legally binding forms of governance17.

In addition to contributing to de jure governance processes, science can also exert de facto governing power. By studying seemingly apolitical technical questions in anticipation of future uses of the natural world, scientists directly shape what can be governed and how. From their authoritative position in society, and their exclusive role in accessing the mesopelagic zone, scientific assessments can (inadvertently) shift governance debates from first-order ‘what, if and whether’ questions to those that normalise and institutionalise certain activities18. By rendering the natural world legible, scientific work becomes a de facto means of shaping how resources are fundamentally understood and valued, as well as the principles that ultimately define state and private sector rules and action for conservation and/or exploitation18. Yet this de facto governing role of science is not subject to the same oversight as de jure state-based forms of regulation. In light of the growing interest in the mesopelagic zone, it is therefore increasingly important to understand and reflect upon the role science is already playing in the governance of this ecosystem.

In order to compare the contents of the scientific and ‘public debate’ about the mesopelagic zone, we collected 2226 scientific abstracts and 4066 social media posts published before 31–12–2020 that mention the “mesopelagic zone” or “ocean twilight zone”. By performing an automated content analysis of these texts, we identify key ‘topics’ – corresponding to what Nunez‐Mir et al.19 call hidden thematic structures within bodies of literature. Next, we build on the theory of network agenda setting20,21, as well as methodological approaches such as framing and agenda setting22, convergence networks23 and semantic networks24, to identify interconnections between these topics. The result is a visualisation that we call a network of meaning, which we define as a visual representation of complex text data, where topics are nodes that represent dominant ideas within texts and interconnections show how these ideas co-occur within texts (see “Methods” section for detail). We use this network of meaning to identify coherent and recurring themes within the body of texts, revealing the dominant discourses surrounding a subject and how they relate to one another.

We find that scientists shape the anticipation and governance of present and future exploitation of the mesopelagic zone both through scientific publications and by using social media to communicate about their work. This network of meaning directly contributes to an anticipatory form of knowledge production that transforms the mesopelagic ecosystem into an object of future governance15. We see this as a de facto governing role of science: by shaping the network of meaning about the mesopelagic zone within the literature but also in the public domain, science makes particular management actions and outcomes more likely in the future. This prompts a discussion about how this de facto governing role of science in the deep sea might be opened to greater public scrutiny and enriched by emerging dynamic ocean management systems25 and governance regimes.

Results

Mesopelagic topics

There are 13 topics (T) within the body of texts related to the mesopelagic zone (Fig. 1). Six of these topics are directly related to fishes: their diets (T3) and reproduction (T5), how scientists explore and discover new species (T7), estimates of populations and their behaviour (T9), their bioluminescent properties (T12) and myctophid (lanternfish) length and growth distributions (T13). Three further topics relate to the mesopelagic zone’s role in carbon cycles, focusing on both particulate (T4) and dissolved organic carbon (T11) as well as the role of the mesopelagic zone in carbon cycles, climate, and fisheries and biodiversity (T2). The remaining topics describe themes related to mesopelagic zone biology and ecology, including distributions of zooplankton and copepods (T1), microbial and genetic diversity (T6), trophic interactions using stable isotope analyses (T8), and the description of new species (T10).

Fig. 1: The 13 latent topics present in Twitter and scientific abstracts.
figure 1

The y-axis shows the 10 most common terms per topic and the x-axis represents the beta score (per-topic-per-word probability). A high beta score suggests a word is more likely to occur in a given topic. The colour of the bars represents the share of texts (either tweets or scientific abstracts) that the topic is present in.

Topics that predominantly feature in tweets (T2, T7, T12) tend to be less internally coherent than those mainly present in scientific abstracts (T1, T3, T5, T11, T13). For example, both “fish” and “carbon” appear in T7, which is 99% tweets; whereas T11 (76% abstracts) is specifically focused on dissolved organic carbon. Twitter topics also mention the organisations conducting and/or funding work in the mesopelagic zone, namely the National Aeronautics and Space Administration (NASA), Woods Hole Oceanographic Institute (WHOI) and the National Science Foundation (NSF), all of which are based in the United States.

Network of meaning

The 13 topics are grouped according to the rate at which they co-occur within a single text. This produces a network of meaning (Fig. 2), within which the thickness of connections between topics indicates the rate at which topics co-occur within texts. This mesopelagic network of meaning contains two clusters of topics (fish and carbon), as well as two bridging and three satellite topics.

Fig. 2: Mesopelagic network of meaning.
figure 2

The size of the topic represents its prevalence in the literature. The colour of the circles represents the share of texts (either tweets or scientific abstracts) that the topic is present in. The lines connecting the topics represent a high frequency of co-occurrences of topic terms within a single text, with thickness representing frequency. More detailed information about the frequency of co-occurrence is provided in Supplementary Table 1.

The largest cluster is made up of a core of four interconnected topics (T3, T5, T9, T13) and one connected topic (T7), all focused on fish. The four core topics (covering fish diets, distribution, recruitment and behaviour) demonstrate an explorative interest in the biology and ecology of mesopelagic fishes. The cluster also contains fisheries-relevant keywords, indicating baseline biomass (T9), reproduction rates (T13), trophic interactions (T3) and location of stocks (T5 and T9).

The connected topic (T7) in this fish-focused cluster is predominantly made up of tweets about exploring the ecosystem and the bioluminescent fishes that live there. T7 joins the cluster of interconnected topics via T9, which is concerned with acoustic studies of biomass and diel vertical migration. The presence of NASA and the NSF in the keywords suggests that these institutions use Twitter as a public communication channel to report their findings about fish living in the mesopelagic zone.

The second cluster of topics focuses on carbon. Here there are two topics (T4 and T11) that emphasise different scales of carbon sequestration by ocean life. T4 focuses on carbon flux and export via particulate organic carbon and T11 is about a similar process at the scale of dissolved organic carbon. This topic (T11) emerges predominantly from scientific texts and connects to another topic (T2) made up predominantly by tweets (see Supplementary Fig. 2 for more detail). Notably T2 also contains the keyword “fisheries”, but T2 and T11, with their presumably different audiences, share only one top-ten keyword: “carbon”. Thus, the content and location of T2 in the network of meaning represent conversations on Twitter about the mesopelagic ecosystem’s role in the carbon pump, as well as competing uses such as fisheries.

Topics about zooplankton distributions (T1) and microbial communities (T6) form a bridge between the fish cluster and the carbon cluster. Zooplankton such as copepods (T1) are less directly relevant for fisheries management, but, like the fish cluster, this topic is also about quantification. The shared keywords “abundance”, “distribution” and “biomass”, quantify and thereby render these organisms legible for food web studies. Linking zooplankton distributions (T1) to the carbon cluster is T6, which is about microbial communities and genetic diversity. Texts about microbial and genetic resources (T6) are therefore likely to have a shared focus on either their role in carbon sequestration (T11) or in supporting the base of the marine food web (T1).

The remaining satellite topics do not connect to the main body of the network of meaning but represent isolated or niche topics related to the mesopelagic zone. Their placement in the network shows that texts about this topic are less likely to contain the keywords of other topics. The smallest topic (T8) contains texts about a technique that uses stable isotope analysis to determine trophic interactions between species. T10 describes the niche of species classification. T12 is a large topic that contains a mix of words relevant to the mesopelagic zone – including “fish”. However, 98% of texts where this is the main topic are tweets, and it appears to be driven by the communications team of WHOI, who use non-scientific language to engage a broad public audience.

The mesopelagic zone in policy documents

There is currently no global policy that specifically governs human activities in the mesopelagic zone, nor are there any single-species management programmes that aim to monitor and restrict catches of mesopelagic species. Instead, the mesopelagic zone is covered by a disconnected range of international conventions and laws that govern the high seas (e.g., the 1958 United Nations Convention on the High Seas), national exclusive economic zones (e.g., the 2013 European Union Common Fisheries Policy and the 2021 Magnuson-Stevens Fishery Conservation and Management Act), and migratory fish stocks (e.g., the 1995 United Nations Conference on Straddling Fish Stocks and Highly Migratory Fish Stocks). More detail about the document selection is provided in Methods and the full list of documents is provided in Supplementary Table 2.

In contrast to the rich discussion of the mesopelagic zone in science and social media, there is no mention of the mesopelagic zone in the 15 key conventions and laws that govern ocean activities (nor is it represented by other terms such as “ocean twilight zone” or “midwater”). By searching for the keywords of the network of meaning (i.e., the 10 words from each of the 13 topics presented above in Figs. 1 and 2) we find a total of 62 of the 130 keywords present in the international conventions. Figure 3 presents the 15 most frequently occurring words (a more detailed frequency analysis is presented in the Supplementary Table 3 and Supplementary Fig. 3). The keywords “fisheries” and “fish” were the most prevalent, indicating that these policies are heavily skewed to governing the extraction and protection of fish species and stocks. This is not remarkable, given that fisheries are a socially and economically important activity and are already subject to long-established global governance agreements. The presence of “area” and “data” also appear to indicate the priorities of governments in allocating space for human activities at sea and the increasing role of (digital) scientific input into ocean governance, respectively.

Fig. 3: Frequency distribution of the keywords from the 13 topics present in scientific abstracts and tweets in the selection of policy documents.
figure 3

The words presented here represent overlaps in the keywords relevant for the mesopelagic zone according to science and public communication with the selected policy documents. A summary of the frequency per document source for each keyword is presented in the Supplementary Table 3 and Fig. 3.

Other subjects are completely absent from these policy documents. Despite being the focus of several topics in the scientific abstracts and tweets (T2, T4, T11), “carbon” is mentioned only once in the policy documents, in the context of the “ocean’s natural blue carbon function” (in 2016 European Parliament Joint Communication, see Supplementary Text Box 1). This indicates that carbon sequestration, albeit a major topic in the scientific debate on mesopelagic ecosystem functions, is not established as a topic in ocean policy. There is also no policy coverage for the microbial and genetic material that can be found in the mesopelagic zone (T6).

Discussion

This computational methodology systematically identifies the most important topics about the mesopelagic zone across expert and public domains of communication. We interpret how these topics connect to one another and how this shapes how the mesopelagic zone is understood and thereby governed. Our results reveal fish and carbon as the two main focuses of scientific work, with Twitter mainly being used by scientists to communicate their work to the public and to one another. Meanwhile, the absence of the mesopelagic zone from the key governing policy documents contrasts starkly with the growing discussion about this ecosystem in science and on social media. This demonstrates that the characteristics and uses of the mesopelagic zone are being negotiated in the scientific domain, long before being governed by de jure policies.

The large cluster of topics about fish contains keywords that are directly related to fisheries management concepts (e.g., distribution in T1, T5, T9, biomass in T1, T7, T9 and words such as larval, growth, size, length and weight, which are related to recruitment and growth in T5 and in T13). By quantifying the life cycles and behaviours of these animals, this cluster represents the first steps in rendering mesopelagic fishes legible for management decisions. Once enumerated and abstracted into populations and stocks with predictable behaviours and characteristics, fish become a governable resource. In other words, by defining and helping to populate metrics that are useful for fisheries management instruments (e.g., stock assessments), science plays an anticipatory role in defining and de facto governing the mesopelagic zone. The smaller carbon cluster shows that there is also interest in the biological carbon sequestration function of the mesopelagic zone. This reveals the development of a field interested in constructing the mesopelagic zone as a source of blue carbon26 or even fish carbon27, supporting human life by regulating the climate. The satellite topics show that, in addition to the ecosystem services of food provision and climate regulation, a narrow range of alternative meanings and uses of the mesopelagic are also being explored.

Twitter has reinforced these scientific clusters by enabling scientific communication rather than, as seen in other contested sectors or issues28, democratising meaning-making and/or enabling counter-narratives to emerge. The lack of alternative narratives through Twitter appears to reflect scientist’s virtual monopoly on knowing the mesopelagic, a result of their access to expensive technologies such as submersible vehicles. The lack of attention from the public may also be the result of the currently low levels of human economic activity in the mesopelagic zone, which may generate interest with other parties. With limited knowledge contributed from an established fishery or other industries to date, scientists have a privileged position in both identifying and resolving complex uncertainties in the ecosystem. While there is certainly variation in the roles played by the scientific organisations identified in our analysis (NASA, the NSF and WHOI), their presence in the text analysis shows their important role in communicating about, and thereby constructing, the mesopelagic zone in the public sphere. Attempts to engage the public through ocean literacy programmes29 or via live streams of scientific expeditions (e.g. the NOAA Ocean Exploration initiative30) can introduce the public to the mesopelagic zone and steer attitudes and even behaviours. However, for now the power of constructing the mesopelagic zone remains in the hands of the scientists conducting these outreach activities because they fundamentally shape what is presented to the public and therefore what can possibly be contested outside of the sphere of science.

While the content of the network of meaning reveals the focus of scientific and public knowledge about the mesopelagic zone, what is not included is also worth considering. For example, although some articles published after the data collection of this study have begun to consider the economic viability of fishing in the mesopelagic zone31,32, there are no keywords in the output of the topic model that indicate scientific evaluation of the costs and benefits of exploitation (in social, economic or ecological terms). Similarly, the role of the mesopelagic zone in supporting culturally important species or apex predators is being investigated32 but does not feature prominently in the network of meaning. Furthermore, deep-sea mining may impact the mesopelagic zone33,34 but this is not detected by the topic model of both scientific abstracts and tweets. While some have researched public perceptions about the deep sea35, considered what new institutional arrangements may be required for the mesopelagic zone36 and the high seas27,37, and have called for particular modes of governance in both scientific journals38 and public-facing threads on Twitter39, keywords indicating political and social research are absent from the topic model. Similarly, since the end of 2020 the interest in the deep ocean has continued to accelerate, with deep-sea exploration activities being endorsed by the UN Ocean Decade in late 202140, the treaty negotiations for Biodiversity Beyond National Jurisdiction set to resume their fifth session in March 202341, and the International Seabed Authority continuing to work towards binding regulations for deep-sea mining42. Additionally, the full text of the most recent IPCC report places the mesopelagic zone as being particularly vulnerable to rapid change under climate projections and invokes the importance of the ocean’s ‘blue carbon’ storage capacity43.

Despite these recent examples, the overall lack of explicit attention to the economic and cultural significance of and decision making over the mesopelagic zone, as well as the absence of explicit attention to the mesopelagic zone in de jure policy and regulation, means that the knowledge produced by predominantly natural scientists directly shapes how the mesopelagic zone is known, classified, quantified and thereby governed. In other words, in the larger process of state-science interaction, we currently see the future of the deep sea being anticipated in ways that are not being made explicit in policy, nor do we see powerful counter-narratives in the public domain of Twitter. As a result, science is a governing actor in the mesopelagic zone. For example, classifying organisms into abstract species and genera (T10)44, and setting distribution estimates (T5 and T9) and biological descriptors (T13) of mesopelagic species directly and explicitly feeds into management instruments such as total allowable catches, which depend on fish stock assessments. This also means that highly uncertain or speculative estimates by scientists, such as the mesopelagic zone holding the potential to double global fish landings6, have consequences for how resources and ecosystems may be thought of and ultimately governed. Less overtly, we argue, by selecting and framing ideas and topics of research, scientists may also shape the very questions that are asked about this ecosystem from the outset.

These results indicate a need to reconsider natural science’s de facto governing role in the deep sea. Instead of only relying on science to anticipate a narrow set of use-based narratives, such as fisheries and carbon sequestration, decision-making about the mesopelagic zone would benefit from enhanced transdisciplinary forms of knowledge that broaden our understanding (and imagination) of the mesopelagic zone by engaging diverse groups including marine industries, policy makers at national and international levels, non-government organisations, activists, artists and the general public35,45,46,47. While some international science projects48,49,50, organisations51 and declarations52,53 aspire to and/or support stakeholder engagement, they tend to reinforce science- and state-led framing of the mesopelagic zone, rather than opening up to different forms of knowledge and values associated with the deep sea. Thus diversity and inclusion in science must go beyond multi-disciplinarity within deep ocean science: it must also extend to diverse communities, nations and epistemologies if the coming decade of ocean science is going to lead to equitable outcomes54,55,56,57. A further democratisation of knowledge would as such not negate the role of fundamental science as a crucial way of classifying and quantifying the mesopelagic zone. However, with transdisciplinary approaches that explicitly recognise the de facto governing role of science, it is possible to include additional perspectives by situating fundamental scientific insights alongside or against alternative values, norms and questions. Ultimately, opening up the scientific process in this way may increase the likelihood that more diverse (non-)use cases for the mesopelagic, and similar global frontiers, would be presented to policymakers and the public alike and that resulting policy decisions are viewed as more legitimate.

Democratising the de facto governing role of science in the deep sea remains highly challenging. Instituting deliberation over alternative values and uses beyond fisheries and climate services is, despite the global importance of the mesopelagic, made difficult by the apparent lack of societal engagement with the deep sea58. This is a key difference with other domains, such as food systems, where there are calls for democratised knowledge-governance (rather than science-policy) interfaces that enable diverse perspectives59 and differentiate responsibilities from civil society and private actors60. Creatures from the deep sea have some place in human imagination and popular culture61 and the mesopelagic zone contains prey species for culturally and economically important epipelagic fish such as tuna and swordfish62, and yet the mesopelagic zone has no direct traditional users and only limited cultural significance when compared to coastal marine ecosystems and even the deep seabed63,64. Furthermore, where trade-offs between food security and climate mitigation are already being estimated for terrestrial food production65 and in coastal fisheries66, this has yet to be considered for the mesopelagic zone. As research into its potential contribution to and suitability for food systems moves forward67,68,69,70,71,72, presenting this contribution as a trade-off with climate mitigation may shut down debate about other values and trade-offs before it can take place. Finally, because most of the mesopelagic zone is in the high seas beyond national jurisdiction, and requires expensive technology to interact with, it is likely that a small number of science, private and state actors will be highly influential in these discussions.

The current regime of high seas institutions thus appears inadequate for enabling anything other than the de facto scientific governance of the global ocean commons. Attempts to include diverse understandings of the ocean environment and its resources in policymaking are limited by an ongoing commitment to privilege science-based assessment and monitoring by states, which can “render them unable to accommodate alternative ways of knowing”73. New ways of democratising the de facto governing role of fundamental science are required in order to enable rather than marginalise how marine systems are known and represented.

We argue that there are at least three possible routes for achieving this enhanced democratisation of knowledge for the deep sea. First, existing science-based governance institutions, such as regional fisheries management organisations (RFMOs) or the International Seabed Authority, hold significant potential for accountable and adaptive transboundary governance. However, both have received criticism for their lack of transparency and public engagement and for reinforcing historical use rights through contested and opaque allocation processes74,75. More transparency and public accountability would make their existing monitoring efforts a powerful source of engagement with the private sector, science and the public. Second, the de facto governing role of mesopelagic science could be democratised by linking to open digital platforms that have transparency as a central aim, such as Global Fishing Watch. These online platforms have impressive data analysis capacity through collaborations with companies like Google as well as access to observation technologies through drones, vessel monitoring systems, satellites and sensors placed on floating and stable (public and private) infrastructure76. Such platforms provide an opportunity to include ‘ocean citizens’ beyond scientific, state and private actors in ‘dynamic ocean management’25. Third, RFMOs and private ocean observation can feed into a unified global ocean governance infrastructure. The Global Ocean Observing System has had some success in developing a framework for the collection and use of such data77 and can facilitate new modes of global governance through “informationalising the world ocean”78. In addition, the United Nations treaty on the sustainable use of biodiversity in areas beyond national jurisdiction79 – if ratified – has the potential to mandate and coordinate the consistent collection and distribution of data by RFMOs and other ocean observers at appropriate spatial, temporal and taxonomic resolutions37,80. Combined with the right of all member nations to participate in negotiations, this unified global governance of the high seas can facilitate and democratise deep-sea governance with input from science, industry and the public.

Faced with deep uncertainty and the near total lack of (inter)national governing institutions, science will continue playing a de facto governing role throughout the ocean’s mesopelagic zone by shaping the network of meaning through which future (non-)uses of this ecosystem are determined and justified. In doing so, they anticipate and shape the governance of the deep sea. Mobilising existing institutions to improve transparency and public engagement can open this de facto governing role of science to the same public scrutiny usually afforded to governing bodies and thereby democratise future deep-sea governance. This is essential given the sheer size and significance of the mesopelagic zone as a global common resource, and the likelihood that human activity is likely to move forward before scientific certainty can be achieved.

Our results demonstrate the value of computational methods to reliably interpret large volumes of text data in service of sustainability governance questions, which is useful for understanding ever-growing bodies of literature19,81. These methods are developing rapidly to address their inherent limitations. For example, the topic model does not account for synonyms or the meaning held in negations (see Method). Future work may wish to employ new techniques that allow for topic models to deal with these complexities of language and provide much more coherent analyses of text data82. Furthermore, this work could be scaled up to analyse applied and advocacy-based science by including technical reports, grey literature, popular media articles and documents from non-governmental organisations. As for our policy analysis, a more extensive and multi-lingual analysis of national policy documents could reveal more about the extent to which the mesopelagic zone is currently being governed by de jure instruments. Nevertheless, the work presented here provides a parsimonious analysis of a large sample of documents and reveals key ideas about the mesopelagic zone. When compared to policy, we show the established and anticipatory role of science in the governance of the deep sea.

Methods

Data collection

Our analysis is based on data from three sources: Web of Science, Twitter and public policy databases. Using the search terms “mesopelagic” or “ocean twilight zone” in Web of Science, we retrieved 2395 records until 31–12–2020, of which 2226 were eligible for the study. Ineligible records were records without abstracts and those published before 1–1–1990. Using the same search terms as well as the corresponding hashtags “#Mesopelagic” and “#OceanTwilightZone”, we used the R package AcademicTwitteR83 to access the Twitter API. The search returned 8690 original tweets from the earliest record of the use of the search terms (21–03–2009) until 31–12–2020. Of these, 4066 were eligible for the study. Ineligible tweets are duplicates (either retweets or automated bot tweets of the same content) and those that contain only tags of other users and/or only emoticons. We used a purposive sample of policy documents, selecting laws and conventions that govern human activity at sea where one could expect interactions with the mesopelagic zone. We limited these to global agreements (e.g. United Nations) as well as the United States and European Union, given that these two political entities are investing heavily in mesopelagic science projects (e.g., EU H2020 projects MEESO48 and SUMMER49 as well as WHOI Ocean Twilight Zone project84). We also included documents that might govern the mesopelagic zone in other frameworks (e.g. biodiversity). A full list of the documents and their sources is available in Supplementary Table 2.

Data preparation

The scientific abstracts and tweets were combined to form a corpus. To prepare the corpus for input into the topic model, we transformed all letters to lower case and used the R package stopwords v2.385. We then removed additional stopwords specific to this dataset (see Supplementary Text Box 2 for list), as well as numbers and punctuation (including the # used for hashtagging on Twitter). Descriptive statistics showing the time series distribution of abstracts and tweets used in the analysis, as well as the most common words in the dataset are available in Supplementary Figs. 1, 4 respectively.

Latent dirichlet analysis topic model

Topic models are probabilistic tools for the unsupervised analysis of large document collections86, of which the most commonly used type is the Latent Dirichlet Analysis (LDA) topic model19. The first task in preparing an LDA topic model is to determine the appropriate number of topics. We used the FindTopicsNumber() function from the ldatuning package in R87 to assist in evaluating the most appropriate number of topics for the LDA topic model. The function iteratively produces LDA models within a range of numbers of topics. We specified a range of 5 to 20 topics. This range was theoretically motivated: we know from experience with the subject that there is enough diversity that there was no risk of topic saturation at only 5 topics and an upper limit of 20 ensured feasible interpretation. The function then uses four different methods for evaluating the coherence of the topics in each model version. The most appropriate number of topics (k) for the model according to the output of these four tests is 13 (see Supplementary Fig. 5). This number of topics is most appropriate because this is the iteration where the distance from the central line is smallest across all four metrics (i.e., there is the smallest disagreement amongst the metrics where 1.0 is optimal coherence and the metrics where 0.0 is optimal coherence).

We produced the LDA model using the function LDA() from the R package topicmodels88. We set a low asymmetric alpha (1/1:k, where k = 13). We implement a low alpha because we are using short texts (abstracts and tweets) and asymmetric because we assume that there are dominant and niche topics in the dataset89. See Supplementary Figs. 611 for details of the model robustness checks. We validated our model choice with the use of manual topic intrusion validation using the R package oolong90. We created a Shiny app with an intrusion test which was completed by 6 third-party testers who were unaware of the aims of the research models. All three models (default, low symmetric alpha, low asymmetric alpha) performed well with human testers (p-values < 10−22), suggesting that the topics in each model version are coherent to human interpretation. The outcomes of these intrusion tests and their p values can be viewed in Supplementary Text Box 3 and the instructions provided to the participants are in Supplementary Text Box 4. We present the top ten keywords and their beta score for each of the 13 topics in the model output in Fig. 1.

An important limitation to the topic model method employed here is that it is a Bag of Words approach and therefore cannot take word order or modifiers (e.g., “not deep”) into account. It also, unlike approaches that use word embeddings, cannot consider (near-)synonyms separately. The key model output is the facet plot (Fig. 1) showing the beta score of the top ten words per topic.

Network of meaning

The network of meaning (Fig. 2) was produced by closely following the method presented in Su et al.24. However, where they used manual coders to train a topic model to identify substantive meaning in the texts, we followed the approach proposed by Nunez-Mir et al.19 and transformed the output of the topic model (the top ten keywords for each of the 13 topics) into a dictionary. We then apply this dictionary to the corpus. This produced a matrix of the occurrence of each topic in each text. There are 78 unique pairs of the 13 topics. We calculated how frequently a keyword from Topic A occurs in same text as a keyword from Topic B. We then transformed this table of co-occurrences into an edgelist as used in network analysis, where each topic is a node, and the number of co-occurrences of topics in texts produced the edges. To reduce noise in the visualisation (Fig. 2), we retained the topic pairs where the frequency of shared keywords was >30,000. We also normalised the edge width and node size to account for the different frequencies and lengths of the two text sources (i.e., we had many tweets with few words, and fewer abstracts with more words). A table with the frequency of co-occurrences of topic keywords in texts is provided in the Supplementary Material (Table 1). We visualised the network using the default settings in the plot.igraph() function of the R package iGraph91. This function selects the layout algorithm based on the input data for the graph, which in this case is the Fruchterman-Reingold algorithm.

Policy analysis

The 15 policy documents were loaded into R and transformed into a corpus. The dictionary of the top ten keywords of the 13 topics produced by the topic model was applied to this corpus and the number of matches per document was recorded. To search for particular keywords, we used the kwic() (keywords-in-context) function from the R package quanteda92.