Sustainable development goals as unifying narratives in large UK firms’ Twitter discussions

To achieve sustainable development worldwide, the United Nations set 17 Sustainable Development Goals (SDGs) for humanity to reach by 2030. Society is involved in the challenge, with firms playing a crucial role. Thus, a key question is to what extent firms engage with the SDGs. Efforts to map firms’ contributions have mainly focused on analysing companies’ reports based on limited samples and non-real-time data. We present a novel interdisciplinary approach based on analysing big data from an online social network (Twitter) with complex network methods from statistical physics. By doing so, we provide a comprehensive and nearly real-time picture of firms’ engagement with SDGs. Results show that: (1) SDGs themes tie conversations among major UK firms together; (2) the social dimension is predominant; (3) the attention to different SDGs themes varies depending on the community and sector firms belong to; (4) stakeholder engagement is higher on posts related to global challenges compared to general ones; (5) large UK companies and stakeholders generally behave differently from Italian ones. This paper provides theoretical contributions and practical implications relevant to firms, policymakers and management education. Most importantly, it provides a novel tool and a set of keywords to monitor the influence of the private sector on the implementation of the 2030 Agenda.


Introduction
Reaching sustainable development is an urgent need for humanity.The term was conceptualised back in 1987 as "the development that meets the needs of the present without compromising the ability of future generations to meet their own needs" 1 .Subsequent contributions from the United Nations (UN) continued to clarify it and to outline the dimensions of sustainable development 2 .The latest effort was made in 2015.On 27 September 2015, the UN established 17 Sustainable Development Goals (SDGs) and 169 targets to be reached by a joint effort from all members of society by 2030.The goals balance three dimensions of sustainable development (economic, social, and environmental) and encourage action in areas vital for humanity and the world.Firms are considered crucial development players in achieving the SDGs 3 , while the goals are coherent with the concept of Corporate Social Responsibility (CSR).In fact, the three dimensions into which the SDGs can be grouped are coherent with the three dimensions of CSR 4 : the social dimension (1-5, 10, 16, and 17), the economic (7-9, 11, and 12), and the environmental one (6, 13-15) 5 .The concept of CSR goes back to the 1950s and has been variously defined.A generally accepted definition refers to a company's relationships and responsibilities to society, regarded as the groups of stakeholders with which it interacts [6][7][8] .It comprises all firms' activities beyond what is required by law 9 .What CSR means in practice varies on the cultural and historical environment in which a company operates, and may also represent the difficulties that a company is dealing with at the time 8 .Despite being primarily a societal phenomenon, SDGs have the potential to significantly advance CSR research 10 , with CSR serving as a theoretical framework to examine how and to what degree businesses contribute to the SDGs.Stakeholder engagement is a related concept which has been differently defined and may be viewed under many different theoretical viewpoints.It has been conceptualised as "practices the organisation undertakes to involve stakeholders positively in organisational activities" 12 (p.315).Contributing to SDGs is a new challenge for companies worldwide 13 , which can significantly contribute to sustainable development 14 .One primary question is to what extent companies engage with global challenges, namely the SDGs.Papers have been developed with this aim, both conceptual and empirical.Please find a thorough discussion of the primary studies in the next section.The problem of capturing all firms' contributions to SDGs is still unsolved, as papers mostly focus on small samples that cannot give a whole picture of the phenomenon.Plus, most papers base their analysis on companies' reports, thus providing a non-real-time picture of companies' contributions.Our study contributes to tackling this issue.In this paper, we aim to answer the following broad research question: To what extent are businesses engaging with SDGs themes?.Understanding how businesses contribute to the SDGs is crucial for several reasons.First, we are now mid-way between when the SDGs were set (2015) and when they are aimed (2030).Enough time has passed to evaluate since the SDGs' establishment, while there is still room for improvement in future times.Second, businesses are crucial actors and, due to the urgent need to achieve sustainability worldwide 15 , it is essential to capture their engagement with global challenges, as defined by the SDGs 14 .Third, scholars believe that research in this area is still embryonic 16 .Fourth, it is essential to develop novel methods to describe firms' advancements towards SDGs with big data, a quick and low-cost tool 17,18 .Indeed, online social networks provide a new, underexploited tool to understand firms' challenges, CSR activities and stakeholder engagement 19 .In fact, in the last 15-20 years, online social networks have changed communication, making it cheaper and faster than before and providing a new channel for businesses to engage and directly interact with their stakeholders.They now represent a crucial means of disseminating firms' CSR activities and involving stakeholders.Online social networks also provide the tools to measure stakeholder engagement, assuming that the users belong to the firm's stakeholders 20 .Our research will also investigate stakeholder engagement with SDGs themes.
Legitimacy theory and stakeholder theory are the two primary approaches 21,22 that explain why companies are active in online social networks.On the one hand, legitimacy theory claims that businesses act following society's expectations and ideals.These are not constant and change across time and space.Although several scholars have related legitimacy theory to CSR, it is not necessarily restricted by CSR or stakeholder expectations.According to this perspective, firms use online social networks to justify their social position 23 .On the other hand, following stakeholder theory, firms should follow stakeholders' expectations to create long-term value.Consistently with this approach, firms utilise online social networks to communicate with their stakeholders and share their strategies and outcomes 24 .We base on Twitter, which is extensively used in online societal debates since its short messages are particularly suitable for fast communication, as breaking news or political slogans.In fact, it has been used extensively for investigating political debates in different countries and how they are affected by disinformation (see Ref.s [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41] for an incomplete, but almost exhaustive, review).The availability of detailed data permits a fine characterization of accounts and their engagement in discussions (see, for instance, Ref.s 25,32 ) and to highlight non-trivial structures and dynamics 27,29,[37][38][39] .It is an excellent source for investigation, together with its data availability, especially in countries, as the UK, where its adoption is particularly high (see the statistics in the following).In contrast to interviews and survey-based data, this method does not rely on response rates or an individual's desire to respond to get a bigger sample size.On Twitter, users (stakeholders) can retweet and like posts, which can be considered an endorsement of the message's substance 42,43 .For these reasons, we believe that online social networks, specifically Twitter, are suitable for this study.We focus on large firms, as they have a high social impact 44,45 and are eager to engage in social and environmental activities 24 , although other factors, as the firms' sector 46 and age 47 , have an impact.Compared to small and medium enterprises, they often have more stakeholders requiring information 44 .Thus, we focus on all large firms in one European country, the UK.As will be discussed in the following section, research on firms' accomplishments towards SDGs has been based chiefly on companies' reports.We propose a different approach, combining novel data sources for the management discipline and an interdisciplinary approach for the analysis.Our research is based on 5,859 accounts of large UK firms and 3.1M tweets posted between 2021/02/17 and 2022/02/17.These data are then analysed with complex network methods from statistical physics, showing the communities of discussion that naturally arise.As further discussed in the Concluding Remarks, one limitation of this approach is that it only focuses on the communication dimension.It is beyond the scope of this paper to check to what extent companies are tackling the themes they are discussing on Twitter.In addition to our primary research question, we compare our findings on the UK firms with analogous research investigating large Italian firms' Twitter discussion and CSR orientation.Developing from the social and institutional paradigm, we expect that firms belonging to countries with different institutions, cultures, and values show different behaviours 48,49 .
This article develops as follows.First, we provide a review on previous findings about firms' contributions to SDGs, providing an overview of the main contributions in the field.Then, we proceed with the results, describing firms' engagement with SDGs topics, the communities of discussions that arise and the engagement of stakeholders on these issues.We continue presenting some concluding remarks, contributions, practical implications and future research paths.Last, we detail our method.

Previous results
SDGs were established in September 2015 as global objectives for our societies.Instead of focusing on a macro-perspective 50 , this paper investigates the micro-dimensions, i.e. firms' contributions to global challenges.The UN explicitly indicates firms as key players in working towards the SDGs: "Private business activity, investment and innovation are major drivers of productivity, inclusive economic growth and job creation.We acknowledge the diversity of the private sector, ranging from micro-enterprises to cooperatives to multinationals.We call upon all businesses to apply their creativity and innovation to solving sustainable development challenges" 15 (p. 29).Being a recent phenomenon, research on the theme is at an embryonic state 16 .The first papers dealing with the relationships between firms and SDGs were primarily conceptual.For example, they suggested that managers integrate SDGs in their companies' communication, turn them into actions (tactical level) and consider them in their strategies to contribute to the global challenges 51 .Some other frameworks were developed to capture the alignment between existing activities and SDGs 2 .However, to what extent companies contribute to SDGs is not a straightforward issue.Each activity that companies engage with has the potential to have both positive and negative effects on SDGs.When setting up a strategy to positively impact specific SDGs, companies need to recognise the diversity of consequences their strategies could have on other SDGs 14 .Scholars have advanced operational frameworks to support companies in understanding their impacts on the various SDGs dimensions.These push firms to evaluate their impacts not only in "core activities" but also in the other dimensions of the business 52 .More broadly, empirical research is investigating the reasons why and factors that drive firms' SDGs adoption 53,54 , their challenges 55 , how SDGs are implemented in the firms' strategies and activities [56][57][58] or taught in management education within business schools 59 .A growing number of studies is focusing on how SDGs achievements are communicated 60 and reported 53,55,61 , also to map to what extent firms are contributing altogether to the global challenges 13,46,[61][62][63][64] .With this latter aim in mind, case studies 54-56, 58, 65 , and the analysis of websites 65 can also be found.The attempts to map companies' SDG contributions mostly use companies' reports.However, a series of caveats arise.First, (sustainability) reports are usually published once a year, making mapping companies' engagement with SDGs somewhat delayed compared to the periods when the contributions took place.Second, the small dimensions of samples can capture only a tiny proportion of the firms' contributions to SDGs.For example, Ref. 13 mapped firms' SDGs contributions and the factors that explain firms' involvement in SDGs based on their reports.As the authors only considered the companies' 2016 sustainability reports, the study provides a valuable but very early analysis of the phenomenon, as SDGs were established in September 2015.Also, the paper has a limited sample size, considering 408 organisations.Similarly, Ref. 63 investigate the SDGs adoption two years after their introduction, examining the sustainability reports of the 2000 largest stock-listed businesses worldwide, while Ref. 46 analysed a sample of 385 sustainability reports disclosed by 235 companies in the period 2016-2020.Overall, the three studies agree that overall SDG involvement is quite limited, but this could also be a consequence of the early timing of the studies' settings.Using empirical data about the CSR activities of the top 500 companies listed in the BSE (the largest stock exchange in India) between 2014-2016, Ref. 64 find that companies generally tend to carry out CSR activities aligned with SDGs from the social dimension.These results seem to contradict the part of the literature arguing that social sustainability has been overlooked across businesses and organizations, although it deserves more attention in practice and academic studies 66,67 .Differently, Ref. 62 find that companies from the Fortune's Change the World 2019 list concentrate on environmental and social areas.However, the sample they base on is quite limited, as their empirical analysis is founded on 50 companies and 40 usable reporting documents.Interestingly enough, based on a sample of 385 reports from 235 firms from all over the world, Ref. 46 showed that firms in Healthcare contribute more than other sectors to the SDGs.Though starting in the early years of SDGs adoption, Ref. 68 take a much broader approach, analysing 14,308 reports from 9,397 organisations between 2016 and 2017 from the GRI dataset.The first findings show that businesses seem to focus on specific SDGs covering the three dimensions of sustainability.The most common ones seem to be SDG 3 (good health and well-being), 4 (quality education), 9 (industry, innovation and infrastructure) and 12 (responsible consumption and production) 68 .With a focus on entrepreneurship, Ref. 69 innovates as far as the method, carrying out a semi-automated content analysis on web data on 588 new ventures in Germany in order to understand what role entrepreneurship plays in achieving the SDGs, finding that most activities align to social and economic goals, Again, though the method is innovative, the sample they base on is limited.As for now, research is overlooking the role of online social networks in understanding firms' engagement with SDGs.To the best of our knowledge, only a few papers explore to what extent firms discuss SDGs themes on online social networks.They take precise approaches, considering firms in the FinTech area 70 or CEOs' accounts 71 .Regarding stakeholder engagement, while recent research on sustainability reporting finds a low level of stakeholder engagement disclosures 72 , studies based on online social networks highlight that stakeholders variously interact on these themes [73][74][75] .As for now, it seems that SDGs have limited relevance in the online debate 70 .Also, there seems to be a limited involvement of stakeholders on SDGs posts, consistently with similar studies on CSR 23 .However, present studies use limited samples or focus on specific industries.Considering the existing studies on the topic, we believe online social networks are promising tools to map companies' engagement with SDGs in real-time.We aim to contribute to this issue by investigating to what extent firms are engaging with SDGs themes on Twitter.

Data description
In order to highlight firms' engagement with SDGs, we first downloaded from Orbis (Bureau Van Dijk, https://www.bvdinfo.com/)the primary information regarding large companies (i.e.those with more than 250 employees), such as the name, address, number of employees, total assets, NACE code, and the website.Then, we automatically extracted the Twitter account of the related firm from each website, if present.Please find more details about the automatic Twitter account search in the Methods section.We found that Twitter is an excellent tool for this analysis, as it is the most widely adopted online social network by UK large firms.As Fig. 1 shows, nearly 87.3% of the largest UK firms have a Twitter account, overcoming other online social platforms.
Finally, we downloaded the timeline of each Twitter account using the official API (specifically, using the command: GET /2/users/:id/tweets) via tweepy python wrapper.Doing so allows us to access the most recent ∼ 3200 messages.We focused on the period between 2021/02/17 and 2022/02/17.As in Ref. 19 , we consider active accounts in the entire period (and not just a portion).While this choice may appear too conservative, it allows concentrating our efforts on subjects that have continuously contributed to creating a shared narrative.Using this time restriction, we ended up with 3.1M tweets, out of which 596k retweets and 609k replies.As we focus on the interaction between Twitter accounts and hashtags, we excluded 318 accounts out of 6179 of the original dataset since they did not use any hashtag in the considered period.
As a first step, we measured the recurrence of SDG hashtags in our dataset, i.e. count how many messages contain hashtags related to SDGs, see Fig. 2. SDGs are a crucial topic in firms' communication: each SDG hashtag appears, on average, in 99.01 messages, against the 7.56 of the average hashtag in our dataset.

The validated network of Twitter accounts
To study how UK companies contribute to the evolution of common narratives, we considered the bipartite network composed of the firms' Twitter accounts and their hashtags.In a bipartite network, nodes are divided into two sets, called layers, and links can only connect nodes from different layers.In our application, the two layers include 5,859 accounts and 136,504 different hashtags.To highlight those accounts using the same hashtags, as in Ref. 19 , we use the validation projection procedure proposed in Ref. 76 .In a nutshell, any couple of accounts are connected if the number of hashtags they used is statistically significant (i.e. it cannot be explained by their hashtag usage and the popularity of the various hashtags).Please find more details in the Methods section.The result of this validation projection is a monopartite network of Twitter 3,629 accounts and 59,158 links.The relative Largest Connected Component (LCC) is represented in Fig. 3.
Before describing the network and its structure, we highlight a few remarks.First, the percentage of validated UK Twitter account nodes is 61.9%.This percentage indicates the fraction of accounts whose usage of hashtags differs substantially from a random behaviour and whose communication strategy presents significant similarities with other accounts.In this sense, a low frequency of validated nodes can mean that many accounts focus on the peculiarities of their communication.In contrast, a high frequency of validated nodes can mean that the communication is more homogeneous and strongly related to common narratives.Instead, the percentage of validated large Italian firms' Twitter accounts was only 19.2%.Second, SDGs are among the subjects contributing the most to developing common narratives.Validated users (those passing Distribution of the number of messages per SDG hashtag.The boxplots compare the distribution of the number of messages in which each hashtag appears for all hashtags (the grey box on the left) and SDG hashtags (all the boxes beyond the red line.Boxes are colored using the official indication from the United Nations, https://www.un.org/sustainabledevelopment/wp-content/uploads/2019/01/SDG_Guidelines_AUG_2019_Final.pdf) and; for all the SDGs hashtags (the sky blue box on the left).The boxplots show the distribution of the logarithm of the number of messages per hashtags, since the distributions are heavy-tailed.In this sense, boxplots may not be the perfect tool for capturing the distribution properties but can effectively deliver the message about the rough differences among the various distributions.In particular, SDGs hashtags appear more frequently than "standard" hashtags in the communication strategy of large firms, thus representing crucial topics.
the validation procedure described above) contribute with no less than 85% of the SDGs hashtags of the entire dataset; see Table 1.Otherwise stated, most Twitter accounts using SDGs in their communications pass our filter.This result is remarkable since the validation procedure of Ref. 76

Description of the communities of Twitter accounts
To extract more information, we ran the Louvain community detection algorithm 77 on the validated network of firms, highlighting four main communities displayed in the left panel of Fig. 3.The rationale is to find groups of firms contributing to the same common narrative, as captured by hashtags.Rerunning the same Louvain community detection algorithm inside each community shows a more detailed description, which is represented in the right panel of Fig. 3.The communities in Fig. 3 mostly revolve around social themes, showing that CSR themes are indeed fundamental in firms' communication on Twitter, consistently with 19 while contradicting 66,67 .Communities are generally coherent with the sector (i.e. the economic activity) the firms belong to, as captured by NACE code Rev.2 at 1 digit (see Fig. 4; the description of the various codes can be found in Table 3).This coherence reflects the themes discussed: the most addressed CSR themes are the ones closest to the firms' sector, as represented in Fig. 5.This confirms that CSR changes according to the specific context 8 .Moreover, we show that the social dimension appears more critical than the environmental one.Although this result contrasts with most previous literature 78 , it seems in line with more recent findings 19 .Community Cyan is a sort of exception among the various groups, as it comprises three main sectors (Professional, scientific and technical activities; Information and communication; Manufacturing).Its top hashtags reflect digital innovation, environmental sustainability, social and economic themes (see Table 4).They are coherent with the wide range of SDGs mentioned (SDG10 refers to the social dimension; SDG9 to the economic one; SDG12 and SDG13 to the environmental dimension).The other communities revolve around social themes and hashtags.Community Orange-red is composed of firms from two sectors (Human health and social work activities; Education) and focuses on social themes (see Tables 5 and 6).Coherently, hashtags relate to SDGs from the social dimension, namely SDG3, SDG5 and SDG10.Community Yellow comprises firms from one sector (Education) and mainly discusses social themes (see Table 7).The most mentioned SDGs come from the social dimension: SDG1, SDG3, SDG4, SDG5, SDG10, and SDG16.Similarly, Orchid has companies from only one sector (Human health and social work activities).It is focused on social themes as well (see Table 8).As community Yellow, its SDGs belong to the social dimension: SDG3, SDG5, and SDG10.

Hashtag frequency
In this subsection, we will focus on the four major communities, i.e.Cyan, Orange-red, Yellow and Orchid.For each community, we show the most recurring hashtags in the biggest subcommunities (with more than 50 nodes), which reflect the main themes  3 in the Methods section.To enhance clarity, we anticipate here the most present sectors, which are: C) Manufacturing; G) Wholesale and retail trade; repair of motor vehicles and motorcycles; J) Information and communication; K) Financial and insurance activities; M) Professional, scientific and technical activities; N) Administrative and support service activities; P) Education; Q) Human health and social work activities.
that businesses discuss.The following analysis is based on the results summarised in Tables 4, 5, 6, 7 and 8.
Compared to the Cyan community, the themes in the Orange-red, Yellow and Orchid show more homogeneity.The Orangered community is highly focused on social themes.Among the seven subcommunities, they all show hashtags related to social themes.Six of them have social themes as the prevalent ones within the subcommunity.For example, subcommunity n. 1, which has a high prevalence of hashtags related to the social dimension, has "internationalwomensday", "mentalhealthawarenessweek", "pridemonth", "blackhistorymonth" among the most frequent hashtags.Something similar happens with subcommunities n. 2, 3, 4 and 5. Conversely, the subcommunity n. 0, while having a few hashtags related to social themes, is more focused on festivities (e.g."Christmas", "valentinesday", "halloween").
Community Yellow discusses social themes in all three subcommunities, which are mostly homogeneous.For example, all three subcommunities mention "mentalhealthawarenessweek" and "internationalwomensday" among their top 10 hashtags, with some differences in other hashtags.Only subcommunity n. 3 comprises themes related to engineering education (including hashtags like "engineering", "education", "construction", "apprenticeship").
Community Orchid is focused on social themes as well.It includes Covid themes among all its subcommunities: in this case, the social dimension is connected to the pandemic.For example, all four subcommunities associate "covid" and "covidvaccine" with "nhs", "internationalwomensday", and "mentallhealthawarenessweek".Overall, the hashtag "covid" is generally found in many subcommunities , but it is not associated with other related words.It has a higher relevance only in the Orchid community.

Stakeholder engagement
This subsection focuses on stakeholder engagement with the narratives developed by firms' accounts.First, the dataset shows that the average number of retweets and likes per hashtag is 15.85 and 23.16.This shows that UK firms' stakeholders tend to use more likes than retweets when interacting on Twitter.However, further analyses reveal that this pattern is the opposite when stakeholders interact with companies on SDGs subjects.As Fig. 6 shows, when stakeholders interact with SDGs hashtags, they put a lower number of likes but retweet more than they do with non-SDG hashtags.The average numbers of likes and retweets per SDG hashtag are 6.71 and 19.83, highlighting a higher engagement on SDGs themes.We also highlight that stakeholder engagement with large companies in the UK is different compared to Italy 19 , where the average number of retweets and likes per hashtag were 5.39 and 14.83.
Stakeholder engagement on the various SDGs depends on the community and sector.For example, in community Orangered, SDG5 and SDG16 hashtags (i.e.'Gender Equality' and 'Peace, Justice, and Strong Institutions') received more retweets, on average, than other SDG hashtags, and more than random hashtags.This community mainly comprises firms in sectors 'P' and 'Q' ('Education' and 'Human health and social work activities').Analogous considerations can be done for the other communities.Doing so, our findings seem to contradict 23,72 and integrate [73][74][75] , showing a higher involvement of stakeholders in SDGs themes compared to non-SDGs related Twitter posts.

Concluding remarks
This paper presents large UK firms' discussions on Twitter, specifically focusing on SDGs.It shows that: 1) SDGs are the themes that unite firms' discussions; 2) the social dimension is prevalent, compared to the environmental and economic ones; 3) the interest in specific SDGs depends on the community and sector a firm belongs to; 4) stakeholders are highly engaged on SDGs themes, using more retweets than likes when interacting with a tweet that contains an SDG-related hashtag; 5) overall, large UK firms and stakeholders show substantially different behaviours compared to the Italian ones.We will discuss these points in the following paragraph.First, communities of discussion naturally arise from the data.These communities are uniform and based on common narratives.Most importantly, the shared narratives are centred around SDG themes.Large UK firms use Twitter to participate in broader discussions on widely acknowledged themes (such as "internationalwomensday"). Thus, we believe that our results for the UK support stakeholder theory: large firms use Twitter to engage in discussions on highly socially relevant themes.This finding The boxplots show the distribution of the logarithm of the number of likes and retweets.We used the logarithms because the distributions are heavy-tailed.In this sense, boxplots may not be the perfect tool for capturing the distribution properties but can effectively deliver the message about the rough differences among the various distributions.
gives a different perspective compared to previous research, which states that CSR themes are overlooked by firms in their communications on online social networks 19,23,79,80 and that companies are scarcely involved in the SDGs 13,46,63 .While not contradicting these previous studies, we show that SDG themes unify the firms' discussions, creating different communities in the UK debate on Twitter.In doing so, we show that SDGs themes indeed entered the firms' communication level 51 .We also highlight the importance of integrating different methodologies into business research, uncovering patterns that would not show using traditional methods 81 .Second, the recurrent themes in the communities mainly focus on the social dimension, with discussions on environmental and economic themes that are present but less relevant.Our findings oppose traditional CSR literature, which maintains that environmental themes are the primary dimension 78 .However, they seem consistent with more recent studies on SDGs, finding either that companies are focused on the social dimension 64 or that they do not concentrate on a single dimension only 46,62,68,69 .Third, we highlight that the interest in SDGs depends on the community a firm belongs to, and the community mostly depends on the firm's sector.This is consistent with previous works about SDGs, which argue that the interest in SDGs depends on the sector the firm belongs to 46,68 .It is also consistent with previous findings about large Italian firms discussing CSR themes 19 .Both in the UK and Italy, large firms' dialogue largely depends on the sectors to which they belong.Only community Cyan discusses the themes shared by large Italian firms 19 (i.e. the digital transformation, environmental sustainability, Covid and the economic dimension).Thus, UK firms' behaviour appears to be substantially different from the one of large Italian firms, as described in Ref. 19 .Fourth, results highlight stakeholder engagement with retweets is higher on SDG-related tweets than on general tweets.As retweets are a more significant endorsement of the author of the post 42,82,83 , the higher number of retweets on SDGs themes highlights a more significant engagement with the global challenges.Overall, our results provide a map of large UK firms' engagement with themes related to SDGs.The results also show a different use of Twitter by UK firms compared to the Italian ones.Thus, we highlight that consistently with institutional theory 84 , different institutional and cultural settings translate into different behaviours, including corporate communications on online social networks.These differences are not limited to localized behaviours.Instead, they appear to relate to the fundamental reasons why companies interact in online social networks, highlighting that the results support different theories in the two countries.

9/22
Contributions, practical implications and future research paths Our paper brings several contributions.First, we contribute to institutional 84 , stakeholder and legitimacy theories 21,22 , explaining UK firms' attitudes on Twitter and comparing them with Italian firms'.We argue that stakeholder and legitimacy theories coexist and explain firms' behaviours on Twitter in the two countries.Following the institutional theory, we believe that different institutional settings, values and cultures explain these different behaviours and their reasons.
Second, we answer previous calls to map firms' contributions to the SDGs 17,18 .As research about firms and SDGs is still at an embryonic state 16 , we contribute to advance preliminary studies 73 with an interdisciplinary approach on a wide dataset, also providing a novel set of keywords to detect firms' engagement with SDGs on Twitter.Our research highlights that Twitter posts concerning SDGs themes unify the firms, naturally creating discussion communities.It also shows the prevalence of the social dimension, as opposed to the environmental and economic one, and a higher engagement of stakeholders on these themes compared to general posts.This paper brings several practical implications which might be relevant for firms, policymakers and in management education.
In short, it provides a novel tool to monitor the influence of the private sector on the implementation of the 2030 Agenda.Knowing what businesses are doing and how they are engaging in global challenges in real time is essential to handle any issues promptly and effectively.On the one hand, managers could use online social network data to understand what their competitors and firms from other sectors contribute to regarding SDGs.On the other hand, policymakers could gain an advantage from the timeliness of using big data to capture firms' engagement in sustainable development objectives.Compared to the analysis of corporate reports, widely used in the academic literature, our research proposes a novel and nearly real-time approach to monitoring firms' engagement with the SDGs.Plus, it can capture a high number of firms at the same time.Thus, our approach could serve as an additional tool for policymakers to monitor SDG progress, with a chance to develop real-time policies to improve the firms' engagement with global objectives.Additionally, this research might be of interest to business schools.As proposed by Ref. 59 , business schools must educate future managers with a holistic perspective, integrating sustainable management education with interdisciplinary approaches.Our paper could serve as a tool to increase awareness of the different methodologies a (sustainability) manager can use to understand the environment.This paper has several limitations that open new paths for future research.The first limitation is the time frame considered, which is a specific year (from 2021/02/17 to 2022/02/17).As SDGs are an evolving phenomenon, it would be interesting to go back in time and check if and how this trend increased in the past years.Also, according to the UN, SDGs should be reached by 2030.A future study could investigate how companies' engagement with SDGs had changed over the whole 15-years period.
The second limitation is geographical.Our results are based on one country, the UK.They are consistent with Ref. 19 , but contradict Ref. 78 .Thus, it would be interesting to check if the higher interest we found in social themes rather than environmental ones holds for other contexts.Future research should investigate to what extent firms discuss the social and environmental dimensions on online social networks on a broader scale.This kind of research could substantially contribute to the academic literature focused on the perceptions of the responsibilities of businesses towards society 78 .Third, while we can assume that online communication reflects the firms' strategies and activities 85 , we do not have enough data to claim that companies are actually pursuing the SDGs they are discussing in online social networks.Further research could dig deeper into this issue to unravel how firms' communication about SDG themes is consistent with their real-world activities on global challenges.Fourth, we acknowledge that what companies communicate on Twitter does not represent the whole picture of their contributions to SDGs.A company might focus on sustainable activities in its core business while negatively impacting some SDGs with its ancillary activities 14,52 .While this paper aims at mapping companies' engagement towards SDGs, a limitation is that our method tends to map the positive contributions companies declare on online social networks while not revealing the negative impacts that might arise.Fifth, our paper considers a specific category of stakeholders, namely, online Twitter users.While our measure of stakeholder engagement is generally regarded as appropriate 23 , it should be noted that a firm's stakeholders are various and beyond online users.Stakeholders might engage in a business' activities offline.Thus, our measure of stakeholder engagement reflects only a part of the phenomenon.Last, our research is only based on large firms and does not consider small and medium-sized enterprises' (SMEs) contributions.Although large firms have a higher social impact 44,45 and engage more in sustainable activities compared to SMEs 24,46 , SMEs represent the majority of firms in Europe.Future studies should try to capture SMEs' engagement with SDGs, though facing the difficult challenge of data availability.

From websites to Twitter accounts
As a first step, we downloaded companies' websites from Orbis and, automatically accessing them, we got the relative Twitter accounts, when present.In order to test our scraping algorithm, we took a sample of 100 websites and manually extracted the Twitter accounts from them.Then, if both the human and the scraping algorithm agree on the account, we assign a true positive (TP) to each of them.If they 10/22 cannot find any Twitter account, we assign a true negative (TN) to both.In other cases ( i.e., when one finds an account and the other does not, or if they disagree on the account found), we manually checked directly from Twitter which method returns the correct answer.Even if the scraping algorithm performances are not astonishing, they overcome the human ones.Please find the performances of the automated tool for getting the Twitter accounts in Table 2

92.6%
89.3% Table 2. Scraping algorithm performances vs. human annotation; best performances in bold.Machine performances always overcome human ones.From the data above, the primary human and machine problem seems to be missing the existing Twitter accounts (low sensibility).At the same time, if they find anything, they get the right one in most cases (high precision).
The high specificity tells us that both the human and the machine can spot when the Twitter account is absent.

Hashtag data cleaning: edit distance
In order to properly consider the issue of misspelt hashtags or to consider as a single word singular and plural nouns, we used edit distance, as implemented by the py_stringmatching python module 86 .In order to obtain the most effective threshold, we randomly picked couples of keywords and selected the first 100 couples with an edit similarity score greater than 0.8.Then, we manually checked when the hashtags effectively represent different words or if they refer to the same concept.For this 100 couples' sample, we calculated the precision and the accuracy for the various values of the threshold (sensibility and specificity are trivial and do not carry any relevant information): the most effective threshold for edit similarity is 0.86.

Hashtag cleaning
A significant part of hashtags refers to acronyms, and comparisons among them may cause false matches between unrelated terms.Thus, we first removed digits from the hashtags, except for 'Euro2020', since it is an event that was central in the UK in the analysed period.Removing digits would have introduced mismatches and errors.Then we turned all hashtags to lower cases and considered their frequencies.In principle, using edit distance for hashtag cleaning, we should have compared all couple of hashtags, thus performing O(N 2 ) tests.
In order to limit the efforts dedicated to hashtag cleaning to O(N), we implemented the following procedure.First, we selected all hashtags appearing in the dataset more than 50 times, resulting in 922 different hashtags.In this 'benchmark set', we first select all couples of words displaying an edit distance greater than the edit threshold of 0.86.Among those, we choose the less frequent hashtag for every couple and remove it from the benchmark set, resulting in a total of 916 different hashtags.We finally compared all hashtags with the ones in the benchmark set.All hashtags that displayed an edit similarity greater than the threshold with another hashtag in the benchmark set were then substituted with their more frequent partner.After the cleaning, we have 136,504 different hashtags.

Bipartite Configuration Model analysis
After the cleaning, we build a bipartite network in which the two layers represent respectively firms' Twitter accounts and the used hashtags, as in Ref. 19 .The two layers includes respectively 5,859 accounts and 136,504 different hashtags.
In order to have a proper benchmark for our analyses, we leverage on the Bipartite Configuration Model (BiCM, 87 ), i.e. the extension to bipartite networks of the entropy-based null-models reviewed in Ref. 88 .In a nutshell, the procedure is based on 3 main steps.First, we define an ensemble of (bipartite) networks, all having the same number of nodes per layer as in the real systems, but displaying all possible edge configurations, from the empty graph to the fully connected one.We then maximise the Shannon entropy associated to the ensemble, constraining some topological quantities of the network 89 (this approach replicate the approach of Jaynes for deriving Statistical Physics from Information theory 90 ).In particular in the Bipartite Configuration Model, we constrain the average (over the ensemble) degree sequences for both layers to the values observed in the real system.Finally, in order to obtain the numerical value of the related Lagrangian multipliers, we maximize the Likelihood of the real system, i.e. the probability, according to our null-model, of getting the observed network 91 .
Using the present procedure, we are getting a benchmark that is maximally random (due to the entropy maximization), but still tailored on the real system (due to fixing the degree sequences to one observed in the real network).In the following, we will first introduce briefly the formalism, then the Bipartite Configuration Model and, finally, its application for the validation of the co-occurrences.

Formalism
Let us call and ⊥ the two layers of the bipartite network and use Latin and Greek indices to indicate elements in the respective sets; we indicate with N and N ⊥ , respectively, the dimension of the two layers.The biadjacency matrix B associated to the bipartite network is a N × N ⊥ matrix whose generic entry b iα is either 1 or 0 if either there is or there is not a link connecting node i with node α.Therefore the degree sequences for both layers read k i = ∑ α b iα ∀i ∈ N and h α = ∑ i b iα ∀α ∈ N ⊥ .

The Bipartite Configuration Model
Let us call G Bi the bipartite networks' ensemble containing all possible graphs in which the dimension of the layers are respectively N and N ⊥ .If S = − ∑ G Bi ∈G Bi P(G Bi ) ln P(G Bi ) is the Shannon entropy, its maximization, constraining the average degree sequence on both layers, is equivalent to the maximization of S defined as where quantities with an asterisk * represent the values observed in the real network and η i , θ α and ζ are the Lagrangian multipliers associated, respectively, to the degree sequence of layer , to the degree sequence of layer ⊥ and to the normalization of the probability P(G Bi ).The maximization of the S returns the functional form of the probability per graph P(G Bi ) in terms of the Lagrangian multipliers: 1 + e −(η i +θ α ) .
Therefore, P(G Bi ) can be interpreted as the product of independent probability p iα = e −(η i +θα ) 1+e −(η i +θα ) of connecting node i with node α.In order to get the numerical values of Lagrangian multipliers η i and θ α , we can maximise the Likelihood associated to the observed network: it can be shown (see Ref. 91 ) that it is equivalent to setting

Validated projection of bipartite networks
Using the Configuration model defined in the previous subsection, it is possible to validate the projection of the bipartite network on one of the two layers.This procedure aims at stating the statistical significance of the co-occurrences observed in real systems.Consider, for instance a couple of nodes (i, j) belonging to layer: the probability that they both link node α ∈ ⊥ is where V i j α is the event "both i and j are linked to α" and p iα is the probability of connecting nodes i and α.Using Eq. 1, we can calculate the probability that the total number of co-occurrences between i and j is exactly n as the sum of the contributions from all possible ways to choose n nodes in ⊥ layer.If we call A n this last quantity, the probability of observing Since, in principle, every p iα is different, the distribution described by Eq. 2 is a sequence of Bernoulli events, each with different probability and equal to the one expressed in Eq. 1 and takes the name of Poisson-Binomial distribution 92 .
Once we have the BiCM distribution for the number of co-occurrences between nodes i and j, we can then calculate the statistical significance of the observed V * i j via the p-value, i.e.

12/22
Iterating the calculation of Eq. 3 for every couple of nodes belonging to the layer results in N 2 p-values; to state the statistical significance of each of them, it is necessary to adopt a multiple hypothesis testing correction.In particular, the False Discovery Rate (FDR, 93 ) is particularly effective since it permits to control the false positives rate.
The procedure described in this subsection was developed in Ref. 76 .For the actual implementation, we used bicm python module, available on pypi and described as part of NEMtropy package, in Ref. 94 .

Community detection on the validated projection network
The choice of the community detection algorithm is not a trivial one, see for instance 95,96 .In this case, we used Louvain as a descriptive method (using Peixoto's jargon 96 ), since we intended to describe the mesoscale structure of the validated network of firms.Since Louvain is known to be node-order dependent 95 , we run the algorithm 1000 times after changing the order of the nodes, finally accepting the partition displaying the greatest value of the modularity.For completeness, we compared the Louvain partition with the ones coming from other models.In particular, we analysed the results from Infomap 97 and WalkTrap 98 , since they are built on a different rationale than the modularity used in Louvain 95 : for all these algorithms, we used the implementations present in the python module python-igraph.Both algorithms return communities that are, on average, much smaller that the ones returned by Louvain.Moreover, the communities of InfoMap and WalkTrap are nearly completely embedded in the Louvain ones: on average, 82.52% (99.14%) of nodes in each InfoMap (WalkTrap) community belong to the same Louvain community.In a way, those methods are describing smaller structures than the ones captured by Louvain, that nevertheless, includes them.In our description, we intend to focus on particularly dense communities, since they represent groups of firms whose social communication is indeed similar: a modularity-based algorithm fits with our aims..

Relating hashtags to SDGs
To identify the SDG subjects that UK companies talk about, we used a threefold approach to have a good cover tailored to the available data set.As a list of SDGs-related keywords suitable for online social network searches does not exist, we had to create one.Considering the many attempts to map academic articles' contributions to SDGs, we first started from the list of the University of Auckland (available here), which is used in business research 100 and consider the presence of words in our data set.This list mainly refers to keywords used in research papers in Elsevier's Scopus database, while in the present dataset, we are referring to Twitter's hashtags.In this sense, we gather multiple words in a single keyword, as it is customary for hashtags: for instance, "Child Labor Laws" became "childlaborlaws".Sometimes, the keywords were annotated under more than a single SDG: we disambiguated the multiple identifications manually, focusing on the main target of the various SDGs.At this level, the identified SDG keywords represented less than 0.52%, i.e. quite a limited coverage.Since we were not aware if the limited coverage of SDG subjects was due to short attention to those arguments or not effective identification of SDG hashtags, we manually annotated the hashtags among the 300 most frequent ones related to an SDG.The two authors independently performed the identification and agreed on 86.3% of the annotations; when they did not agree on the hashtag categorisation, they discussed each hashtag and finally attributed an SDG when they reached an agreement.Using this approach, we reached the 0.60% of all hashtags used by accounts in the validated projection of Fig. 3.
To further enlarge the SDG covering, we used a network approach.Using the bipartite representation of accounts and hashtags already used to obtain the validated projection on the account layer, we projected the network on the layer of hashtags, using the technique described in the subsection above and introduced in Ref. 76 .We remind the reader that in the validated projection, two nodes are present if they share a significant number of nearest neighbours in their bipartite representation.In this paper, two hashtags are connected in the validated projected network if they were both used by a significant number of different users.In this sense, a link in this network represents a non-trivial measure of similarity in how the various Twitter accounts use hashtags.Some might argue that we are interested in hashtags appearing in the same messages.This point is debatable: an account interested in subjects related to, for example, SDG3 may use some of them related to different facets of SDG3 in different messages and focusing on hashtags used in the same messages will miss this information.Moreover, we avoid the risk of validating too many close hashtags since the procedure defined in 76 is highly restrictive.For instance, in the hashtag-validated projected network, the link density is extremely low, i.e. 0.09%.Nevertheless, even in this case, we had to check the "automatic" annotation manually: in fact, the (validated) link between two hashtags may be due to a different reason than the adherence to the aims of the SDG: for instance, the keyword #worldengineeringday is connected to only the hashtag #inwed, i.e. the acronym for the International Women in Engineering Day, but it is not necessarily related to Gender Inequality (SDG5), as its neighbour.In a way, the validated network represents a hint to spot possible SDG hashtags related to the already labelled ones.Moreover, it permits spotting SDG hashtags specific to the current data set.It is the case, for instance, of hashtags of the various campaigns of the National Health Systems (all of them have been classified in SDG3) that are not general or the ones related to the Covid19 vaccination.We remark that in the validated network, the SDG hashtags represent a greater percentage (8%), signalling that there is collective attention of company accounts on the various subjects.
We focused on all hashtags that were not assigned an SDG that have at least an SDG hashtag among their neighbours since we expect that the former hashtags are related to the SDG of their neighbours.To be more restrictive, we focus on hashtags whose neighbours that were assigned an SDG represented more than half of their degree.Then they were assigned the most frequent SDG in their neighbours.The association was later manually checked to manage the case of ties in the SDGs in the neighbours, resulting in 146 newly annotated hashtags.The annotated hashtags now represent 0.68% of all hashtags in the data set.

Figure 1 .
Figure 1.Online social networks adoption among large firms in the UK.Twitter is the most popular online social network for large firms in the UK, overcoming the other ones.

Figure 3 .
Figure 3.The Largest Connected Component of the validated projected network of users.The dimension of the nodes is proportional to the logarithm of their degree.In the left panel, the different node colors represent the various communities, as detected by Louvain algorithm; in the right panel the colors identify the subcommunities (nodes that do not fall in one of the main subcommunities are plotted in white).Please note that the contents of the various subcommunities are indicated in Tables 4,5,6, 7 and 8.

Figure 4 .
Figure 4.The frequency of NACE Rev.2 (1 digit) sectors among the firms of the four greatest communities in the validated network of Fig. 3. Please find the number of firms on the vertical axis; the colors of the different bar charts are the same as in the left panel of Fig. 3.The identification of the various NACE is in Table3in the Methods section.To enhance clarity, we anticipate here the most present sectors, which are: C) Manufacturing; G) Wholesale and retail trade; repair of motor vehicles and motorcycles; J) Information and communication; K) Financial and insurance activities; M) Professional, scientific and technical activities; N) Administrative and support service activities; P) Education; Q) Human health and social work activities.

Figure 5 .
Figure 5.The SDG activity of the four greatest communities in the validated network of Fig. 3.The colors of the different bar charts are the same as in the left panel of Fig. 3.The identification of the various hashtags with the different SDGs is described in detail in the Methods section.

Figure 6 .
Figure 6.Stakeholder engagement on the SDGs.These boxplots compare the distribution of the number of likes and retweets for all hashtags (the gray box on the left), for each SDG (all the boxes beyond the red line; boxes are colored using the official indication from UN, https://www.un.org/sustainabledevelopment/wp-content/uploads/ 2019/01/SDG_Guidelines_AUG_2019_Final.pdf ), and for all the SDGs hashtags (the sky blue box on the left).The boxplots show the distribution of the logarithm of the number of likes and retweets.We used the logarithms because the distributions are heavy-tailed.In this sense, boxplots may not be the perfect tool for capturing the distribution properties but can effectively deliver the message about the rough differences among the various distributions.

Table 1 .
SDG hashtags used by validated users over their usage in the entire dataset.These high percentages indicate that most active users using SDG hashtags are validated by the filtering procedure.In turn, it implies that SDGs are among the main subjects shaping the various common narratives of large firm's accounts on Twitter.
is restrictive, as tested in different contexts: such a strong signal indicates a non-trivial activity on SDG communication. .

Table 3 .
NACE Rev.2, main division The description of the categories was taken from Ref.