Introduction

Investments in sustainability are becoming paramount as many companies are under constant pressure to reduce the social and environmental impact of their operations (Bai and Sarkis, 2020) and increase accountability toward stakeholders and the wider society (de Ruyter et al., 2022; Serafeim, 2020; Wang et al., 2022). While corporate sustainability efforts tend to focus primarily on external stakeholders (e.g., customers, supply-chain partners, governmental organizations; Gonzalez-Arcos et al., 2021), internal stakeholders (e.g., employees) represent a critical, and sometimes overlooked, target group to ensure effective corporate engagement with the sustainability agenda (Chatzopoulou et al., 2022; Martín-de Castro, 2021; Paine, 2014).

Internal sustainability efforts (ISEs) encompass a wide range of corporate policies directed towards internal stakeholders, including, for example, promoting a healthy employee work–life balance (Kelliher et al., 2019), investing in gender equality and diversity (Nadeem et al., 2017), and ensuring a harassment-free working environment (Cassino and Besen-Cassino, 2019). These ISEs can reduce staff turnover (Giauque et al., 2019) and improve market competitiveness (Wang and Verma, 2012). However, although many companies openly advertise their commitment to internal sustainability, employees often report contrasting accounts of their experience of such efforts (Peloza and Shang, 2011), and the extent to which ISEs successfully propagate throughout the organization remains unclear.

We partly tackle those issues by running a large-scale assessment of organizational practices aligned with ISEs. Since there exists no agreed-upon definition of corporate ISEs, to inform our research we started from the United Nations (UN) World Commission on Environment and Development (WCED) definition of sustainability as a strategy oriented towards “meeting the needs of the present without compromising the ability of future generations to meet their own needs” (WCED, 1987). This definition is operationalized in the UN 17 Sustainable Development Goals (SDGs), which represent both a framework and a call-to-action for organizations to invest in addressing critical societal issues such as “good health and well-being”, “decent work and economic growth”, and “peace, justice and strong institutions” (Nations, 2015). Not all 17 UN SDGs are relevant to a company’s internal stakeholders (this is the case, for example, for SDG “life under water”). To identify the relevant ones and sharpen their definitions in the internal corporate context, we developed and validated a mixed-method approach that ended up paraphrasing the broad UN SDGs into six corporate-relevant ISEs. These ISEs concerned health, education, diversity, monetary benefits, infrastructure, and atmosphere (Fig. 1). Core to the approach is a state-of-the-art Natural Language Processing (NLP) framework that processed more than 350K geo-referenced reviews about 104 S&P 500 companies.

Fig. 1: The wheel of internal sustainability efforts (ISEs).
figure 1

The wheel includes the two macro-categories (financial benefits vs. staff welfare) under which the six ISEs are classified. The wheel’s outer layer reports keywords representative of each ISE.

Data

Our aim was to understand and capture the microfoundations of ISEs; we did so in a bottom-up fashion, starting from the perspectives of employees. More specifically, we collected data from a popular company reviewing the site, where current and, more likely, former employees write reviews about their own corporate experiences, ranging from job interviews to salaries to workplace culture. These reviews have been recently used in studies exploring corporate culture at scale (Das Swain et al., 2020). As of 2021, there are 50M monthly visitors on the platform, and 70M reviews of 1.3M companies. To ensure quality reviews, the site: a) performs both automatic and manual content moderation; b) allows for full access to content only to users who register on the site and write at least one review each (encouraging neutral and unbiased reviews); and c) allows for posting maximum one review per employee per year. Our dataset consisted of reviews published over twelve years, from 2008 to 2020.

Each review consists of a title; a ‘pro’ portion (i.e., positive aspects of the company); a ‘con’ portion (i.e., its negative aspects); a set of four ratings on a [0,5] scale scoring the company’s balance, career, culture, and management; and a final overall rating of the company. Since reviewers have the option to include their location, we were able to identify the states for part of the reviews. To ensure the robustness of our text processing method, we retained companies that had at least 1000 reviews and were present in at least 10 states, leaving us with a dataset of 358,527 reviews of 104 US-based companies (which represented 88.7% of the original dataset); 80% of these are S&P 500. As detailed in Supplementary Information, these 104 companies offer the same level of representativeness as the S&P 500 companies, in terms of the distribution of industry sectors and the geographic distribution across states. In addition to the reviews, we collected yearly stock growth values of the 104 companies from the Yahoo Finance portal.

Methods

The three-step mixed-method approach for defining ISEs

We developed a mixed-method approach to operationalize ISEs. This approach unfolded in three main steps, which are detailed in Supplementary Information and summarized here as follows (Fig. 2):

  • Step 1 - Pre-selection of goals: Using a deductive content analysis (Elo and Kyngäs, 2008), three independent annotators assessed each of the UN seventeen goals’ definitions and decided whether they applied to the corporate context or not. We took a conservative approach and discarded the goals that the annotators unanimously discarded, which ended up being four, leaving us with 13 potentially relevant goals (after step 1 in Fig. 2).

  • Step 2 - Unsupervised discovery of goals: An unsupervised deep-learning framework based on the sentence-level BERT algorithm (Reimers and Gurevych, 2019) was developed (its technical architecture is discussed in Supplementary Information). This framework scored each employee’s review against the 13 goals found in the previous step. The framework identified the five reviews most relevant to each goal, and three other independent annotators then manually assessed the relevance of these reviews. To conservatively retain only the goals that were accurately identified by the framework, we discarded any goal for which the majority of the annotators marked <4 of the goal’s 5 reviews as relevant (overall, the agreement among the annotators was high, i.e., Fleiss K = 0.83). As a result, five goals were dropped; these had more to do with environmental sustainability (e.g., “clean water”, “climate change”) than with internal corporate affairs. This left us with eight goals (after step 2 in Fig. 2).

  • Step 3 - Consolidation of goals: Finally, the three annotators assessed if any of the eight goals ended up acquiring very similar meanings in company reviews. Two pairs were merged, ultimately leaving us with six ISEs (after step 3 in Fig. 2). Table 1 reports the names of these ISEs (first column), corresponding original UN SDGs (second column), and related excerpts of real reviews (third column).

    Table 1 The six internal sustainability efforts resulting from the three-step mixed-method approach for defining ISEs.
Fig. 2: Summary of the three-step mixed-method approach for defining ISEs.
figure 2

Starting with the 17 UN SDGs, three annotators unanimously discarded those that did not apply to the corporate context (step 1, pre-selection): 13 SDGs were left. From these, the subset of SDGs accurately captured by our NLP deep-learning framework was identified (step 2, unsupervised discovery): 8 SDGs were selected. Finally, three annotators merged the goals that, in the context of company reviews, ended up being paraphrased with very similar meanings (step 3, consolidation): this final step resulted in the identification of the six ISEs.

Metrics

We studied the six ISEs at the company-level u to test whether the commitment to ISEs manifests itself at a micro-level (e.g., in a company’s growth). To that end, we computed the score s(u, i) of the ith ISE for company u as the fraction of u’s reviews that mentioned i:

$$s(u,i)=\frac{{\sum }_{p\in R(u)}{\rm {si{m}}}_{{\rm {t}}}({v}_{p},{v}_{i})}{| R(u)| }$$
(1)

where R(u) is the set of u’s reviews, vi is the SBERT (Sentence-BERT) vector of ISE i (the six vectors/phrases for the ISEs are in Supplementary Information in Table 4), and simt(vp, vi) is the thresholded SBERT similarity score (Reimers and Gurevych, 2019) between the SBERT vector of review p and the SBERT vector of ISE i. More precisely, simt(vp, vi) is defined as

$$\scriptstyle{\rm {si{m}}}_{{\rm {t}}}({v}_{p},{v}_{i})=\left\{\begin{array}{ll}{\rm {sim}}({v}_{p},{v}_{i}),\quad \,\,{{\mbox{if}}}\,\,{\rm {sim}}({v}_{p},{v}_{i})\, > \,0.31\,{{{\rm{AND}}}}\,{\rm {sim}}({v}_{p},{v}_{i})\, > \,95 \% (i)\\ 0,\quad \,\qquad\quad{{\mbox{otherwise}}}\,\end{array}\right.$$
(2)

We chose the threshold of 0.31 by computing the mean SBERT similarity for each of the 8 goals left after stage 2 of our three-step ISE selection procedure as we had established that the NLP method worked well for these 8 goals. We then took the average value of the eight means (which was 0.31). Based on further validation, we also established that the SBERT values for all ISEs were not equally distributed and, as such, the fixed generalized threshold of 0.31 had to be paired with an ISE-specific threshold: based on our experiments reported in Supplementary Information, this latter threshold value (denoted as 95%(i)) was the 95% percentile of the ISE’s distribution, which is the very same threshold found in previous studies (Choi et al., 2020). We finally ranked companies by their score s(u, i) for each ith ISE. Note that, by review, we mean the proportion of the review. That is because we were mostly interested in positive initiatives (pros) rather than shortcomings (cons). In Supplementary Information, we indeed show that, if we were to instead take cons (or combine cons with pros together), our deep-learning framework would perform worse in the two validation steps of our mixed-method approach (steps 2 and 3).

Results

We identified each ISE’s keywords from all reviews associated with it (e.g., the keyword ‘salary’ for the ‘monetary’ ISE), and ascertained through a principled linguistic validation that the keywords are semantically related to the ISE (RQ1). After establishing that our ISE scoring is valid, we scored the companies and studied the relationship between a company’s ISE scores and its success in the forms of company ratings and stock growth (RQ2), and uncovered ISE scores variability across industry sectors (RQ3). Figure 3 summarizes our analyses and the data used for them.

Fig. 3: Overview of the research questions investigated in this work and the sources of data we used for them.
figure 3

The first block corresponds to the first research question, where we use manual qualitative annotation to validate the ISE scoring technique. The second research question analyzes the association between company success and ISE by looking at company stock growth and ratings. Finally, in the third research question, we assess ISE scores across different industry sectors.

RQ1: Does our machine learning method capture internal sustainability efforts?

We validated our deep learning method for detecting ISEs based on a triangulation approach (Denzin, 2012), during which we first established its face validity by inspecting the language used in reviews, and subsequently examined our results with respect to external reports. We discuss the former next, while the latter is detailed in Supplementary Information.

To establish the face validity of the proposed ISE detection method, we took the linguistic approach explored by Das Swain et al. (2020) First, for each of the six ISEs, we obtained the most frequent keywords—1, 2, 3, and 4-grams from the reviews deemed relevant by our method. We then computed the TF-IDF scores for such n-grams, where each document was comprised of all shortlisted reviews for each ISE. Finally, we ranked keywords for each ISE based on their TF-IDF score. This allowed us to find the keywords judged to be important for a certain ISE by our embedding-based method. The top-ranked keywords for the six ISEs are visualized as a heatmap in Fig. 4.

Fig. 4: Top n-grams in sentences expressing ISEs.
figure 4

Darker colors (higher normalized TF-IDF score) indicate greater relative relevance to a particular ISE.

We observed many keywords to be highly discriminative of the ISE they associated with for example, keywords ‘pay good’ and ‘salary’ were (correctly) ranked highly for ISE ‘monetary’ only; ‘health’, ‘health benefits’, and ‘take care’ were ranked highly for ISE ‘health’ instead. Keywords ‘opportunities learn’, ‘experience’, ‘good train’, and ‘program’ were uniquely strongly associated with ISE ‘education’. Keyword ‘flexibility’ was highly discriminative of ISE ‘diversity’; ‘industry’ and ‘technology’ were strongly associated with ISE ‘infrastructure’; lastly, n-grams like ‘positive work environment’ and ‘friendly work environment’ were strongly associated with ISE ‘atmosphere’. Other keywords ranked highly in more than one ISE instead: this was the case, for example, for keywords ‘benefit work–life balance’ and ‘family’, which were highly associated with both the ‘health’ ISE (as one might expect), and to the ‘diversity’ ISE. Health-enhancing factors like work-life balance and flexible working conditions options have been shown to facilitate gender equality and improve the diversity of employees (Chung and Van der Lippe, 2020; Lyonette, 2015), therefore it was promising that our method was capable of picking up these semantically related concepts too.

Indeed the six ISEs we identified were not mutually exclusive concerns (and neither are the UN SDGs), and one may wonder to what extent they are semantically related. To shed light on this question, we conducted a principal component analysis (PCA) on s(u, i) at a company level to assess how much of the variance in the data could be explained by different principal components, and how those components related to the six ISEs. We found that, at the company level, just two components explained 88% of the variance—specifically, the first component explained 73% and the second component explained 15%. We report the correlation between the first two PCA components and the six ISEs in the last two columns of Table 2.

Table 2 Cross-correlation between the six ISE scores and the two principle components obtained via PCA at a company level.

We observed that all ISEs with the exception of ‘monetary’ were strongly correlated with the first component and weakly negatively correlated with the second component; on the other hand, ‘monetary’ was moderately correlated with both the first and second principal components. These two findings suggested that the ‘monetary’ ISE was orthogonal to the other five and that these other five were strongly interconnected with one another. Indeed, one may expect that improving work–life balance has a positive impact on both the ‘health’ ISE and the ‘diversity’ ISE; on the other hand, improving monetary conditions may not directly affect other aspects of corporate internal sustainability. Overall, we thus found two main facets of employee-centred sustainability—a staff welfare-related one (PC1) and a financial benefits-related one (PC2). To avoid multicollinearity, we used these two main facets of ISEs (rather than the six individual ones) to answer the following research questions.

RQ2: Is sustainability associated with company success?

There are several ways to measure a company’s success. We considered two complementary ones: the online ratings it received from its employees (available from the company reviewing site), and its financial position (measured as stock growth).

Sustainability and company online ratings

Employees have the option to rate the company they are reviewing based on four different facets—balance, career, culture, and management, plus a fifth company’s overall one. We thus investigated to what extent a company’s success across these five facets could be predicted based on the company’s commitments to the ISEs. We did so by first aggregating ISE scores and ratings at a company level. The aggregation reduces the endogenous association between company ratings and ISEs in individual reviews. We then conducted an OLS regression using our two main sustainability facets as predictors (‘staff welfare’ and ‘financial benefits’) while also controlling for a company’s total number of reviews. As reported in Table 3, we found that these two sustainability facets could explain up to 64% of the variance in a company’s ratings; particularly noteworthy was that the staff welfare facet of corporate internal sustainability was strongly positively correlated with all aspects of a company’s success, including balance and culture, in line with previous research findings (Isensee et al., 2020; Rao, 2017).

Table 3 Predicting company online ratings from the two main facets of sustainability (staff welfare and financial benefits) using a stepAIC analysis on an OLS regression.

Sustainability and company stock growth

We obtained stock data for 84 of the 104 companies in our dataset, from 2009 to 2019, using the Yahoo Finance portal. For each company, we calculated the geometric mean of its stock growth during such period; we used the geometric mean since the distribution of stock growth values across companies was heavy-tailed (as reported in Supplementary Information). To inspect whether a company’s financial success (measured as stock growth) was associated with its sustainability efforts, we then plotted in Fig. 5 the geometric mean of its stock growth (y-axis) against its ranking in terms of the staff welfare facet of sustainability and the financial benefits facet of sustainability (x-axis). We also included in the figure the total number of reviews, to check whether stock growth was merely associated with the company’s popularity rather than its internal sustainability practices.

Fig. 5: Geometric mean of stock growth values for increasing ISE score ranking.
figure 5

While companies with both types of sustainability have high stock growth, staff welfare is more strongly associated with higher growth. We also plot the number of reviews to rule out the role of company popularity.

As shown in Fig. 5, companies that focused on both staff welfare and financial benefits sustainability tended to have high stock growth; between the two facets, it was staff welfare that most strongly associated with high stock growth, in line with previous research (Diversity Equity and Inclusion Still Matter in a Pandemic). Notably, companies with high stock growth did not invest as heavily in financial sustainability only, bolstering previous work which noted that focusing on staff welfare sustainability could lead to greater stakeholder engagement even without high pay (Ziegler et al., 2007). Overall, our results suggest that a company’s financial success is associated with its investment in internal sustainability practices, but only if they focus on a holistic approach to sustainability that tackles both staff welfare and financial benefits.

RQ3: Is sustainability associated with specific industry sectors?

To examine whether certain industry sectors were leading the corporate sustainability agenda, we plotted in Fig. 6 the distribution of the two facets of sustainability for each industry sector. We further conducted a MANOVA analysis and found the differences in sustainability scores to be significant across sectors. In terms of staff welfare sustainability efforts, we found Industrials and IT to lead, possibly due to recent investment in this type of sustainability initiative (Higón et al., 2017). The Financial sector followed, while the health care one exhibited very high variability. This could be explained by healthcare professionals often sacrificing personal well-being and work–life balance due to the highly demanding nature of their work (Schwartz et al., 2019; Shanafelt et al., 2015). We found consumer staples and consumer discretionary to lag significantly behind. This was also the case when looking at financial benefits, although differences between sectors were smaller along this facet of internal sustainability efforts.

Fig. 6: Sustainability and industry sector.
figure 6

Boxplots showing the distribution of the staff welfare and financial benefits ISE scores across different industry sectors.

Figure 6 offered an overview of engagement with sustainability efforts at the industry sector level. To reveal more nuanced variations within the same sector, we plotted individual companies’ engagement with each of the two main sustainability facets in Fig. 7. Notable variations emerged: our sector-based analysis revealed low sustainability scores for consumer discretionary and staples companies overall; upon closer inspection, we found some companies (e.g., Dollar General, K-mart) to indeed score low on both facets of ISEs, while others (e.g., Costco) to score low on staff welfare but high in the financial benefits facet of sustainability, a phenomenon noted in previous work too (Cascio, 2006). Variations emerged also within the IT sector, previously shown to be leading sustainability efforts on both dimensions: a more nuanced investigation revealed high sustainability scores on both financial benefits and staff welfare ISEs for companies like Microsoft, Google, and Apple; however, more traditional IT companies like Infosys, IBM, and Cognizant scored high on staff welfare sustainability only.

Fig. 7: Scatterplot of the scores of each company’s staff welfare vs. financial benefits.
figure 7

The size of a company’s dot represents its stock growth. We highlighted in blue some of the companies to assess them qualitatively. Consumer staples and discretionary companies like Kmart, Macy’s, and Kohl’s scored low for both types of sustainability. Traditional IT companies like Infosys, IBM, and Accenture scored high for staff welfare sustainability but not for financial benefits sustainability.

One must be mindful that comparisons among (same-sector) companies were further affected by the type of employees reviewing their employer. In our study, this was apparent for companies like Amazon, which enjoyed high stock growth but surprisingly scored low for both types of sustainability. Despite the company employing a large number of software engineers as well as warehouse workers, upon close inspection, we found the most common roles of Amazon employees in our reviews to be ‘warehouse associate’ and ‘warehouse worker’. Previous research did find logistic workers at Amazon to face poor working conditions (Amazon’s no show on sustainability; Chan, 2015), corroborating the low sustainability scores that our method computed for this company. Furthermore, previous literature noted that Amazon’s lack of focus on sustainability practices has yet to hurt its profitability (Amazon’s no show on sustainability; Chan, 2015).

Discussion

Many companies are under constant pressure to invest in a wide range of internal sustainability practices designed to enhance working conditions (Barko et al., 2022; Jakob et al., 2022). However, the benefits of such investments for both the company and its stakeholders are often difficult to assess. By examining how employees form perceptions of their company’s engagement with ISEs, this research spells out the microfoundations of internal sustainability and provides evidence of the strategic importance of investing in business practices and policies geared towards ISEs (de Ruyter et al., 2022; Zhao et al., 2022).

By examining how the wider UN SDGs agenda can be translated into diverse internal corporate efforts directed towards employees, our work offers substantive methodological, conceptual, and empirical contributions to internal sustainability research and managerial practice. More specifically, it offers two main theoretical contributions. The first has to do with the conceptualization of ISEs. We have shown how the sustainability agenda brought forward by the introduction of the UN SDGs informs and shapes six sustainability efforts within a company. Efforts to do with health, education, diversity, monetary benefits, supporting infrastructure, and a supportive atmosphere. While the existing literature often presents sustainability as a monolithic construct (Chen et al., 2020; Liu et al., 2020), our two-factor conceptualization of ISEs delineated the two core strategic aspects that companies should carefully balance when implementing ISEs: one aspect had to do with traditional financial benefits (e.g., salary, bonuses), and the other had to do with broader aspects of staff welfare (e.g., diversity, atmosphere). The second theoretical implication enhances the understanding of what makes companies economically successful and how internal sustainability practices differ by sector, especially in emerging sectors like IT.

This work also offers practical implications, and it does so for three main stakeholders. The first stakeholder consists of scholars. Our method is grounded in the UN SDGs and performed consistently well across several rounds of external validation. By providing a robust framework for examining mentions of ISEs through automated text analysis, new textual datasets could be academically studied in the future.

The second stakeholder consists of policy makers. We showed that high levels of ISEs engagement (not only for financial aspects but also for general staff welfare) were associated with high economic growth. This result supports policies in recent years that have fostered a corporate culture that goes beyond financial rewards and is oriented towards equality and well-being (Triana et al., 2019). Beyond company efforts, policy makers themselves would be able to strategically decide which ISEs to incentivize with taxation schemes or set out a legislation agenda that would attract workers who care about specific ISEs. To inform more targeted interventions, we also showed that the impact of engaging with ISEs varies across sectors: companies in the IT and business-to-business industrial goods sectors outperformed companies that produce and commercialize consumer goods. This finding is noteworthy as previous research shows that sustainability signals tend to be stronger in business-to-consumer than in business-to-business market contexts (cf., Hoejmose et al., 2012).

The third stakeholder consists of company managers. By reflecting on employees’ perceptions, our analytical framework represents an invaluable tool to operationalize the microfoundations of internal sustainability, assess how corporate efforts in this area directly impact employees and quantify and qualify the extent to which corporate engagement with ISEs becomes visible to employees across different organizational levels.

Our work comes with five main limitations though. The first is that our list of ISEs may not be accurate or exhaustive. While the corporate sustainability literature has focused on initiatives that are external to a company and have an impact on the wider world’s sustainability, the practices that are internal to a company and have an impact on employees received less attention. As a result, we only found non-comprehensive frameworks for internal sustainability practices suggested, such as those focusing on social aspects only (Baumgartner and Ebner, 2010). To tackle that, we started from the well-grounded definitions of the UN sustainability goals, used a principled mixed-method approach to paraphrase those most relevant to the corporate context, and validated the resulting list with both qualitative and quantitative approaches. These approaches are generalizable, in that they could be used to study other constructs appearing in reviews in the future (e.g., how employees in a company deal with stress).

The second limitation is that, since the reviewing site was founded in 2008, key financial events prior to 2008 (e.g., the dot-com bubble in the late 1990s) may have impacted our results but could not be accounted for because of a lack of data.

The third limitation is that the number of companies under study is invariably limited. We were able to study 104 major companies, largely because the other companies had a limited number of reviews that did not allow for automatic processing. Future work should explore alternative mixed-method approaches (likely qualitative ones) to study ISEs for these companies.

The fourth limitation is the lack of causal claims. Given our data, we could not assess the causal direction between ISEs and socio-economic returns. More specifically, we could not assess whether focusing on ISEs led to better socio-economic returns (e.g., stock growth), whether better socio-economic conditions created a breeding ground for fostering ISEs, or whether these two causal relations were in a self-reinforcing cycle.

The fifth and final limitation has to do with the representativeness of our data. Companies in certain sectors (e.g., IT) may have been reviewed more often than those in other sectors (e.g., consumer discretionary). Despite that possibility, in Supplementary Information, we show that our data was still representative along three major dimensions: (a) the distribution of industry sectors of the S&P 500 companies, which our data matched without over-representing any specific sector; (b) official population in a state, which scaled linearly with the number of employees in the state in our data; and (c) number of company headquarters in a state from official sources, which has a nearly perfect correlation with the number of headquarters per state in our data. Finally, despite the platform’s mechanisms to guarantee review quality, as discussed in the section Data, we acknowledge that potential self-selection bias could cause our reviewers’ sample to be non-representative. To reduce the impact of such a bias and ensure robustness, we restricted our analyses to companies having at least 1000 reviews.