A systematic and interdisciplinary review of mathematical models of language competition

During the last three decades, scientists in formal and natural sciences have been proposing models of language competition. Such models could prove instrumental in informing efforts made towards preserving the world’s linguistic diversity but have yet to gain significant interest among linguists. This situation could be due to a lack of overlap between the concepts and methods used in those models and those used by linguists. In an effort towards promoting interdisciplinary dialogue on the topic of language competition, this study describes the concepts and methods used in mathematical models of language competition and assesses whether these concepts and methods are becoming more similar over time to those used by linguists. To this end, studies that proposed mathematical models of language competition were systematically retrieved and analysed. Change over time in those models was first assessed concerning the way they are specified, including the parameters they contain. Next, it was checked whether models were increasingly fitted to empirical data. Finally, change in the disciplines covered by the journals where those models were published was evaluated. Results show that overall, models have been including few sociolinguistic parameters, have been relying little on empirical data, and have been mostly published in journals covering the fields of mathematics and physics. However, the last years have seen an important turnaround along each of these three axes. A common language seems to be emerging between fields regarding mathematical models of language competition, which should prove instrumental in informing efforts made towards preserving the world’s linguistic diversity.


Introduction
S ince at least the 1950s, linguists have been playing a leading role on the language preservation scene. They were the first to sound the alarm about the perplexing pace at which languages are becoming extinct worldwide (Dorian, 1981;Hale et al., 1992;Krauss, 1992). They have been at the forefront of efforts made towards preserving the world's linguistic diversity, notably within international organisations such as the United Nations Educational, Scientific and Cultural Organization (UNESCO) (Grenoble and Whaley, 2006;Moseley, 2010). Finally, linguists have continued to save moribund languages from oblivion through extensive fieldwork and documentation (Crystal, 2000;Seifart et al., 2018).
Recently, scientists in formal and natural sciences have also started to show interest in the problem of language death, albeit from a different perspective. Starting in the early 1990s, they have been proposing mathematical models that aim to establish the mechanisms through which languages go extinct (Seoane and Mira, 2017). Knowledge gained from such models could be tapped into to help to slow down the rate of language extinction worldwide but have yet to gain significant interest among linguists and other researchers in the humanities and social sciences. This situation could be due to a lack of overlap between the concepts and methods used in those models and those used by linguists. In fact, researchers interested in the question already suggested that it is "because models have not sought to engage with the intellectual framework used by linguists" that "the real influence of mathematical modelling has been severely limited in the field of language revitalization" (Fernando et al., 2010, p. 49).
In an effort towards promoting interdisciplinary dialogue on the topic of language competition, this study aims to describe the concepts and methods used in mathematical models of language competition, assess how models have changed over time, and determine whether there is any common language that is emerging between the different fields of science in modelling how languages become extinct.
To achieve this aim, a systematic review of the literature on mathematical models of language competition was performed. Information was retrieved from relevant studies and organised along three axes. The first axis concerns the methods used for modelling language competition. This point is important because mathematical models of language competition were initially developed based on methods that already existed in formal and natural sciences, but not necessarily in linguistics. However, research on models of language competition could provide practitioners involved in language preservation more practical tools if it relied more on methods that are common to the different disciplines involved. Below, the methods on which models were built are characterised concerning the level of analysis (macroscopic, mesoscopic, microscopic), the form of the model (equation vs. simulation-based), the linguistic composition of the population being modelled, and the parameters included.
The second axis concerns the use of data for validating models. Mathematical models of language competition are often purely theoretical and do not take empirical data as input. Conversely, linguistic analyses of language endangerment are typically datadriven. In the second part of the results section, studies on language competition are analysed concerning whether they validate their models against empirical data and whether there is any trend towards greater use of such data. Also, for each study that fitted its model to empirical data, a detailed account of the languages considered is provided, alongside the regions (or countries) covered.
The third axis concerns the way that research on mathematical models of language competition is communicated among different disciplines. Poor communication has long been recognised as an impediment for successful collaboration between disciplines (Bracken and Oughton, 2006). If publications on language competition appear predominantly in journals covering formal and natural sciences, linguists and social scientists are less likely to take note of them. On the other hand, mathematical models of language competition might more easily reach linguists and social scientists if they are published in multidisciplinary journals or journals covering the fields of linguistics or social sciences. Furthermore, an increase in the number of publications on mathematical models of language competition in journals covering the fields of linguistics and social sciences could be an indication of a certain appropriation of such models by linguists and social scientists. The third part of the results section therefore provides an analysis of the disciplines covered by the journals in which models of language competition were published.
Other reviews have already covered some of the studies that are reviewed here (Gong et al., 2014;Stauffer (2006a, 2006b); Seoane and Mira, 2017;Solé et al., 2010;Vogt, 2009;Wang and Minett, 2005). This study differs from those on at least three points. First, the focus of most of the previous reviews was broader than the focus of the present review. Those reviews discussed studies that used mathematical models to solve different problems allying languages and population dynamics, including-but not limited to-language competition. In contrast, our review concentrates specifically on language competition, which allows a more in-depth discussion of the relevant literature. Second, none of the previous reviews employed a systematic approach at retrieving and analysing articles. As a result, they could not provide a completely balanced and objective picture of the literature, which this study aims at doing. Third, the goal of the previous reviews was primarily to describe the main findings of the studies they covered. The present review mainly focuses on the methodological innovations that have taken place in the field over the years, which we hope will help us achieve our aim formulated above.

Methods
Search strategy. Methods follow the Prisma statement for reporting systematic reviews (Moher et al., 2009). Relevant articles were retrieved using the search engines of Arxiv, Scopus and Web of Science. Articles must have included in their title the word "language" in combination with at least one of the following words: death, competition, extinction, endangerment, shift, disappearance, invasion, revitalization, coexistence, survival, or preservation. To avoid finding too many articles from psycholinguistics-which are not relevant for our purposes-articles must not have included in their title the following words: competence, teaching, learning, processing, disorder, acquisition and comprehension. We further specified that the word "model" should appear in either the title, abstract or keyword list. Truncation was used to allow for different words that share the same root and meaning. To avoid overseeing any important work, articles that cited two seminal studies in the field were searched using the Scopus search engine. These two studies are those of Baggs and Freedman (1990) and Abrams and Strogatz (2003). The reference lists of selected articles were also checked, and experts in the field were consulted for missing articles. Eligible studies must have been research articles, thus excluding book chapters, and must have been written in English. Searches were performed on the 25th of May 2020 and refreshed on the 27th of July 2020. The exact word strings used in searches are provided in Table S1 of the supplementary materials.
Article selection and data extraction. Figure 1 shows how studies were selected for review. The different search streams provided a total of 821 records, of which 624 remained after duplicates were removed. The first round of screening was performed based on titles and abstracts. A total of 134 articles with mathematical models of language competition as a topic was selected for a complete assessment. At this stage, more articles were excluded if they assessed the sensitivity or qualitative properties of existing models without proposing a new one (model analyses), if they were not published in the form of a research article, or if they did not have sufficient information about the type of model they presented. This resulted in 56 studies left for data extraction, to which six were added from the reference lists of the selected studies or based on expert knowledge, resulting in a total of 62 studies selected for analyses.
We extracted from the 62 selected studies information about authors' names, year and journal of publication, the level of analysis (macroscopic, mesoscopic, microscopic), the composition of the population concerning its linguistic groups, the parameters included in the models, and the languages and region or country of interest covered by the data (in case empirical data were used). Model parameters were considered as being the same if they denoted similar concepts despite being called differently. For example, the parameter "Status", which is sometimes used to refer to the socioeconomic position of the speakers of a given language, was considered as equivalent to the parameter "Prestige". A full account of the parameters included in the analyses and what they were called in the different studies is given in Table S2 of the supplementary materials.

Analytical framework
Evolution of methods. The way that methods evolved over the years is accounted for using four characteristics. The first one refers to the level of analysis. We distinguish between the macroscopic, mesoscopic and microscopic levels. Studies on models of language competition, and mathematical models of social processes in general, commonly distinguish between the macroscopic and microscopic approaches (Castellano et al., 2009;Stauffer and Schulze, 2005). Macroscopic models describe population processes from an aggregate level. Results translate mean outcomes and do not account for the variability among individuals. An example of an early macroscopic model is provided by Abrams and Strogatz (2003), who use a differential equation to describe change over time in the proportion of speakers among two competing languages and apply their model to different situations of language competition around the world. Microscopic models, on the other hand, describe population processes considering each individual separately. This approach confers certain advantages compared to the macroscopic approach. More specifically, microscopic models allow to explore the full range of an outcome instead of just its mean. These models further allow for stochasticity, which can play an important role among smaller populations. Finally, they allow to explicitly consider interactions between individuals. An example of an early microscopic model is provided by Castelló et al. (2006), who define a population of agents speaking either language A or B, or both (bilingual agents). Agents interact with each other and acquire a second language under the influence of repeated contact with the speakers of another language, or abandon an already known language due to a lack of interaction with the speakers of that language. Studies less often consider mesoscopic models as a separate approach. In practice, mesoscopic models are similar to microscopic models in that both consider each individual separately and allow for stochasticity and interactions. However, in mesoscopic models, the unit of analysis is not the speaker like in microscopic models, but rather groups of speakers, or even entire languages. An example of an early mesoscopic model is provided by Schulze and Stauffer (2005), who define a set of bit-strings where each different string represents a different language. The strings-or languages-can duplicate themselves, which leads to an increase in the number of speakers of that language, they can undergo mutations, which represents the phenomenon of language birth, or they can disappear, which represents language death. This kind of model is usually used to study the world's current languages distribution concerning their numbers of speakers, i.e., to explain why a few languages are spoken by a large share of the world's population while most languages are spoken by only a fraction of it.
The second characteristic used to account for the way that models have evolved refers to the type of model or the form of the equation used. Six types were identified. We first consider three types of models that are based on ordinary differential equations: ordinary differential equations models (ODE), Lotka-Volterra models (LV), and reaction-diffusion model (RD). ODEs refer to the simpler form of ordinary differential equations models. These models estimate the change in the proportion of speakers belonging to a language as a function of time and some parameters and can be fitted to data that include only limited information. In these models, the total population is normalised, meaning that these models assume that it is not the absolute size of a linguistic group that matters, but its respective proportion in the whole population. It follows that in an ODE model with two linguistic groups, growth in one group means a decline in the other one. LV models are also based on ordinary differential equations, but additionally include information about group size, allowing the size of a linguistic group to change independently from its proportion in the whole population. Diffusion terms can be added to ordinary differential equations (including to LV models) to allow speakers to spread in space, in which case the model will be referred to as an RD model.
We further distinguish models that were developed using a system dynamics (SD) approach. These models offer the same possibilities as an ordinary differential equations model but are expressed in terms of stocks and flows rather than in terms of equations, which may increase their accessibility to nonmathematicians (Wyburn and Hayward, 2009). Furthermore, such models often make explicit some parameters that otherwise remain implicit in ordinary differential equations models.
Studies that rely on ODE, LV, RD and SD models usually model language competition from a macroscopic level. Among those studies that took a mesoscopic or microscopic approach, most relied on agent-based (AB) models or a conversation game (CG) framework, or a combination of both. AB models simulate interactions between speakers according to a set of preestablished rules. Due to the complex nature of these interactions, these models often provide insights that could not be obtained using equation-based models. CG, on the other hand, is a form of game theory that concentrates on decisions made by speakers during conversations. Speakers' information about the world is imperfect and leads them to make unexpected choices in multilingual settings. This approach was used on its own or in combination with agent-based models, in which case language shift depends on a series of one-on-one interactions. One study could not be assigned to any of the model types, presenting results from a simulation based on a series of equations. It will be referred to as "Other simulations" (OS).
The third characteristic refers to the composition of the population concerning its linguistic groups, i.e., their number and whether bilingualism is considered. Among the studies reviewed here, competition is either considered among two, three or four languages, or among hundreds or even thousands of them. While the former refers to competition in a specific country or a region, the latter refers to competition in the whole world. Bilingualism refers to the explicit modelling of speakers that are fluent in two or more languages.
The fourth and final characteristic refers to the parameters included in each model. All models consider at least one parameter that drives growth in the number of speakers of one language to the expense of the number of speakers of another language. The universe of such parameters is presented in Table 1. To facilitate analyses, parameters were assigned to three broad categories. The first category includes parameters that do not directly predict the intensity of the shift between languages but instead specify the form in which it occurs, i.e., whether it occurs through a horizontal, vertical or exogenous transmission. Horizontal transmission occurs from adults to adults, vertical transmission (also called intergenerational transmission) occurs from parents to their children, and exogenous transmission occurs via institutions such as schools. The second category refers to geodemographic parameters. These parameters describe how speakers occupy the space, reproduce, migrate and die. The third category refers to sociolinguistic parameters. These include the prestige conferred to speakers of a given language, the language to which speakers identify, and language use itself, e.g., the level of fluency. Some parameters directly influence shift. For example, speakers are more likely to shift towards languages that are spoken by larger numbers of speakers. Other parameters have an indirect influence on shift. For example, groups of speakers with higher birth rates will have higher numbers of speakers not because more births induce more speakers, but because more births allow for more vertical transmission.
Data usage. Information on data usage includes languages and countries (or regions) covered by the data, for each study. We consider that a study made use of data if it explicitly mentions the languages and country (or regions) covered by the data, and if data minimally reflects the numbers or proportions of speakers of two or more competing languages at two or more points in time, or transitions intensities between languages at different points in time. Conversely, we consider that a study did not make use of data in cases where the model is strictly theoretical, or in cases where data only reflects proportions of speakers at one point in time, which is often the case in studies that aim at reproducing the world's languages distribution according to their number of speakers. A distinction is made between studies that used data on a specific population for the first time or for a second time or more. These will be referred to as first and second analysis, respectively. Populations are considered different when they either speak a different combination of languages or speak a similar combination of languages in different countries (or regions). For example, several studies considered competition between Welsh and English in Wales. The study that first used data on the combination Welsh/English in Wales is considered as a first analysis. All subsequent studies that used data on the combination Welsh/English in Wales are considered as second analyses.
Discipline coverage. Scopus' Subject area classifications were used to assign a scientific discipline to each journal in which studies were published (Scopus, 2020). These "classifications" include a total of 26 disciplines covering the whole of the health, life, physical and social sciences. Scopus may assign more than one discipline to each journal. We only consider the first of those disciplines, which is also the main one. Journals in which the studies analysed here were published cover eleven of Scopus' 26 disciplines, namely: Agricultural and biological sciences; Arts and humanities; Biochemistry, genetics and molecular biology; Chemistry; Decision sciences; Economics, econometrics and finance; Engineering; Mathematics; Multidisciplinary; Physics and astronomy; and Social sciences. To facilitate analyses, these eleven categories were merged into six: Biology; Economics; Linguistics and other humanities; Multidisciplinary; Physics and Mathematics; and Social Sciences. The exact correspondences are presented in Table S3 of the supplementary materials.

Results
Number of publications by year. According to our selection, the first mathematical model of language competition was proposed in 1990. It is however during the mid-2000s that this topic started to gain in popularity, mostly as a result of Abrams and Strogatz' influential study published in 2003. Studies on models of language competition continue to be more or less regularly published to this day, with between zero and eight publications per year since 2005. Information on the number of publications per year can be derived from Figs. 2 and 3 presented below.
Evolution of methods. Table 2 presents a comprehensive overview of how methods for modelling language competition evolved over the last thirty years. Studies are ordered chronologically. References in column 1 are colour-coded to represent the level of analysis. Column three shows the model employed, while column four shows the composition of the population according to its REVIEW ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-00683-9 linguistic groups. Column five contains tiles which show the parameters used in each model. Information is also included about data usage, where darker tiles indicate the use of data, and about the journal of publication's main discipline, in column two. These last two points will the focus of the next two subsections and will not be commented on further here.
Early models of language competition took a macroscopic approach, and this approach remained popular throughout the observation. Most of these models are based on a system of ordinary differential equations, though many studies published between 2005 and 2014 used models of the types Lokta-Volterra, reaction-diffusion or system dynamics. Models that took a microscopic approach started to appear from 2005 onwards. This approach continued to be regularly utilised over time, relying either on agent-based or conversation games models or on both. The mesoscopic approach, on the other hand, was mostly used between 2005 and 2007 in the form of agent-based models. These focused on large numbers of languages as they were mostly used to study the world's linguistic diversity.
Turning to the parameters included in each model, a vast majority of models considered horizontal shift between speakers of different languages, while about half as many considered vertical shift. A few studies considered exogenous influences on language shift, while a few did not include any shift parameter but Family formation, mating Captures the fact that speakers who share a common language are more likely to have children with each other than speakers who do not share a common language. Environment Urban/rural, media Captures the possible influences of the environment on language shift, such as the fact of living in a city or a rural area, or the fact of being exposed to media in a given language. Similarity Distance Captures the fact that speakers are more likely to learn each other's languages if the languages that they speak are similar to each other, effectively leading to more bilingualism. Attrition Memory, forget Captures the fact that speakers may forget a language that they once knew upon ageing, effectively leading to less bilingualism. Change Mutations Captures changes in the structure and vocabulary of a language over time. May lead to language birth. Accommodation Language use, personality type, diglossia Captures the efforts made by speakers to adapt to an interlocutor or a specific situation, thus influencing language choice among bilinguals. Competency Fluency, proficiency, skills Captures the variation in fluency among speakers of a second language, which may influence language choice. rather concentrated on the choice of different languages in different contexts using a conversation game framework. The choice of language shift parameters does not appear to have significantly varied over time or to systematically change according to the type of model used.
Regarding geodemographic parameters, the parameter "number of speakers" was included in an overwhelming number of studies. That is, most studies let shift between linguistic groups depend at least in part on the size of those groups: large groups thus tend to become larger and smaller groups tend to become smaller. Many studies additionally considered the births and deaths of speakers as a factor influencing the size of linguistic groups. Many studies considered the role of space, acknowledging the fact that speakers who are separated by a greater physical distance have less influence on each other than speakers who are closer to each other. Among the studies that considered space, more than half considered movement, a feature often modelled through means of reaction-diffusion models. Few studies considered the role of migration or borders. A small number of studies, all based on the mesoscopic approach, assumed that some speakers have a higher propensity to reproduce than others, i.e., they have higher fitness. Five studies distinguished different phases in the lives of speakers by including ageing in their models. The use of features such as borders, fitness and ageing went from rare to almost inexistent towards the end of the period of observation.
The most commonly included sociolinguistic parameter is prestige, followed closely by loyalty. These two parameters can be considered as two opposing forces influencing language shift: speakers shift to a new language due to its prestige but remain loyal to the already known one for convenience or cultural reasons. These two parameters are to be found in most models that took a macroscopic perspective, as well as in some microscopic models, but are absent from mesoscopic models. The parameter loyalty tends to appear more in models that were proposed from 2012 onwards. The parameter language change, on the other hand, appeared in many mesoscopic models published in the mid-2000s but was almost completely abandoned afterwards. In fact, language change was mainly operationalized in the form of random mutations affecting artificially created languages (e.g., bit-strings) in the context of mesoscopic models. Other sociolinguistic parameters have become more common over time. This is the case of the parameters network topology, homophily, environment, accommodation and competency. These were often included in microscopic models, or in models that took a system dynamics approach. To finish, Fig. 2 Studies breakdown by year and data usage. Colours indicate whether a study used data or not, and if so, whether it used data on a specific population for the first time (First analysis) or for a second time or more (Second analysis). Populations are defined according to the languages they speak and the region or country they inhabit.  Table S3 of the supplementary materials).

REVIEW ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-00683-9 Table 2 Chronological list of studies broken down by type of approach, model employed, composition of the population by linguistic groups, use of data and parameters.  the parameter similarity was sporadically used throughout the observation, in the context of ordinary differential equations or mesoscopic agent-based models.

ShiŌ parameters Geodemographic parameters Socio-linguisƟc parameters TOTAL
Data usage. Figure 2 displays for each year of interest counts of studies according to whether the models they presented were fitted to data, and if so, whether they used data on a specific population for the first time (first analysis) or for a second time or more (second analysis). A total of 22 studies fitted their models to empirical data. Of these, five did so before 2010, while seventeen did so in 2010 or later. Strikingly, eight of the nine studies published in 2019 and 2020 were fitted to empirical data. Table 3 lists the languages and regions or countries for which data was used. Panel A lists the languages and the corresponding countries (or regions) that were covered in studies that modelled competition between two languages, while Panel B lists the languages and the corresponding countries (or regions) that were covered in studies that modelled competition between more than two languages. Studies modelled competition among fifteen pairs of languages, five combinations of three languages, and one combination of four languages. Twenty-seven countries or regions were covered. Different studies sometimes modelled competition between similar languages, either in a similar or a different country (or region). The same languages also sometimes appeared in different models in combination with different languages. Notably, eighteen studies considered competition involving the English language. Most studies considered language competition in economically developed countries, including European and North American countries, but also Hong Kong and Singapore.
Discipline coverage. Figure 3 breaks down the selected studies according to the disciplines covered by the journals in which they were published. Journals covering the fields of physics and mathematics have by far been the most popular outlets for studies on models of language competition, covering 36 publications out of a total of 62. Journals covering the field of biology and journals covering the field of social sciences come second, each with seven publications. Journals covering the field of economics, those covering the field of linguistics and other humanities, and multidisciplinary journals each published four studies on mathematical models of language competition. The dominance of journals covering the fields of physics and mathematics seems to be slowly diminishing: their share went from 75% before 2010 to 40% from that year onwards. Initially, publications in journals covering the field of biology tended to replace journals covering the fields of physics and mathematics. However, the picture is more diversified concerning the later period, with a higher number of publications in journals covering the fields of economics, linguistics and other humanities, and social sciences. We note that one particular journal was relied on heavily as an outlet for publishing models of language competition. Namely, Physica A: Statistical Mechanics and its applications published sixteen of the 62 publications reviewed here. In comparison, the second most common outlet is the International Journal of Modern Physics C, with five publications, while most other journals are represented once or twice (Table S3, supplementary materials).

Discussion
In which direction should mathematical models of language competition evolve so that they engage better with the intellectual framework used by linguists, and thus better inform the current efforts made towards preserving the world's linguistic diversity? As seen above, mathematical models of language competition are appearing more and more in journals covering disciplines outside formal and natural sciences. This section aims at providing an answer to the question raised above by looking into how those studies published in the fields of humanities and social sciences differ from those published in the formal and natural sciences. Then, a discussion is provided as to whether models, in general, are becoming more similar over time to those recently developed in the fields of humanities and social sciences.
Comparison of models across disciplines. Results presented above suggest that models published in journals covering the fields of linguistics and other humanities exhibit properties that are different from models published in journals covering other disciplines. Table 4 shows how models differ between those two groups of disciplines which we, respectively, refer to as Humanities and social sciences and Formal and natural sciences. Those differences can be summarised the following way: compared to models published in journals covering formal and natural sciences, models published in journals covering humanities and social sciences rely more on a microscopic rather than a macroscopic approach, consider more often bilingualism as a separate state, privilege the use of sociolinguistic parameters over geodemographic ones, and are slightly more often based on data.
First, the more common use of the microscopic approach among linguists and social scientists might be explained by the fact this approach allows to model interactions between speakers more explicitly and accommodates a larger number of parameters. Some of the models of this class recently proposed by linguists and social scientists indeed include a large number of parameters, allowing for a high degree of precision (Civico, 2019;Karjus and Ehala, 2018). These models, furthermore enhanced by the use of data for calibration, allow to unravel the multiple pathways through which language shift can occur, and what can be done about it.
Second, the importance of bilingualism was recognised relatively early by linguists interested in questions relating to language competition (Minett and Wang, 2008). This recognition comes from observations made on the field stating that speakers rarely shift directly from one language to another, but instead go through a phase of bilingualism (Appel and Muysken, 2005). Increasingly, however, it seems that formal and natural scientists are recognising this fact as more of their models published recently are considering bilingualism (Heinsalu et al., 2014;Seoane and Mira, 2017).
Third, it speaks for itself that linguists and social scientists rely more on sociolinguistic parameters than formal and natural scientists do. Two parameters appear much more often in publications in journals covering humanities and social sciences: homophily and competence. The former refers to the greater likelihood that unions are formed among people who speak a similar language than among people who speak different languages. This is important since bilingual unions often lead to bilingual children and as noted above, bilingualism is an important vector of language shift (Appel and Muysken, 2005). The other parameter, competence, is probably equally important since the degree of fluency of speakers in one language will have a direct impact on the frequency of its usage, and the less a language is used, the more likely it is to become extinct. Other sociolinguistic parameters often encountered in models developed by linguists and social scientists include network typologies, which refers to the fact that not all speakers are equally likely to talk to each other due to, for example, shared interests and acquaintances, and loyalty, which we discuss below.
Fourth, one crucial aspect of the efforts made towards preserving the world's linguistic diversity consists in making inventories of extant languages in terms of their number of Table 3 Languages covered in studies based on empirical data, with corresponding countries (and regions between parentheses, if applicable) and references. Templin (2019) Welsh Wales Abrams and Strogatz (2003); Wyburn and Hayward (2008, 2018; Fort and Perez-Losada (2013); Zhang and Gong (2013); Isern and Fort (2014); Barrett-Walker et al. (2020) Wales ( speakers (Ethnologue, 2020). Language preservation is, in this sense, a highly applied and quantitative field. Though validation against empirical data is not a prerequisite for mathematical models of language competition to be insightful, it remains an important step that has often not been made, especially in studies published in journals covering the fields of physics and mathematics.
Is the discrepancy between sciences in the techniques used for modelling of language competition due to a lack of engagement among formal and natural scientists with the intellectual framework used by linguists? Part of the explanation for this discrepancy, which was previously raised by Fernando et al. (2010), could lie in the origins of mathematical models of language competition. Many of these models were first developed to answer questions relating to natural phenomena, for example about the transmission of infectious diseases among living species, the spread of particles in space, or predator-prey dynamics (Kandler and Unger, 2018;Prochazka and Vogl, 2018). It will not come as a surprise, therefore, that these models focus in the first place on population dynamics and interactions between individuals and their environment-thus on geodemographic parameters-rather than on sociolinguistics phenomena. Obviously, when applying their models to questions relating to language competition, formal and natural scientists included parameters that reflect sociolinguistic realities. For example, early models included, next to population size, the notion of prestige as an important factor influencing shift from one language to another (e.g., Abrams and Strogatz, 2003). However, these models often failed to also consider the fact that speakers sometimes resist to shifting to a more prestigious language due to loyalty towards their heritage language, an important factor in efforts made towards language preservation (Thomason, 2015). Other early models considered language similarity (e.g., Mira and Paredes, 2005) or language change (e.g., Stauffer and Schulze, 2005). However, these concepts were often implemented based more on mathematical convenience than on phonological or syntactical properties of the languages involved, making them highly vulnerable to criticism from the linguistic perspective.
Comparison of models over time. The first mathematical model of language competition was developed thirty years ago, but until ten years ago, almost all models were developed by formal and natural scientists. The last ten years have seen more models developed by linguists and social scientists, but these continue to form only a minority of all models. Taken as a whole, have models evolved to resemble more to those recently published by linguists and social scientists?
Results presented above suggested change over time in a few characteristics inherent to models of language competition. Table 5 quantifies this change concerning the year 2010, allowing to divide models into two roughly equal numbers. The characteristics considered are the same as in Table 3, which was presented in the previous subsection. Change between the two periods is rather minor concerning the approach taken and the consideration of bilingualism as a separate state. However, there are clear increases in the use of sociolinguistic parameters and on the reliance on data. Part of this increase could be explained by the fact that after 2010, linguists and social scientists started to more regularly develop their own models of language competition. As we saw above, models developed by linguists and social scientists tend to differ considerably in the parameters they use and their use of data. And indeed, as shown in Table 4, the number of publications in journals covering the fields of humanities and social sciences doubled between the period pre-2010 and the period post-2010. Linguists have been increasingly relying on methods developed in the formal and natural sciences over the last years (Bromham, 2017) and this trend seems to apply to the field of language competition as well. By doing so, linguists and social scientists may have contributed to changing practices in the field of mathematical modelling of language competition. However, analyses shown in Table S4 of the Supplementary materials suggest that practices have also started to change among formal and natural sciences, as we notice there an increase in the mean number of sociolinguistic parameters over time. We could thus be witnessing the emergence of a common language between fields concerning mathematical models of language competition.
Conclusion: a new wave of research on mathematical models of language competition? Analysing studies published in the last thirty years on models of language competition, two facets can be distinguished. First, mathematical models of language competition, which were initially developed in the formal and natural sciences, have established themselves as a powerful tool for understanding language competition and language death, but have been slow in reaching the linguistic community. This can be seen in the fact that most models are published in journals covering the fields of physics and mathematics, but also in the fact that those models often lack a complete or accurate representation of the key processes affecting language competition, and in the fact that they are rarely validated against data. These assertions, however, seem to represent the reality of models of language competition less and less well. Since the last ten years, the way language competition is modelled has changed considerably. Models published recently typically consider a greater number of sociolinguistic parameters and are more often validated against data than before. This change is in part because more linguists and social scientists have started to show Table 4 Overview of models characteristics with respect to the discipline covered by the journal of publication. d Totals sum up to 64 since they concern the number of models rather than studies (see Table 2).
interest for models of language competition, harnessing their unique set of expertise, but also in part to the fact that formal and natural scientists themselves have started to adopt the concepts and terminology used by linguists. In conclusion, though a gap can still be noticed between disciplines concerning mathematical models of language competition, important steps have been made in recent years towards reducing it. A new wave of research on mathematical models of language competition is forming and is set to prove instrumental in informing efforts made towards preserving the world's linguistic diversity.

Data availability
The list of all selected articles is provided in Tables S2 and S3 of the supplementary materials.
Received: 9 August 2020; Accepted: 19 November 2020; d Totals sum up to 64 since they concern the number of models rather than studies (see Table 2).