This article will review a range of sociolinguistic concepts and their applications in multimodal studies, in relation to how language has been conceptualized in sociolinguistics. While there are reviews of specific areas of research in sociolinguistics, including prosody and sociolinguistic variation (Holliday, 2021), language and masculinities (Lawson, 2020), and Language change across the lifespan (Sankoff, 2018), there have been few reviews works set out to delineate the most fundamental ontological questions in sociolinguistic studies; that is, what is and what constitutes language? How do sociolinguists perceive language in relation to other semiotic resources that are part and parcel of social meaning-making and social interaction? Relevant discussions are scattered in passing mainly in the introductory sections of various sociolinguistic works, such as Blommaert (1999), García and Li (2014) and Makoni and Pennycook (2005). However, there have not been review articles systematically dealing with the changing perceptions of language in sociolinguistic studies.

These issues are worthwhile to pursue in the sense that though sociolinguistics studies language, yet no reviews were done regarding what on earth constitutes language, especially in relation to a wider range of semiotic resources. What even makes the review more imperative is that in an increasingly globalized and high-tech world, linguistic practices are complicated by the super-diversity of ethnic fluidity, communications technologies, and globalized cross-cultural art.

Centring on the ontological perception of language in sociolinguistics, this article consists of five sections. After the “Introduction” section, the next section will review traditional (socio)linguistic perceptions of language as written or spoken signs or symbols that people use to communicate or interact with each other. The next section will review representative sociolinguistic approaches that place language in multimodal settings which involve the relationship between language and other semiotic resources. They are categorized as the conceptualizations of “language in multimodal construct” and “language as multimodal construct”. These conceptualizations share the common feature that language is not researched merely in terms of written and spoken signs and symbols, but it is probed (1) in relation to its multimodal contexts and (re)contextualization (regarding language in multimodal construct), (2) in terms of its own materiality and spatiality, and linguistic representations of multimodality, for instance, social (inter)action and “smellscapes” (Pennycook and Otsuji, 2015a) which are in turn conflated with linguistic features (regarding language as multimodal construct). The penultimate section and the last section will present a critical reflection and a conclusion of the review, respectively.

Language as written and spoken signs and symbols

What constitutes language(s)? Saussure (1916) distinguishes between langue and parole. The former refers to the abstract, systematic rules and conventions of the signifying system, while the latter represents language in daily use. Chomsky (1965) refers to them as competence (corresponding to langue) and performance (corresponding to parole). Chomsky (1965) assumes that performance is bound up with “grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance” (Chomsky, 1965, pp. 3–4). He advocates that the agenda of linguistics should be the study of competence of “an ideal speaker-listener, in a completely homogeneous speech-community, who knows its (the speech community’s) language perfectly” (in brackets original). His conception of the ideal language rules out the “imperfections” arising from the influences of social or pragmatic dimensions in real language use. This can be seen as the conception of language as innate human competence. By contrast, constructionists have argued that language cannot be separated from the societal and social domain; social reality is constructed through languages (Berger and Luckmann, 1966), and linguistics should take social dimensions into account, as shown by Systemic Functional Linguistics developed by Halliday. These approaches to language studies, nevertheless, do not pay much attention to the ontological issues of language or linguistics concerning what constitutes language, whether languages can be separated from each other, and whether there are different conceptions of language(s).

Sociolinguistics, taking as its departure an interdisciplinary attempt to be the sociology regarding linguistic issues or linguistics regarding sociological issues, faces the ambivalent positioning of whether it should be sociologically oriented (that is, more explanatory) or linguistically oriented (that is, more descriptive) (Cameron, 1990). Also, there are contentions regarding whether more attention should be paid to epistemically linguistic minutiae (as in conversation analysis or CA), or to the macro-social interpretation of ideology not necessarily dependent on the evident orientation of the participants (as in critical discourse analysis, or CDA), as debated in Blommaert (2005) and Schegloff (1992, 1998a/1998b, 1999). As such, more sociolinguists than linguists in other disciplines are concerned with the ontology of language regarding its nature and its relation with broader social structures. In other words, such concerns can, firstly, justify the identity of sociolinguistics being either a branch of sociology, or linguistics, or even more broadly, anthropology. They can also delineate the contour of the macro vis-à-vis micro research subjects: are languages seen as separate systems, or inseparable but relatively fixed systems or an integrated construction in relation to their social dimensions of power, ideology and hegemony?

Such ontological concerns are important, because different approaches to research may be engendered accordingly. For instance, variational sociolinguistics is concerned with the linguistic differences within a language (standard language vis-à-vis its variations in dialects) and examines how these differences are linked to social aspects of linguistic practices, such as gender and social status. These differences within a certain category of language may be placed in the changing situations of various language communities or areas (e.g., Labov, 1963, 1966), or in contextualized pragmatic situations (Agha, 2003; Eckert, 2008). Assumptions of separable or separate languages may be well-encapsulated in the works regarding language ideology and linguistic differentiation, such as the studies by Kroskrity (1998), Irvine and Gal (2000), as well as considerable other works on bilingualism or multilingualism. These works treat language as belonging to different standard systems (e.g., English, French, German, and so on) and can be pursued by “enumerating” these categories. In other words, these standard language systems are seen as having clear boundaries between them, and language can be researched by attributing different linguistic resources to (one of) these systems. The stance of the inseparability of language problematizes the enumeration of languages, by discrediting their explanatory potential in linguistic practices. In pedagogical contexts, transnational students are found using language features beyond the boundaries of language systems (Creese and Blackledge, 2010; Lewis et al., 2012). In the context of youth or urban culture, there are loosely fixed assumptions between language and ethnicity (Maher, 2005; Woolard, 1999). In some globalized contexts, new communications technologies as well as globalization itself are changing the traditional power structure in linguistic practices (Jacquemet, 2005; Jørgensen, 2008; Jørgensen et al., 2011). Furthermore, Makoni and Pennycook (2005), by advocating the disinvention of languages, problematize the process of “historical amnesia” (Makoni and Pennycook, 2005, p. 149) of bi- and multilingualism, and their tradition of enumerating languages which reduces sociolinguistics to at best a “pluralization of monolingualism” (Makoni and Pennycook, 2005, p. 148). However, this does mean that languages cannot be probed as standard categories. It holds a more intricate stance: on the one hand, it problematizes the separation of languages, as language is characterized by fluidity in multi-ethnic settings; on the other hand, it assumes the fixity of the relationship between a given (standard) language and its corresponding identity, ethnicity, and other societal factors (Otsuji and Pennycook, 2010); fluidity and fixity, however, are not binary attributes that exclude each other; they coexist, mutually influence each other in real-life linguistic practices. By the same token, Blackledge and Creese (2010) and Martin-Jones et al. (2012) also hold a dynamic view on language and identity: while language functions as “heritage” (see Blackledge and Creese, 2010, pp. 164–180) and the positioning or maintenance of national identity, the bondage, however, frequently loosens as it is always contested, resisted and “disinvented” (Makoni and Pennycook, 2005). Table 1 illustrates three kinds of sociolinguistic conceptualizations of language.

Table 1 Different conceptualizations of language as a verbal form in social linguistics.

The above discussion briefly delineates how contemporary sociolinguistic studies attempt to capture the complex ways in which the notion of language is construed, resisted or reinvented in and through practices. Most of these approaches are based on the traditional assumption of language as written signs and symbols in its verbal forms. Other forms of resources are generally seen as contexts where these verbal signs and symbols take place. They are contextual facets that contribute to the ideological and sociological corollary of language use, but they are not seen as ontological components in linguistics. Later developments, which integrate multimodal studies into sociolinguistics, show differing stances regarding the ontology of language, as shown in the next section.

Language in vis-à-vis as multimodal construct

Jewitt (2013, p. 141) defines multimodality as “an inter-disciplinary approach that understands communication and representation to be more than about language”. This should be seen as a definition oriented toward social semiotics, in which different semiotic resources are seen as various modes of representation or communication through semiosis. For a sociolinguistic version of the definition, we prefer to interpret it as language in vis-à-vis as a multimodal construct. By using the word “construct”, we would like to point out that multimodality or multimodal conventions enter into sociolinguistic studies because they are socially constructed; that is, sociolinguists research these multimodal dimensions because they are semiotic resources and practices which are constructed by social subjects with power, manipulation and ideology. They are not neutral resources by which people communicate information or by which the process of meaning-making, or semiosis, is realized. Instead, they are a social construct that constitutes the type of Foucauldian knowledge in which sociological power and ideology lie at the core. In this sense, the notions, frameworks, and approaches that we discuss as follows are socially critical in nature and are predominantly related to socially constructed ideologies such as hegemony, power, and identity. As Makoni and Pennycook (2005) note, languages are “invented” by the dominant (colonial) groups through classification and naming in history; they are not neutral practices and they are constructed and invested with ideologies, power and inequality. Sociolinguistics thus needs a historically critical perspective. In fact, since its birth, sociolinguistics has been a discipline focusing on language use in relation to socially critical issues, such as gender, race, class and politics. This focus can date back as early as Labov’s (1963, 1966) ethnographical research on variations of English on the island of Martha’s Vineyard, Massachusetts and in New York City. The sound change or phonetic features are studied in relation to ethnicity, social stratification and class. Agha (2003) and Eckert (2008) also probe the phonetic features or regional change of variations in relation to ethnicity and social and economic status.

In fact, the above-mentioned concerns of sociolinguistics are also consistent with CDA (see Wang and Jin, 2022; Wang and Yang, 2022), especially multimodal critical discourse analysis (MCDA), which also contributes to the research trend in terms of language in multimodality. Kress and van Leeuwen (1996) postulates a set of visual grammar based on systemic functional grammar. Machin (2016) and Machin and Mayr (2012) and other scholars have also adopted MCDA in various types of discourse. Semiotic resources other than language are analysed to reveal the social construct of power, ideology, and inequality in relation to verbal resources (Wang, 2014, 2016a, 2016b). Language in the multimodal construct in sociolinguistics is quite similar to the social semiotic and critical discourse approach to multimodality: language is seen as one type of resource, amongst other non-language resources (visual, aural, embodied, and spatial) in the meaning-making process. The difference lies in that sociolinguistic approaches toward language in multimodality have much more focus on social interaction, power and ideology and their research frequently includes ethnographical data and observations. Language as a multimodal construct, by contrast, sees language as a more integral part of multimodal resources, and vice versa; less distinct boundaries are seen as existing between languages and non-languages. These two trends of conceptions are discussed below.

Language in multimodal construct

To place language studies in the multimodal construct is not a new practice in sociolinguistics. Agha (2003, p. 29) analyses the Bainbridge cartoon, treating accent not as “object of metasemiotic scrutiny”, but as an integral element in “the social perils of improper demeanour in many sign modalities” such as dress, posture, gait and gesture. His discussion demonstrates how language studies can be embedded in a larger multimodal scope. Language is contextualized by its peripheral multimodal paralinguistic sign systems. In Eckert (2008, p. 25), the process of “bricolage” (Hebdige, 1984), in which “individual resources can be interpreted and combined with other resources to construct a more complex meaningful entity”, is linked to the style and language variations which reflect social meaning. She gives examples of how the clothing of students at Palo Alto High School affords them certain types of styles to convey social meaning. Eckert (2001), Coupland (2003, 2007) and other scholars’ research represent the “third-wave” sociolinguistic studies, which see the use of variation in terms of personal and social styles (Eckert, 2012). Language and other semiotic resources constitute a stylistic complex that makes social meaning and constructs social styles and identities together. Goodwin (2007) extensively encompasses multimodal interaction in the examination of participation, stance and affect in a “homework” interaction between a father and his daughter, where gaze, gesture, and the spatial environment are taken into account. Goodwin’s research is partly premised on Bourdieu’s (1991, pp. 81–89) associating bodily hexis with habitus, which is also a notion that is multimodal in itself. The deployment of different bodily modes in different contexts of participation (such as homework, archaeology, and surgery) depends on conventions of various social practices or their respective habitus.

Research regarding language in multimodal construct shares some common ground with the social semiotic approach towards multimodality. First, in communication, there are different modes of resources or semiotic types that convey social meaning and embed ideology. Second, these resources consist of language and “non-language”: the former being written or spoken signs and symbols that social actors use to communicate, and the latter being visual, aural, or embodied ones in that language are situated. Third, meaning-making is done through the orchestration of these resources.

In contrast to social semiotic approaches, with an anthropology-oriented concern, language in the multimodal construct as a sociological and sociolinguistic approach usually bases itself on ethnographical observations of social interaction. Language is seen as a component in social interactional discourse; other semiotic modes or resources are also important resources through which language use is contextualized. To be more specific, language in multimodal construct shows concerns with language as one type of semiotic resource that is placed in multimodal contexts in the following aspects:

First, meaning-making through other resources is seen as “add-ons” to that of language. In other words, language indexes social meaning and ideology in collaboration with other types of resources. An example is Agha’s (2003) analysis of the Bainbridge cartoon in which clothes, demeanour, and even body shape work in collaboration with accent in conveying register and social status. Second, language as one type of social meaning-making resource can be conceptualized in relation to the meaning-making process of other resources. For example, the process of “bricolage” is probed in relation to variations with their indexed styles and social categorization in terms of “gender and adolescence” (Eckert, 2008, p. 458). This concept is used to offer clues regarding how “the differential use of variables constituted distinct styles associated with different communities of practice” (Eckert, 2008, p. 458). Third, language is one of the communicative modes in social interactional discourse. It does not necessarily take the central role, because other types of resources, such as gestures, gaze, and the environment where these actions take place, jointly constitute the social meaning-making process. This can be best encapsulated in Goodwin’s (2007) analysis of the “homework” interaction between a father and his daughter. In this quite mundane interactional discourse, the father uses different embodied actions to negotiate different moral and affective stances through the “homework interaction” with his daughter. Conversation as a linguistic resource plays a role in the interaction, while embodied actions are key factors in affecting these stances.

Language as a multimodal construct

A slightly different approach to studies of language in multimodal contexts is to view it as a multimodal construct: either in the way that language is considered as autonomously constituting the semiotic texture (e.g., in the art form of the “text art” where text is also seen as picture) or in the way that some traditionally assumed extra-linguistic modes are considered as special forms or dimensions of language. This trend of research includes recent studies on language in space, social interactional multimodal discourse analysis, and new concepts or conceptualizations of language in society, as discussed below.

Language in space: semiotic landscape, place semiotics, and discourse geography

Jaworski and Thurlow (2010) review the notion of spatialization, that is, the semiotics and discursivity of space (Jaworski and Thurlow, 2010), and the extension of the notion of the linguistic landscape. By so doing, they frame the concept of semiotic landscape as encapsulating how written discourse interacts with other multimodal discursive resources with blurring boundaries in between.

In their opinion, space is “not only physically but also socially constructed, which necessarily shifts absolutist notions of space towards more communicative or discursive conceptualizations” (Jaworski and Thurlow, 2010, p. 7). Sociological research on space thus is more oriented toward spatialization, “the different processes by which space comes to be represented, organized and experienced” (Jaworski and Thurlow, 2010, p. 6). This spatialization—as represented discursively—is intrinsically multimodal:

Echoing the sentiments of Kress and van Leeuwen quoted at the start of this chapter, Markus and Cameron argue that ‘[b]uildings themselves are not representations’ (p. 15), but ways of organizing space for their users; in other words, the way buildings are used and the way people using them relate to one another, is largely dependent on the spoken, written and pictorial texts about these buildings… Architecture and language (spoken and written) may then form an even more complex, multi-layered landscape (or cityscape) combining built environment, writing, images, as well as other semiotic modes, such as speech, music, photography, and movement…(Jaworski and Thurlow, 2010, pp. 19–20)

The “spatial turn” (Jaworski and Thurlow, 2010, p. 6) in sociolinguistics thus adds the analytical dimensions of multimodal resources to the traditional concept of the linguistic landscape. Written language itself does convey social meaning and ideologies, while it is situated in materiality (the materials it is written on) and spatiality (the places where it appears). The concept of the semiotic landscape blurs the traditional boundary between language and non-language.

Different from social semiotic approaches towards multimodality, researchers of semiotic landscape pay predominant attention to the “metalinguistic or metadiscursive nature of ideologies” (Jaworski and Thurlow, 2010, p. 11). In Kallen’s words, the concept of semiotic landscape starts from the assumption that “sinage is indexical of more than the ostensive message of the sign”. (Kallen, 2010, p. 41); signage indexes ideologies that are embedded in, or indicated by, different types of space or spatiality: city centre, tourist places, districts and so on. Less interest is invested in the process of semiosis regarding how different modes of signs are orchestrated to communicate information, which is one of the primary endeavours of social semiotics (Li and Wang, 2022; Wang, 2014, 2019; Wang and Li, 2022). As such, in ethnographical studies or data analysis, language, materiality, and spatiality are usually seen as interwoven with each other, with no distinct boundaries in between; or at least, boundary-marking is not the primary concern of semiotic landscape.

In the same vein, Scollon and Scollon (2003, p. 2) coin the term “geosemiotics” (or “place semiotics”) which is “the study of the social meaning of signs and discourses and of our actions in the material world”. Their research objects are signs in public places. The conceptual framework of “geosemiotics” sees language as a multimodal construct in terms of the following aspects. First, verbal language is analysed by using social semiotic approaches to visuals. Code preference (regarding which language is seen as “primary” language) shown on signs or buildings is analysed by using Kress and van Leeuwen’s (1996, p. 208) conception of compositional meaning indexed by different positions in pictures. Second, language is seen as multimodal itself. Language on signs or buildings is analysed in terms of the multimodal inscription (see Scollon and Scollon, 2003, pp. 129–142) that includes fonts, letter form, material quality, layering and state changes. Third, the emplacement (referring to meaning-making through positioning signs in different places) in geosemiotics, similar to Jaworski and Thurlow’s (2010) approach towards the semiotic landscape, is predominantly concerned with spatiality and metalinguistic or metadiscursive ideology, rather than the interaction and orchestration of different modes (language vis-à-vis non-language) in semiosis.

Similar to the concepts of semiotic landscape and place semiotics, Gu (2009, 2012) postulates the framework of four-borne discourse and discourse geography. Based on Blommaert’s (2005, p. 2) view of discourse as “language-in-action”, Gu analyses the language and activities in social actors’ trajectories of time and space in the land-borne situated discourse (LBSD): a type of discourse categorized by Gu (2009) according to different types of spatiality as carriers and places where the discourses take place. In Gu’s (2012) conceptualizations, language and discourse are metaphorically spatialized: language is seen in terms of the place where it takes place. Multimodality is evaluated based on space (Gu, 2009). Though it is arguable to what extent language is seen as a conflation of modes or semiotic attributes in Gu (2009), his work demarcates an ambivalent boundary between language and the “non-language”. Also, in “spatializing” language as discourse geography, it represents language and discourse as a PLACE or SPACE metaphor that is multimodal itself. In addition, it analyses the translation between different modes, for instance, the “modalization” of written language into visuals and sounds; visuals are also seen as forms of “modalized” language and vice versa. As such, Gu (2009) also represents the “spatial turn” of sociolinguistics which can be seen as the research trend that regards language as multimodal construct.

In general, the trend to spatialize language and discourse (or the “spatial turn”), with the concepts or frameworks such as semiotic landscape, place semiotics, and discourse geography, treats language as multimodal construct in the following two aspects. First, it focuses on metalinguistic or metadiscursive ideologies that are embedded in different modes of signs or symbols; also, Gu’s research metaphorically theorizes social interaction through multimodality. In other words, it posits that language itself is multimodal or modalizable in meaning-making. Written language has its multimodal dimensions such as facets of its inscription including fonts, letterform, material quality, layering and state changes (Scollon and Scollon, 2003). Different forms of language are multimodal in terms of spatiality: they can be naturally multimodal and aural-visual for instance in televised discourse; written language can also be “modalized” (Gu, 2009, p. 11) into visuals (Gu, 2009). Overall, language is either considered as signs in the spatialized system or actions in trajectories of activities. It is an integral part of multimodal construct, where other modes (visual, gesture, action, and so on) are not peripheral or auxiliary, but frequently they also belong to linguistic resources, for instance, the visual resources in text arts.

Multimodal studies from the social interactional perspective

There are sociolinguistic approaches towards multimodality that combine social interactional sociolinguistics (Goffman, 1959, 1963, 1974), social semiotic approach towards multimodality (Kress and van Leeuwen, 1996), and intercultural communication (Wertsch, 1998). We summarize these approaches as multimodal studies from the social interactional perspective, which include mediated discourse analysis (Scollon and Scollon, 2003) and multimodal interaction analysis (Norris, 2004); the latter grew out of the former.

Multimodal studies from the social interactional perspective focus on people’s daily actions and interactions, and the environment and technologies with(in) which they take place. This trend of research sees discourse as (embedded in) social interaction and sets out to investigate social action through multimodal resources used in daily interaction, such as gestures, postures, and language (see Jones and Norris, 2005). In Norris’s (2004) framework for multimodal interaction analysis, units of analysis are a system of layered and hierarchical actions including the lower-level actions such as an utterance of spoken language, a gesture, or a posture, and the higher-level actions consisting of chains of higher-level actions. Norris (2004) also coins the term “modal density” to refer to the complexity of modes a social actor uses to produce higher-level actions.

The focus on hierarchical levels of actions and the concept of “modal density” entail reflections on the question with regard to what constitute(s) mode and language. Language in multimodal interaction analysis is seen as a type of lower-level action amongst other different embodied resources that are at interactants’ disposal. These embodied resources are seen as different modes such as gesture, gaze, and proxemics. But arguably gestures and gazes in Norris (2004) are also seen as forms of language in interaction as well. Furthermore, regarding the mode of spoken language, Norris (2004) and her other works methodologically treat it as a multimodal construct where the pitches and intonation are visualized through various fonts in the wave-shaped annotation, along with the policeman’s gestures, as shown in Fig. 1.

Fig. 1: Selected example from Norris (2004, p. 8).
figure 1

The policeman’s spoken language is treated as a multimodal construct where the pitches and intonation are visualized through various fonts in the wave-shaped annotation, along with his gestures.

Multimodal studies from the social interactional perspective, similar to other sociolinguistic approaches to multimodality, target the meta-modal or metadiscursive facets of ideology. This is done through a bottom-up approach, that is, examining the general social categories of such as power, dominance and ideology from people’s daily (inter)action. This trend of research focuses on basic units of actions in people’s daily interaction; the conception of mode and language is oriented toward seeing language as multimodal; the methodological treatment of languages also shows this orientation. Multimodal studies from the social interactional perspective are intended to reveal the ideology and power embedded in language as action. Overall, they perceive language as a multimodal construct in social (inter)action.

Metrolingualism, heteroglossia, polylanguaging and multimodality

In the second section of the paper, we mentioned the works on some similar notions such as metrolingualism and polylanguaging. In this section, we will review the latest application of the notion of metrolingualism in multimodal analysis and discuss why other related notions or approaches also encapsulate the conceptualization regarding language as a multimodal construct.

Metrolingualism is a concept postulated by Otsuji and Pennycook (2010) originally referring to “creative linguistic conditions across space and borders of culture, history and politics, as a way to move beyond current terms such as multilingualism and multiculturalism” (Otsuji and Pennycook, 2010, p. 244). Their later works (Pennycook and Otsuji, 2014, 2015a, 2015b) develop the concept and reformulate it as a broader notion encompassing the everyday language use in the city and linguistic landscapes in urban settings.

In Pennycook and Otsuji (2014, 2015b), metrolingualism involves the practice of “metrolingual multitasking” (Pennycook and Otsuji, 2015b, p. 15), in which “linguistic resources, everyday tasks and social space are intertwined” (Pennycook and Otsuji, 2015b, p. 15). Metrolingualism thus is not only concerned with the mixed use of linguistic resources (from different languages), but it involves how language use is involved in broader multimodal practices such as (embodied) actions accompanying or included in the metrolingual process, (changing) space or places where these actions and language use take place, and the objects in the environment. Pennycook and Otsuji (2015b) include an olfactory mode in their analysis of the metrolingual practices in cities. Smell is represented through linguistic or pictorial signs in the city and suburb to constitute “smellscapes” in relation to social activities, ethnicities, gender and races. Metrolingual smellscapes are represented through the conflation of written and visual signs and symbols (e.g., street signs), social activities (e.g., buying and selling, and riding a bus), objects (e.g., spices), and places or spaces (e.g., suburb markets, coffee shops, buses and trains). The conventional distinction between language and the non-language is less important, or not at issue here, as smells have to be represented through language or visuals, and more resources are conceptualized as metrolingual other than languages.

Language in Pennycook and Otsuji’s (2014, 2015a, 2015b) conception of metrolingualism, in this regard, is seen as being integrated into different types of activities and actions; it is also spatialized in the sense that metrolingual practice is seen as involving the organization of space, the relationship between “locution and location” (Pennycook and Otsuji, 2015b, p. 84), (historical) layers of cities (Pennycook and Otsuji, 2015b, p. 140). The spatialization is intrinsically multimodal, which we have discussed in earlier sections.

In relation to metrolingualism, Jaworski (2014) briefly reviews the history of arts and writing, from which he chose the art form of “text art” as his research subject. Referring to the notion of metrolingualism, he sees these art forms as “metrolingual art”, where language interacts with other modes or is seen as part of the visual mode. He suggests that it be useful to “extend the range of semiotic features amenable to metrolingual usage to include whole multimodal resources” (Jaworski, 2014, p. 151). The multimodal representations in text art are realized by mixing, meshing and queering of the linguistic features, as well as by its relation to a “melange of styles, genres, content, and materiality” (Jaworski, 2014, p. 151). In this regard, the multimodal affordances (Kress, 2010; Jewitt, 2009) realized by materiality (e.g., papers, cloths, walls where the language is written), media (e.g., soundtrack, video, moving images, etc.), and styles (e.g., fonts, letterform, layering like add-ons or decorations) are an integral part of the metrolingualism. Subsequently, he postulates that it would be useful to align the concept of heteroglossia with metrolingualism, so as “to extend the idea of metrolingualism beyond ‘hybrid and multilingual’ speaker practices (Otsuji and Pennycook, 2010, p. 244) and move towards a more ‘generic’ view of metrolingualism as a form of heteroglossia” (Jaworski, 2014, p. 152). In this way, it relates the subject position taken by the producers of the text arts to their social orientation or alignment as regards power, domination, hegemony, and ideology in a broader social realm. This is also in line with Bailey’s discussion about heterogliossia: “(a) heteroglossia can encompass socially meaningful forms in both bilingual and monolingual talk; (b) it can account for the multiple meanings and readings of forms that are possible, depending on one’s subject position, and (c) it can connect historical power hierarchies to the meanings and valences of particular forms in the here-and-now” (Bailey, 2007, pp. 266–267; also quoted in Jaworski, 2014, p. 153). Overall, Jaworski (2014) shows how metrolingualism and heteroglossia can be used to analyse features of language and their place in multimodal construct. He also discusses how other notions which are similar to metrolingualism may bear a relationship with multimodality in that they stress “the importance of linguistic features (rather than discrete languages) as resources for speakers to achieve their communicative aims” (Jaworski, 2014, p. 138).

Apart from the concepts of metrolingualism and heteroglossia, Jaworski (2014) touches upon the relationship between polylanguaging and multimodality, but he does not elaborate on it. Jørgensen (2008) demonstrates how polylanguaging is concerned with the use of language features in language practice among adolescents in superdiverse societies. Some of these language features “would be difficult to categorize in any given language” (Jørgensen et al., 2011, p. 25); that is, they do not belong to any standard language system (e.g., English, Chinese, German). In addition, emoticons are frequently used in communication via social networking software. If some of these language features do not belong to any given language, it is difficult to say whether they can be seen as languages. The attention on features of language hence blurs the boundary between language and other semiotic resources. Of course, these features can be seen as a type of linguistic (lexical, morphemic or phonemic) units which still belong to language, but they are frequently used in multimodal meaning-making. Below I use Jørgensen et al.’s (2011, p. 26) example (Fig. 2) to illustrate this.

Fig. 2: Selected example of polylanguaging from Jørgensen et al. (2011, p. 26).
figure 2

The “majority boy” makes use of resources from the minority’s language (the word “shark”).

Jørgensen et al.’s analysis of this example focuses on the “majority boy” using the word “shark”, which is a loan word from Arabic. As a majority member, he is using the minority’s language to which he is not entitled. Judging by the interaction, it can be seen that “both interlocutors are aware of the norm and react accordingly” (Jørgensen et al., 2011, p. 25). As such he noted that one feature of polylanguaging is “the use of resources associated with different ‘languages’ even when the speaker knows very little of these” (Jørgensen et al., 2011, p. 25).

What also needs attention but is not discussed by Jørgensen et al. (2011), is the interlocutors’ creative way to use these features in polylanguaging: the word “shark” is written as a prolonged “shaarkkk” in terms of its phonetic and visual effects. The creative configuration of the language feature “shark” functions to draw other interlocutors’ attention toward the polylanguaging practice. The emoticon “:D” following it is to demonstrate that the speaker knows that he is using language features by violating the “normal” rules; that is, he is using the minority language features to which he is not entitled. The repeated words “cough, cough”, followed by the emoticon “:D”, also demonstrate this.

Polylanguaging, as formulated by Jørgensen et al. (2011), deviates from the tradition of multilingualism to enumerate languages, but focuses on language features that may not belong to any given language. In this sense, the emoticons or creative configuration of words can also be seen as language features—the language features that are creatively used by a virtual community of (young) netizens in communication. These features are multimodal in the following aspects. First, they visualize the polylanguaging practice by creating new forms of words, for instance, the prolonged word “shaarkkk”. This creation itself is in fact also a process of polylanguaging, in the sense that it uses the features of common language, or language in people’s daily life (that is, non-cyber language) to create new cyber-language that is used by members of a virtual community. Second, these language features utilize the multimodal resources of embodiment in polylanguaging. For example, emoticons use different letters or punctuations (as language features from people’s daily written language) to represent different facial expressions and emotions. The repetition of the words “cough, cough”, as “a reference to a cliché way of expressing doubt or scepticism” (Jørgensen et al., 2011, p. 27) also takes on an embodied stance. It shows that the interlocutors are aware that the majority boy is using the minority’s language to which he is not entitled. Hence, this embodied stance indexes the polylanguaging practice. To summarize what is discussed above, polylanguaging entails seeing language as a multimodal construct, as interlocutors creatively adapt language features in daily communication (face-to-face or written communication not involving the internet) or utilize embodied language features when polylanguaging in online communication.

Discussion and a critical reflection

In the sections “Language as written and spoken signs and symbols” and “Language in vis-à-vis as multimodal construct” above, we delineated the ontological perceptions of language in sociolinguistics, including language as spoken and written signs and symbols, language in vis-à-vis as a multimodal construct. In teasing out various trends of approaches, language in sociolinguistics is found to have undergone several stages of development. Language as spoken and written signs and symbols have been pursued in variational sociolinguistics, bi- and multilingualism, and the latest theoretical and conceptual trends of research that do not see language as separate and separable systems or codes. Language in sociolinguistics, however, has been predominantly placed in nuanced and complicated relationships with other semiotic resources. Research regarding language in multimodal constructs sees language and non-language resources as different modes, or types of resources. These different modes have boundaries, and efforts are made to see how each mode combines with each other in meaning-making; language itself is a distinctive type of mode, interdependent with but different from other modes. Research regarding language as a multimodal construct sees language itself as multimodal, language is spatialized (that is, probed in relation to various spatiality and materiality where they appear); in the social interactional approach to multimodality, it is embodied and seen as embedded in a layered and hierarchical system of modes (including gesture, posture, and intonation) in social interaction; in the latest concepts built on languaging, language is regarded as “inventions” (Makoni and Pennycook, 2005), as cross- and trans-cultural practice, instead of separable and enumerable codes, or system. Language is entangled and integrated with objects (for instance, signage, and the materiality where it appears) and multitasking with embodied resources (gestures, talking, and simultaneously doing other things).

Expanding the ontology of language from verbal resources toward various multimodal constructs has enabled sociolinguists to pursue meaning-making, indexicalities and social variations in its most authentic state. Language itself is multimodal, though it cannot be denied that language and other modes do have boundaries and distinctions (yet not always being so). Whenever a language is spoken, the stresses, intonations, and paralinguistic resources are all integrated into it. Focusing on language per se has generated fruitful outcomes in sociolinguistic studies, but placing language in the multi-semiotic resources has innovated the field and it has become the dominant trend in contemporary sociolinguistics. Both languages in or as multimodal constructs have captured the complex ways in which language interacts with multi-subjects, materiality, objects and spatiality. But it may be found that the latest research in sociolinguistics comes to increasingly see language itself as an intricate multimodal construct, as encapsulated by various new concepts and theories including translanguaging, metrolingualism, and polylanguaing, in the contexts of globalization, migration, multi-ethnicity, and new communication technologies. Language is not only seen as separable codes and systems spoken or written by a different group of people, but it entails a wider range of communicative repertoires including embodied meaning-making, objects and the environment where the written or spoken signs are placed. It hence may be speculated that sociolinguistics will be increasingly less concerned with the boundaries of language and non-language resources, but will focus more on the social constructs, social meaning, and language as a force in social change. The enumerating and separating way of studying language and multimodality—that is, delineating inter-semiotic boundaries and focusing on how modes of communication are combined in meaning-making—has generated various outcomes, especially in the field of grammar-oriented social semiotic research and MCDA. However, contemporary sociolinguistic studies have immensely expanded their scope toward a wider range of areas other than discursive, grammatical, and communicative. The three research paradigms regarding language as a multimodal construct reviewed in “Language as multimodal construct” have proved themselves as a feasible approach toward language in social interaction, geo-semiotics, and language use in ethnographical and multi-ethnic settings. The ontology of language in sociolinguistics, in this regard, may be perceived in terms of the sociology and societal facets of multimodal construct, rather than language placed in a multitude of semiotic types or the verbal resources per se. A critical reflection on the ontology of language is one of the prerequisites of innovations in contemporary linguistics, which is also the objective of this comprehensive review.


As can be seen through the above discussion, there are several versions of the perception of language in sociolinguistics. First, perceptions of language as a written or verbal system are moving from, or have moved from, the enumerating traditions bi- or multi-lingualism towards seeing language as an inseparable entity with fixity and fluidity. In other words, new approaches in sociolinguistics come to see languages as comprising different features, repertories, or resources, rather than different or discrete standard languages such as English, French, German and so on. The negotiation, construction, or attribution of ethnicity, identity, power and ideologies through language also has taken on a more dynamic and diverse look. Second, there is sociolinguistic research that places language within the multimodal construct. Language is seen as being contextualized by other multimodal semiotics that is seen as “non-language”. However, more research comes to see language as multimodal construct; that is, language, be it written or spoken, is multimodal in itself as it comprises multimodal elements such as type, font, materiality, intonation, embodied representations and so on. It is also activated (seen as actions or activities) or spatialized in different approaches such as mediated discourse analysis, multimodal interaction analysis, geosemiotics, semiotic landscape, and metrolingualism discussed earlier. Third, these changing perceptions of languages in sociolinguistics result from researchers’ innovative efforts to view language from different perspectives. More importantly, they arise from the fact that language itself is also changing as society changes. As mentioned in the beginning, the world has been increasingly globalized and communications technologies have fundamentally changed the ways people interact with each other. Linguistic practices are complicated by the super-diversity of ethnic fluidity (e.g., the diversity of ethnic groups and the ever-present changes in ethnic structure), communications technologies, and globalized cross-cultural art.

In sum, it can be argued that contemporary sociolinguistics has become increasingly concerned with languaging (trans-, poly-, metro-, and pluri- and so on), rather than languages as a type of (static and fixed) verbal resource with demarcated boundaries separating them from other multimodal resources. Language is multimodal; it is embedded in or represents social activities, places or spaces, objects, and smells. Language in society belongs to and constitutes the “semiotic assemblage” (Pennycook, 2017) that can be better analysed holistically so as to reach an understanding of “how different trajectories of people, semiotic resources and objects meet at particular moments and places” (Pennycook, 2017, p. 269). At a fundamental level of sociolinguistic ontology, this trend of research reflects the changing ways in which sociolinguists come to understand what language is and how it should be understood as part of a more general range of semiotic practices.