Introduction

The flipped classroom (FC), also known as the “inverted classroom”, is a pedagogical approach that first emerged in the 1980s and came into more widespread use in the 2000s (Baker, 2000; Bergmann and Sams, 2012; Khan, 2012). It has gained prominence as advances in technology afford increasing opportunities for ubiquitous access to a variety of online resources. The FC model removes in-class lectures, freeing up classroom time for more in-depth exploration of topics through discussion with peers or problem-solving activities facilitated by instructors. The removed content is often delivered to learners through pre-class materials like video recordings. As a result, in the FC, learning activities that are active and social occur inside the classroom while most information transmission occurs outside the classroom. Today, the FC has been implemented in many different disciplines and in schools and universities around the world (Akcayir and Akcayir, 2018).

Proponents of the FC assert its pedagogical merits on several fronts. First, it alleviates the constraints associated with requiring all learning to happen at the same time and place, furnishing learners with an individualized education that enables flexible online study at their own pace as long as an internet connection is available (Hung, 2014). Second, it allocates class time to the cultivation of learners’ higher-order cognitive skills, emphasizing application, analysis, and evaluation, as opposed to lower-order skills such as knowledge and comprehension (Brinks-Lockwood, 2014; Lee and Wallace, 2018). Third, in contrast to traditional lecturing, the FC is a student-centered approach emphasizing engagement and active learning (Steen-Utheim and Foldnes, 2018), fostering students’ autonomy by endowing them with heightened responsibility for their learning (Brinks-Lockwood, 2014; O’Flaherty and Philips, 2015).

Vygotsky’s social constructivism (1978) has frequently been adopted as a theoretical foundation for designing learning experiences in technologically rich environments (Marzouki et al., 2017), and this framework highlights the particular benefits of technology-enhanced FC pedagogy (Jarvis et al., 2014). As mentioned above, in an FC model, learners can watch pre-recorded videos in their own time before class to remember basic information and understand concepts as they prepare for classroom activities, while the higher-order skills of analyzing, applying, evaluating, and creating can be collaborative and interactive, taking place in class with the guidance of a teacher, and thus facilitating progression within the learners’ proximal developmental zone.

Since its introduction in foreign language teaching (FLT) in China in 2011, the FC has attracted increasing research attention and has been welcomed by foreign language teachers (Yan and Zhou, 2021). Over the past decade, the Ministry of Education of the People’s Republic of China has exerted increasing pressure on higher education institutions to transition from traditional teacher-centered lecture-style approaches to innovative methods integrating technology and the internet, with the goals of enhancing learning, sustaining student engagement, and improving student satisfaction (Ministry of Education of People’s Republic China, 2021). The FC model, combined with traditional face-to-face teaching and personalized online learning, has emerged as a popular strategy in China to meet ministry requirements while delivering cost-effective and learner-centered curricula in response to the increasing student enrollment in higher education.

Despite the wide adoption of FCs in FLT in China, literature reviews about their implementation and effects have been notably scarce in the last decade. A search of the China National Knowledge Infrastructure (CNKI), the largest national research and information publishing company housing China’s most extensive academic database, revealed only three reviews—by Deng (2016), Qu (2019), and Su et al. (2019)—published prior to the end of 2021. These reviews primarily focused on FCs in the context of English as a foreign language (EFL) education, overlooking most of the over 100 foreign languages taught in Chinese higher education. As a result, these reviews fell short of delivering a comprehensive analysis of research pertaining to FCs, and the reliability and generalizability of their findings in non-EFL contexts are questionable. Moreover, Deng (2016) and Su et al.’s (2019) reviews included all published papers without establishing clear inclusion and exclusion criteria. For example, they did not exclude articles that made a passing or token reference to the FC model, short papers of only one or two pages in length, book reviews, or editorials. Qu’s study (2019), on the other hand, was constrained in scope to articles within the Chinese Social Sciences Citation Index (CSSCI), a sub-database developed by Nanjing University of China Academy of Social Sciences Research Evaluation Center and the Hong Kong University of Science and Technology, and thus omitted relevant contributions from other academic journals. The CNKI incorporates both the CSSCI and the Core Journals of China (CJC), an equally significant sub-database overseen by the Peking University Library and experts from relevant institutions. Given the exclusion of the latter, a reevaluation of the scope and potential limitations of Qu’s study is warranted.

Thus, there persists an imperative for a comprehensive synthesis of the extant studies on FCs in FLT within Chinese higher education over the past decade. The restricted visibility of studies conducted in China, owing to their publication in Chinese and confinement to Chinese academic journals, makes it difficult for international researchers and practitioners to access and comprehend this body of literature. Such understanding among the global academic community is necessary for exploring both the strengths and limitations of FCs in diverse cultural and linguistic contexts.

Research method

The current study adopts a scoping review approach based on the methodological framework developed by Arksey and O’Malley (2005) to provide both quantitative and qualitative data for researchers and practitioners.

A scoping review is a relatively new approach to synthesizing research data which has been gaining popularity in many disciplines (Davis et al., 2009; Daudt et al., 2013). It is often undertaken as an independent project when a research area is complex, and no review of that area has previously been made available. A scoping review serves to highlight the relevant literature to researchers with the aim of rapidly mapping the key concepts characterizing a research area and the main sources and types of evidence available (Arksey and O’Malley, 2005; Mays et al., 2005; Levac et al., 2010). According to Arksey and O’Malley (2005), this kind of review addresses four goals: to examine the extent, range, and nature of research activity; to determine the value of undertaking a full systematic review; to summarize and disseminate research findings; and to identify research gaps in the existing literature. The scoping review is increasingly being employed in the field of foreign language education to provide a comprehensive view of FLT studies, identify implications for theory and pedagogy, or inform subsequent in-depth reviews and empirical studies (Chan et al., 2022; Hillman et al., 2020; Tullock and Ortega, 2017).

The difference between a scoping review and a narrative or traditional literature review lies in the transparency of the review process. A narrative review usually depends on the author’s own knowledge or experience to describe the studies reviewed and uses an implicit process to provide evidence (Garg et al., 2008). The reader cannot determine how much literature has been consulted or whether certain studies have been ignored due to contradictory findings. A scoping review, in contrast, uses an explicit, rigorous, and systematic approach to retrieve relevant articles to ensure the transparency and replicability of the data extraction process. For example, the methodological framework adopted by Arksey and O’Malley (2005) for conducting a scoping study comprises five stages: identifying the research questions; identifying relevant studies; selecting studies for inclusion; charting the data; and collating, summarizing, and reporting the results. By presenting the process and results in an accessible and summarized format, reviewers are in a position to illustrate the field of interest in terms of the volume, nature, and characteristics of the primary research, enabling researchers, practitioners, and policymakers to make effective use of the findings.

Figure 1 presents the process of the scoping review in the current study based on the five-stage methodological framework developed by Arksey and O’Malley (2005).

Fig. 1
figure 1

Process of the scoping review.

Process of the scoping review

Identifying research questions

This scoping review is driven by four research questions:

RQ1. What is the current state of FC research in FLT within the context of higher education in China?

RQ2. What research methods and instruments have been employed in the included FC studies?

RQ3. What research foci and trends are displayed in the included FC studies?

RQ4. What are the major findings of the included FC studies?

RQ1 aims to provide an overview of studies on FCs in FLT in Chinese higher education by providing details about the basic information about existing publications, such as the number of publications per year and the distribution of publications by foreign language context. RQ2 leads to a classification of the research methods and instruments used to collect data in FC research. RQ3 explores the topics and trends in FC research over the past decade with the help of the literature visualization and analysis tool CiteSpace5.8R3. RQ4 reveals the effects of the FC model on direct and indirect educational outcomes, learners’ satisfaction with FCs, and the factors influencing the impact of FCs, as documented in the reviewed sources.

Searching for relevant studies

To be as comprehensive as possible in identifying primary evidence and to ensure the quality of the published articles, we searched both CSSCI and CJC in the CNKI database. The key search terms were developed and categorized based on two dimensions according to the purpose of the review. One dimension related to teaching or learning in FCs, while the other dimension related to the types of foreign languages. The key search terms and search methods are listed in Table 1.

Table 1 Key search terms.

As the FC approach was introduced into FLT in China in 2011, the search included articles published between 2011 and 2021. Further inclusion and exclusion criteria were developed to focus on the scope of the review; these are outlined in Table 2.

Table 2 Inclusion and exclusion criteria.

Study selection

Figure 2 shows a process diagram of the study selection process, which consisted of four phases: searching the databases; identifying the total number of articles in each database; screening titles, abstracts, and full texts; and selecting eligible articles for inclusion.

Fig. 2
figure 2

Flowchart diagram for article selection.

The final database search was conducted on January 16, 2022, and resulted in the identification of a total of 333 articles. Subsequently, all potentially relevant articles went through a three-step screening process. The first step excluded 9 duplicates. The second step excluded irrelevant articles by screening titles and abstracts; 37 articles were removed at this stage as they were book reviews, conference proceedings, reports, editorials, or other non-refereed publications. The third step filtered articles by screening full texts; 54 articles were excluded because they made only passing reference to the FC or were not related to higher education. This meticulous selection yielded a corpus of 233 articles suitable for in-depth analysis, each of which was scrutinized by the authors to confirm its suitability for inclusion. During the selection process, the 233 articles were also systematically categorized into two groups: 131 non-empirical and 102 empirical studies. The non-empirical studies were further divided into two subcategories. The first type was literature reviews; the second type was those drawing on personal observations, reflections on current events, or the authority or experience of the author (Dan, 2021). The empirical studies used a variety of systematic methods of collecting materials and analyzing data, including quantitative methods (e.g., survey, correlational research, experimental research) and/or qualitative methods (e.g., interview, case study, record keeping, observation, ethnographic research) (Dan, 2021).

Data charting and collation

The fourth stage of Arksey and O’Malley’s scoping review framework is the charting of the selected articles. Summaries of each study were developed. for all studies, these summaries included the author, year of publication, citations per year, foreign language taught, and a brief description of the outcomes. For empirical sources, details related to the research design, study population, and sample size were also provided. Tables 3 and 4 list the top ten most-cited non-empirical and empirical sources. In Table 4, which references experimental and control groups in results summaries, the experimental group (EG) was the group that took courses in the FC model, while the control group (CG) took courses in a traditional classroom.

Table 3 Summary of top ten most-cited non-empirical studies.
Table 4 Summary of top ten most-cited empirical studies.

Results and analysis

In accordance with the fifth stage of Arksey and O’Malley’s framework for a scoping review, the findings from the 233 included studies are summarized and discussed in the following three sections. Section 4.1 summarizes basic information regarding the included studies; section 4.2 presents a holistic analysis of the research foci and trends over time using keyword clustering analysis and keyword burst analysis; and section 4.3 offers an in-depth content analysis focusing on the categorization of the included studies and discussion of the major findings.

Basic information on the included studies

Distribution by year of publication

As Fig. 3 shows, the first studies on FCs in the field of FLT in China emerged in 2013. The number of such studies began to steadily increase and reached a peak in 2016 and 2017. Although there was some decrease after that, the FC model has continued to attract research attention, in line with global trends. According to Akçayir and Akçayir’s (2018) review of the literature on FCs published in Social Sciences Citation Index (SSCI) journals as of 31 December 2016, the first article about the FC was published in 2000, but the second was not published until more than a decade later, in 2012; 2013 was also the year that FC studies became popular among scholars. A possible explanation for this increase in interest is the growing availability of internet technologies and the popularity of online learning platforms, such as MOOCs and SPOCs (Small Private Online Courses), along with the view of the FC as a promising model that can open doors to new approaches in higher education in the new century.

Fig. 3
figure 3

Number of articles published by year.

Distribution by foreign language

Figure 4 shows the distribution of foreign languages discussed in the FC literature. The FC model was mainly implemented in EFL teaching (93%), which reflects the dominance of English in FLT in Chinese higher education. Only five articles discussed the use of FC models in Japanese teaching, while one article was related to French teaching. Ten non-empirical studies (4%) reported the feasibility of FC models in FLT without mentioning a specific foreign language.

Fig. 4
figure 4

Distribution by foreign language type.

Research methods of the included studies

Figure 5 shows a breakdown of the methodologies adopted by the studies included in our review. Among the 131 non-empirical studies, three were literature reviews, while the remaining 128 (55%) were descriptive studies based on the introduction of the FC model, including descriptions of its strengths and associated challenges and discussions of its design and implementation in FLT.

Fig. 5
figure 5

Methodological paradigms.

Of the 102 empirical studies, 60 (26%) used quantitative methods for data collection, eight (3%) used qualitative methods, and 34 (15%) used mixed methods. It is interesting to note that although quantitative methods are more common in FC studies, seven of the top ten most-cited empirical studies (as listed above in Table 4) used mixed methods. A potential reason may be that research findings collected with triangulation from various data sources or methods are seen as more reliable and valid and, hence, more accepted by scholars.

A breakdown of the data collection approaches used in the 102 reviewed empirical studies is displayed in Table 5. It is important to note that most studies used more than one instrument, and therefore, it is possible for percentages to add up to more than 100%. The survey, as a convenient, cost-effective, and reliable research method, was the tool most frequently used to gain a comprehensive picture of the attitudes and characteristics of a large group of learners. Surveys were used in 79 of the 102 studies—73 times with learners and six times with teachers—to explore students’ learning experiences, attitudes, and emotions, as well as teachers’ opinions. Some studies used paper-based surveys, while others used online ones. Interviews with learners were used in 33 studies to provide in-depth information; one study used interviews with teachers. Surveys and interviews were combined in 24 studies to obtain both quantitative and qualitative data. Other research approaches included comparing the test scores between experimental and control groups (used in 25 studies) or using the results of course assessments (17 studies) to investigate the effects of the FC on academic performance. Learners’ self-reports (9 studies) were also used to capture the effects of the FC on learners’ experience and cognitive changes that could not be obtained in other ways, while one study used a case study for a similar purpose. Teachers’ class observations and reflections were used in eight studies to evaluate students’ engagement, interaction, activities, and learning performance.

Table 5 Instruments for the data collection.

Holistic analysis of the research foci and the changing trends of the included studies

A holistic analysis of the research foci in studies of FCs in China was conducted using CiteSpace5.8.R3, a software developed by Chaomei Chen (http://cluster.cis.drexel.edu/~cchen/citespace/, accessed on 20 February 2022), to conduct a visual analysis of the literature. This software can help conduct co-citation analysis, keyword co-occurrence analysis, keyword clustering analysis, keyword burst analysis, and social network analysis (Chen, 2016). In this study, keyword clustering analysis and keyword burst analysis were chosen to capture important themes and reveal changing trends in FC research.

Keyword clustering analysis primarily serves to identify core topics in a corpus. Figure 6 presents a graph of the top ten keyword clusters identified in the included studies. In this graph, the lower the ID number of a given cluster, the more keywords are in that cluster. As shown in the top left corner of Fig. 6, the value of modularity q is 0.8122, which is greater than the critical value of 0.3, indicating that the clustering effect is good; the mean silhouette value is 0.9412, which is >0.5, indicating that the clustering results are significant and can accurately represent hot spots and topics in FC research (Hu and Song, 2021). The top ten keyword clusters include #0翻转课堂 (flipped classroom), #1大学英语 (college English), #2 MOOC, #3教学模式 (teaching model), #4元认知 (metacognition), #5微课 (micro lecture), #6微课设计 (micro lecture design), #7英语教学 (English teaching), #8 SPOC, and #9 POA (production-oriented approach).

Fig. 6
figure 6

The graph of the top ten keyword clusters.

Keyword burst analysis is used to showcase the changes in keyword frequencies over a given period of time. By analyzing the rise and decline of keywords, and in particular, the years in which some keywords suddenly become significantly more prevalent (“burst”), we can identify emerging trends in the evolution of FC research. Figure 7 displays the 11 keywords with the strongest citation bursts. We can roughly divide the evolution of FC research documented in Fig. 7 into two periods. The first period (2014 to 2017) focused on the introduction of the new model and the analysis of its feasibility in FLT. The keywords that underwent bursts in this period included “MOOC”, “自主学习” (independent learning), “模式” (model), “学习模式” (learning model), “教师话语” (teacher discourse), “茶文化” (tea culture), and “可行性” (feasibility). The reason for the appearance of the keyword “tea culture” lies in the fact that three articles discussing the use of FCs in teaching tea culture in an EFL environment were published in the same journal, entitled Tea in Fujian, during this period. The second period (2018–2021) focused on the investigation of the effect of FCs and the design of micro lectures. Keywords undergoing bursts during this period included “互联网+” (internet plus), “课堂环境” (classroom environment), “教学效果” (teaching effect), and “微课设计” (micro lecture design). The latter two topics (“teaching effect” and “micro lecture design”) may continue to be prevalent in the coming years.

Fig. 7
figure 7

Top 11 keywords with the strongest citation bursts.

In-depth content analysis of the included studies

Along with the findings from the keyword clustering analysis and keyword burst analysis, an open coding system was created to categorize the research topics and contents of the 233 articles for in-depth analysis. Non-empirical and empirical studies were classified further into detailed sub-categories based on research foci and findings. It is important to note that some studies reported more than one research focus. For such studies, more than one sub-category or more than one code was applied; therefore, it is possible for percentages to add up to more than 100%. The findings for each category are discussed in detail in the following sections.

Non-empirical studies

The 131 non-empirical studies can be roughly divided into two categories, as shown in Table 6. The first category, literature reviews, has no sub-categories. The second, descriptive studies, includes discussions of how to use FCs in FLT; descriptions of the process of implementing the FC in FLT; and comparisons between FCs and traditional classes or comparisons of FCs in Chinese and American educational contexts.

Table 6 Classification of non-empirical studies.

The sub-categories of “introduction and discussion” and “introduction and description” in Table 6 comprise 91.6% of the non-empirical studies included in our review. The difference between them lies in that the former is based on the introduction of the FC literature, while the latter is based both on the introduction of the FC literature and exploration of researchers’ teaching experience; the latter might have become qualitative studies if researchers had gone further in providing systematic methods of collecting information or an analysis of the impact of FCs.

Empirical studies

The 102 empirical studies were divided into four categories based on the domain of their reported findings: the effect of FCs on learners; learners’ satisfaction with FCs; factors influencing FCs; or other research foci. Each group was further classified into more detailed sub-categories.

Effect of FCs on learners

Studies on the effect of FCs on learners were divided into two types, as presented in Table 7: those concerned with the direct effect of FCs on learning performance and those exploring the indirect effect on learners’ perceptions. Eight codes were applied to categorize the direct effect of FCs on learning performance, which was usually evaluated through test scores; 14 codes were used to categorize the indirect effect of FCs on learners’ perceptions, which were usually investigated through surveys or questionnaires. We do not provide percentages for each code in Tables 79 because, given that the total number of empirical studies is 102, the percentages are almost identical to the frequencies.

Table 7 Effect of FCs on learners.
Table 8 Learners’ satisfaction with FCs.
Table 9 Factors influencing the effect of FCs.

The results shown in Table 7 reveal that 84 studies of direct educational outcomes reported that FCs had a positive effect on basic language skills, content knowledge, and foreign language proficiency. Of these, 64 were concerned with the positive effect of FCs on foreign language proficiency, speaking skills, or listening skills. This result might be explained by the features of FCs. The main difference between FCs and traditional classrooms is that the teaching of content in FCs has been removed from the classes themselves and is often delivered to the students through video recordings, which can be viewed repeatedly outside of the class. In-class time can thus be used for discussion, presentations, or the extension of the knowledge provided in the videos. It is evident that students have more opportunities to practice listening and speaking in FCs, and foreign language proficiency is naturally expected. Only three studies reported that FCs had no effect or a negative effect on the development of foreign language proficiency, speaking, listening, and writing skills. Yan and Zhou (2021) found that after the FC model had been in place for one semester, college students’ reading abilities improved significantly, while there was no significant improvement in their listening and writing abilities. Yin (2016) reported that after FC had been implemented for one semester, there was no significant difference in college students’ speaking scores.

A total of 96 studies reported positive effects on indirect educational outcomes, including: boosting learners’ motivation, interest, or confidence; enhancing engagement, interaction, cooperation, creativity, independent learning ability, or critical thinking ability; fostering information literacy, learning strategies, learning efficiency, or self-efficacy; or relieving stress or anxiety. The most frequently documented indirect effect of FCs is improvement in students’ independent learning ability. Only one study found that the FC did not significantly increase student interest in the course (Wang, 2015). Similarly, only one study found that students’ anxiety in the FC was significantly higher than that in a traditional class (Gao and Li, 2016).

Learners’ satisfaction with FCs

Table 8 presents the results regarding learners’ satisfaction with FCs. Nine codes were used to categorize the different aspects of learners’ satisfaction investigated in the 102 empirical studies. Some researchers represented learner satisfaction using the percentage of students choosing each answer on a five-point Likert scale from 1 (not at all satisfied) to 5 (very satisfied), while others used average scores based on Likert scale values. For the purposes of our synthesis of findings, if the percentage is above 60% or the average score is above 3, the finding is categorized as satisfied; otherwise, it is categorized as not satisfied.

The results in Table 8 show that among the nine aspects investigated, teaching approach and learning outcomes were most frequently asked about in the research, and learners were generally satisfied with both. Only one study (Li and Cao, 2015) reported significant dissatisfaction; in this case, 76.19% of students were not satisfied with the videos used in college English teaching due to their poor quality.

Factors influencing the effect of FCs

Eleven factors were found to influence the effect of FCs; these are categorized in Table 9.

The results shown in Table 9 indicate that learners’ foreign language proficiency and self-regulation or self-discipline abilities are two important factors influencing the effect of FCs. Learners with high foreign language proficiency benefited more from FCs than those with low foreign language proficiency (Lv and Wang, 2016; Li and Cao, 2015; Wang and Zhang, 2014; Qu and Miu, 2016; Wang and Zhang, 2013; Cheng, 2016; Jia et al., 2016; Liu, 2016), and learners with good self-regulation and self-discipline abilities benefited more than those with limited abilities (Wang and Zhang, 2014; Lu, 2014; Lv and Wang, 2016; Dai and Chen 2016; Jia et al. 2016; Ling, 2018). It is interesting to note that two studies explored the relationship between gender and FCs (Wang and Zhang, 2014; Zhang and He, 2020), and both reported that girls benefited more from FCs because they were generally more self-disciplined than boys.

Studies with other research foci

There were six studies with other research foci, three of which investigated teachers’ attitudes toward FCs (Liao and Zou, 2019; Zhang and Xu, 2018; Zhang et al., 2015). The results of the surveys in these three studies showed that teachers generally held positive attitudes towards FCs and felt that the learning outcomes were better than those of traditional classes. However, some problems were also revealed in these studies. First, 56% of teachers expressed the desire to receive training before using FCs due to a lack of theoretical and practical expertise regarding this new model. Second, 87% of teachers thought that the FC increased their workload, as they were spending a significant amount of time learning to use new technology and preparing online videos or materials, yet no policy was implemented in the schools to encourage them to undertake this work. Third, 72% of teachers felt that the FC increased the academic burden students faced in their spare time (Zhang and Xu, 2018; Zhang et al., 2015). The final three studies include Cheng’s (2016) investigation of the mediative functions of college EFL teachers in the FC, Wang and Ma’s (2017) construction of a model for assessing the teaching quality of classes using the FC model, and Luo’s (2018) evaluation of the learning environment of an FC-model college English MOOC.

Discussion and conclusions

This investigation employed literature visualization to systematically analyze 233 research papers sourced from CSSCI and CJC in the CNKI database, thereby conducting a scoping review delineating the landscape of FC research within the domain of FLT in the context of higher education in China.

Our findings in relation to RQ1 highlight a substantial surge in the number of articles relating to FCs in FLT between 2013 and 2017, followed by a discernible, albeit moderate, decrease. Despite this trend, FC studies continue to be of significant interest to foreign language educators and researchers. This may be attributed to Chinese government policies encouraging higher education reform, increased internet access among educators and learners, and the burgeoning popularity of online courses such as MOOCs and SPOCs. However, the majority of the reviewed FC studies were conducted in college English classes, with only 6 studies on classes teaching foreign languages other than English. It seems that foreign language education in China (and in much of the world) has become synonymous with the teaching and learning of English, with other languages occupying a marginal position, struggling to find space in educational programs. In a multilingual world in which each language offers different possibilities for understanding others, their cultures, their epistemologies, and their experiences, this monolingual approach to FLT is dangerous (Liddicoat, 2022). The promotion of linguistic diversity in foreign language education policies and research is thus imperative. Another gap that needs to be addressed is the paucity of studies on the implementation of FCs in adult education. The FC model is expected to be potentially effective for teaching adult learners because it is similar in some respects to online distance learning.

In answer to RQ2, we found that the commonly used research methods and instruments in studies of the FC model include surveys, interviews, comparisons of academic measures between EGs and CGs, and course assessments. The case study is the least used method, likely due to limitations such as time demand, researcher bias, and the fact that it provides little basis for the generalization of results to the wider population. However, more case studies are needed in future research on FCs because they can provide detailed and insightful qualitative information that cannot be gathered in other ways.

Our findings regarding RQ3 show that research foci within the FC domain have evolved over time from initial exploration and feasibility discussions to a subsequent focus on the design of FCs incorporating micro-lectures based on MOOC or SPOC structures, and then to the present focus on the examination of FCs’ impacts on learners. The results of the keyword burst analysis indicate that these thematic areas are likely to persist as prominent subjects of research interest for the foreseeable future.

In response to RQ4, our in-depth content analysis found that FCs, on the whole, yield positive outcomes, although isolated studies identify limited negative impacts. FCs are most frequently associated with enhancements in student learning performance, fostering independent learning, promoting engagement and cooperation, and mitigating stress or anxiety. The results of this study suggest that well-designed FCs present a significant opportunity for foreign language educators to revolutionize instructional approaches. Furthermore, well-structured FCs can facilitate the development of learners’ potential while concurrently enabling the seamless integration of digital technology into FLT.

Most learners are satisfied with FCs, particularly with the innovative pedagogical approach of reversing traditional classes. FCs are perceived as beneficial for improving learning outcomes, creating an environment conducive to peer interaction, and gaining immediate teacher feedback and support. In addition, students’ interest in classes is enhanced by the rich and diverse online learning materials uploaded by teachers, which can be accessed conveniently at any time in any place. Furthermore, the dynamic and formative online assessment approach is also welcomed by students because it provides immediate feedback and the ability to discuss any problems they have with teachers or peers online or offline.

However, it is worth noting that most of the reviewed studies on FCs focused on one course, usually over only one semester. Students’ increase in motivation or improvements in learning outcomes might, therefore, be a result of the Hawthorne effect. Compared with the traditional didactic lecture format, the novelty of FCs, when used for the first time, might generate excitement among students, thus increasing their attention and enhancing learning outcomes, but such benefits will diminish over time. Therefore, there is a need to examine whether this model is suitable for large-scale implementation and whether its effects might be sustained over longer periods of implementation.

Learners’ foreign language proficiency and self-regulation or self-discipline abilities are the two key factors influencing the effect of FCs. These two factors are closely related; self-regulation or self-discipline is a prerequisite for successful foreign language learning in FC contexts and plays a crucial role in students’ success in the pre-class sessions for which they are personally responsible. In addition, factors such as learners’ attitudes, expectations of and adaptability to the FC model, the learning tasks and learning environment, the teaching organization and assessment methods, and the learner’s gender also have some impact on the effect of FCs. However, due to the limited number of studies, there is not sufficient evidence to warrant the generalization of any of these effects.

This scoping review highlights some potential challenges that need to be addressed for the effective implementation of FCs.

First, despite the benefits of the FC model, FCs are not equally advantageous to all students due to the self-regulated nature of the model. Many learners have reported difficulties in completing their individual online tasks outside the classroom (Yoon et al., 2021). The non-traditional configuration of FCs poses a formidable challenge, particularly for students less inclined to engage in pre-class online activities characterized by a lack of interactivity and for those who are less self-disciplined. Consequentially, students may attend class without having assimilated the pre-assigned material, thereby diminishing the efficacy of this instructional approach. To address this issue, additional support or prompts for students should be provided to remind them of the need to self-regulate their learning. For example, Park and Jo (2015) employed a learning analytics dashboard displaying visual representations of students’ learning patterns derived from login traces, such as login frequency and interval regularity, within the course’s learning management system. These visual indicators allowed students to monitor their learning engagement and performance in comparison to those of their peers.

Second, a persistent problem with FCs is the inability of students to interact with their peers or receive prompt feedback from instructors after completing independent online learning activities. While some researchers identified a need for teachers to provide immediate online feedback or opportunities for peer discussion, our review of the literature shows that scant attention has been given to this issue. Researchers note that under-stimulation, low perceived control over tasks, and delayed or insufficient feedback in online learning contribute significantly to learner boredom or absenteeism (Yazdanmehr et al., 2021; Tao and Gao, 2022). Online pedagogical innovations are needed to solve these new problems. For instance, the establishment of online groups employing chat software like QQ or WeChat could facilitate instantaneous feedback or peer interaction through text-based communication, thereby enhancing learners’ satisfaction with FC courses.

Third, despite recognizing the value of FCs in enhancing the learning experience for students, teachers often lack the requisite training to implement FCs effectively. Insights derived from interviews with teachers, as noted in several of the reviewed studies, reveal a pronounced desire for increased opportunities to learn about the underlying theories of FCs and acquire the skills necessary for the translation of FC concepts into pedagogical practice. Specifically, teachers express a need for guidance in creating engaging instructional videos, determining optimal video length to sustain learner interest, and ascertaining the ideal duration for online quizzes to foster optimal learner performance. Further research is required on strategies and technologies that can help teachers produce high-quality videos despite limited time and technical skills. Support from professional communities, institutions, and technology specialists is thus essential for the provision of effective hybrid offline and online instruction.

Fourth, additional research is required to determine whether workloads for students and teachers are increased by the use of FCs. If this is the case, as found in some of the reviewed studies, then the compelling benefits of FCs would be offset by the extra time needed, making it difficult to draw the conclusion that FCs are more efficient than traditional classes. The majority of language teachers, due to limited skills in technology, online environment management, and online interaction, feel too physically and emotionally overworked to expend more time and energy on enhancing teaching effectiveness. With few teachers having excess spare time, the thought of designing and creating new content might discourage even the most enthusiastic teachers.

Finally, robust empirical evidence is needed to evaluate whether FCs can facilitate students’ higher-order thinking through the use of creative technologies and assessment approaches. Constructs such as creativity and critical thinking are not always easily reduced to measurable items on survey instruments or scores on examinations (Haladyna et al., 2002).

In conclusion, the insights garnered from this study have the potential to enrich the global discourse on the benefits and limitations of FCs in diverse cultural and linguistic contexts. Our review included literature accessible through CSSCI and CJC in the CNKI database, and while this provides a thorough selection of the Chinese literature on the subject, our search approach may have excluded valuable FC-related papers published in other languages and countries. Consequently, different search criteria might yield different selection and data results. Future researchers are encouraged to undertake more comprehensive literature reviews encompassing broader databases to fill the gaps in our work and to augment the depth and breadth of knowledge in this domain.