Introduction

With the advent of the intelligent education era, emerging technologies such as learning analysis, data mining, cloud computing, and artificial intelligence have been integrated into education. Data are the basis of combined education with these technologies for tracking behavior threat assessments (CDT, 2021a; Ge et al., 2021). However, data ethics, which focuses on how to collect, manage, share, and use data in a secure and equitable way and to avoid harm to individuals and public profits, is a widely debated issue when it comes to the combination of technology and education (CDT, 2021b). Recently, many proposals and documents about data ethics have been published by international organizations and institutions. For example, the United Nations has realized that information and communication technologies (ICTs) create opportunities for illegal data collection, surveillance, interception, and other violations of human privacy and rights. Therefore, a proposal on how to protect data privacy and ethics in the digital age was adopted (United Nations, 2020). UNESCO also published a proposal on some recommendations related to artificial intelligence ethics (UNESCO, 2021). It asserted that artificial intelligence caused new ethical problems in business, media, and education. In particular, there is an urgent need for countries and related organizations to form laws about data ethics during data collection, usage, sharing, saving, and deletion. The 2021 EDUCAUSE Horizon Report (Information Security Edition) analyzed six key technologies and practices, which may affect higher education information security in the future (EDUCAUSE, 2021). Cloud Vendor Management mainly requires cloud vendors to provide ethical data security protection for software tools, resources, and other aspects. Endpoint Detection and Response is the monitoring and protection of data privacy and security of terminal equipment. The purpose of Multifactor Authentication (MFA)/Single Sign-On (SSO) is to ensure the security of data ethics involved in user login accounts and identity verification. Preserving Data Authenticity/Integrity mainly emphasizes the guarantee of the authenticity and integrity of data in transmission, storage and processing. Research Security is aimed at ensuring the security of data generated by learners’ participation in teaching research or experiments. Student Data Privacy and Governance is about data privacy and security protection regarding the collection and use of learners’ personal information. This report discusses security and management of student data privacy and other content related to educational data. China has also participated in the formation of the Beijing consensus (UNESCO, 2019) and other proposals. Overall, data ethics is an important factor when technology is combined with education.

For research purposes, it is necessary to define the concept of educational data ethics. Data ethics in education are defined in the existing research. For example, the Center for Democracy & Technology (CDT) defines data ethics as the evolving principles for the collection, management, sharing and use of educational data in a safe and fair manner that avoids harming individual learners or the public interest (CDT, 2021b). The General Services Administration’s (GSA) definition of data ethics is different from the former. It does not target a specific field or situation. However, it better guarantees how humans can collect, manage, or use data in the freest way and maximize public interest on the premise of avoiding risks. Furthermore, data ethics is seen as a basis for judgment and accountability (General Services Administration, 2020). In the Open Data Institute (ODI), data ethics is considered a branch of ethics, which mainly emphasizes the evaluation function and the value of data ethics for data application, such as data collection, sharing, and use. It can also restrict data application that may affect individuals and society (Open Data Institute, 2021). However, data ethics is seen as a constraint on behavior in the process of applying data, such how scholars’ decision-making was affected by their concepts of educational data ethics, as reviewed by Mandinach. Mandinach and Jimerson (2022) maintain that educational data ethics can ensure the rational use of data and correct analysis and interpretation of conclusions. In general, educational data ethics refers to the principles to be followed in the process of collecting, managing, sharing and using educational data, which can help to restrain behavior, assist in decision-making and evaluate practices, so as to realize that educational data application can securely maximize individual and public interests on the basis of equality.

Furthermore, many countries and regions have focused on data ethics according to their actual cultural situations. America’s General Services Administration (GSA) defined the concept of data ethics in the draft of the Data Ethics Framework (General Services Administration, 2020). Government Digital Service (GDS) in the United Kingdom also published the Data Ethics Framework (Government Digital Service, 2020). Open Data Institute (ODI) developed Data Ethics to help assess the influence of data collection, data sharing and data usage Canvas (Open Data Institute, 2021). Since data is an important cornerstone of the role of science and technology, the Ministry of Science and Technology of China has strengthened the governance of science and technology ethics. (Ministry of Science and Technology of the People’s Republic of China, 2021). Meanwhile, China has also promulgated laws to provide legal protection for data ethics, such as the Data Security Law (People’s Republic of China, 2021a), the Personal Information Protection Law (People’s Republic of China, 2021b), and the Internet Data Security Management Regulations (Cyberspace Administration of China, C, 2021). In general, the field of education is only a small part of these. However, there are still many unsolved problems in the actual implementation of related principles and strategies, especially in education. Therefore, it is particularly important to focus on educational data ethics, as learners are the main data collection objects, and there are vulnerable groups among learners such as children (CDT, 2021b). Additionally, the relevant principles or laws of educational data ethics are relatively loose at present regarding implementation in actual situations. Therefore, it is difficult to obtain detailed information that can guide the solution of educational data ethics problems in a given situation (Rosa et al., 2022). There are different cultural, political, economic and other backgrounds in different countries or regions, and it is necessary to adapt and change international principles of privacy and data protection (Hoel & Chen, 2018). There is an urgent need for a systematic review of relevant international research to help propose specific and systematic solutions to educational data ethics problems in China, which is a developing and government-oriented nation. In doing so, the purpose of this study is to review and analyze the international literature using bibliometric analysis, which can provide more information and implications to solve problems related to the educational data ethics in China. Therefore, this study can clarify the research hotspots, trends, and evolution processes of the educational data ethics. The urgent problems and corresponding solutions can then be summarized and proposed.

Overall, these studies proposed horizontal solutions or countermeasures to the ethical issues of educational data from different perspectives which are relatively one-sided and lack systematization. Therefore, this study hopes to propose solutions for educational data ethics in the Chinese context through bibliometric analysis.

The research question are as follows:

  1. (a)

    What are the dilemmas and solutions of educational data ethics in the international context?

  2. (b)

    What do dilemmas and solutions tell us about the development of educational data ethics in the Chinese context?

It is hoped that the relevant results obtained can be learner-centered and combined with the actual situation in China to form solutions to problems related to educational data ethics. There are three parts, as follows. (1) First, a bibliometric analysis of the existing literature was carried out. (2) Based on the review of the existing literature, further bibliometric analysis of relevant literature was conducted to sort out the dilemmas and solutions of educational data ethics in China. (3) The third part describes the conclusions and implications that can be drawn from this study.

Research methods

Data sources

The purpose of this research is to clarify the main problems regarding the educational data ethics and to formulate solutions to problems about data ethics with learner-centered thinking. The researchers chose keywords, which all related to “Data”, “Education” and “Ethics” as constraints for the literature’s themes. The keywords used in the study are from a literature review related to educational data ethics until 2019, and the literature search formula is (Hakimi et al., 2021):

“digital trace data OR digital data OR digital footprint OR learning analytics OR education data mining OR big data OR artificial intelligence OR predictive analytics OR adaptive learning OR critical data OR datafication OR education analytics OR educational data OR data science OR data-driven (Theme) and ethics OR ethic* OR privacy OR surveillance OR data protection OR data ownership OR dataveillance OR data sharing OR bias OR fairness OR accountability OR agency OR autonomy OR vulnerability OR anonymity OR inequality OR justice OR at risk OR governance OR ownership (Theme) and learn* or educat* or school or university or MOOC or distance learning or pre-school or primary school or prekindergarten or kindergarten or junior school or high school or secondary school or college or student or classroom or education technology or early years or instructional systems (Theme) and 2023 or 2022 or 2021 or 2020 or 2019 (Published years) and Thesis or Meeting or Published online or Review paper or Books (Type of literature) and Web of Science Core Collection (Database)”.

There were 88764 search results obtained from the Web of Science from 2019 to 2023 (because there are some e-prints published online, the 2023 was contained). To improve the efficiency of literature screening, we used ASReview software, which was used in many review articles, to perform screening based on machine learning algorithms (van de Schoot et al., 2021). ASReview software uses active learning to sort the relevance of all literature in real time based on the annotation of papers according to the filtering and inclusion criteria by the researcher. As such, the ASReview can push those with strong relevance to the front for priority annotation in real time. When 20 consecutive occurrences are marked as not relevant to the topic, it signifies that the rest of the literature is not relevant to the topic, and the filtering of the literature can be stopped. After filtering, 385 papers were included in the final analysis. Thus, the ethical issues of educational data were extracted and summarized, as shown in Fig. 1.

Fig. 1: Flowchart of literature filtering.
figure 1

Literature filtering based on literature filtering criteria and literature screening tool (https://asreview.nl/).

In order to effectively identify the relevant research on the educational data ethics from the perspective of intelligent education, the filtering and inclusion criteria for papers are as follows:

  1. a.

    It should contain ethical issues about educational data, such as students’ privacy.

  2. b.

    It can be a theoretical article on educational data ethics, such as proposing a framework to avoid the emergence of issues of educational data ethics.

  3. c.

    It can also be an article that uses technology to solve the ethical problems of educational data, such as using unbiased algorithms to analyze educational data to achieve fairer teaching decisions.

  4. d.

    It can also be a case article on the solution path in educational data ethics issues, such as exploring the data collection and analytic specifications of digital education applications based on data privacy protection laws or regulations.

This study used the bibliometric method to analyze the published articles. CiteSpace (6.1.R4) was used to show the research hotspots and trends related to the educational data ethics. CiteSpace is a visualization bibliometric analysis tool for academic literature review developed by Professor Chen Chaomei of Drexel University. It can analyze the hotspots, evolution, and development trends of a certain discipline or a research field (Wang et al., 2016). Combining deep reading and analysis of key literature, the current dilemmas and solutions regarding educational data ethics were proposed.

Analysis framework

In order to effectively respond to the two research questions, the analysis of this study can be divided into two main phases, as shown in Fig. 2.

Fig. 2
figure 2

Research framework for the entire study.

The first is the bibliometric analysis of the articles to find relevant research hotspots, research trends, and important scholars and research institutions. The second is to further read and analyze related literature in depth based on the results of the bibliometric analysis, and to summarize the dilemmas and solution strategies of educational data ethics.

There are three sections of bibliometric analysis of educational data ethics based on Citespace. First, the results of the research hotspots analysis can confirm the precision and feasibility of the purpose of this research. This is mainly concluded by the co-occurrence network of keywords formed by clustering based on the simultaneous occurrence frequency of two pairs of keywords in the literature. Then, the evolution process and the results of the development trend of the related research lay the foundation for the subsequent countermeasures. This is mainly based on the timeline chart, which shows the first time keywords included in the bibliometric analysis were used in the literature. Finally, important scholars and institutions were identified, which helped us clarify what articles published by these scholars and institutions should be analyzed. This was mainly done by drawing the cooperative network of authors and institutions to come to the corresponding conclusions.

Further literature was explored to summarize the dilemmas caused by educational data ethics in the existing research and practice as well as the corresponding solutions, and to put forward adaptive solution strategies for educational data ethics problems in the Chinese context.

Bibliometric analysis of educational data ethics

Research hotspots analysis

In this study, the keyword co-occurrence network was formed by Citespace to mine the research hotspots of educational data ethics in the past 5 years, as shown in Fig. 3, such as “learning analytics”, “data science”, “systematic review”, “artificial intelligence (ai)”, “big data”, “artificial intelligence literacy”, “gender bias”, “attitude” and “educational data analytics”. The keyword co-occurrence network divides the literature into different categories based on the frequency of simultaneous occurrence in different articles. In addition, the size, homogeneity and average publication year of different clusters are presented in detail in Table 1.

Fig. 3: Co-occurrence network of keywords.
figure 3

From 2018 to 2023 (top 10 clusters).

Table 1 The keywords co-occurrence network with top 10 clusters (Generated with Citespace 6.1.R4).

The “learning analysis” is based on the collection and examination of learning data to mine the rules of learning and education, which can be further used to improve learners’ learning performance (Baker & Inventado, 2014). Therefore, as learning analytics becomes a hot research topic, it is important to note that the ethical issues caused by the collection, processing and analysis of sensitive and private data cannot be ignored during learning analysis application (Jones, 2019). This also can be seen from Fig. 3 and Table 1, which illustrate that “learning analytics” has been the hottest research hotspot of educational data ethics research in the past 5 years. “# 0 learning analytics” is the largest scale cluster, which covered the largest number of keywords in the cluster. Moreover, “learning analytics” is also the node with the highest number of times in the shortest connection paths located between the nodes, which represents that this node is an important research turning point and the bridge between different clusters. This is because Citespace can represent the nodes with high between centrality in the keyword co-occurrence network in purple circles. In Table 2, the centrality value of “learning analytics”, which is equal to 0.24, is the highest among all keywords. “Learning analytics” is also the most frequently occurring keyword in Table 2. As mentioned earlier, learning analytics needs to be based on the collection, storage, processing, and analysis of learning-related data. Therefore, the application of learning analytics inevitably involves issues of educational data ethics.

Table 2 The top 10 keywords with details, such as Frequency, Degree, Centrality, etc.

The “data science” is the foundation of educational data ethics. On the one hand, it can provide theoretical and practical support. On the other hand, the ethics education included in data science education also lays the foundation for avoiding educational data ethics issues. Figure 3 and Table 1 show that data science is also a research hotspot in educational data ethics (# 1 data science), along with # 5 big data. This is because big data-related research, especially that which deals with ethical issues, can be applied to educational big data.

Intelligent education is an important trend in the future development of education, and ethical issues are inevitable in the integration of artificial intelligence and education. “Artificial intelligence” refers to the issues of educational data ethics involved in the educational application of artificial intelligence, such as educational inequity due to the inherent bias of intelligent algorithms. “Artificial intelligence literacy” refers to which articles solve the ethical issues of educational data by improving and cultivating artificial intelligence literacy. The two clusters of “# 3 artificial intelligence” and “# 6 artificial intelligence literacy” appearing in the keyword co-occurrence network proves the importance of solving the ethical issues occurring during the integration of AI and education. Table 3 shows that the former is more focused on the issues arising from the application of AI in the field of education, while the latter concentrates on how to carry out AI education so that students have the literacy and ability to deal with the issues of educational data ethics.

Table 3 The top 10 keywords in #3 artificial intelligence and artificial intelligence literacy.

There is some additional information. “Higher education” means that some studies focused on the educational data ethics in the context of higher education. Table 2 shows that researchers have focused on educational data ethics in higher education in the past 5 years (Freq = 46). Gender bias is a type of issue in educational data ethics. “Gender bias”refers to the gender bias in the processing and analysis of educational data due to the gender bias of the algorithm designers themselves, such as the assumption that girls are bad at physics. Table 1 shows that “gender bias” is an emerging research hotspot, as the average year of publication for the gender bias category is 2022, and the homogeneity of the literature in this category is relatively high (Silhouette = 0.995) in Table 1.

Research development evolution and trend analysis

The development trend for the educational data ethics-related research is shown in Fig. 4. Based on this figure, the burst of keywords for educational data ethics in the last 5 years can be divided into three stages.

Fig. 4: Analysis of keyword citation intensity on educational data ethics.
figure 4

Keywords: these come from the keywords listed in the article; Year: This represents the average year of literature; Strength: The values of strength represent the strength of the keywords' burst, and the larger, the stronger; Begin/End: it represents the year the keyword started/stopped bursting.

First, the keywords “information”, “big data analytics”, “architecture”, and “academic library” emerged between 2019 and 2020. “Information” and “architecture” mean that there are ethical issues caused by extracting structured information from educational data. “Big data analytics” represents the educational data ethics that arise when applying the paradigm of “big data analytics” to educational data research, such as privacy or sensitive data leakage. The “academic library” refers to educational data ethics that occur in the context of academic libraries, such as permissions for the collection of personal reading records. In this period, the research related to the educational data ethics focuses on the ethical issues related to the extraction of information from data.

Second, between 2020 and 2021, the keywords “privacy principle” and “university” emerged. The “privacy principle” means that researchers focused on forming privacy principles which can be used for protect the privacy of students. “University” means that studies focused on the context of universities. Figure 4 shows that studies began to focus on how to apply privacy and ethics-related principles and strategies to address issues of educational data ethics, especially in higher education in this time frame.

Third, between 2021 and 2023, the keywords “systematic review”, “user acceptance”, “educational data mining”, “teacher”, and “decision-making” have emerged, which foreshadows the future research trends of educational data ethics. “Systematic review” means that there is a critical mass of existing research and time sufficient to support the drawing of common conclusions from the available studies. “User acceptance” represents the beginning of research focused on the impact that educational data ethics may have on the user acceptance and experience of learners. “Educational data mining” represents research that applies big data mining and analysis into educational contexts and pays attention to the educational data ethics that can be associated with educational data mining. “Teacher” represents relevant research that focuses on what teachers need to do and what responsibilities they have in protecting educational data ethics. “Decision-making” refers to the ethical issues involved in the application of educational data to the specific context of decision-making. For example, teachers may rely too heavily on the results of educational data analysis, ignoring the learner’s ability to develop. Therefore, it can be seen from Fig. 4 that researchers began to systematically review past studies and tried to find the paths or cases that could be applied to the resolution of educational data ethics issues in this stage.

Moreover, the emergence of these keywords shows that the relevant research at this stage focuses more on the issues of educational data ethics in actual educational contexts and that this trend will continue in the future. These studies explored include the acceptance of teachers and students as educational subjects in educational data collection, the sensitive and private information involved in the mining of educational data, and the explain-ability and trustworthiness of the results of educational data analysis for educational decision-making. These studies focused on education and teaching practice, which can be applied to future problem-solving and solutions.

Analysis of research scholars and institutions

The main goal of the analysis of research institutions and scholars’ co-relationship is to discover those with important influence, which can help to reach further conclusions. In order to identify scholars with significant influential roles in educational data ethics in the last 5 years, this study formed an author collaboration network based on Citespace, as shown in Fig. 5, and counted the top 12 scholars’ publications in the last 5 years, as shown in Table 4.

Fig. 5: Author collaboration network for research related to educational data ethics between 2018-2023 generated by Citespace.
figure 5

Note: The larger the node circle, the more citations.

Table 4 The top 12 scholars’ publications in the past 5 years.

The largest collaborative network of authors focuses on the application of learning analytics to higher education, while issues of educational data ethics are considered one of the hindering factors. For example, ethical challenges that can be encountered in learning analytics applications and higher education contexts are explored (Alzahrani et al., 2022). The scholars who are often associated with Jones focused more on the ethical and privacy issues involved in learning analytics (Jones et al., 2020). Scholars who cooperate more with Prinsloo are more concerned about how to solve the ethical problems of educational data and protect students’ privacy. Scholars working with Fonseca David and Amo Daniel focus more on the ethical issues that can be encountered during the implementation of learning analytics in web contexts. Knight Simon and other scholars mainly explored the pedagogue’s perspective about issues related to educational data ethics (Shibani, Knight, & Shum, 2020). Viberg and other scholars, however, focused on exploring the issues of educational data ethics from the learners’ perspective (Viberg, Engstrom, Saqr, & Hrastinski, 2022).

In order to identify research institutions that have had a significant impact on educational data ethics in the last 5 years, this study formed an author co-occurrence network through Citespace (as shown in Fig. 6) and counted the top 11 research institutions’ publications in the last 5 years (as shown in Table 5). Combining Fig. 6 and Table 5, it can be seen that Monash University is the research institution with the highest number of publications on educational data ethics in the last 5 years. Collaborations between this institution and research institutions have focused on researching educational data ethics issues involved in learning analytics, particularly in higher education and in classroom contexts. For example, they explored how educational data ethics issues are addressed and resolved from the perspective of pre-service teachers (Prestigiacomo et al., 2020). The research institute, in collaboration with Beijing Normal University, focused on reviewing the current state of development and dilemmas of the integration of technology into education and found that educational data ethics is one of the factors causing the dilemma (Tlili et al., 2021). The research institute in collaboration with the University of Eastern Finland, KTH Royal Institute of Technology mainly focused on addressing the educational data ethics issues arising from the integration of AI and education, with particular emphasis on the need to train designers of ethical AI applications (Vanhee & Borit, 2022). The related research institutions working with Indiana University have focused not only on the ethical issues raised by the integration of AI into educational contexts (Morley et al., 2021), but also on data ethics and privacy issues in other educational contexts such as libraries (Jones et al., 2020).

Fig. 6: Cooperation network of research institutions.
figure 6

Related to educational data ethics.

Table 5 The top 11 institutions’ publications in the last 5 years.

In general, the research hotspots of educational data ethics research in the past 5 years are data ethics issues related to learning analytics, mainly concentrating on higher education. The studies also showed a tendency towards educational data ethics issues in specific learning contexts, such as educational data ethics issues arising from the application of artificial intelligence in education. In addition, the study further identifies researchers and research institutions with significant status in educational data ethics studies in the past 5 years. The results of the above bibliometric analysis lay the foundation for summarizing educational data ethical issues and proposing ways to avoid and solve educational data ethical dilemmas in the Chinese context.

Discussion

Based on the above analysis, the reviews of studies on educational data ethics in the past 5 years can be divided into two main groups. The first group focuses on the ethical or privacy issues related to educational data in specific educational contexts. Most of these reviews address ethical and privacy security issues arising from learning analytics, such as the review of the current state of educational data ethics in learning analytics and the research trends (Tzimas & Demetriadis, 2021). The second group focuses on the current state of research for specific technologies and education integration, such as a review of the current state of research and practices regarding AI applied to educational contexts, where ethical issues are one of the challenges caused by AI applications in educational contexts (Zhai et al., 2021). In contrast to these reviews, the present study is not limited to specific educational contexts or the integration of specific technologies into the field of education but includes all theoretical and practical articles related to traditional offline classrooms as well as online education.

Fewer studies have specifically addressed the issues of educational data ethics, although there is one study that addresses the issues related to tracking data collection, processing, and analysis in digital education (Hakimi, Eynon, & Murphy, 2021). In contrast to Hakimi and other scholars’ articles, the present study analyzes articles on educational data ethics between 2019 and 2023 to examine the research hotspots, research trends, and important researchers and research institutions. As such, we read the important literature, identified the dilemmas caused by educational data ethics, and developed adaptive strategies to avoid or solve educational data ethics issues in the Chinese educational context.

This helps to analyze the research related to educational data ethics more comprehensively. In turn, it provides more valuable insights for the avoidance of and solutions to educational data ethics issues in the Chinese context. Therefore, we further reviewed related literature, which can help summarize the current ethical problems and solutions of educational data.

Current educational data ethics issues

Current educational data ethics issues can be summarized as follows: (a) Privacy is violated during the collection, storage, and sharing of educational data. (b) The predictive function of educational data deprives learners and teachers of their ability to choose independently. (c) The application of educational data leads to a preference for evaluation by data standards, but there is a lack of “forgetting ability”. To overcome the barriers caused by the educational data ethics, this study uses China’s context as a backdrop for proposing some solutions to educational data ethics.

Violation of privacy during data collection, storage, and sharing

On the basis of educational data application, there are violations of the privacy of educational subjects in the process of data collection, processing, storage, and sharing. Some questions need to be addressed, such as “why are data collected?” “Is it really necessary to collect data?” “What is the purpose of data use?” As these questions have not been sufficiently answered, the best means of protecting the collected data is still unclear, and there is a lack of systematic and effective norms and guidance regarding practice.

Specifically, violations of the privacy of educational subjects can be classified into three aspects. First, to realize personalized teaching, it is necessary to thoroughly collect a large amount of information from learners in real time. However, there are many problems in the informed consent process before data collection, such as deception and ambiguity (Rubel & Jones, 2016). Secondly, there are hidden dangers in the process of data storage, such as lack of anonymization of sensitive information (Kularski & Martin, 2021). Data storage without effective data protection measures may lead to leakage of private information, which causes teachers and learners to become “transparent” and poses a huge threat to them. Third, there may be a risk of privacy leakage during data sharing. In the process of data and analysis results sharing, the private information of learners and teachers is fully exposed through continuous cross-validation of data from various sources by data mining and related technologies and algorithms of artificial intelligence (Yang & Liang, 2017). This poses a threat to the privacy and security of learners and educators. Therefore, during data collection, storage and sharing, researchers and practitioners should pay attention to ethical issues of educational data collection, storage and sharing, and prevent the leakage of privacy from educational data application.

Prediction of educational data deprives people of the ability to choose independently

One of the analyses and extractions of the main functions of educational data is prediction. However, the use of the prediction function requires correct attitudes and perspectives. That is, the functions of prediction, clustering, and anomaly detection provided in educational applications should be seen as aids in educational and teaching decision-making information. In other words, they cannot directly replace the educators and learners in making educational decisions. Excess dependence on the results of educational data mining and analysis will deprive learners and educators of the ability to make independent decisions.

For students, the possible impact caused by the educational data ethics can be divided into two aspects. Firstly, prediction results will reduce the learner’s opportunities for trial and error in the process of learning. Although this greatly improves learning efficiency and accuracy of prediction, it also greatly weakens the possibility of learners’ independent thinking and trial-and-error innovation during learning (Zhong & Tang, 2018). This hinders the development of innovative thinking skills. Secondly, mass dissemination based on data will create an information cocoon, narrow the horizons and minds of learners and teachers, and cause them to be too engrossed in the topics they are interested in. This is because the information recommended by the algorithm will show information, which lacks diversity (Li, 2021). In turn, this can have a negative impact on fostering the promotion of morally sound values and outlook in young people.

The predictive function of educational data may help educators free themselves from teaching concerns, but it will push them into another constraint. That is, educators must deal with various types of intelligent teaching systems, analyze various data, and obtain and judge data prediction results. Learning data between different systems and platforms cannot be exchanged, so it is difficult for teachers to directly integrate data between different systems or platforms to gain a comprehensive understanding of learners, which requires teachers to have higher skills and literacy (Holstein et al., 2019). Therefore, this hinders the promotion and development of normal teaching activities. Similarly, if teachers rely too much on the prediction function provided by educational data, it will reduce their deep thinking, which may eventually affect their professional development (Magdy & Dony, 2020). It will also indirectly affect learners’ learning performance. Furthermore, based on educational data processing and analysis results, such as learning analytics, educators’ decision-making can have unconscious or conscious biases towards learners (Rubel & Jones, 2016), which in turn leads to biases in decision-making or even mistakes.

Using data as an evaluation standard but lacking the ability to “forget”

Another ethical problem with educational data application is that evidence-based evaluations based on educational data may be overly data-istic. At the same time, there will be a lack of ability to “forget”, because data storage is distributed and permanent (Mayer-Schönberger, 2011).

First of all, the results of educational data analysis cannot fully ensure the accuracy of data prediction because the educational data used for prediction are not complete. Specifically, it is difficult to include all the factors that affect the accuracy of the prediction result (Schouten, 2017). As such, further consideration is still needed as to whether prediction and evaluation of educational data must be accurate, and whether the algorithm can be responsible for the error of the prediction result (Essa, 2019). As in other fields, data-driven decision-making can lead to discrimination due to private information in the data (Mittelstadt et al., 2016). The results of the prediction and evaluation of educational data contain a large number of private details about the learner, such as disease status and family economic status. This information is likely to cause teachers to have a ‘bias’ or discriminate against learners. These are unresolved ethical problems that can be brought about by educational data-based evolution. Moreover, predicting and customizing future learning trajectories based on the results of educational data analysis undoubtedly obliterates the possibility of learners’ acquired mutations (Mayer-Schönberger & Cukier, 2014). As learners are developing people, we cannot box learners into specific evaluations, labels, and categories.

Secondly, a comprehensive collection and permanent storage of educational data means that the labels of learners will be solidified. As Richards and King (2015) proposed in their identity paradox, although analysis of data can help identify and classify identities, it also affects and constrains the formation and change of identities. Since the records of educational data will be stored for a long time, the misbehavior of the learner in the immature period will also be kept for a long time, which may increase the bad influence of the learner’s mistakes on their future performance. It may even obliterate the learner’s motivation for follow-up development (Tang & Zhang, 2020).

Therefore, the integration of technology and education should form human-friendly products and services during the pursuit of intelligence in education. Moreover, it is necessary for the evaluators to judge learners from a developmental perspective and have appropriate tolerance for learners’ faults. This also corresponds to the main idea of this paper that educational data should be learner-centered.

Strategies and recommendations for ethical issues in Educational Data

With the aim of solving the ethical problems faced by educational data, there is an in-depth analysis of relevant literature. In general, it is necessary to fully consider the exertion and scheduling of all parties to achieve collaborative innovation and then form a new ecology of educational data for symbiosis and co-prosperity. Specifically, strategies and suggestions for solving the current ethical dilemma of educational data can be divided into three directions:

First, from the macro level, government departments need to lead the construction and implementation of a systematic and standard system of educational data. On this basis, the constraints on the collection, storage, and sharing of educational data can be realized. The construction of an educational data center or an educational digital base can also be promoted. Furthermore, the foundation of educational data application can be provided. The second is to make efforts from the “research-practice dual channel”, which can coordinate effective forces of all parties. In this way, educational data applications can fully play their role and show their value in compliance with ethical norms. Third, it is necessary to carry out education on the educational data ethics, which is mainly to establish correct concepts and attitudes on the use of educational data. This not only means accepting educational data with an open mind, but also fully guaranteeing autonomy and innovation.

Establish systematic standard systems or platforms of educational data from the macro level

The role of educational data in personalized learning is unquestionable, but data security is an important premise. First, different industries and companies use different data for varying purposes, methods, and standards. In order to ensure data exchange, government departments must formulate and promote educational data standards for data collection so that educational data can be standardized and systematized. This will promote the further development of educational data applications, such as the successful experience of establishing data standards in Kentucky, USA (Wang & Dan, 2021). At the same time, government departments can establish effective, strong and perfect protection frameworks and supervision mechanisms to monitor and restrain educational data so that educational data can be used within a controllable condition (Ma et al., 2017). At present, there are relevant policies and documents around the world, so ethical issues of educational data have been addressed and resolved to a certain extent. For example, the UNESCO Institute for Information Technology in Education (IITE, 2020) jointly issued the “Personal Data Security Technical Guide for Online Education Platforms”, which shows that far from enough has been done to meet the needs of the era of intelligent education. This study calls government departments to speed up the establishment of educational data standards, so that the process of collecting, storing and sharing educational data can be rationalized and established as lawful.

This section analyzes the value of educational data from three platforms. With the help of establishing data centers and digital bases, government departments can conduct convenient and fast data aggregation, governance and application. When combined with the education cloud platform, safe and efficient data aggregation and governance can be achieved. Among them, the establishment of digital bases can meet the basic functional requirements of educational data collection, transmission and calculation. These digital bases can also help to develop various educational digital applications.

However, because digital bases have a tendency toward publicity and natural monopoly, construction and operation need to be led by government departments (Gao, 2021). The establishment of data centers can standardize data management, promote integration between various business modules, simplify the educational data sharing and exchange process, and improve efficiency (Li, Shu et al., 2021). This, in turn, helps obtain accurate, intelligent, and personalized decision-making based on educational big data. It is particularly important that establishment of data centers can build a safe and credible data system to ensure the safe circulation of educational data by technological means, such as disaster recovery backup and rights management (Gu & Li, 2021). The core of the education cloud platform mainly provides platforms and services such as single-sign-on, unified identity authentication, unified portal, unified interface, and unified data center. This platform can help to realize the effective integration and management of the authority of educational data. At the same time, the education cloud platform can realize the integration of software and applications, which enables seamless switching of multiple applications (Yang & Yu, 2015).

Additionally, 5G+ can provide a foundation of communication for the construction of the digital base, data center, and education cloud platform. Therefore, vigorous development and construction of 5G+ education are needed. For example, 5G+ can be used to build a multi-interactive intelligent communication platform, which can provide better interaction, learning support, immersive experience, etc. (Wang & Wang, 2020). Owing to the large scale of relevant construction, it needs to be established with standardized and unified data standards, such as interface specifications. Furthermore, the construction and management of 5G+ education require relatively high technical skills, complex structure, extensive coverage, and a long cycle, so it should be led and carried out by the relevant construction departments.

Construction of a New Ecology of Educational Data by the Dual Path of “Research-Practice”

With corresponding policy support provided by states and government departments, there are two ways to solve real ethical problems faced by educational data: the practice of applying educational data and academic research. In other words, it is necessary to fully realize the transformation of research outcomes into practical applications. This coincides with the international call for education stakeholders to collaboratively address ethical issues related to educational data (Siemens, 2019).

On the one hand, managers of educational data must strengthen their education in data ethics. They not only need to cultivate and develop their awareness and understand the concepts of data ethics, but they also need to exercise and improve the professional quality and data literacy of data management (Onorato, 2013). Suppliers of educational data-related products and services should also have professional trainings. For example, at the beginning of product design and development, they should consider how to effectively protect data privacy and security (Robinson & Gran, 2018). At the same time, they need to pay attention to and investigate data privacy needs. Furthermore, the relevant data service personnel also need to improve their awareness and knowledge of the educational data ethics (Jones, 2019). Then, they need to make sure that the storage, use, and sharing of data is open and transparent. Especially in the process of user informed consent, more consideration should be given to the use of words and expressions that users can understand, rather than overly professional agreement content (Siemens, 2019). It is necessary to control data disclosure to some extent in the process of providing products and services made with educational data (Li, Chen et al., 2021), which can help avoid hidden dangers or negative impacts on users’ privacy.

On the other hand, it is also necessary to research disciplines related to ethical issues of educational data, which is the main aim of researchers and practitioners. Research departments and institutions must explore and study relevant ethical disciplines based on educational data, such as investigating and discovering the positive relationship between the standardization of educational data and the utilization rate of educational data applications (Pentland, 2014). Education intelligence, distributed blockchain storage, secret key encryption, and other technologies are the methods and technologies that can protect the privacy of educators and learners (Zeng et al., 2020). These are widely considered important in the field of educational research. The decentralized nature of the blockchain in particular enables credible, reliable, and highly private data storage and sharing.

Many blockchain application scenarios in the field of education have been studied to build a relatively complete architecture and implementation path, such as the combination of blockchain and credit bank (Yuan, 2021). The roles of the technologies involved are shown below. First of all, chained timestamping blocks and hash values can ensure the traceability and formability of data. Furthermore, distributed ledger technology can help realize decentralization, form flat architectures of systems, and form a consensus mechanism. Based on this, all participants can directly access educational data under certain permissions. In addition to this, smart contract technology can help realize the automatic credit transformation, which has been determined in the contract. In addition, the combination of blockchain and federated learning can solve the problem of “data silos” between different institutions or organizations. In other words, it can realize data sharing between different organizations based on specific policies or standards. This is mainly achieved through the decentralization feature of the blockchain. At the same time, the blockchain can also support the sharing of differentiated private multiparty data models to ensure communication security and privacy protection in the process of data sharing and use (Li, Yuan et al., 2021).

At present, many other technologies can be combined with blockchain to ensure data security and privacy protection, such as Secure Multi-Party Computation (SMC), Zero-Knowledge Protocol, Ring Signature, Homomorphic Encryption and Trusted Execution Environment (TEE) (Zhang, Wang, & Li, 2021).

Moderate application and forgetting of educational data

It is also important to think about how educational data ethics should be guaranteed and realized. This can start with data ethics education. On the one hand, the training and development of educational data ethics should be strengthened in learner-centered education and teaching so as to ensure the appropriate application of educational data. On the other hand, for the data record itself, reasonable algorithms should be used to achieve moderate forgetting during data-based and evidence-based education evaluation.

First of all, teachers and learners are not only the direct beneficiaries of educational data applications, but they are also directly threatened by ethical issues of educational data. Therefore, to effectively enhance the awareness, development and ethical cultivation of educational data, it is necessary to provide ethical education about educational data, which can effectively improve teachers’ and learners’ self-control of data. On the one hand, it should enhance the subject’s awareness of educational data protection (Chen et al., 2018). It is especially important to avoid unintentional violation of privacy in the use of educational data when teachers and learners use products and services during the educational process. On the other hand, in order to help educational subjects and understand the correct way to use educational data, it is necessary to know what educational data is and why we need to use it. Only by fully understanding the usage and values of educational data can it be better implemented (Hazelbaker, 2016). Moreover, it is necessary to foster correct understanding in the educational subject of how to appropriately use educational data. In other words, it is necessary to control and moderate the use of educational data. Learners must accurately comprehend the concept of being a learner. It is especially important to enhance learners’ ability to resist psychological stress and strengthen the construction of students’ psychological quality when faced with the application of educational data (Zhou & Tang, 2020). Additionally, teachers should not be too data-based or too strict in the implementation of learning evaluations. It is necessary to avoid dimensional dataism and appropriately relax data-based and evidence-based evaluation standards so as to achieve a fair and just evaluation for learners and implement the ethical concept of learner-centered educational data.

Secondly, due to the accessibility, permanence, and comprehensiveness of digital memory in the digital age, a lack of moderate forgetting will result in a panorama of prisons for teachers and learners. That is to say, the lack of moderately forgotten educational data will confine the educational subject to a digital cage, which can be analogous to the iron cage of social development proposed by Weber in The Protestant Ethic and the Spirit of Capitalism. In this situation, it is easy to trigger the “chilling effect”. In other words, the learners will reduce related activities and avoid conscious or unconscious mistakes in the learning activity process because these will be recorded and permanently retained. As such, learners’ opportunities for normal learning behavior will be affected. The moderate forgetting of educational data can be realized from both technical and ethical aspects. In terms of technology, the originality of data forgetting emphasizes that forgetting will make evaluation objects tend to return to the previous state of data generation (Yang, 2020), so related algorithms guided by this core idea can help achieve moderate data forgetting. Forgetting should be regarded as a virtue in the digital age. Owing to the advent of the self-media era, release, dissemination and diffusion of information are often fast and difficult to monitor, so it is important to form corresponding community norms (Mayer-Schönberger, 2011). Overall, the ethical regulation of all producers and users of educational data is crucial.

Conclusion

In general, with the continuous intervention and integration of emerging technologies in education such as artificial intelligence and big data, educational data is the core of constraining and balancing development of educational intelligence. Educational data ethics, as an important obstacle, is a common dilemma faced by researchers in related fields. Therefore, through bibliometric analysis and in-depth literature review, this study analyzes research hotspots, the evolution process, and development trends in the fields of the educational data ethics and confirms that related issues of educational data ethics are important factors that affect educational informatization, intelligence, and development. At the same time, the research concludes the three main dilemmas and the corresponding strategies in current research on ethics of education data that are further sorted out from a detailed literature reading.

The difficulties are as follows. (a) The privacy of educational subjects during data collection, storage and sharing is violated. (b) The prediction function of educational data deprives educational subjects of their ability to choose independently. (c) Data are used as an evaluation standard but lack the ability to be forgotten. There are three learner-centered strategies, which provide research directions and foundations for researchers and practitioners in related fields: (a) Establish systematic educational data standard systems and related platforms from the macro level. (b) Make efforts to build a new education data ecology through dual “Research-practice” channels. (c) Implement appropriate ethical education and educational data application and forgetting during evaluation. On the one hand, stakeholders must have a correct understanding of data ethics; on the other hand, intelligent technology must be able to automatically guarantee data ethics. These results can provide inspiration for future research and practice in educational data ethics in China.

However, this study also has shortcomings; that is, this study mainly uses the bibliometric analysis of the literature and in-depth literature review to demonstrate viewpoints and lacks the support of specific research practices. Therefore, future research is expected to make breakthroughs in the practical direction of solving ethical problems of educational data. It is important to make theoretical and practical contributions to the application of educational data, which can help break the ethical barrier in education.

Problems of educational data ethics

With the continuous integration of technology and education, the problems of educational data ethics have also received extensive attention from relevant researchers, but they have often focused on specific educational technologies or methods. First, some researchers focused on the ethical issues related to the use of video in education (Peters et al., 2021). Peters et al.’s study emphasized the necessity of carefully considering questions caused by videos in education. For example, why is video needed? How can consent and anonymity be achieved? How can the videos be processed to protect data privacy? Second, learning analysis is mainly based on the analysis of digital traces or footprints generated in the learning process, so there are significant issues surrounding data privacy and security, such as the prediction of learning trends (Mathrani et al., 2021). Based on the literature published from 2011–2018, Tzimas and other scholars summarized data ethics problems caused by learning analysis into three aspects: teaching intervention, the contradiction between learners’ needs and privacy and security, and mismatch between technology updates and regulations of laws (Tzimas & Demetriadis, 2021). Third, big data also involves many ethical issues. Baig et al. (2020) literature review of 40 primary studies published from 2014 to 2019 shows that ethics is an important direction of development. Some countries and regions have researched the ethical issues involved in big data as follows. On the one hand, many people do not have an accurate understanding, especially of the predictive ability of big data. On the other hand, stakeholders disregard morality and politics when they use social networks, mobile applications, and other ways to collect and use large amounts of data (Chen & Quan-Haase, 2018). In general, there are many problems with educational data ethics, but most of these studies research educational data ethics about a specific technology combined with education.

Solutions to educational data ethics

In order to solve the problems of educational data ethics, existing researchers often think about solutions to the ethical problem of educational data from different perspectives but lack systematization. From the perspective of educational leaders and researchers, some researchers have explored how to fairly, ethically, and effectively use AI and other technologies in education. It is crucial for all parties to join together to develop strong educational ethics. (Roschelle et al., 2020). Owing to the particularity of China’s political, economic, and cultural background, the central government’s policies play an important role in solving ethical issues of educational data (Knox, 2020). From the perspective of researchers and developers of educational data-related technologies, some researchers have thought about how to realize the protection of educational data ethics. Shum and other researchers explore how developers can avoid or respond to data ethics issues during their work (Shum & Luckin, 2019). Furthermore, from the perspective of educational technology companies, some researchers have focused on the challenges, solutions, and needs regarding data ethics faced by education companies. For example, Kousa and Niemi (2022) claim that research and development of artificial intelligence education products should be preventive, safe, explainable and equal. Similarly, facing the challenges of data ethics caused by artificial intelligence requires the collaboration of multiple stakeholders, including companies, consumers, educational institutions, researchers, funders, and managers. In any case, research on educational data ethics must be learner-centered (Tzimas & Demetriadis, 2021). It is necessary to effectively safeguard data ethics, combine cultural backgrounds, and collaborate with various stakeholders facing different situations. In general, there is an urgent need to form a more systematic educational data ethics solution, focusing on a specific country or context.