## Main

The COVID-19 pandemic paralyzed education systems worldwide; at one point, school closures forced over 1.6 billion learners out of classrooms1. Moreover, widespread school closures are not unique to COVID-19: teacher strikes, summer breaks, earthquakes, viruses (such as influenza and Ebola) and weather-related events cause schools to close. School closures result in large learning losses, which have been documented in North America, Western Europe, Asia and Sub-Saharan Africa2,3,4,5. These learning losses include a combination of knowledge that is forgotten over time, and forgone learning that would have occurred if schools were open. To mitigate learning loss in the absence of school, high-income families have access to alternative sources of instruction—books, computers, internet, radio, television and smart phones—that many low-income families do not6,7,8,9.

Stemming learning loss when schools are closed, particularly in areas where learning resources are lacking in the household, requires outside-school interventions that can substitute instead of complement ongoing instruction. Doing so at scale requires cheap low-technology solutions that can reach as many families as possible. One such low-technology with high reach is mobile phones. In low- and middle-income countries, 70% to 90% of households own at least one mobile phone, while only 15% to 60% of households have internet access10.

In this paper, we provide experimental estimates of minimizing the impact of school closures on learning in the context of the COVID-19 pandemic. Specifically, we focus on basic numeracy interventions and assessment. We evaluate two low-tech mobile phone solutions that leverage short message service (SMS) text messages and direct phone calls to support parents in educating their children. A sample of 4,500 families with primary-school-aged children across nearly all regions of Botswana were randomly assigned to either intervention arms or a control arm. The sample had characteristics that match national averages along several indicators. In one treatment arm, SMS messages provided a few basic numeracy ‘problems of the week.’ A second treatment arm supplemented these weekly SMS messages with a live 15–20 min phone call walk-through of the problems. Each student in this arm received 3 h of direct instruction spread over 8 weeks.

In Botswana, the government closed schools for a planned 6 months starting 20 March 2020. Schools reopened on 17 June, were subsequently closed again after a new wave of COVID-19 cases and have since reopened. Similar waxing and waning of school closures have occurred throughout the year. Even as students returned to school (our data show that 98% of primary school students eventually returned after initial school closures), a double-shift rotation system, where half of the students attend school in the morning and the other half attend in the afternoon, drastically reduced time in school for each student. While the government launched learning programmes on national television and radio stations during this time, our data show that access to radio is relatively low, with only 20% of the control group listening to radio in the status quo.

The lack of access to education during school closures is likely to exacerbate a pre-existing learning crisis in Botswana. Analysis of data from the Southern African Consortium for Monitoring Education Quality (SACMEQ) found that 88% of grade 6 students are at grade 4 math levels or below. A census of two regions further found that only 10% of students in grade 5 could do 2-digit division and 40% could not read a simple 1-paragraph story11.

In this context of a pressing need to improve learning and limited options to receive an education during school closures, low-tech solutions to reliably provide remote education were in high demand. Over 99% of parents reported demand for continued remote learning services even if schools reopened, probably due to uncertainty around whether schools would remain open, reduced school hours and disrupted learning.

Results of the main intervention in the trial show large statistically significant learning differences between treatment and control groups. The combined phone and SMS intervention increased learning by 0.121 standard deviations (95% CI 0.031, 0.210; P = 0.008), although the SMS intervention alone had no statistically significant effect on learning (β = 0.024, 95% CI −0.066, 0.114; P = 0.602). The improvement in learning between the combined phone and SMS group and SMS only group equates to 0.097 standard deviations (P = 0.033). We find a 31% reduction in absolute innumeracy (students who cannot do any numerical operations) and an average level gain on the Annual Status of Education Report (ASER) assessment of 0.15 levels (95% CI 0.039, 0.262; P = 0.008). For households that participated in all sessions, instrumental variables analysis shows learning gains are 0.167 standard deviations (95% CI 0.046, 0.289; P = 0.007). The phone plus SMS intervention also translates to solving other foundational skill competencies, such as solving place values (β = 0.114, 95% CI 0.028, 0.200; P = 0.009). Lastly, our results are robust to a series of sensitivity tests, and explore how effects vary on the basis of whether instruction is targeted to the learner’s learning level.

These results demonstrate that certain types of instruction through mobile phones can provide an effective and scalable method for education delivery outside of the traditional schooling system. The phone and SMS intervention is highly cost-effective, with 0.63 to 0.89 standard deviation learning gains per US$100. These results also reveal that some level of direct instruction, which can be done cheaply and virtually via phone, might be necessary; SMS messages alone are not as effective as phone calls plus SMS messages at producing learning gains (P = 0.033). This is in line with existing evidence showing that SMS messages might best serve as a complement to direct instruction, as in this study, or as an accountability nudge in education systems, such as by helping families track their child’s academic progress12. We further find that parental engagement with the interventions is high: 92% of parents report that their child attempted to solve the problems sent, with slightly higher engagement in the SMS plus phone group of 95%. Parents report 8.7% and 15.2% greater self-efficacy in supporting their child’s learning because of the SMS only and phone and SMS interventions, respectively. Parents also update their beliefs about their child’s learning level in tandem with their child’s learning progress. This suggests that parents are involved and aware of their child’s academic progress. We also find no statistically significant effects on parents’ return to work post lockdown due to the interventions, which alleviates the concern that further parental engagement in their child’s education might crowd out other activities, such as returning to work. Remote instruction compelled several innovations in high-frequency, low-cost remote assessment. To measure learning, we adapted the numeric portion of a test that has consistently been used in the education literature—the ASER test—into a phone-based learning assessment13,14,15,16. We also incorporated time limits, and a requirement that children explain their work to accurately identify their numeracy levels. To measure the reliability of our assessment, we randomly assigned problems measuring the same proficiency to students, a version of a reliability test used in the psychometric literature17. We further disentangled cognitive skills gains from effort effects, which have been shown to affect test scores18. We tested this hypothesis with a real-effort task. We also measured whether learning gains are a matter of familiarity with the content in intervention groups that receive exposure to similar material. We tested this by including new content not covered during the intervention, but which is related, such as fractions. The familiarity hypothesis was also partially tested with randomized problems of the same proficiency. Lastly, we demonstrated the value of high-frequency, remote assessment by using a midline assessment to target content to learning levels for a cross-randomized subgroup of students. Our work contributes to several literatures. The low-tech interventions we test relate to a growing literature on technology and education. Mobile phone SMS messages have been used to supplement adult education programmes in Niger and the USA19,20,21, to help parents teach nascent literacy skills to their children in the USA22,23,24, to enhance parental engagement in children’s education in both Brazil25 and Cote d’Ivoire26, and to help parents monitor their child’s effort and progress in school27,28,29,30,31,32,33,34,35,36 (see ref. 36 for a review). We contribute to this literature by providing evidence on live, direct instruction through phone calls rather than only automated, text-message-based instruction. While mobile phone calls have been used as a medium to deliver health interventions37, before the COVID-19 pandemic, experimental evaluation of phone calls to provide live instruction in education has been limited. Moreover, we contribute to the literature by testing low-tech approaches in a setting where these interventions operate largely as substitutes for schooling rather than as complements (see ref. 38 for a review of the role of technology as a complement or substitute for the traditional schooling system). We also contribute learning data collected via phone-based assessments. This paper also relates to an emerging global priority to improve learning at low cost and at scale. Even before the pandemic shock to education, student learning levels were low, and progress was slow as highlighted by UNESCO and the World Bank. For example, in Kenya, Tanzania and Uganda, three-quarters of the students in grade 3 cannot read a basic sentence such as ‘the name of the dog is Puppy’39. Moreover, a recent review of 150 impact evaluations in education in low- and middle-income countries found that nearly half had no effect on learning40. This trend of limited learning has been referred to as the ‘learning crisis’ by the international education community41. Some interventions that are effective, such as in-person tutoring programmes, can be expensive. For example, a tutoring programme which yielded 0.19 to 0.31 standard deviation learning gains cost US$2,500 per child42. The intervention in this trial, low-cost remote tutoring via phone calls, has similar effective sizes and is two orders of magnitude cheaper. To address learning shortfalls and gaps in education provision, which have been exacerbated by the COVID-19 pandemic43, there is a need for approaches that cost-effectively improve learning on a global scale. In addition to the results presented in this study, other examples of cost-effective approaches that have emerged during COVID-19 include those found in refs. 44,45.

Our results have substantial implications for global policy. Recent estimates from the World Bank suggest that current school closures during the pandemic could cost up to US$10 trillion in net present value46. There is a pressing need to mitigate this fallout on education worldwide. Even as schools start to reopen, this reopening is often partial—for example, with students receiving only half as many hours of instruction. Moreover, as stated earlier, school closures occur in settings beyond the current pandemic, such as in refugee settings and adverse weather events. In moments where a substitute for schooling is needed, particularly for families with fewer resources at home, the low-tech solutions tested in this study have unique potential to reach the masses. The results in this paper provide evidence that remote instruction by phone and SMS messages has the potential to improve learning for primary school children using a low-cost and scalable model when schooling is disrupted. ## Results The study took place in Botswana with 4,550 households. We compare our sample to national-level indicators and find that the final sample has characteristics that match those of a nationally representative sample as described in the sections below. Supplementary Fig. 1 shows a heat map of the location of the children’s schools to demonstrate the distribution of participants across the country. Supplementary Fig. 2 provides a timeline of each step from initial phone number collection, piloting and training, programme implementation and waves of data collection. Supplementary Fig. 3 provides an overview of the experimental design. Of working phone numbers, 71% were reachable and gave consent to participate in the study. We randomized the 4,550 phone numbers into three groups of equal size: a weekly SMS message followed by a phone call, a weekly SMS message only and a control group. We further randomly cross-randomized 2,250 numbers for a midline assessment, and approximately 1,600 of these were randomly selected to receive targeted instruction customized to their learning level using the data collected at midline. The initial randomization to SMS, phone calls and SMS, or the control group was stratified on whether at least one child in the household had previously participated in previous school-based educational programming, a proxy for having recently made substantial learning gains. Each phone number belongs to a caregiver and household. ### Sample characteristics and representativeness We include a few descriptive statistics to describe how our sample, which represents around 15% of all primary schools in Botswana, compares to characteristics of nationally representative samples. Botswana has nine regions in total and our sample covers eight of them, including the most remote and low-literacy regions. Extended Data Fig. 1 compares study sample characteristics with national indicators for a subset of indicators. We find a similar gender split of between 50% and 52% in our sample and nationwide. We also find a similar ratio of rural students in our sample to the national average of 29%. We find similar distributions of learning: the percentage of students who score an A, B and C is 16%, 21% and 41% in study schools, respectively, and 14%, 17% and 36% for all primary schools in the nation. In addition, we collect simple descriptive data on child age, grade and gender in surveys. Around 50% of our sample is female; the average age of students is 9.7; 28.5% of students are in grade 3, 39.1% in grade 4 and 32.4% in grade 5. The average age of caregivers participating in the randomized trial was 35, and 68% of them were female. Our data show that in the control group the median caregiver (48.5%) spends just 1–2 h on educational activities with their child per week. We asked households to nominate the best person to provide educational support to their child during school disruption: 81% of nominated caregivers were parents, 7.6% were grandparents, 7.8% aunts or uncles, and 2.8% siblings. Additional details on the primary caregiver are in Supplementary Information ‘Section A: Sample Characteristics’. For a subsample of parents (n = 209), we also measure parental education level and additional characteristics. This subset is not necessarily representative of the entire sample. However, they were the most responsive parents, suggesting that they probably represent an upper bound of the most literate parents. In the sample, 29% had completed schooling beyond secondary school, compared with a national average of 26% based on data from the World Bank. These measures suggest that the sample of parents have similar education levels to the national average. Moreover, the sample in the study has moderate literacy rates similar to other low- and middle-income countries. While the average secondary schooling completion rate in Europe and Central Asia is over 90%, average completion rates in lower middle-income countries are only just above 70%47. ### Primary outcomes For our two main learning outcomes focused on foundational numeracy skills—average level and place value—Fig. 1 (see also Table 1) shows large, statistically significant learning differences between treatment and control groups. For the combined phone and SMS group, there was a 0.121 standard deviation (95% CI 0.031, 0.210; P = 0.008) increase in the average numerical operation. The learning gains for the combined phone and SMS intervention also translate to other foundational skill competencies, such as gains in place value of 0.114 standard deviations (95% CI 0.028, 0.200; P = 0.009). For households that participated in all sessions, instrumental variables analysis in Extended Data Fig. 2 shows learning gains of 0.167 standard deviations (95% CI 0.046, 0.289; P = 0.007). As we show later, these results are robust to several validity checks. We find no statistically significant effects on average for the SMS-only intervention across all three learning proficiencies—average level, place value and fractions (P = 0.602, 0.837 and 0.309, respectively). These results reveal that combined phone and SMS low-tech interventions can generate substantial learning gains, and that SMS messages alone are not as effective (P = 0.033). This suggests that SMS messages might not be as effective as direct instruction on their own; instead, they might be best placed as a complement to direct instruction through phone calls as in this study or as an accountability nudge for education systems, for example, as reminders for parents to monitor their child’s academic progress12. To put the effect sizes of the joint phone and SMS treatment in context, ref. 48 provides benchmarks based on a review of 1,942 effect sizes from 747 randomized controlled trials (RCTs) evaluating education interventions with standardized test outcomes. In this review, 0.10 is the median effect size. A review in ref. 49 also finds 0.10 median effect sizes across 130 RCTs in low- and middle-income countries. Our findings show effect sizes that are around or above the median effect size, with a relatively cheap and scalable intervention. We further include non-standardized effect sizes in Extended Data Fig. 3. We find a 31% reduction in absolute innumeracy (students who cannot do any numerical operations) and an average level gain on the ASER assessment of 0.15 levels (95% CI 0.039, 0.262; P = 0.008). As a benchmark, a highly effective in-school educational programme, Teaching at the Right Level, achieved average improvement in math ASER levels of 0.09 to 0.13 in Bihar, India15. Furthermore, the learning gains observed were achieved in a total dosage of just 3 h of direct instruction spread over 8 weeks. If effects persist with a higher dosage, up to a 1–2 ASER level gain could potentially be achieved with 20–40 h of instruction, a typical educational programme dosage. Note that learning gains observed might be driven by either learning gains, minimizing learning loss or a combination of both. In Extended Data Fig. 4, we explore heterogenous treatment effects along three dimensions: student gender, student grade and baseline school exam performance. These variables are typical predictors of learning and were available at baseline. We find limited evidence of heterogeneity along any of these margins, with interaction effects showing no significant effect (see figure for fully reported results). This suggests that the programme works equally well across these subpopulations. One possible explanation for the striking lack of heterogeneity in treatment effects is the focus of the intervention on foundational concepts, which applied to nearly all students. Moreover, since the phone calls were a one-on-one interaction, this ensured that no student was left behind. ### Validity checks We run a series of validity checks for our remote assessments and treatment effects. First, we randomize problems that test the same proficiency, a version of a reliability test used in the psychometric literature17. We randomize five problems for each proficiency including for addition, subtraction, multiplication, division and fractions (Table 2). We find that each random problem across all proficiencies is not statistically significantly different compared with a base random problem. Relatedly, we find no difference in treatment effects by the random question received for each proficiency. These tests reveal that the phone-based learning assessment has a high level of internal reliability. Details of statistical results, including P values, standard errors and F-tests are shown in Table 2. We further disentangle cognitive skills gains from effort effects, which have been shown to affect test scores18. In our context, where learning outcomes are measured remotely in the household, effort might be particularly important. We test this hypothesis with a real-effort task requiring one to spend time to think about the question and exert effort or motivation to answer it beyond simple numerical proficiency (see Methods). As shown in column 1 of Extended Data Fig. 5, around 29% of students could answer this question in the control group, and we find no statistically significant changes in effort as a result of any of the interventions (β = 0.016, 95% CI −0.026, 0.058; P = 0.448 and β = 0.021, 95% CI −0.021, 0.0630; P = 0.335). Column 2 shows the effect on average level as a reference. These results indicate that learning gains due to the intervention are largely a function of cognitive skill rather than effort on the test. It is also possible that learning gains are a matter of familiarity with the content in the intervention groups which received exposure to similar material as on the endline assessment. The familiarity hypothesis is partially tested by randomizing problems of the same proficiency, since this exogenously varies the question asked to minimize overlap with any particular question asked during the intervention itself; this does not change our results. We also test the familiarity hypothesis by including content not covered during the intervention, but which is related, such as place values; as noted earlier, we find that in the phone and SMS group, learning gains can translate to this skill. We further explore a psychometric validity assessment known as the known-groups method. This approach quantifies whether test scores detect signal across groups that are known to differ50. We explore differences in learning level by student age and grade in the control group, two of the factors known to most affect differences in cognitive skills in the status quo. We find in Extended Data Fig. 6 that the assessment detects large and statistically significant differences across both dimensions. For each grade, students score around half an ASER level higher (P < 0.001), demonstrating the assessments’ ability to differentiate among known groups (β = 0.476, 95% CI 0.377, 0.576). We include a series of additional robustness checks in Supplementary Tables 1 and 2, including P values using randomization inference51 and a joint test of significance for key foundational numeracy learning outcomes. We find small differences in P values overall, and that overall results hold, probably because of the large study sample size, which reduces the likelihood of these P values differing substantially (see Supplementary Tables 1 and 2 for full statistical results). Lastly, we explore how effects vary on the basis of whether instruction is targeted to the learner’s learning level. As seen in Table 1, we find an effect on average level for targeted content of β = 0.076 (95% CI −0.014, 0.165; P = 0.097) and β = 0.070 on average level for non-targeted content (95% CI −0.021, 0.160; P = 0.130). The direct comparison between targeted and non-targeted instruction has a P value of 0.896. Targeted instruction translated to increased learning when compared with the control and improves understanding of place values by 0.098 standard deviations (95% CI 0.012, 0.185; P = 0.026). Targeted instruction also benefits learning higher-order competencies such as understanding fractions, with 0.093 standard deviation gains against the control (95% CI 0.004, 0.182; P = 0.041). There were no significant effects on learning for non-targeted instruction against the control. The difference between targeted and non-targeted instruction is not statistically significant (see Table 1). ### Mechanisms We explore parental demand and engagement mechanisms. Parental engagement in both interventions is high, with column 1 of Extended Data Fig. 7 showing 92.1% of parents reporting their child attempted to solve any of the problems in the SMS only group (95% CI 0.903, 0.938; P < 0.001), and slightly higher engagement of 95.2% in the phone call plus SMS group (95% CI 0.939, 0.966; P < 0.001). Table 3 column 3 also shows significant increases in parents’ self-efficacy and perceptions as a result of both interventions. Parents report 4.9 (95% CI 0.7, 9.1; P = 0.023) and 8.6 (95% CI 4.4, 12.9; P < 0.001) percentage points greater self-efficacy in supporting their child’s learning in the SMS only, and phone and SMS group, respectively. We also find that parents’ confidence that their child made progress on their learning increases from 6.6 (95% CI 2.4, 10.9; P = 0.002) to 10.5 (95% CI 6.2, 14.8; P < 0.001) percentage points. Moreover, parents of children in the phone call plus SMS group update their beliefs about their child’s learning level in tandem with their child’s learning progress (see Table 3 columns 1 and 2). These results reveal that parents are engaged in the intervention and notice their child’s progress. Parents’ engagement in their child’s math learning might displace other educational activities and non-educational activities, such as returning to work when lockdowns were lifted. In column 6 of Table 3, we find no statistically significant educational crowd-out for both interventions, with no reduction in educational engagement overall (β = −0.001, 95% CI −0.019, 0.017; P = 0.933 and β = −0.002, 95% CI −0.020, 0.016; P = 0.809). In column 5, we find no evidence that parental engagement crowds out non-educational activities such as return to work, with no statistically significant increase in unemployment in the SMS plus phone intervention (β = −2.9, 95% CI −6.3, 0.5; P = 0.092). Altogether, these results show that remote instruction can change parental beliefs and investments, which play an important role in their child’s learning. The Supplementary Information contains details on each of the mechanisms mentioned here, as well details on other robustness checks performed. ## Discussion This paper provides experimental estimates on minimizing the fallout of the COVID-19 pandemic on learning. We show that remote instruction and remote assessment can promote learning. We find that low-tech phone calls plus SMS interventions have large and cost-effective effects on household engagement in education and learning, while SMS messages alone may not. Both low-tech interventions are relatively low cost. The cost was US$5 per child in the SMS group and $19 dollars per child in the phone and SMS group. Given average treatment effects in the phone and SMS group of 0.12 standard deviations, this translates to 0.63 standard deviation gains for the phone and SMS group per US$100. For those who engaged in all sessions of the programme with a treatment effect of 0.17 standard deviations, this translates into 0.89 standard deviations gained per US$100. These estimates are cost-effective relative to the literature. As a comparison, providing additional textbooks in Kenya had no effect on learning; halving class size in Kenya and India also had no effect on learning; and conditional cash transfers in Malawi yielded around 0.1 standard deviation per US$10052. Another relevant cost-effectiveness comparison is tutoring programmes. An evaluation of remote tutoring with college students in Italy during COVID-19 finds large learning gains of 0.26 standard deviations44. A recent review shows that tutoring programmes have been consistently effective across 96 randomized trials53. The phone call intervention in our trial compares closely with some of these tutoring programmes. A prominent example yielded 0.19 to 0.31 standard deviation learning gains and cost US\$2,500 per child42. These comparisons show that the intervention in this study yields similar effects to some of the most effective interventions in the education literature; furthermore, the intervention is contextualized to low-resource contexts and, in some cases, can be more than an order of magnitude cheaper.

Since Botswana is a middle-income country, we note that cost conversions might be necessary when thinking through external validity of cost-effectiveness estimates to a low-income setting54. We consider purchasing power parity conversion rates to assess cost differences across contexts, although this is an imperfect conversion. In 2020, the purchasing power parity conversion to US dollars in Botswana was 4.5 according to World Bank data. In contrast, in Kenya, another example country context, the purchasing power parity conversion is 44. Thus, for the same total cost and assuming similar effectiveness, the cost-effectiveness in Kenya could be up to 10 times higher. Future research could collect cost as well as effectiveness data to directly compare cost-effectiveness of similar approaches across settings.

We also show that mobile phones provide a cheap and scalable way to collect information on student learning levels. We find learning gains are robust to a variety of phone-based robustness tests, including randomized problems across the same proficiency and differentiating effort from cognitive skills with real-effort tasks. We further find that gains persist in the phone and SMS treatment across multiple waves of assessment.

In terms of mechanisms, we find high parental engagement in educational activities with their children, high demand and greater self-efficacy to support their child’s learning, as well as partial gains in accurate perceptions of their child’s level. This finding reveals that parental investments in education can improve their child’s learning outcomes even in a low-resource context.

Of note, this study is limited to one context and tests a subset of potential low-tech interventions. Future research might explore similar studies across contexts to assess how well the approach can be adapted across low- and middle-income countries, with growing evidence on similar approaches already emerging44,45,55,56. In addition, alternative high-access and low-cost technologies could be compared, such as WhatsApp, or phone calls only with no SMS. For example, some studies have compared WhatsApp for English language instruction to in-person instruction57. We present results after a few months in a high-needs school disruption setting. Future research might explore long-run effects. Research shows that short-term school disruption can cause lasting and accumulating damage5. It is possible that stemming learning loss during shocks could have far-reaching benefits, with students able to reintegrate into school instruction quickly; it is also possible that benefits fade if all students catch up as schools reopen.

The results in this study have immediate implications for global policy during the current school disruptions, revealing cost-effective and scalable approaches to stem learning loss during the pandemic. Moreover, school closures occur in settings beyond the COVID-19 pandemic, including teacher strikes, summer holidays, public health crises, during adverse weather events, natural disasters, and in refugee and conflict settings. In moments when schooling is disrupted, particularly for families with fewer resources at home, outside-school interventions are needed. Doing so at scale requires cheap, low-technology solutions that can reach as many families as possible. To this end, the results from this study have long-run implications for the role of technology and parents to serve as partial educational substitutes during school disruption, and to provide cost-effective remote instruction and assessment.

## Methods

Our study complies with all relevant research protocols. It received Institutional Review Board (IRB) approval from Columbia University Teacher’s College (IRB Protocol No. 20-299) and is registered on the AEA RCT registry (AEARCTR-0006044 on 25 June 2020; https://www.socialscienceregistry.org/trials/6044). IRB approval by Columbia University was deemed appropriate and sufficient; the intervention was not sensitive, informed consent was provided by adult caregivers with all participants providing assent, and protocols by the local research institutions that have jurisdiction over education interventions recently decentralized research approvals to regional units and prioritized direct respondent consent procedures rather than providing a centralized IRB protocol. Note that no datapoints were excluded from the analyses.

### Population and sampling

A few days before the government announced that schools were closing as a result of the state of emergency, we collected 7,550 phone numbers from primary schools. This response built on an active presence in schools by Youth Impact (previously known as Young 1ove), one of the largest non-government organizations in Botswana, which was conducting educational programming in partnership with the Ministry of Basic Education. These numbers were collected for students in grades 3 to 5; however, only a subset of these numbers were reachable often due to arbitrary reasons, such as misrecorded digits. After phone collection and verification, facilitators called all numbers to confirm interest from parents in receiving remote learning support via phone. As described in Supplementary Information Section A, the final sample characteristics match those of a nationally representative sample on a host of indicators, such as learning levels and parent education.

For parents who opted into remote learning support, we provided two low-tech interventions: (1) one-way bulk SMS texts with multiple numeracy ‘problems of the week’ and (2) SMS bulk texts with live phone call walkthroughs of the problems on a 15–20 min phone call. Both low-tech interventions were intentionally designed to be simple so as to be digestible via phone by parents, teachers and students, and scalable by governments.

### Treatments

The first intervention was a weekly SMS containing several simple math problems; for example, “Sunshine has 23 sweets. She goes to the shops to buy 2 more. How many does she have altogether?”. The SMS was sent at the beginning of each week via a bulk texting platform. The SMS contained a message with 160 to 320 characters that could fit in one or two texts. Supplementary Fig. 4 shows an example weekly message of practice problems focused on place value.

The second intervention was a weekly phone call ranging in typical length from 5–20 min in addition to the weekly SMS, which was sent at the beginning of the week. On the call, the facilitator asked the parent to find the student and put the call on speaker. This arrangement allowed both the parents and the student to hear the facilitator at the same time and to engage in learning. The facilitator confirmed that the student had received the SMS message sent and answered any questions related to the task. Furthermore, the facilitator provided the student with a math question to go over and practice. The calls served to provide additional learning support as well as motivation and accountability. Supplementary Fig. 5 includes a subset of a sample phone call script.

A subset of phone numbers also received an additional intervention: targeted instruction to each child’s learning level. We used data on learning levels from a midline phone-based learning assessment to send tailored text messages to each student in the fifth week. For example, students who knew addition received subtraction problems, whereas students who knew multiplication were sent division problems. This targeted instruction programme used data collected at week 4 to have near real-time data to target instruction. We collected additional endline survey data and conducted learning assessments which enabled evaluation of the targeted instruction component of the intervention.

The targeted instruction component of the intervention relates to a literature on targeted instruction. An educational approach called ‘Teaching at the Right Level’ (TaRL), a classroom-based intervention evaluated over 20 years targeting instruction by learning level rather than by age or grade, has been shown to produce cost-effective gains in learning across multiple studies. This approach has worked when delivered by teachers or volunteers13,14,15,16,58 and when using adaptive computer software13,59. We tested a particularly low-cost and scalable approach to targeted instruction using phone-based assessments and instruction.

### Data collection

We conducted two waves of data collection. The endline occurred after 4 months and a midline occurred shortly before the halfway point. The endline survey consisted of 17 questions including a learning assessment, parental engagement in educational activities, and parental perceptions of their own self-efficacy and their child’s learning. A portion of the survey was conducted with the parent and learning outcomes were collected by directly assessing the child over the phone.

The learning assessment was adapted from the ASER test, which consists of multiple numeracy items, including 2-digit addition, subtraction, multiplication and division problems. Supplementary Fig. 6 shows a sample assessment. We focused on basic numeracy interventions and assessment. Learning gains might also translate to literacy; however, we did not measure literacy gains in this study. To maximize the reliability of the phone-based assessment, we introduced a series of quality-assurance measures: students had a time cap of 2 min per question to minimize the likelihood of family members in the household assisting the child, and we asked each child to explain their work and only marked a problem correct if the child could correctly explain how they solved the problem. We assigned facilitators to phone numbers using an arbitrary match sorted by phone number order. On average, each facilitator was assigned to about 30 phone numbers. Less than 1.5% of facilitators that provided weekly intervention calls surveyed the same household, providing for objective assessment. While imperfect, these measures provide a level of verification to maximize the likelihood that the test captures child learning. We previously discussed practical steps to implement learning measurement via phone60. We also conducted several checks to validate measures, as described below.

In addition to the ASER test, we evaluated the children’s ability to answer a simple place value word problem such as “Katlego has 77 apples and organizes them by place value. How many tens does she have?” to capture learning outcomes beyond a core set of mathematical operations. We included a series of additional questions to identify mechanisms driving learning gains. This includes a measure of student effort using the following question: “The day before two days from now is Saturday. What day is today?”. Note that this question is not necessarily a pure measure of student effort and could also capture other related capabilities. There is no standard measurement of student effort. Real-effort tasks range from solving mazes61 to adding series of 2-digit numbers62, and other proxies of effort include measuring the rate of decline in performance as the test progresses or the effort exerted while filling out an additional survey63. Since we aimed to differentiate numerical ability from effort, we built on arithmetic real-effort tasks but chose a problem that had easier arithmetic than our numeracy assessment (single digit addition and subtraction), yet required non-arithmetic effort. We also included a higher-order numeracy question to assess whether learning gains translate to material not covered directly in the intervention. In particular, we asked a question on fractions such as ‘$$\frac{3}{8} + \frac{5}{8} = ?$$’. We further conducted a reliability assessment by randomizing five different questions of each proficiency (addition, subtraction, multiplication, division and fractions) to formally assess the reliability of the learning assessment questions17. For example, for a division problem, we had one problem that asked students to divide 68 by 5 and another problem where 38 is divided by 3. Both are 2-digit division problems with remainder. If results for both problems have similar results, given that they measure the same latent ability, this increases our confidence in learning estimates.

We also included questions on parental engagement, perceptions and self-efficacy. We measured learning engagement by asking parents if they recalled their child attempting any of the problems sent over the last few weeks. We included a measure of a parent’s perception of their child’s numeracy level by directly matching their perception of their child’s level to their child’s actual learning level. If a parent estimates the highest level their child can do is subtraction, and their child indeed performs up to subtraction level, we code this as ‘correct’. If the parent overestimates or underestimates their child’s level, we code this as ‘incorrect’. We also captured parents’ confidence in supporting their child’s learning at home and whether they felt their child made progress during the school closure period. We coded a dummy variable for whether parents are ‘very confident’ for both indicators. Additional questions included information on whether the caregiver has returned to work. Finally, demographic questions recorded the child’s age, grade and gender. A sample survey is included in Supplementary Information (Additional Supplementary Information (1)).

We also conducted a similar midline assessment to cross-randomize targeted instruction (described above) and asked about demand for remote learning services if schools were to reopen.

### Empirical strategy

We estimated treatment effects of the SMS only and phone and SMS intervention using the following specification:

$$Y_{ij} = \alpha _0 + \beta _1\mathrm{SMS}_j + \beta _2\mathrm{PhoneSMS}_j + \delta _s + \varepsilon _{ij}$$

where Yij is an outcome for child i in randomly assigned household j. SMS is an indicator variable coded to one for the SMS message only treatment group and zero otherwise, and PhoneSMS is an indicator variable coded to 1 if a household received both an SMS and a phone call and zero otherwise. 𝛿s is a strata indicator, which indicates whether a child participated in education programming immediately before the intervention. α0 is a constant, εij is an error term; β1 and β2 represent treatment effects of the SMS arm as well as the Phone and SMS arm relative to the control group, respectively. We used this specification to measure the impact of each intervention on students’ learning level, engagement, and parents’ perceptions of their child’s level and self-efficacy. We included the primary child identified for instruction in each household level j, which is determined by the caregiver’s phone number and is the unit of randomization. The vast majority of households had one child in the household at the time of the programme. Only 413 households had 2 children in the household, and only 52 households had 3 or more children in the household. Moreover, the number of children was unbiased across treatment arms (a regression which codes the number of kids assessed per household as a continuous variable finds a coefficient of −0.02 fewer kids in any treatment group relative to the control, with a P value 0.329 revealing no statistically significant differences in the number of students per household across arms (95% CI −0.052, 0.011)). As a robustness test, we ran a regression on the main learning results for households with only one participating child per household and found average learning gains of 0.136 standard deviations compared with 0.12 for the full sample (95% CI 0.041, 0.232; P = 0.005).

We also estimated the effect of targeted instruction with the following specification:

$$Y_{ij} = \alpha _0 + \beta _1\mathrm{Targeted}_j + \beta _2\mathrm{NotTargeted}_j + \delta _s + \varepsilon _{ij}.$$

Given randomization and equivalent treatment and control groups, each specification identifies causal effects of the intervention. In Supplementary Tables 3 and 4, we show no statistically significant survey response rate differences between treatment groups compared with the control group or each other, suggesting that endline outcomes are unbiased across study groups. We also show no statistically significant differences between groups on baseline characteristics, providing evidence that randomization was successfully implemented. Note that all statistical tests performed are two-tailed, and given the large sample size in this study, data distribution is assumed to be plausibly normal.

### Statistical power

We conducted power calculations in Stata using estimates based on the literature. We estimated that our study sample is sufficiently large to detect 0.075 standard deviation gains in learning. Given that median effects in the education literature are approximately 0.10 standard deviations48,49, these initial power calculations suggest that we could detect clinically meaningful results informed by the literature with high statistical certainty.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.