The nutritional quality of cereals varies geospatially in Ethiopia and Malawi

Micronutrient deficiencies (MNDs) remain widespread among people in sub-Saharan Africa1–5, where access to sufficient food from plant and animal sources that is rich in micronutrients (vitamins and minerals) is limited due to socioeconomic and geographical reasons4–6. Here we report the micronutrient composition (calcium, iron, selenium and zinc) of staple cereal grains for most of the cereal production areas in Ethiopia and Malawi. We show that there is geospatial variation in the composition of micronutrients that is nutritionally important at subnational scales. Soil and environmental covariates of grain micronutrient concentrations included soil pH, soil organic matter, temperature, rainfall and topography, which were specific to micronutrient and crop type. For rural households consuming locally sourced food—including many smallholder farming communities—the location of residence can be the largest influencing factor in determining the dietary intake of micronutrients from cereals. Positive relationships between the concentration of selenium in grain and biomarkers of selenium dietary status occur in both countries. Surveillance of MNDs on the basis of biomarkers of status and dietary intakes from national- and regional-scale food-composition data1–7 could be improved using subnational data on the composition of grain micronutrients. Beyond dietary diversification, interventions to alleviate MNDs, such as food fortification8,9 and biofortification to increase the micronutrient concentrations in crops10,11, should account for geographical effects that can be larger in magnitude than intervention outcomes.

Article national or international food-composition tables are routinely used in SSA 2,5 . However, data are typically limited to single values for each food-nutrient combination and are therefore of limited value for exploring spatial drivers of micronutrient intake. This is problematic given the widespread consumption of locally produced staple foods such as cereal grains among smallholder farming communities, and given that food micronutrient composition varies within countries 5, 17 .
Subnational food composition data have not yet been used to inform estimates of the prevalence of MNDs in SSA. This is despite a long history of studies in the fields of medical geology and veterinary sciences, which have linked human and livestock nutritional status to soil and landscape factors. These include studies on the micronutrients I 18 , Se 4, 18,19 , molybdenum (Mo) 20 and cobalt (Co) 21 . Substantial variation in the Ca, Fe, Se and Zn concentrations of staple crops was recently reported from nine districts in Uganda and some of this variation was associated with soil characteristics 17 . The case for using subnational food composition data to estimate the prevalence of MNDs is strengthened by recent evidence of long-range geospatial variation in grain Se concentration being linked to biomarkers of Se dietary status. In parts of the Amhara region of Ethiopia, nutritionally important variation in the grain Se concentration of teff (Eragrostis tef (Zucc.) Trotter) and wheat (Triticum aestivum L.) was associated with soil and landscape covariates at distances extending further than 100 km, which is similar to patterns seen among human biomarkers of Se dietary status 22,23 . In Malawi, human Se dietary status was associated with variation in the grain Se concentration in maize (Zea mays L.) at localized scales 4,5,15,24-26 . Grain sampling for micronutrient quality has not yet been conducted systematically at wider geographical scales in SSA.

Grain surveys of cultivated land in Ethiopia and Malawi
Grain micronutrient concentrations in cereal crops are reported here from 1,389 locations in Ethiopia, based on a spatially balanced sample in the Amhara, Oromia and Tigray regions, during the late-2017 and late-2018 harvest seasons. The sampling frame represents a subset of around 354,000 km 2 of cultivated land-representative of most of the cereal production area in Ethiopia (Fig. 1)-which was constrained to accessible locations that were less than 2.5 km from a known road. At each location, a grain sample and a co-located composite soil sample were taken with the informed consent of the farmer. Grain samples reported from Ethiopia included teff (n = 373), wheat (n = 328), maize (n = 302), sorghum (Sorghum bicolor (L.) Moench; n = 138), barley (Hordeum vulgare L.; n = 181) and finger millet (Eleusine coracana (L.) Gaertn.; n = 39), with a smaller number of triticale (× Triticosecale Wittm. ex A. Camus; n = 20) and rice (Oryza sativa L.; n = 8) samples. In Malawi, grain micronutrient concentrations are reported from 1,812 locations, which were sampled during the April-June 2018 harvest season. The Malawi sampling frame represents around 66,000 km 2 of cultivated land (Fig. 1). Cereal production in Malawi is less diverse than in Ethiopia and grain samples comprised mostly maize (n = 1,608) samples, together with sorghum (n = 117), rice (n = 54), pearl millet (Pennisetum glaucum (L.) R. Br.; n = 32) samples and a single finger millet sample. Sampling designs, grain analyses and geostatistical methods expanded on those described for part of the Amhara region 22 (Extended Data Figs. 1-9 and Extended Data Tables 1-3).

Variation between and within crops
Grain Ca, Fe, Se and Zn concentrations varied substantially between and within crop species (Fig. 2   the lowest concentration of all four micronutrients; people relying on maize-based diets are therefore likely to have the lowest micronutrient intakes. Finger millet is a good potential source of Ca, with a median grain Ca concentration (4,574 mg kg -1 ; range, 3,203-6,264 mg kg -1 ) that is almost two orders of magnitude greater than in maize (median, 64.5 mg kg -1 ) in Ethiopia. The single finger millet sample from Malawi had a grain Ca concentration of 3,564 mg kg -1 , which is consistent with the large Ca concentrations that have been reported for finger millet flour from markets in Malawi 26 (range, 2,900-4,700 mg kg -1 ). A screen of finger millet varieties in India 27 showed a grain Ca concentration range of 1,350-3,120 mg kg -1 (n = 49 genotypes), indicating that the trait is conserved across the species. Although the bioavailability of Ca in finger millet will be limited by phytate, as it is for teff, which also had a high grain Ca concentration in Ethiopia (median, 1,473 mg kg -1 ; range, 46.8-7,925 mg kg -1 ), fermenting flour to produce injera (a thin flat bread that is commonly consumed in Ethiopia) increases the bioavailability of Ca and other mineral micronutrients by stimulating endogenous phytase enzymes 28 . Within-species variation in grain Ca, Fe, Se and Zn concentrations will arise due to spatial variation in soil and landscape factors, and the effects of extrinsic soil dust. The potential effects of soil dust are pronounced for Fe; soil total Fe concentrations (medians, 92,744 and 28,804 mg kg -1 for Ethiopia and Malawi, respectively) are more than three orders of magnitude larger than median grain Fe concentrations of 20.3 and 21.3 mg kg -1 for maize grain in Ethiopia and Malawi, respectively. Grain concentrations of Se and Zn are less sensitive to soil dust due to the much lower total concentrations of these elements in soils (medians, 0.35 and 0.32 mg kg -1 for Se and 100.1 and 33.7 mg kg -1 for Zn, in Ethiopia and Malawi, respectively).

Geospatial mapping and dietary contributions
This study focused on mineral concentrations in grain of teff and wheat for Ethiopia and maize for Malawi. These crops were chosen because they comprise a large proportion of the energy intake in national diets and have good spatial coverage in the survey. Grain concentration maps were based on ordinary kriging. Kriging variances-the expected squared error of the predictions-quantify the uncertainties in the maps (Extended Data Figs. 3, 4). The dietary contribution for each crop-nutrient combination was then mapped as a percentage of dietary requirements and visualized on quartile scales from yellow (small) to dark red (large). These calculations used food balance sheets from the Food and Agriculture Organization 29 and assumed estimated average requirement (EAR) thresholds for a representative demographic group-adult women aged 18-24 years eating an unrefined (that is, high phytate) diet 30 .
There is spatially dependent variation in grain Ca, Fe, Se and Zn concentrations over large distances in Ethiopia and Malawi (Figs. 3, 4). These observations are likely to be of nutritional importance given that most cereals are grown, milled and consumed locally in Ethiopia 31 and Malawi 5 , because they show that the dietary supply and intake of these nutrients varies substantially from one location to another. In Ethiopia, spatial dependencies were seen over distances from 100 to 200 km for the concentration of Ca in teff, for the concentrations of Fe in teff and wheat, and for the concentration of Se in wheat (Extended Data Fig. 2 and Extended Data Table 3). For grain Se and Zn concentrations in teff, and grain Ca and Zn concentrations in wheat, longer-range spatial variation extends beyond 250 km. In Malawi, the spatial dependence of the variation in grain Ca, Fe and Se concentration of maize occurs at distances of 50-80 km. For the concentration of Zn in maize grain in Malawi, more of the variation was attributable to differences over distances that were too short to be resolved by our sampling frame, although the value of the variogram still increased at distances up to 100 km.
In Ethiopia, the Ca concentrations in teff and wheat grain were generally greater in crops sampled from north, northeast and east Tigray region, north and northwest Amhara region and from areas of the Rift

Article
Valley than crops from most of the Amhara region (Fig. 3). A trend of increasing grain Se concentrations in teff and wheat 22 and increasing Se dietary status of children 31 from the west to the east of the Amhara region has been reported previously. The largest grain Fe and Zn concentrations in teff and wheat were in north Amhara, Tigray and northwest and southwest Oromia regions. In Malawi, maize grain Ca, Se, Zn and-to a lesser extent-Fe concentrations are typically greater in the Shire Valley of the southern region, than in the northern and central regions (Fig. 4). These observations are consistent with observations that crops growing on localized soil types (for example, Vertisols) have greater grain micronutrient concentrations than crops growing on more weathered, acidic soils 4,5,25,26 . As in Ethiopia, geographical areas with the largest maize grain Se concentration in Malawi co-located with areas of Se sufficiency that have been reported in both cross-sectional studies 15 and representative surveillance analyses 24 of human biomarkers of Se dietary status.
Links between grain micronutrient concentration and a dietary outcome were observed for Se, for which there is a reliable biomarker of dietary status 7,13 . Biomarker data for Se for 321 enumeration areas in Ethiopia 23 and 101 enumeration areas in Malawi 24 are available from recent National Micronutrient Surveys. For each enumeration area, the nearest grain sample site for any crop type was identified. In both countries, there is a statistically significant positive relationship between the Se concentration in grain and the Se concentration in serum (Ethiopia) and plasma (Malawi) (Extended Data Fig. 5). Direct evidence of links between the grain concentration of other micronutrients, biomarkers of dietary status and health outcomes remains a major research challenge.
Soil and environmental factors that influence variation in grain micronutrients-for teff, wheat and maize in Ethiopia, and for maize in Malawi-were observed (Extended Data Figs. 6-9 and Extended Data Table 3). For example, soil pH was statistically significantly correlated with grain Se concentration for all crops in both countries (positive relationships). Grain Zn concentration was significantly correlated with soil pH for teff in Ethiopia (negative relationship) and for maize in both countries (positive relationships). Further studies of contrasting responses to soil pH, observed in teff and maize, are needed. However, the generally weak predictive value of soil pH on grain Zn concentration (Extended Data Figs. 7, 9) is consistent with a survey of staple crops in Uganda 17 . Soil organic carbon was significantly correlated with grain Se concentration in wheat in Ethiopia (negative relationship), grain Zn concentration in wheat in Ethiopia (positive relationship) and grain Zn concentration in maize in both countries (positive relationships). Grain micronutrient composition was also significantly correlated with at least one environmental covariate for each micronutrient, crop and country. In Ethiopia, grain Se concentration was negatively correlated with mean annual precipitation for each crop and was positively correlated with mean annual temperature for teff and wheat in Ethiopia and maize in Malawi. Mean annual temperature and topographic index-a measure of soil wetness at a site-were positively correlated with grain Zn concentration for maize in Malawi. Multivariate spatial statistical modelling could be informative about soil and environmental factors that jointly influence the variation of grain micronutrient concentration; however, this requires additional statistical assumptions and is beyond the scope of this study.
In Ethiopia, the proportion of dietary Ca requirements that can be met by the consumption of teff and wheat is much greater than for maize in Malawi, however, it is still likely to be less than 25% of the required amount for most of the population (Fig. 3). In Malawi, maize intake provides less than 3% of dietary Ca requirements for many, despite providing more than 50% of dietary energy requirements (Fig. 4). Cereals provide a greater proportion of the dietary Fe, Se and Zn requirements than Ca in both countries. For example, for some rural households in Malawi, more than 100% of dietary Fe, Se and Zn requirements will be met by eating locally produced maize alone. However, most households will receive less than 25% of Se, less than Ca 79.9-1,080 71.7-79. 8  50% of Fe and less than 75% of their Zn requirements from a typical consumption pattern for maize.

Surveillance and interventions
Surveillance of MNDs and food system foresighting activities currently rely on national or international food composition data to estimate micronutrient supply as a proxy for intake 6,7 . Subnational food composition data are not yet used, probably because of the logistical and conceptual challenges in generating data in the required forms. Challenges include sampling crops across large areas during short harvest periods, analysing large numbers of samples and associated data in areas in which access to suitable laboratories and trained personnel is often lacking, and communicating sparse data and associated uncertainties. Spatial dependencies in the grain micronutrient concentrations found in this study can be communicated simply using localized administrative level boundaries. Mean grain Ca and Zn concentrations for teff and wheat show spatial trends when mapped to woreda administrative levels in the Amhara region of Ethiopia using geostatistical models and block kriging (Fig. 5). Ideally, improved access to diverse diets would alleviate many MNDs 1-6 . However, this is unlikely to be feasible in the shorter term for many people for socioeconomic reasons 6 . A crop micronutrient survey within a country could inform shorter-term interventions to alleviate MNDs. Such interventions include supplementation 32 , food fortification 8,9 and biofortification of staple crops through breeding and agronomic approaches 10,11 . For example, Zn-biofortified wheat varieties, which were recently released in India and Pakistan 33,34 , have been bred with a target to increase Zn concentrations by 8-12 mg kg -1 above a notional baseline grain Zn concentration of 25 mg kg -1 . Similarly, increases in grain Zn concentration in new Zn-biofortified hybrid maize varieties of 15% in Guatemala (ICTA HB-18 and ICTA B-15) and 36% in Colombia (BIO-MZN01), with target levels of around 30 mg kg -1 , have been reported 35 . Here, geographical differences in grain Zn concentration, in both wheat (Ethiopia) and maize (Malawi), can exceed these breeding targets. Subnational spatial sources of variation could support priority areas for the release of new crop varieties or micronutrient fertilization strategies and improve impact evaluations of biofortification interventions. The persistence of MNDs is clearly more complex than micronutrient supply in cereal grains 36 . For example, livestock are important components of diverse diets and income; identifying areas of lower concentrations of micronutrients in forage crops could inform strategies to improve livestock health and production 20,21 .
This study did not explore the effects of crop variety (genotype) or farmer management strategies (management), which will contribute substantially to the variation in yield and grain micronutrient concentrations, even over short distances within and between fields of the same farm 37,38 . For example, the preferential use of locally sourced organic materials by smallholder farmers on certain fields can improve the quality and yield of grain micronutrients in maize-based 38 and wheat-based 39 systems in SSA. Temporal variation in environmental (environment) factors, such as the projected decreases in cereal grain micronutrient concentrations due to increased atmospheric CO 2 -of 6% for Fe and 9% for Zn in wheat by the mid-twenty-first century 40 -and increased leaching of soil Se under higher rainfall 41 , should also be considered. However, rising temperatures may compensate for some of these effects in terms of grain micronutrient quality 42 . A better understanding of how the complex interactions between genotype, management and environment drive crop micronutrient quality within diverse, climate-smart farming systems is essential for a more-sustainable global food system.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03559-3.

Ethical approval
Grain and soil samplings from farmers' fields were completed in November-December 2017 (for most of Amhara region) and November 2018-January 2019 (Amhara, Oromia and Tigray regions) in Ethiopia, and in April-June 2018 in Malawi. The work involved sampling of grain and soil from farmers' fields and grain stores with the informed consent of the farmers. The work was conducted under ethical approvals from the University of Nottingham, School of Sociology and Social Policy Research Ethics Committee (REC); BIO-1718-0004 and BIO-1819-001 for Ethiopia and Malawi, respectively. These REC approvals were recognized formally by the Directors of Research at Addis Ababa University (Ethiopia) and Lilongwe University of Agriculture and Natural Resources (Malawi), who also reviewed the study protocols.

Sampling design
The objective of this study was to support the spatial mapping of grain Ca, Fe, Se and Zn concentration of cereal crops. We sought reasonable spatial coverage over the target sample frame and used 'main-site' and 'close-pair' sampling to support the estimation of parameters of the spatial linear mixed model (LMM) 22, 43 .
In Ethiopia (Amhara, Oromia and Tigray regions), target sample frames were constrained to locations at which the probability of the land being under crop production was ≥0.9 based on predictions produced on a 250-m grid. These predictions were derived from the interpretation of high-resolution satellite imagery by trained observers and by machine learning methods applied to multiple covariates derived from remote sensing data and digital elevation models 44,45 . The sample frame was further constrained to include only those locations from a 250-m grid that fell within 2.5 km of a known road. A map indicating nodes on a 250-m grid (with the same origin as the agricultural land-use grid) that met this requirement was prepared. These constraints may introduce possible biases into predictions made at locations outside the designed sample frame, however, it would not otherwise have been possible to visit all of the sample locations in the time available. Information on the distribution of roads was taken from OpenStreetMap (www.openstreetmap.org). Of a total land area of around 1.1 million km 2 in Ethiopia, the total cropland mask represented 354,325 km 2 , of which 220,467 km 2 was within 2.5 km of a known road. In Malawi, the cropland area was determined from the European Space Agency Climate Change Initiative 46 . The agricultural area used was defined as including all raster cells that included the category of 'cropland' in their description. In Malawi, where road access to cropped areas is generally better than in Ethiopia, no constraint to road distance was imposed on sample locations. The mapped cropland areas are shown in Fig. 1.
In Ethiopia, a total of 1,825 primary sample locations were selected a priori, with each 250-m grid node within the sampling frame allocated an equal prior inclusion probability. This was done using the lcube function from the BalancedSampling library for the R platform 47,48 . The lcube function implements the cube method, to enable random sampling according to specified inclusion probabilities while aiming for balance and spread with respect to the specified covariates 49 . Here, inclusion probabilities were uniform across the sample frame and sample locations were selected for spatial balance, which entails that the mean coordinates of sample locations are close to the mean coordinates of all points in the sample frame, and spatial spread, which ensures that the observations are spread out rather than clustered with respect to the spatial coordinates 50 . A subset of 175 of these locations were selected as close-pair sites at which an additional nearby sample was taken to support the estimation of parameters of the spatial LMM 43 .
In Malawi, a different sampling method was used to achieve good spatial coverage of a total of 1,710 main-site locations. These included 820 fixed sample points from the 2015/16 Demographic and Health Survey of Malawi 24,51 . The stratify function in the spcosa library for the R platform 52 divides a sampling domain into Delaunay polygons centred on a set of fixed points and with the remaining polygon centroids selected to partition the domain into approximately equal-area regions. The centroids of the polygons were selected as sample points. An additional 890 sample points were found in addition to the 820 fixed ones, with the stratify function. Once these were obtained, a further 190 locations were selected at random as close-pair sites for an additional nearby sample, as in Ethiopia.

Field sampling
Sampling was conducted by teams who were trained in standard procedures and risk assessments. Each team planned to visit five main-site locations per day. Main-site locations were loaded onto a computer tablet and printed on paper maps for each team. A team would navigate to the target location, using a handheld GPS device for the last few kilometres. At each sample location, the team would identify the nearest field with a mature cereal crop within a 1-km radius, and sample grain and soil, subject to farmer consent. If a field with a standing mature cereal crop was not apparent, that is, the crop had been harvested, or a non-cereal crop had been grown, the team would ask the farmer to identify a field from which a cereal crop had recently been harvested and stored, and from which a sample could be obtained. If sampling was not possible, then the team would either look beyond a 1-km radius for an alternative location, or abandon the location. At designated close-pair locations, a second field was identified ideally within around 500 m (range, 100-1,000 m) of the main-site location. If a close-pair location could not be found, then a close-pair location would be selected at the next sample location that was not already earmarked for a close-pair sample.
Within a selected field, samples were taken from a 100 m 2 (0.01 ha) circular plot. This was centred as close as practical to the middle of the field unless this area was unrepresentative due to disease or crop damage. Five subsample points were located (Extended Data Fig. 1). The first point was at the centre of the plot. Two subsample points were then selected at locations on a line through the plot centre along the crop rows, and two more points on a line orthogonal to the first through the plot centre. Where possible, the central sampling location was fixed between crop rows, and the long axis of the sample array (with sample locations at 5.64 and 4.89 m) was oriented in the direction of crop rows with the short axis perpendicular to the crop rows. A single soil subsample was collected at each of the five subsample points with a Dutch auger with a flight of length of 150 mm and diameter of 50 mm. The auger was inserted vertically to the depth of one flight and the five subsamples were stored in a single bag. Where a mature or ripe crop was still standing in the field, grain samples were taken close to each augering position by a different operator, to minimize further contamination by dust and soil. For maize, a single cob was taken at each of the five points. Maize kernels were stripped from around 50% of each cob lengthways and composited into a single sample envelope for each location. For smaller-grained crops, sufficient stalks were taken so that approximately 20-50% of the sample envelope was filled (dimensions 15 cm × 22 cm), with samples placed grain-first into the sample bag and the stalks were twisted off the grain heads and discarded. If a crop was in field stacks, then a subsample, comprising five cobs for maize, or a representative sample for other crops was taken from each available stack, taking material from inside the stack to minimize contamination by dust and soil (Extended Data Fig. 1). If a crop was in a farmers' store, that is, already averaged from across the field, then a representative sample was taken, while avoiding grain from the store floor if grain was loosely stored and avoiding grain with visible soil or dust contamination.
Photographs at sample locations and of sample bags were recorded for quality assurance along with site GPS locations. In Ethiopia, 1,385 of the 1,389 locations from where grain data are reported had positional uncertainties of ≤8 m as recorded by the GPS. The other four locations had positional uncertainties of 9-16 m. In Malawi, 1,790 of the 1,812 locations had positional uncertainties of ≤9 m. A further 16 locations had positional uncertainties of 10-17 m, and 6 locations had positional uncertainties of 2,900-5,000 m. We took a decision not to exclude any data based on positional uncertainties for this study. We used robust estimators of the variograms (Extended Data Fig. 2), which are resistant to effects of spatial outliers due to a small number of points being in the wrong position 53 and these models were validated. Any effect of position error on the broad mapped pattern of long-range spatial variation at these scales will therefore be very limited and localized.

Sample preparation
Whole-grain samples were air-dried in their sample bags. Each sample was then ground in a domestic stainless-steel coffee grinder, which was wiped clean before use and after each sample with a non-abrasive cloth. Whole grains are consumed in many settings, although more-refined fractions-with concomitant losses of more micronutrient-rich bran and endosperm fractions-are often consumed. All preparation was done away from sources of contamination by soil or by dust. A 20-g subsample of the ground material was then shipped to the University of Nottingham. Soil samples were oven-dried at 40 °C for 24-48 h depending on the moisture content of the soil. Preparation took place in a soil laboratory to avoid cross-contamination with grain samples. Plant material was removed from each soil sample, which was then disaggregated and sieved to pass 2 mm. This material was then coned and quartered to produce subsample splits. A 150-g subsample of soil was poured into a self-seal bag, labelled and shipped to the UK for analysis in the laboratories at the University of Nottingham and Rothamsted Research. Soil pH (water) and soil organic carbon (SOC) content were measured using standard methods 22 .

Grain micronutrient analyses
Grain micronutrient analysis methods followed standard methods 54 . Approximately 0.2 g of each ground sample was weighed and digested using a microwave system. For samples collected in the Amhara region in 2017, a Multiwave 3000 48-vessel MF50 rotor (Anton Paar) was used; digestion vessels were perfluoroalkoxy tubes in polyethylethylketone pressure jackets (Anton Paar). Samples were digested in 2 ml 70% trace-analysis-grade HNO 3 , 1 ml Milli-Q water (18.2 MΩ cm; Fisher Scientific) and 1 ml H 2 O 2 . Settings were: 1,400 W, 140 °C, 2 MPa, for 45 min. For samples collected in 2018-2019, we used a Multiwave Prom 41HVT56 rotor and pressure-activated venting vessels made of modified polytetrafluoroethylene (56-ml 'SMART VENT', Anton Paar). Samples were digested in 6 ml of 70% trace-analysis-grade HNO 3 . Settings were: 1,500 W, 10 min heating to 140 °C, 20 min holding at 140 °C, and 15 min cooling to 55 °C. Two operational blanks were typically included in each digestion run. Duplicate samples of a certified reference material (Wheat flour SRM 1567b, NIST) were included in approximately every fourth digestion run. Following digestion, each tube was made up to a final volume of 15 ml by adding 11 ml Milli-Q water, then transferred to a 25-ml universal tube (Sarstedt) and stored at room temperature. Samples were further diluted 1:5 with Milli-Q water into 13-ml tubes (Sarstedt) before analysis.
Multi-elemental analysis of grain (Ag, Al, As, B, Ba, Be, Ca, Cd, Cr, Co, Cs, Cu, Fe, K, Li, Mg, Mn, Mo, Na, Ni, P, Pb, Rb, S, Sr, Ti, Tl, U, V and Zn) was undertaken using inductively coupled plasma mass spectrometry (ICP-MS; iCAPQ, Thermo Fisher Scientific). The instrument used a helium collision cell with kinetic energy discrimination to reduce polyatomic interferences. Samples were introduced from an autosampler incorporating an ASXpress rapid uptake module (Cetac ASX-520, Teledyne Technologies) through a perfluoroalkoxy Microflow PFA-ST nebuliser (Thermo Fisher Scientific). Internal standards were introduced to the sample stream on a separate line via the ASXpress unit and included Sc (20 μg l -1 ), Rh (10 μg l -1 ), Ge (10 μg l -1 ) and Ir (5 μg l -1 ) in 2% HNO 3 (Primar Plus grade; Fisher Scientific). An external multi-element calibration standard (Claritas-PPT grade CLMS-2; SPEX Certiprep) was used to calibrate Ag, Al, As, B, Ba, Be, Cd, Ca, Co, Cr, Cs, Cu, Fe, K, Li, Mg, Mn, Mo, Na, Ni, P, Pb, Rb, S, Se, Sr, Tl, U, V and Zn, in the range of 0-100 μg l -1 (0, 20, 40, 100 μg l -1 ). A bespoke external multi-element calibration solution (PlasmaCAL, SCP Science) was used to create Ca, K, Mg and Na standards in the range of 0-30 mg l -1 . B, P and S calibration used in-house standard solutions (KH 2 PO 4 , K 2 SO 4 and H 3 BO 3 ); Ti was determined semiquantitatively. Sample processing was undertaken using Qtegra software (Thermo Fisher Scientific) with external cross-calibration between pulse-counting and analogue detector modes when required. Se was determined separately using a triple quadrupole ICP-MS (iCAP TQ; Thermo Fisher Scientific) using an oxygen cell to mass shift the isotope 80 Se to m/z 96 ( 80 Se 16 O) to reduce interference from the 40 Ar dimer. Drift correction was achieved using Rh as an internal standard; calibration used the CLMS-2 multi-element standard (Certiprep).
Analyses were conducted in batches of around 240 samples per run on the ICP-MS instrument (Extended Data Table 1). Individual grain concentration data were corrected for run-specific operational blanks and then converted to concentration on a dry-matter basis. Element-specific limits of detection were reported as 3× the standard deviation of the operational blank concentrations, assuming a notional starting dry weight of 0.2 g of sample material (Extended Data Table 1). For samples for which the grain element concentration was less than the limit of detection, data were removed before the statistical analyses. No adjustment was made for potential contamination of grain samples, for example, with soil dust from the field or store using typical markers (for example, Fe, V). Two sorghum samples, taken from a grain store in Malawi, were excluded from the data analysis based on high concentrations of Ca, Mg and other elements that were considered unlikely to have arisen from soil contamination.

Statistical analyses
Summary statistics were computed for concentrations of all elements in grain and histograms were examined. It is common for geochemical variables to be log-normally distributed, and the coefficient of skewness of the data was examined using octile skewness as a robust measure of asymmetry of the distribution 55 . The data were analysed on the original scales of measurement (mg kg -1 ) if the octile skewness was within the interval [−0.2, 0, 2] as described previously 56 . If the octile skewness fell outside this range (with a positive value), and the absolute value of the octile skewness after log e transformation was smaller than on the original scale, then the data were analysed on the log e scale (Extended Data Tables 1, 2).
Variograms were estimated for each variable using three estimators (Extended Data Table 2): the standard estimator 57 and two alternatives 58,59 . The two alternative estimators are more robust than the standard estimator to outlying data, be these marginal outliers (apparent on the histogram) or spatial outliers (apparent in local spatial context). The variogram estimates were formed from all pairwise comparisons among observations up to a maximum lag distance (330 km in Ethiopia and 100 km in Malawi), and with lag bins with a width of 10 km. The distance between any two locations (specified by latitude and longitude) was computed as the great circle distance using the distVincentySphere function from the geosphere library of the R platform 47,60 . Exponential variogram functions were then fitted to the estimates by weighted least squares 61 . The exponential variogram was selected because it ensures positive definite covariance matrices for distances on the sphere 62 . Each model was then tested by cross-validation. The selected models are shown in Extended Data Fig. 2.
In cross-validation, each observation was removed and predicted from the remaining observations by ordinary kriging. This was done using each variogram model fitted for each variable to the sets of point estimates computed by the three estimators. The standardized squared prediction error (SSPE) was computed for each observation as the squared difference between the observed and predicted values divided by the prediction error variance (kriging variance) as computed in the standard ordinary kriging equations 53 . The median value of the SSPE has an expected value of 0.455 in the case of a valid underlying variogram model with normally distributed kriging errors 53 . The standard estimator due to Matheron 57 is more statistically efficient than the robust alternatives, so if the model fitted to these estimates appeared correct from the cross-validation results (the median SSPE is within the 95% confidence interval) then the alternatives were not considered. If the SSPE suggests that the model fitted to Matheron estimates are affected by outliers, then the models fitted to robust estimates were also cross-validated, and one was selected on the cross-validation results 53 .
Once a variogram model was selected it was used to compute predictions of the grain concentration (with or without transformation) on nodes of a fine square grid. This was done by ordinary kriging 61 . In ordinary kriging, it is assumed that the mean value of the variable of interest is locally constant, but unknown. An estimate is found that is a linear combination of the observations such that the expected squared error of the prediction (the ordinary kriging variance) is minimized. The kriging variance is also computed as a measure of the uncertainty of the prediction. For a given variogram, the kriging variance reflects the proximity of the location at which a prediction is made to observations of the variable of interest. In the case of variables that had been transformed to logarithms, the prediction at each location was computed by exponentiation of the prediction on the log e scale. The resulting back-transformed estimate is a median-unbiased predictor, which is appropriate for variables with a skewed distribution 63 . Note, the kriging variance cannot be meaningfully back-transformed. However, it still indicates how the uncertainty of the predictions varies in space and so it is mapped here on the log e scale for transformed variables. The kriging variance maps are shown in Extended Data Figs. 3, 4. In Ethiopia, where the sample sites are irregularly distributed over the mapped area, the kriging variance differs markedly depending on the local sample density.
In Ethiopia, we mapped the percentage of dietary requirement potentially met by the intake of wheat and teff (Fig. 3); similarly, in Malawi for maize (Fig. 4). In Ethiopia, 97.6 g per capita per day of wheat and 89.3 g per capita per day for teff (based on the Food and Agriculture Organization database item, 'other cereals') were used as reference intakes 29 ; in Malawi, 342.8 g per capita per day was used as a reference maize intake 29 . Estimated average requirement (EAR) thresholds of 860, 22.4, 45 and 10.2 mg per capita per day for Ca, Fe, Se and Zn, respectively, were chosen as a representative threshold, based on an adult woman aged 18-24 years eating an unrefined (that is, high phytate) diet 30 . These thresholds are similar to other demographic groups. Because a comparable measure of uncertainty to the kriging variance for this derived variable is not available, we defined a mask for Ethiopia, where the sparsity of sampling leads to larger uncertainty compared with Malawi. We considered the Zn concentration in teff, a variable with spatial dependence over long distances, and identified those areas in which the kriging variance for this variable exceeded 75% of the variance of the variable itself because the sampling was sparse. These areas defined the mask used for the maps of the percentage of dietary requirement for all variables and is shown in grey in Fig. 3.
The maps shown in Figs. 3, 4 are made by point kriging-that is, they are predictions of a measurement at an unsampled site. Ordinary kriging can also be used to predict the mean value of a variable across a region or 'block' 61 . To illustrate the communication value of this approach, the mean concentrations of Zn and of Ca in grain (teff and wheat) are shown at the woreda level in the Amhara region (Fig. 5). These were obtained by block kriging of woreda means from the same sample data, and with the same variogram models as used to produce the point kriging predictions.

Grain-biomarker links
Determining the links between a dietary biomarker of status and grain micronutrient concentration focused here on Se, for which there is a reliable biomarker of dietary status 13 ; this is not the case for Zn 7 and the other micronutrients analysed in this study. Data on the concentration of Se in the blood of women of reproductive age were available from micronutrient surveys in Ethiopia (serum 23 ) and Malawi (plasma 24 ). Mean concentrations were computed for each enumeration area available-321 in Ethiopia and 101 in Malawi-for which the latitude and longitude for each enumeration area centroid were available. For each enumeration area, the nearest grain sample site (regardless of crop) was found. In Ethiopia, the median distance to the nearest grain sample site over all enumeration areas was 17 km. In Malawi, the median distance was much shorter (1.4 km), which is attributable both to the denser crop sampling in Malawi and to the fact that enumeration areas used for the micronutrient survey were targeted for sampling.
The enumeration areas do not comprise a simple independent random sample, so the relation between the serum or plasma Se concentration and the Se concentration in the nearest grain sample could not be quantified by a statistic such as the correlation coefficient. It was therefore studied with an appropriate LMM incorporating a spatially correlated random effect, modelled with a Matérn correlation function 64 . Exploratory analysis indicated that a log e transformation of all serum or plasma Se concentrations was necessary for the assumption of normal random effects to be plausible. The observed serum or plasma Se concentration was modelled as a linear function of the concentration in grain at the nearest sample location. Residual maximum likelihood was used to estimate the random effects parameters. The fixed effects were then estimated by weighted least squares 65 along with their standard errors. The evidence against the null hypothesis that the regression coefficient for serum or plasma Se on grain Se concentrations was zero was tested by a log-likelihood ratio test.
Plots of the serum or plasma Se concentration in women of reproductive age, in Ethiopia and Malawi, respectively, show a positive correlation between the variables (Extended Data Fig. 5), which is supported by the statistical models. For Ethiopia, the estimated regression coefficient (log[ng serum Se per ml] and log[mg of grain Se per kg]) was 0.08 with a standard error of 0.02. The null hypothesis that the coefficient was zero is rejected on the grounds of the log-likelihood ratio statistic (L = 14.48, P = 1.4 × 10 −4 ). If there was no relationship between the biomarker and the grain Se concentrations, the probability of obtaining an L-statistic this large or larger would be very small. Similarly, for Malawi, the estimated regression coefficient was 0.09 with a standard error of 0.03 (L = 11.56, P = 6.7 × 10 −4 ).

Environmental-soil-grain links
Data on the concentration of Se and of Zn in grain (maize in Malawi; teff, wheat and maize in Ethiopia) were extracted along with the corresponding data on soil pH (water) and SOC. Observations for three environmental covariates were also extracted for these same locations: (1) mean annual temperature; (2) mean annual precipitation values from the CHELSA datasets 66,67 , which are downscaled to a spatial resolution of 30 s; (3) topographic index values from the 30-s resolution MERIT Digital Elevation Model 68 . The topographic index-which is sometimes called the topographic wetness index-is a measure of the tendency for drainage to accumulate at a site and is widely used as a predictive measure for soil properties. Following exploratory analysis, grain Se concentration data were log e -transformed to make the assumption of normal random effects plausible; this was not necessary for grain Zn. Measurements of SOC from Malawi showed a marked positive skew and were therefore log e -transformed.
Data were analysed using a LMM in which a regression-type function of environmental covariates was considered as a fixed effect along with a spatially autocorrelated random effect, as described for the biomarker data. Random-effect parameters were estimated by maximum likelihood, and then the fixed-effect parameters by weighted least squares. The first model was fitted with a constant mean as the only fixed effect. The second model was fitted with mean annual precipitation as a fixed effect. The evidence for this predictor was evaluated by a log-likelihood ratio test of the second model against the first. The predictor was retained initially if the probability of obtaining a value of L as large or larger than observed under the null hypothesis of no effect of the predictor was <0.05. Additional predictors were then considered in the order of mean annual temperature then topographic index. Finally, the P values were evaluated as the outcome of multiple tests, controlling the false-discovery rate (FDR) 69 at 0.05. The FDR is the expected proportion of incorrectly rejected null hypotheses among all rejected ones. For each micronutrient in each crop and country, the predictors were identified that could be regarded as significant with FDR control at 0.05. The same procedure was followed to produce a comparable model based on the two soil properties, considering first soil pH and then SOC.
Plots of grain Se and Zn concentrations against the environmental covariates and soil properties are shown in Extended Data Figs. 6-9. The evidence for environmental covariates or soil properties as predictors of micronutrient content in the grain is summarized in Extended Data Table 3. Extended Data Table 3 also shows the estimated coefficients, and their standard errors, in separate models for each micronutrient; for maize, teff or wheat in Ethiopia and for maize in Malawi. The predictors in these models are only those that were selected with FDR control.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
All data are freely available from the corresponding author and available online at https://github.com/rmlark/GeoNutrition.

Code availability
All code is freely available from the corresponding author and available online at https://github.com/rmlark/GeoNutrition.