Assessing the predictability of existing water-to-enamel geolocation models against known human teeth

Stable isotope analysis of human tissues has become a valuable tool for mapping human geolocation. This study adds to the existing knowledge of the relationship between oxygen stable isotopes in human enamel and drinking water by presenting enamel oxygen values in clinic-extracted human dental enamel with known provenance. The results from this study indicate that the theoretical isotopic relationship between enamel and drinking water oxygen is weak at the city and country-level. Differences of up to 15‰ were observed between predicted drinking water oxygen values using existing models and observed values, highlighting the complexity of using water/enamel conversion equations. The lower isotopic boundary of enamel oxygen values is now understood for Metro Vancouver at δ18Oc(VPDB) = – 11.0‰ and presents the possibility of using stable isotope analysis as an exclusionary tool where individuals falling below threshold value can be identified as non-local. Overall, this study’s results support the development of geographical reference maps for human enamel oxygen.

. Existing water/enamel conversion equations are shown as the relationship between drinking water oxygen values (δ 18 O dw ) and enamel carbonate oxygen values (δ 18 O C(VSMOW) ). This graph aims to provide a visual representation of the various water/enamel conversion equations that exist to date and to show the general spread of slopes and intercepts, rather than for interpretation. Phosphate/carbonate conversion equations were applied to water/enamel equations based on enamel phosphate oxygen. Solid lines indicate the equations that allowed for direct conversions between δ 18 O dw and δ 18 35 . The OIPC can be utilized to estimate precipitation oxygen isotope values for any geographical location by feeding latitudinal, longitudinal, and elevational data. However, an important consideration that must be made is that the degree to which precipitation oxygen values translate to human drinking waters in modern society is dependent on several factors such as the type of drinking water sources, climatic characteristics, and water infrastructure 36,37 . There is a heightened reliance on complex water distribution systems for safe drinking water delivery and is becoming essential for high density and expanding metropolitan regions 38,39 . Tap water distribution systems often draw water from multiple water sources or geographically distant areas with varying physiography 40 to meet growing communities' demands. This ultimately leads to significant offsets in tap water oxygen values compared to that of local precipitation values. Thus, it is essential to analyze water samples that accurately represent the isotopic composition of oxygen in consumed drinking water for pairing with enamel oxygen. This is vital as water/enamel conversion equations are importantly utilized as a means to localize unknown human remains. There is an increased understanding of the spatial variations in stable isotope values of tap water across large geographical areas such as the conterminous United States 41,42 , Mexico 43 , France 44 , South Africa 13 , Korea 45 , and China 46 . However, very little is known for defined metropolitan areas and whether a single tap water oxygen value represents an entire city, especially those relying on multiple water sources. A study of the distribution of hydrogen and oxygen isotope values in tap water across the entire regional district of Metro Vancouver showed that tap water values uniquely followed the tap water distribution system of Metro Vancouver 47 . The study showed the importance of tap water distribution systems on the spread of hydrogen and oxygen isotope values of tap water across a single geographical locale, in which the isotope values closely followed the pattern of the water distribution pipelines. Most Metro Vancouver municipalities rely on the three major watersheds located north of the region, but a smaller proportion of the residents consume tap water from groundwater sources, which was isotopically distinct from the other water sources. Thus, the study indicated the importance of understanding tap water distribution systems for the defined geographical area to ensure that collected tap water samples cover every water source that contributes to the overall tap water system, especially if those sources are isotopically distinct. Another high resolution tap water study for Salt Lake Valley, Utah showed similar results where spatiotemporal variation was noted in tap water isotope values across the region 48 .
An alternative approach to the use of conversion equations has been proposed where Pellegrini et al. 26 created a tissue-based isoscape from enamel phosphate oxygen for Britain based on a large sample of archaeological remains. This approach eliminates the need for water-enamel conversion equations as the tissue isotopes themselves were used for mapping. However, the challenge with such an approach is that it necessitates the extensive sampling of human enamel and therefore requires a significant amount of time for sample collection. And perhaps more importantly, the life-history or provenance of those individuals is unknown during the period of enamel formation. The current approach to the development of tissue isoscapes does not require an extensive collection of enamel samples since it is based on drinking water isoscapes paired with modern samples with known provenance. Again, this approach requires the use of conversion equations with various proposed slopes and intercepts, without a clear understanding as to which equation best represents the water/enamel relationship.
The purpose of this study is to validate and refine the water/enamel conversion model by carefully matching oxygen isotope values in recently extracted enamel with known tap water values from cities across Metro Vancouver. Tap water samples were collected directly from Metro Vancouver tap water sources every month over a 12 month period. It is important to emphasize that water samples were strategically sampled to capture the entire spread of oxygen isotope compositions of drinking water for Metro Vancouver by referring to the detailed isotope study of tap water distribution 47 . Biographically known enamel oxygen results from this study will also be compared against existing water/enamel conversion models to determine how well the existing models perform against known modern enamel and drinking water samples. Collection and analysis of tooth samples. 154 empty vials were sent out to dental clinics across Metro Vancouver during the years 2015 to 2017. 126 vials were returned with extracted human molar samples comprised of first, second, and third molars. Molars were placed in 20 mL scintillation vials labelled with sample identifiers following the extraction and returned to the laboratory for storage. Immense care was taken to obtain information from individuals on location of residence and any relocations from birth to 25 years which covers the entire duration of enamel formation (Table 2). This information was vital to ensure accurate matching of enamel oxygen to drinking water oxygen. Collected molars were cleansed with tap water and dried for at least 24 h before drilling. Core enamel was extracted into powdered form (~ 1 mg) by gentle abrasion with a diamond-tipped hand-held drill (Dremel). The enamel's external surface was mechanically removed and discarded prior to sampling core enamel to ensure that sampled areas were devoid of dental calculus and overt caries. Care was taken to ensure no dentine was sampled, and the drill was cleaned of any enamel powder with compressed air between each sample. Enamel powder was prepared after Lee-Thorp et al. 49 . The powder was soaked in 50% sodium hypochlorite for 45 to 60 min, centrifuged, and rinsed with distilled water. The powder was then reacted with 0.1 M acetic acid for 5 to 15 min, thoroughly rinsed with distilled water then freeze-dried.
Samples were inserted into 12 ml borosilicate glass vials with a septum-contained screw top lid (Labco, Ceredigion UK) and place in a temperature-controlled (72 °C) sampler tray. To remove atmospheric air the tubes were flushed with helium by using the CTC Analytics A200S autosampler (Switzerland). Depending on the sample size, five to seven drops of 72 °C acid (85% orthophosphoric acid and phosphorus pentoxide, specific gravity of solution = 1.92) were added to each vial manually through the septum with a 1 ml syringe. Samples were analyzed after being left to react for three hours 49,50 . The autosampler was used to sample the gas that evolved from each reaction and subsequently passed to the Themo Finnigan model II GasBench (Germany). The sample gas was then went through the Nafion water removal unit. To separate the gas compounds, the sample gas passed through the "Poraplot Q" GC column followed by another Nafion water trap. The gas then passed through the GasBench to a Delta Plus XP isotope ratio mass spectrometer (Thermo electron, Bremen, Germany), which is computer-controlled by Isodat software. The gas flow gave 8 sample peaks and 5 reference peaks. The reference gas (CO 2 ) with 99.995% purity was introduced into the mass spectrometer through the Isodat software-controlled GasBench 50 .
Oxygen analyses were carried out on the structural carbonate of tooth enamel. Isotopic results are reported in conventional δ-notation in units of per mil (‰) with reference to Vienna Pee Dee Belemnite (VPDB) carbonate and calculated according to the equation: where R = ratio of the abundance of heavier 18    MV tap water is delivered to municipalities through interconnected water pipes originating from the three main MV watersheds. Several municipalities additionally draw tap water from wells 52 . Although the water distribution system is well monitored, the source of tap water delivered to the municipalities through the interconnected water pipes can vary even by the hour. As a result, the exact fractional contributions of end-member water sources to the overall municipal tap water supply are unknown. The fractional contribution (f) of each water source (a, b, c, d) to the total water supply can be expressed as: Source information of tap water samples collected and reported in Ueda and Bell's 47 study was used to determine each water source's potential fractional contribution to MV municipality water supply. The fractional contributions of each water source were then utilized to calculate the mean tap water oxygen (δ 18 O tap ) values for each MV municipality: Drinking water from Capilano and Seymour watersheds was treated under two separate facilities until the Seymour-Capilano Twin Tunnel construction in 2015, which now connects water from the Capilano watershed to the Seymour-Capilano Filtration Plan before distribution to the municipalities 52 .

Data collection of drinking water oxygen values for individuals placed outside of Metro Vancouver.
Isotopic data on drinking water oxygen (δ 18 O dw ) values for cities outside of MV were retrieved from existing publications for comparisons with enamel carbonate oxygen (δ 18 O c ) values. The preferred drinking water type was tap water with known source water information. Tap water data without any indications of the actual source of tap water were also collected but with the understanding that it may represent mixed water sources if supplied by multiple water sources. The temporal representation of the data was also noted, whether annual or a one-time collection. Isotopic data of actual or possible source water were taken as an alternative when tap water data did not exist. Average values were calculated if more than one δ 18 O tap value was reported for a given city. Information on tap water sources was collected through municipal or governmental websites and from published studies. The OIPC 35  Statistical analysis. The pairwise t-test was used to test for differences between estimated precipitation water oxygen values retrieved from the two databases at the α = 0.05 level. The ordinary least squares (OLS) method was utilized to measure the correlation between δ 18 O c and δ 18 O dw values with adjusted r-squared ( R 2 ) as the goodness-of-fit measure 54 . Coplen's 55 equation was utilized for any conversions from VPDB to Vienna Standard Mean Ocean Water (VSMOW). One-way analysis of variance (ANOVA) tests were performed to compare means of δ 18 O c between cities, water sources, and countries. When ANOVA assumptions were not met, the Kruskal-Wallis rank sum test 56 was performed as a non-parametric alternative to one-way ANOVA. Shapiro-Wilk's 57 method was used to test for normality of data. All mapping and statistical analyses were conducted on R version 4.0.3 58 .
Water/enamel modelling and assessment of predictability. δ 18 Table 3. The mean squared deviation (MSD) 59,60 between predicted and measured drinking water oxygen values were calculated to measure the predictability of existing water/enamel conversion equations. MSD was calculated as where x = predicted and y = measured values. MSD measures the degree of deviation from the equality line, where slope = 1 and intercept = 0, and was preferred over the more commonly used statistic of mean squared error that measures the mean deviation from the regression line rather than the equality line. MSD can further be partitioned into three separate components and is the sum of squared bias (SB), non-unity slope (NU) where b = x n y n / x 2 n , The models or sets of models were then ranked according to MSD to determine the best performing models for predicting δ 18

Results
The collection of enamel samples from dental clinics across Metro Vancouver (MV) resulted in a globally representative dataset. Samples included individuals from 14 countries (Fig. 2), reflecting Metro Vancouver's high fluidity in international migrations 61 . Of the collected tooth samples, enough enamel powder was generated from 103 samples for isotopic analysis. Two samples returned no oxygen isotopic signatures signifying the absence of hydroxyapatite, which could be attributed to the presence of dental porcelain visually akin to dental enamel. As it is important to obtain information related to the geographical location in which the individual lived during enamel mineralization to ensure the appropriate matching of enamel oxygen values with drinking water values, samples without city-level information were not used in the water/enamel relationship analysis. Enamel and tap water oxygen results for MV. 24 individuals were of MV origin and covered eight municipalities-Burnaby, Coquitlam, Langley, Richmond, New Westminster, North Vancouver, Surrey, and Vancouver (Fig. 3). δ 18 O c(VPDB) values ranged from − 11.0‰ (Burnaby) to − 7.2‰ (Surrey) with a mean value of − 8.7‰ ± 0.8 (Fig. 4a). One-way ANOVA test showed no statistically significant inter-city differences between mean δ 18 (Fig. 4b), although an outlier was identified for the sample with δ 18 O c(VPDB) value of − 11.0‰.
Monthly tap water oxygen (δ 18 O tap ) values ranged from -13.0 to -8.2‰ with an annual mean value of -11.1‰ (n = 22), and -13.2‰ to -8.5‰ with an annual mean value of -11.7‰ (n = 23) for MV tap waters treated at CWTP and SCFP, respectively (Fig. 5). Fractional contributions of each major MV tap water source to the overall municipal water supply were calculated for all municipalities represented by the enamel samples ( Table 4).
The OLS regression for MV δ 18 O c and δ 18 O tap values yielded the result: The negative R 2 value indicates that the data cannot be explained by the model.   (Fig. 7) as not all samples could be directly paired with modern δ 18 O tap values.         Predictability of existing models. The ranges of MV δ 18 O c values that were predicted from each water/ enamel conversion model are shown in Fig. 11. Actual MV δ 18 O tap values were used for the calculations. Water/ enamel equations based on phosphate oxygen were paired with a phosphate/carbonate equation when conversions from phosphate oxygen to drinking water oxygen were necessary. All combinations of models are listed in Table 5 along with the abbreviated codes used in Fig. 11. The minimum MV tap water oxygen value of -11.7‰ was taken from annualized data of SCWT, and the maximum value of -9.3‰ was reported in Ueda and Bell 47 from a groundwater source. The range of values estimated by Dotsika's 17 equation provided the most accurate estimate where 91.7% of MV individuals were correctly identified to have resided in MV during their childhood years. 15 combinations of models failed to identify any MV individuals accurately. The observed ranges of δ 18 O c values for MV were generally more negative than predicted ranges from existing models except for ranges predicted with Posey's 22 equation (Fig. 11).
The offset between predicted and actual drinking water values ranged from -15.3 to + 5.5‰ for tap waters (Fig. 12a) and -12.5 to + 6.5‰ for OIPC (Fig. 12b). The use of the equation given by Dotsika 17 produced the least MSD between predicted and observed drinking water values for both tap water measurements (MSD = 4.00; NU = 0.00; LC = 3.45; SB = 0.55) (Fig. 13a) (Table 5) and include all existing predictive models as well as equations from the current study (U1, U2, U3, U4, U5, U6, U7, UOIPC). Predicted ranges were generally more positive than the actual range for MV. 15 of the 44 predicted ranges fell outside the actual MV range. The best performing predictive model was Dotsika 40  Warner et al. 23 Chenery et al. 24 WaC Warner et al. 23 Iacumin et al. 31 WaI Warner et al. 23 Wright et al. 32

Discussion
The isotopic compositions of enamel samples collected from across Metro Vancouver scattered across an isotopic range of − 11.0 to − 7.2‰. This was similar to the predicted range of values calculated by Dotsika's 17 water/enamel conversion equation (Fig. 11). No statistically significant differences in mean δ 18   www.nature.com/scientificreports/ range of MV enamel values. However, importantly, the dataset represents much of MV δ 18 O c values and most definitely the lower limit of the range. Notably, understanding this lower limit signifies the possibility of oxygen isotope analysis to be used as an exclusionary tool to identify non-MV residents whose enamel oxygen isotope compositions lie below the most negative MV value of − 11.0‰. Sampling across MV enabled the collection of a globally representative dataset due to the regional district's high fluidity in both national and international migration 61 . The global representation of samples collected from one single regional district also highlights the highly mobility of humans and that we need to be cautious of presuming the location of tissue sampling to be the location of tissue formation. Notably, δ 18 O c values in known modern samples from the entire MV region showed a distinct separation from other BC cities (Prince George and Vanderhoof) located at higher latitudes and signifies the possibility of utilizing oxygen isotope analysis to distinguish MV from northern BC cities. However, overlaps in δ 18 O c values were observed with other Canadian cities regardless of the latitudinal differences (Fig. 7). This is contrary to the general latitudinal trend observed at the global level. Mean Canadian δ 18 O c values were slightly more negative than δ 18   www.nature.com/scientificreports/ Overall, the equations derived from this study showed a smaller fractional contribution of drinking water oxygen to enamel oxygen compared to existing water/enamel models. The equations from the global dataset (Eqs. [5][6][7][8] were more comparable to other predictive models than those derived from higher geographical resolutions (Eqs. [1][2][3][4]. Equation (8) derived from the global dataset with the highest R 2 was most comparable to Levinson et al. 's 8 equation. The model performed better as the geographical resolution decreased from city to country to a global dataset. The reason for such a decrease in measured correlation between enamel and drinking water oxygen at a higher geographical resolution may mainly be due to the offsets observed between the isotopic compositions of oxygen in tap water and precipitation water for larger cities like Metro Vancouver relying on multiple water sources. Globally constructed predictive models should then be applied with caution at higher geographical resolutions as individuals can significantly deviate from the predicted regression line. A high-resolution study of tap waters may be necessary to understand the isotopic spread of water sources that supply the targeted regional area if supplied by water sources with isotopically distinct oxygen values. Like MV, enamel samples formed in Hong Kong showed a wide range of δ 18 O c values, which only became evident with the concentrated number of individuals within our sample set that had resided in Hong Kong during the time of tissue formation. Hong Kong relies on both precipitation water and transported surface water from multiple sources 78 with possibly distinct oxygen isotope compositions.
The predictability of existing water/enamel conversion equations was assessed through mean standard deviation calculations between predicted and observed drinking water values. Dotsika's 17 equation proved to best predict δ 18 O dw from measured δ 18 O c values with the lowest mean standard deviation of 4.00 for tap water values and 3.72 for OIPC water values. This could be attributed to the careful sampling of enamel with known geographical data paired with known drinking water data. The equations with the highest mean standard deviation measures were of those from this current study (Eqs. 1-4) with large NU components. NU is a direct measure of the slope where NU > 0 when the slope ≠ 1 60 (D5I) showed the largest MSD amongst the existing predictive models with a high SB score (Fig. 13). SB > 0 when intercept ≠ 0 and thus reflects the vertical shifting of a function and in turn could be a reflection of inter-laboratory differences in measured enamel oxygen values. Inter-laboratory differences between phosphate oxygen values, as discussed by Chenery et al. 18 are generally related to bio-phosphate preparation with either BiPO 4 or Ag 3 PO 4 prior to oxygen extraction. Chenery (Fig. 12). Whereas the differences between predicted and observed δ 18 O dw values generally fell within ± 5‰ for δ 18 O c(VSMOW) values of 25 to 28‰, greater differences were observed for lower δ 18 O c(VSMOW) values, with the highest difference observed for a sample with δ 18 O c(VSMOW) value of 20‰. Interlaboratory differences between δ 18 O c values measured at three separate laboratories using different methodologies and instruments were < 1‰ for carbonate in tooth enamel 79 . Chesson et al. 33 further tested for interlaboratory differences of enamel oxygen from the same set of samples analyzed a few years apart. The differences were mainly attributed to sample preparation and concluded that isotopic differences > 1.6‰ should be considered meaningful. The potential bias associated with inter-laboratory differences in sample preparation, methodology, and the choice of instruments, will further be compounded in the calculations of δ 18 O dw values using water/ enamel conversion equations. The compounding factor is dependent on the slope of the applied water/enamel equation. For example, Ehleringer's 14 equation with a slope of 1.12 would translate to a maximum of 1.8‰ difference in δ 18 O dw values that may be associated with inter-laboratory differences. Ehleringer's equation showed high offsets between observed δ 18 O dw values and predicted δ 18 O dw values for individuals with relatively more negative δ 18 O c values. Such large differences beyond what can be explained by the possible inter-laboratory differences can be concluded to be important.
Differences in slope and intercept of existing equations may be attributed to various methodological approaches, such as in the types of sampled tissues in which the equations were based upon, the period of tissue formation, the spatial resolution of collection sites [6][7][8]14,16,18,21 (Table 1). For example, one of the early models given by Longinelli 6 was based on measurements of human bone bioapatite rather than on human enamel. Some studies lacked information on the specific types of sampled tooth 7,8 while others studied archaeological or forensic samples with inferred geographical information where drinking water sources may be unknown. The oxygen isotope compositions of enamel can also be measured through the analysis of carbonate or phosphate oxygen. Thus, inter-laboratory differences cannot fully explain the observed differences between estimated drinking water values.
Large offsets in predicted values can lead to significant errors when inferring human mobility for identification purposes. The results from this study support the recommendation to standardize the methodology for the development of water/enamel equations, as stated in other papers 33,79 . There is undeniably a much-needed refinement of current predictive models, especially for utilizing conversion equations to develop human enamel oxygen isoscapes. The reliability of predictive models is based on the degree to which human tissues can be assigned to a geographical locale and the understanding of the true source of drinking water in which the individual had consumed during tissue formation. Models should thus be based on the oxygen analysis of modern www.nature.com/scientificreports/ human enamels with known geographical information for pairing with appropriate drinking water data. The preferred study is of known modern samples where geographical information can be obtained directly from the living donor over unknown or archaeological samples where inference of these types of information is required. Drinking water data should also be thoroughly examined, where tap water with known source information can best provide the most accurate drinking water data. Perhaps, an alternative approach to using a model-based approach could be a sample-based approach. Enamel oxygen values can be determined for specified geographical areas, and directly recorded onto a geographical map. The geographical assignment of individuals can then be determined through the use of classification trees or discriminant function models 80 . This approach requires all target regions to be defined a priori as categorical response variables and may require an extensive collection of samples with known sources. However, if the purpose of analyzing stable oxygen isotopes is to determine whether the individual is of interest to the local jurisdiction or not, then the categorical response variables can be determined in such a way that allows for the classification of local or non-local origin. A strategic approach to understanding the isotopic spread of enamel oxygen for that geographical region will then be necessary. A suggested approach would be to (1) gain an understanding of the drinking water source and distribution system for that region, and to (2) strategically obtain reference enamel samples from local areas that receive water from sources with oxygen values that lie on the most distal ends of the isotopic range for that region.

Conclusion
This study aimed to assess the relationships between known biogeographical enamel dataset to a related known city-wide drinking water dataset. The results indicated that the water/enamel model cannot explain the relationship between δ 18 O c and δ 18 O tap values for Metro Vancouver at the city-level and also for Canada at the country-level. Mean standard deviation measurements were carried out to assess the predictability of existing conversion models. The equation provided by Dotsika 17 proved to be the best model for this current set of enamel data. However, large differences between actual and predicted δ 18 O dw values were noted from the use of existing conversion equations. Thus, this study supports the development of sample-based, or rather, tissue-specific δ 18 O c geographical reference maps. Standardization of methodology for isotopic analysis is also vital for the collaborative collection of globally represented data. The development of international enamel oxygen isotope databases would ultimately render conversion equations unnecessary and eliminate any associated errors. A meaningful approach would be to define boundaries confined by the tap water distribution systems rather than by way of political borders, which can be made possible with the detailed understanding of tap water supply and distribution systems for any given geographical area. The development of a reliable database would greatly benefit the forensic community by providing accurate guidance for unidentified individuals' possible regionof-origin determination. Furthermore, understanding the lower boundary of δ 18 O c values for Metro Vancouver (− 11.0‰) now allows for oxygen isotope analysis to be used as an exclusionary tool where individuals with δ 18 O c values falling below the threshold value can be identified as non-local. Such a tool is vital for assisting the forensic community in missing persons investigations.