Quantitative assessment of Pb sources in isotopic mixtures using a Bayesian mixing model

Lead (Pb) isotopes provide valuable insights into the origin of Pb within a sample, typically allowing for reliable fingerprinting of their source. This is useful for a variety of applications, from tracing sources of pollution-related Pb, to the origins of Pb in archaeological artefacts. However, current approaches investigate source proportions via graphical means, or simple mixing models. As such, an approach, which quantitatively assesses source proportions and fingerprints the signature of analysed Pb, especially for larger numbers of sources, would be valuable. Here we use an advanced Bayesian isotope mixing model for three such applications: tracing dust sources in pre-anthropogenic environmental samples, tracking changing ore exploitation during the Roman period, and identifying the source of Pb in a Roman-age mining artefact. These examples indicate this approach can understand changing Pb sources deposited during both pre-anthropogenic times, when natural cycling of Pb dominated, and the Roman period, one marked by significant anthropogenic pollution. Our archaeometric investigation indicates clear input of Pb from Romanian ores previously speculated, but not proven, to have been the Pb source. Our approach can be applied to a range of disciplines, providing a new method for robustly tracing sources of Pb observed within a variety of environments.

. Theoretical examples. Figure displays input (a) in the form of convex hulls; three panels, each displaying a three-isotope plot with the corresponding sources and mixtures to be modelled. Dashed black lines have been added to outline the mixing envelope, as defined by the mean values for each source (black circles). Also displayed are mixtures which fall within the envelope blue crosses) and one which does not (pink circles). Panel (b) displays the output from mixing envelope modelling, and the probability of each sample falling within the mixing envelope (%). (c) Is an example posterior density graph as output by MixSIAR, here from the model of mixture 2. Each field corresponds to the probability of a given source to provide the Pb present in the mixture. (d) Is the overall output of all four mixtures modelled in this example. In each case, the rectangle indicates the range of outputs, with the upper and lower bounds signifying the 2.5% and 97.5% credible intervals. (e) Displays convex hulls of theoretical example 2. Black circles denote the sources used to model the mixture (blue cross). IsoSource examines all possible source contribution configurations, at user-defined intervals (e.g. 1%). These calculations rely upon the following equations, expressed below for a configuration with one isotope system and three sources: where: X = Isotopic signature of the mixture (M) or sources (A,B,C).
f A , f B , f C = Proportion of Pb from each source. X A , X B , X C = Source isotopic signatures. This is an underdetermined system containing three unknowns and two equations, with no unique solution 46 . However, the predicted mixture signature for each source combination can be compared to the observed mixture signature; if it matches, then mass balance is achieved and that source combination is a feasible solution. Because combinations can only be examined in discrete steps, a small tolerance for deviation from exact matches is allowed 46 . These equations make interpretation of datasets with n isotope systems and more than n + 1 sources possible 45,46,49 . Outputs are generally presented as range values, to ensure that all potential combinations are reported 45 .
IsoSource may provide valuable model output, and prior to development of Bayesian isotopic mixing models, it was the main approach for food web studies 55 . However, IsoSource uses only mean isotopic signatures for sources and mixtures and does not directly account for variability in source and mixture ratios, including sampling and measurement errors. Because ore bodies are typically heterogeneous, ascribing a single value is a simplification. Complex models have recently been developed which do incorporate these uncertainties within the framework of rigorous Bayesian statistics, utilising the same basic principles as outlined in equations 1 and 2 48 . Put simply, this approach allows the user to incorporate prior information about the system, including analytical error and source heterogeneity, and using a natural distribution (i.e., Dirichlet distribution 56 ) allows the model to develop solutions, in terms of percentage contributions which sum to unity. The Dirichlet-defined distribution is intentionally vague, allowing for the model to be driven primarily by the data input by the user. Model fits are developed via many Markov Chain Monte Carlo (MCMC) simulations, which produce simulations of plausible source proportions based upon the data, and probability densities. Any mixtures not probabilistically consistent with the data are discarded, and mixtures developed at the end of the run are required to be close to the early ones in terms of proportion, causing a Markov chain to be converged 48 . These, when combined with the prior information (e.g. results of previous studies if available) produce posterior distributions which are true probability distributions for each source 48 .
Models built around a Bayesian approach include MixSIAR 47 , which is a combination of MixSIR 54 and SIAR 48 , as well as FRUITS 57 , IsotopeR 58 , and others, which are slightly varied approaches to the same problem. Here we propose an approach for applying a state-of-the-art Bayesian stable isotope mixing model (MixSIAR) to pollution sourcing and artefact tracing via Pb isotopes. "MixSIAR" is a package which runs in the R framework. Details on downloading and setting up the package may be found at https://cran.r-project.org/web/packages/MixSIAR/ index.html.
Outline of the approach, with two theoretical applications. Pre-Modelling Preparation. Prior to attempting data modelling, the user must ensure that all potential sources (mining regions active or probably active during the period of study, active smelters, potential natural Pb sources) that may have contributed to the Pb isotope signal at the study site are considered in the analysis. It should be noted that for pollution or archaeological applications, the approach is likely complicated by the fact that potential sources will have changed over time, or that ores once actively mined may have been exhausted. When developing source lists, all potential sources (with sufficient data available) must be considered, regardless of locality, due to the reality of long-range transport of Pb 14,59 . Further, for pollution tracing of sediment samples, a local 'natural' Pb isotope signal should be introduced to the model, to ensure any such local input (e.g. erosion products) of Pb is considered.
All data must be collated, and a database of all signatures prepared. As previously suggested, Pb isotope ratios relative to 204 Pb should be used (i.e. 208 Pb/ 204 Pb, 207 Pb/ 204 Pb and 206 Pb/ 204 Pb). This allows for as much of the possible variability in the isotopic signature to be considered during the modelling, and because each isotope originates from a different decay chain 23 , a degree of independence between the three ratios. If several analyses are available for the same site, these data should be averaged before standard deviations are calculated. This allows for errors related to either measurement uncertainty or source heterogeneity to be incorporated into the model.
Because MixSIAR uses standard deviations and means to develop models, multivariate normal distributions of the isotopic compositions of ores are assumed. Previous work appears to suggest this is not necessarily the case 60,61 . This is typically ascribed to the complex origins of most ore bodies, in some cases with multiple depositional phases of Pb or polymetallic (Pb containing) ores at the same site common, resulting in multimodal distributions and a degree of kurtosis. However, it has been argued such multimodal distributions may, when enough data are presented, still be representative of multivariate normal (MVN), as multiple depositional events may even one another out, averaging the data 62  To determine the distribution of the ore bodies considered here, and to ensure MiXSIAR is an appropriate approach, we have performed a number of tests investigating the nature of their isotopic distributions. First, using univariate Shapiro-Wilk and Kolmogorov-Smirnov tests, we determine the normality of each isotope ratio in each field individually, prior to investigation of MVN of the three-ratio system via further tests (Mardia, Henze-Zirkler and Royston). All univariate analysis was performed on SPSS 22, and multivariate analysis using the package "MVN" in R 63 .
Results (see Table 1) indicate normality is typical for the distributions of 206 Pb/ 204 Pb (78% of ore bodies), 207 Pb/ 204 Pb (82%) and 208 Pb/ 204 Pb (69%). MVN examinations ( Table 2) indicate most ore bodies may be considered MVN via at least one of the tests (78%), with smaller databases (e.g. Apuseni Porphyry, Mazarron and Oberlausitz) more likely to be MVN, but with larger databases also displaying MVN (e.g. Almeria, Gaul and Valsugana VMS). Notably, some complex and heterogeneous ore bodies (e.g. Apuseni Epithermal), are not MVN, indicating multiple mineralization phases, as observed at Rosia Montana, part of the Apuseni Epithermal ore body 53 , do not even out the isotope distribution range. The MVN distribution of a majority of ore bodies, and the uncertain nature of a number more appear to indicate an approach, which assumes MVN may be applicable. Crucially, the outcomes from these tests are inconclusive, neither proving nor disproving categorically an MVN nature of lead isotopes. As a result, an approach, which applies means and SDs appears the simplest and most appropriate target for such analyses.
Consequently, grouping, one of the most important parts of this analysis, should be approached robustly and can be done a priori, or a posteriori 64 . A priori grouping should be carried out to align sources (prior to the analysis), which are very similar in isotopic signature, or are all from the same ore field, but ensuring any MVN is not lost. Significance tests 65 may be performed, to ensure that the groups are significantly different. If not, they should be grouped together.

Pre-Run Checks.
Prior to running the model, checks must be performed to ensure the data for the unknown mixture fall within the mixing envelope 46 as determined via the sources identified. Due to the nature of MixSIAR 47 results are produced even when these are not realistic. The mixing envelope can be visualized in each 2-dimensional cross-section of the isotope space (see Fig. 1a). To be explained as simple mixtures of these sources, mixtures must fall within the space defined by the various sources. If they do not, then it must be n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a considered that there is an issue in the working hypotheses: either errors during measurement, or a significant source of Pb has been overlooked in the analysis 46 . Because Bayesian modelling incorporates uncertainties (in this case means and standard deviations), a further Bayesian modelling tool may be used to ascertain exactly which samples fall within the mixing polygon 66 . From this further model, any samples which fall within the polygon in <5% of iterations should be discarded. Further, if a significant number are not constrained within the polygon, this is an indication the model being applied is not correct and should be reconsidered. An instance of this issue may be observed from the example in Fig. 1a, where one sample (shown by a red cross) falls outside the designated potential sources. Model results confirm this (Fig. 1b), indicating there is 0% chance of the sample falling within the polygon, and therefore it is not interpreted.
Post-Run Checks. Once all the preparation steps have been completed, the model employs during its run Markov Chain Monte Carlo (MCMC) statistics, which investigate probability distributions for variables (in this example the proportion of Pb originating from each source). From these 'posterior' distributions (see Fig. 1c), descriptive statistics including mean, standard deviation and range may be produced. It must be noted such outputs are only indicative of proportion of Pb from each source.
To ensure meaningful results, the MCMC chains must be long enough for convergence to be achieved. Convergence of the data may be checked via two diagnostics post-run: Gelman-Rubin and Gewecke tests 47,65 . The Gelman-Rubin test requires at least 1 chain, and values will be near 1 at convergence, therefore high values indicate non-convergence. The Gewecke test compares the two halves of the MCMC chain and develops z-scores to determine convergence. High z-scores are an immediate indicator of non-convergence, and so a longer run must be performed 47 .
In practice, with many sources and three isotope ratios, MixSIAR generally must be set to either 'long' or 'very long' MCMC chains to achieve satisfactory convergence. However, this may vary, and so it is a valuable exercise checking the post-run diagnostics, and in running the model at various lengths. For all models discussed here, 'long' or 'very long' MCMC chains have been run. results may be derived from the raw model outputs (Fig. 1d). However, Pb isotope modelling generally considers many sources, and often the output from a model run could be insufficiently well-defined, sometimes with few clear major contributors. In the second theoretical example (Fig. 1e, Table 3) this is true, with most sources contributing less than 5%, and those with high contributions showing large ranges (e.g. Source 8, Table 3). At this point, we suggest further grouping a posteriori 65 . This combines raw contribution results in order to produce better-constrained relative contributions, but at the expense of specificity within sources. In practice, this means a user will not be able to discern between ore bodies but will be able to indicate more reliably which groups of ore bodies (or major mining regions) are major contributors. As with a priori grouping, this should be performed on groups which are isotopically similar or are located close to each other, allowing for meaningful conclusions to be drawn, despite the simplification this approach causes.
To display the value of such an approach we have performed a posteriori analysis, combining sources which are isotopically similar, for minimising the number of possible end-members (see Table 3). For each possible mixture, all sources to be grouped should be summed for each individual solution 64 . In this way, a new distribution is created for the proportional contributions of the combined sources. The combination of sources 2, 11, 12, 13 and 15 in our theoretical example displays this (Table 4), with the contribution of this grouped source now observable, where before either the sources did not appear to contribute much to the mixture (e.g. Source 3 in Table 3) or displayed large possible ranges (e.g. Source 8 in Table 3). This approach has its advantages in clarifying broad source areas but it does remove some of the certainty to the approach requiring grouping which at times erases distinct sources. Nevertheless, without grouping, it may be difficult to say that any sources provided meaningful contributions to the Pb isotope signature, and so a posteriori grouping is a valuable tool when interpreting data which initially appears unconstrained.
This data, once produced, should be presented either as it is here, in table form (e.g. Table 3) and via figures (e.g. Fig. 1d), but always outlining the entire range of results 65 , and not simply the mean.
Availability of materials and data. All lead isotope data used in this study were published previously and are summarized in tables in Supplementary Information. The model results generated during the current study are available from the corresponding author on request. The tool to allow for Bayesian modelling of a mixing envelope is an R code and can be downloaded from http://www.famer.unsw.edu.au/software/polygon.html.   The MixSIAR model is an R package, which may be downloaded from https://cran.r-project.org/web/packages/ MixSIAR/index.html.

Results and Discussion
Example 1: Pre-Anthropogenic Dust Sourcing. Due to the prevalence of anthropogenic Pb in the biogeochemical system since the inception of metallurgy, Pb isotopes are an imperfect tracer of natural fluxes during the recent past 67 . However, because of the ability of Pb isotopes to record source signatures, they have been applied to dust tracking studies 68 . Using a selection of natural geogenic sources, we attempt to indicate the source of pre-human Pb in the geochemical cycle within Penido Vello (PVO, Fig. 2), a peat bog in north-west Spain 52 , for the period 5000-1300 BCE. Using sources determined in the original publication (local rock, local soil 52 and a Saharan dust value 69 , with the authors assuming negligible anthropogenic input), we have modelled the mixing envelope, indicating only a small number of samples fall inside (Fig. 3a,b). Moreover, a large number of samples (76%) fall a long way from the envelope, due to at least two unconsidered sources (Fig. 3a). As such, by estimating what the composition of these sources must be, and then comparing with published data from plausible sources, we determine two further inputs to the PVO site. The first of these, as speculated by the original publication, appears a volcanic input from the Azores, whilst the second is indicative of a loess-derived input. For demonstration purposes we use a value from a loess field in northern France to represent this source 70 , and data from a selection of lava and ash analysed in the Azores [71][72][73] . Modelling of the mixing space now indicates the majority of mixtures fall within (Fig. 3a,c), and so we have proceeded to model the relative inputs of the five sources. Since only five potential sources are considered, no grouping has been performed a priori (SI Table 1).
Using this new five-source model, a number of conclusions may be drawn. Samples appear to show periodic influx of Saharan dust into the site, with two main periods of higher (>15%) Saharan Pb proportions (3300-2800 BCE and 2100-1300 BCE) corresponding well with the original interpretation, where simple distance-based modelling indicated clear Saharan input midway (3300-2800 BCE) 52 . Outside of these time periods, the Pb isotope signal appears to be related to geogenic weathering, with local rock and soil the major Pb contributors throughout (Fig. 3d). A decreasing trend throughout the whole period for local rock (Fig. 3d) and increasing local soil proportions (Fig. 4c) appears indicative of local soil erosion related to farming and agriculture, with an increase in intensity throughout the time period, echoing evidence from pollen records 74 . The ability of our model to indicate a volcanic influence on the site is clear, with regular pulses in Azore-derived Pb (Fig. 3d).    (See SI Table 2), including a natural pre-pollution aerosol value, as determined by the authors 52 . Here, no a priori grouping was performed prior to modelling. Again, the location of the samples in the mixing envelope has been indicated graphically (Fig. 4a-c), and via a model (Fig. 4d), confirming all samples fall within the mixing envelope.
The first main point of interest is the changing contribution of natural (as represented by pre-pollution aerosol) and anthropogenic (the remainder) sources (Fig. 4f). Early in the period, the anthropogenic sources (all ore bodies) make up ~60% of the Pb, prior to increasing values over the period 350-200 BCE, before plateauing at roughly 90% between 100 BCE-150 CE (Fig. 4f). When compared to conclusions from the original publication, our data suggests a much higher proportion of Pb from anthropogenic sources, perhaps indicative of a previous underestimation of metal production. These values reflect the development and decline of the Roman metal industry, from the acquisition of Hispania during the Punic wars by 146 BCE 75 , through their exploitation, to the combined pressure of source exhaustion 2 and repeated invasions of Hispania from 170 CE onwards 76 . The coherence of the modelled record with the raw Pb concentration data (Fig. 4f) indicates that much of the Pb deposited during this time period was anthropogenic, indicating values of ~90% for the anthropogenic fraction seem sensible.
Unfortunately, due to overlapping fields, many of the modelled ore-related proportion distributions are unconstrained. Some conclusions are still possible, with contributions from Almeria, Andalusia, Mazarron and Cartagena (after a posteriori grouping as south-west Spain), Huelva and the Leonese Zone peaking after the Punic wars (roughly 200 BCE), and falling away by 200 CE (Fig. 5), reflecting the developmental trend indicated by overall anthropogenic proportion changes (Fig. 5). Reocin (of the Basque-Cantabrian basin in Northern Spain) appears to peak just slightly before the Roman invasion (275 BCE), suggesting exploitation by the Iron Age Iberian cultures. Further exploitation of Reocin through the remainder of the period corroborates previous indications of mineral resource exploitation in the region during the Roman period 77,78 , evidence for which has been destroyed by more recent utilisation. Such conclusions indicate the value of such a modelling approach, allowing the user to interpret provenance in much finer detail, tracing pollution back to exact ore fields.
Interestingly, and previously unconsidered by the original publication, mining outside of Spain appears to have also had an impact on the isotopic mixture recorded here, with the contribution of the Mendip hills and Shropshire (grouped as England) appearing to change from very low contribution values at the start of the period, gradually rising after 10 CE, peaking by 150 CE and dropping off after 250 CE (Fig. 5). Such a profile reflects the well-constrained history of British lead mining, with evidence of mining in the Mendips from c.49 CE 79 , prior to considerable fluctuations in lead production throughout the period of Roman rule 80 .
Example 3: Modelling of Artefact Provenance. Another potential application of MixSIAR and isotope mixing is the interpretation of the Pb isotope signal as recorded in individual metal artefacts. Determining the provenance of metal artefacts is of crucial importance in archaeology, and Pb isotopes have previously been used to this end in several studies 50,81,82 . Here we apply MixSIAR, in much the same way as done above, to the Pb isotope mixture of a litharge roll recovered from a Roman-age mining site in Romania 53 . To determine source contributions quantitatively, a dataset of potential sources has been assembled (SI Table 3, Fig. 6).
Due to the large number of sources, initial results indicate a wide range of potential origins for the Pb, but with Moldova Nouă ore field apparently the clearest single source (Table 5). To interpret the data more clearly, a  (Table 6, Fig. 6).
After grouping sources as listed in Table 6, it can be stated that most Pb present in the artefact comes from the Banatites (average of nearly 50%) whilst Baia Mare Pb appears to constitute much of the remainder. Interestingly, the Apuseni group, which contains the Rosia Montana deposit where the litharge roll was discovered, does not contribute significantly (Table 6). This is very much in line with the conclusions reached in the original publication 53 , who suggested that the Pb present in the litharge roll was not from the Apuseni Mountains. Our model appears to suggest a Banat source as dominant, and within that, Moldova Nouă as the most likely mining field. Moldova Nouă, and its porphyry copper deposits 84 has been exploited since Roman times, when Au and Ag were extracted from base metal ores 85 , with the typical method exploiting the chemical properties of litharge to bring the noble metals into the lead prior to further separation 53,86 .
It must be mentioned here that artefacts are much more likely to be composed of a one (or more) Pb ore source(s), in this case likely the Moldova Nouă ore field. The nature of the model, however is that it considers contributions from all sources to satisfy the mass balance, and therefore all sources may have a contribution, even when, as it appears here, the actual source is one ore field. As such, we suggest caution when interpreting such outputs. When one field dominates as it does here, we suggest that the user interprets this as a single, dominant source, and does not over-interpret other contributions. A posteriori grouping may be helpful at this point, as it is in this example, where the combination with other Banatitic ores leads to the clear domination of this signal (Table 6).   Period pollution. One of the further advantages of MixSIAR is that it was developed to investigate populations of species, and not just individuals. As such it allows the user to analyse more than one mixture at a time and provide overall results for the dataset. In practice, this can allow a user not just to obtain proportions at discrete points in time, but to get an overall indication for an entire period. In the example used here, all mixtures have been run separately, and increases and decreases in certain sources may be observed. When all mixtures are run together, a much broader picture of pollution over a longer period may be produced. This is a faster process than modelling each individual mixture, and therefore may be useful for getting a better understanding of the dataset, or for comparing a few datasets covering a short period.

Conclusions
Bayesian mixing models, as developed for food-web studies, may be applied to work focussing on understanding sources of pollution as documented by Pb isotope analyses. In a system where many sources are likely, they are a much more realistic approximation of the real world than the simple binary or ternary mixing currently utilised for such studies. Furthermore, they allow for incorporation of errors, and as such provide a more complex and exhaustive approach. The model results, of course, depend on the quality of the data input. As such, it is vital thorough literature reviews are carried out prior to any analysis, and that all potential sources are considered. This is likely to produce very wide range values, for many sources. An a posteriori grouping approach as done here for the Roman period litharge roll may be applied to clarify what are likely to be the main sources, albeit at the cost of some specificity. Some systems, however, will not produce reliable or clear results. If the isotopic range for sources are similar or overlap significantly, the model may be unable to distinguish separate inputs to the mixture. In other cases, the existence of unknown and uncharacterised sources may result in the mixture falling outside the mixing envelope; in such cases, results are meaningless. A rigorous approach to source characterisation and model parametrisation should rule out errors such as these. Finally, we recommend using such models in tandem with more traditional provenance methods (e.g. 3-isotope plotting of all data); such a dual method approach is likely to produce the most reliable source characterisation. The few examples outlined here indicate the value of such an approach, first by clearly identifying which models require the consideration of further sources, or by identifying exact ore regions as major contributors to a mixture.