Time-dependent taphonomic site loss leads to spatial averaging: implications for archaeological cultures

Archaeologists typically define cultural areas on the basis of similarities between the types of material culture present in sites. The similarity is assessed in order of discovery, with newer sites being evaluated against older ones. Despite evidence for time-dependent site loss due to taphonomy, little attention has been paid to how this impacts archaeological interpretations about the spatial extents of material culture similarity. This paper tests the hypothesis that spatially incomplete data sets result in detection of larger regions of similarity. To avoid assumptions of cultural processes, we apply subsampling algorithms to a naturally occurring, spatially distributed dataset of soil types. We show that there is a negative relationship between the percentage of points used to evaluate similarity across space and the absolute distances to the first minimum in similarity for soil classifications at multiple spatial scales. This negative relationship indicates that incomplete spatial data sets lead to an overestimation of the area over which things are similar. Moreover, the location of the point from which the calculation begins can determine the size of the region of similarity. This has important implications for how we interpret the spatial extent of similarity in material culture over large distances in prehistory.


Introduction
A rchaeologists have a unique interest among historical scientists to understand the spatial structure of past material culture. This often manifests itself in the definition of large regional cultural-technological areas on the basis of similarity in material culture between sites. However, the inherent incompleteness of the archaeological record due to taphonomic and discovery biases can inhibit the search for cultural areas. In recent years, the effects of this incompleteness have been studied in relation to temporal trends (e.g., Surovell and Brantingham, 2007;Surovell et al., 2009;Contreras and Meadows, 2014;Miller-Atkins and Premo, 2018;Perreault, 2018Perreault, , 2019). Yet, comparatively little research has been done to understand how the decreased spatial resolution of the archaeological record due to time-dependent site loss (Surovell and Brantingham, 2007;Surovell et al., 2009) impacts the spatial patterns we are attempting to find.
Cultural classification or the definition of cultural taxonomies is fundamental to anthropological and archaeological research (Barth, 1981;Reynolds and Riede, 2019). This is because the past is often seen as made up of bounded, homogeneous groupings with different ways of life and therefore different material cultures (Shennan, 1994;Lucy, 2005). These cultural units are used to structure research, organise the archaeological record, analyse past cultural dynamics, and communicate results (Reynolds and Riede, 2019;Riede et al., 2019). Cultural classification based on material remains in archaeology has been critiqued from a number of directions. Chief among those is the issue with attempting to draw discrete boundaries in material culture variation as a reference for discrete boundaries among cultural or ethnic groups (Barth, 1981;Shennan, 1994;Furholt, 2008;Riede et al., 2019) or, indeed, subspecies (Villa and Roebroeks, 2014). What has become clear to archaeologists is that variations in material culture cannot be directly mapped onto certain groups and that considering different subsets of the archaeological record can reveal different patterns (Barth, 1981;Shennan, 1994;Furholt, 2008;Hodder, 2012). For this reason, many researchers have been advocating for reformulation of named cultural units either through using attributes that track shared transmission histories of cultural traits , or by considering a more polythetic version of culture to account for cross-cutting patterns of different aspects of material culture (Lucy, 2005;Furholt, 2008).
Despite these critiques, cultural classification remains a mainstay of archaeological practice. Archaeologists routinely compare their excavated materials with those they know from nearby areas and either accept or reject a degree of similarity. These comparisons are sometimes done formally, based on attribute analysis (sometimes supported by multivariate statistics), or, quite often, based on the inspection of published drawings or photographs. Although some contemporary archaeologists use sophisticated mathematical methods to compare artefact shapes (e.g., Eerkens et al., 2006;Lycett, 2016), or indeed, entire assemblages (e.g., Grove and Blinkhorn, 2020), the general classificatory framework is often based on categories that were established earlier using the informal method described above. In addition to the everyday practice of informal assessments of similarity, some studies have turned to quantitative methods to explore the relationship between geographic distance and similarity (i.e., density of particular artefact types, presence/absence of particular traits, etc.) for detecting structure of archaeological cultures (Renfrew, 1977;Kimes et al., 1982;Shennan et al., 2015;Lycett, 2019). Here, the premise is that groups in closer proximity to each other tend to interact and share knowledge more frequently than those farther apart (Johnson et al., 2006;Ross and Atkinson, 2016;Derungs et al., 2018). Similarity based on proximity is also known as Tobler's First Law of Geography, which states that everything is related to everything else, but near things are more related than distant things (Tobler, 1970(Tobler, , 2004Sui, 2004). Incidentally, Tobler's First Law does not only apply to cultural entities, but to all geographic patterns. Examples of this in non-human geography literature most frequently look at the distance decay of similarity in species composition of ecological communities (Nekola and White, 1999;Bjorholm et al., 2008;e.g., Astorga et al., 2012;Wetzel et al., 2012).
Given that archaeologists are still generally interested in finding meaningful spatial structure of material culture despite the low spatial resolution of the archaeological record, we must ask: to what extent are our perceived spatial entities influenced by the incompleteness of the archaeological record? Cultural entities are often defined over vast distances. Although human interaction surely plays a role in the creation of spatially detectable similarity in material culture, the appearance of similar assemblages in distant areas of the world might require a different explanation. Typically, long-distance similarity in material culture is explained by the migration of groups, the vertical transmission of culturaltechnological traditions through interaction, or convergent evolution through independent innovation (Crema et al., 2014;Ross and Atkinson, 2016;O'Brien et al., 2018). However, even if these explanations have validity on certain scales, cultural regions defined especially in the earlier periods of prehistory, such as the Palaeolithic, are massive. Some, such as the Aurignacian or the Gravettian, extend across the European continent and beyond (Reynolds and Riede, 2019). The Acheulian stretches over three continents and lasts over a million years. The ubiquity of such continental-sized cultural regions prompts us to doubt that material culture similarities on such scales were in fact the result of sociocultural processes.
Recent studies have shown that time-and taphonomydependent factors can distort the spatial component of behavioural signals. For example, Miller-Atkins and Premo (2018) demonstrate that time averaging of assemblages can increase the apparent spatial spread of a cultural signature beyond the area actually occupied by that population at any given moment in time. We hypothesise that a similar problem exists for spatial averaging, such that as fewer and fewer points are used to examine similarity across space, the area included in a region of similarity increases. This would mean that cultural regions are systematically overestimated, given the inherent incomplete nature of the archaeological record. To test this hypothesis, we chose to measure the effects of spatial averaging on spatially autocorrelated point data. As any archaeological data set is already incomplete in an uncontrollable way, we chose a contemporary data source that is both spatially autocorrelated and culturally independent: soil classifications. Soil types represent a spatially complete data set with a certain amount of temporal stability that demonstrates spatial autocorrelation but is not the result of cultural processes. For these reasons, it is possible to spatially subsample this data to determine how that affects analysis of similarity without invoking cultural explanations. This allows us to test the effects of spatial averaging without arguing for a similarity between the initial distribution of traits in our toy data set and that of any particular archaeological case.

Methods
Data. The European Soil Database v2.0 (European Soil Data Centre, 2004) is a soil database for all of Europe and Russia, which contains a geographical database of soil polygons with classifications attributes of different soil properties. For our analysis, we converted the polygons to point data using the centroid of the polygons for point placement in ArcGIS 10.4.1 (ESRI, 2016). The analysis was run using two subsets of the data, one which included all of Europe (~25,000 points) and another which contained only the points for a single country (Germany, with~2500 points) given there are often inconsistencies in large-scale maps due to different national mapping systems (Sprafke, 2016).
The categorical attributes used for analysis were classifications of the topsoil and subsoil, slope, accumulate temperature, and depth to rock, as well as the full soil code from the World Reference Base for Soil Resources (see Table 1). These were selected from over 70 total soil attributes for the raw data set to give a wide range of comparisons for calculating the similarity between soil points. Additionally, this range of variables was used to best approximate how informal archaeological comparisons are made: through comparison of absence/presence of particular trait-based variables and/or proportions of different categories of artefacts between sites.
Calculating geographic distance to minimum of similarity. Our spatial averaging methodology consists of repeatedly constructing decay curves for similarity from one random point to a set of other points and then comparing the absolute geographic distances to the first minimum in similarity. All analyses were carried out in R 3.6.1 (R Core Team, 2019) and all scripts can be found on GitHub (https://github.com/cocoemily/spatial-averaging). Geographic distances between the points are calculated in kilometres. Similarity between points is calculated through simple matching of categorical attributes as a percentage of all categories. This means that points that share all of the same values for each of the 13 attributes would have a similarity score of 1, whereas points that share the same values for 12 of the 13 attributes would have a similarity score of 0.92. We modelled similarity in this way because some of the attributes have similar values (as seen in Table 1), and therefore needed to be treated as an ordered set, which standard similarity indices do not account for. Geographic distance is then used as a predictor variable for similarity in fitting a smoothed function to the data using a generalised additive model. The function of this model is then analysed to find the first local minimum (see Fig. 1 for example curves). We chose to analyse the first local minimum of the similarity curve (hereafter, first minimum of similarity) because it best approximates how archaeologists construct cultural zones in practice.
This process is repeated for progressively smaller proportions of the total points. Specifically, we used randomly sampled points at 10% proportion intervals from 90 and 10%, and at 1% intervals from 10 to 1%. These intervals were chosen because the total number of soil data points is~25,000, so therefore these small percentages better approximate the number of sites we might expect in any given archaeological time period.
Additionally, we calculated what proportion of points could theoretically remain at three time intervals, 10 ka BP, 50 ka BP, and 100 ka BP, by fitting an exponential function to radiometric dates for European Palaeolithic sites dating from 7000 to 900,000 BP (Vermeersch, 2019). The database contains radiocarbon, TL, OSL, ESR, Th/U, and AAR dates from the European Lower, Middle, and Upper Palaeolithic from >10,000 sites (Vermeersch, 2019). For this study, we used over 14,000 radiometric dates of the nearly 18,000 dates in the total data set; we did not include dates coded as "unreliable." The best fit function (1) below explains~72% of the variance in date frequency, where n t is the number of radiocarbon dates surviving at time t. n t ¼ 412:5e À4:073 10 À5 t ð1Þ Using the above equation, the number of sites at the three aforementioned times is calculated and divided by the total number of radiometric dates. These proportions were then used as sampling proportions for the methodology described above. This analysis allowed us to model how much data could potentially be lost due to taphonomy using a curve based on empirical archaeological data.
As the points are randomly sampled, we repeated the methodology 100 times for all proportions to capture variation. For each subset of the total points, the distance corresponding with the first minimum in similarity for the fitted function was recorded.

Results
Absolute distance to first minimum for similarity. A negative power function of the form: d = αP β , where d is the distance (in kilometres) to the first similarity minimum, P is the percentage of total points, and α and β are estimated parameters, was fit to the values of absolute distance to the first minimum in similarity by percentage of total points via nonlinear least squares regression (Fig. 2a). This function was determined to be a better fit than a linear regression model and an exponential function using the AIC. The estimates of both α and β were significant to p < 0.001; α was estimated to be 1322 with a 95% confidence interval of [81.34, 183.06], meanwhile β was estimated to be −0.04148 with a 95% confidence interval of [−0.052, −0.031]. The residual standard error of the model is relatively high at 513.1, which makes sense considering the wide range of variation in the distance to the first minimum at each percentage that results from each sample having a different starting point for comparison. That being said, the negative power relationship is still significant, meaning that as fewer points are considered, the distance over which things are similar increases for this particular dataset. This negative power relationship between distance to first minimum of similarity and percentage of points is confirmed by linear regressions on the logged distances at percentages from 100 to 10% and 10 to 1% separately (Fig. 2b, red line and blue line respectively). The negative slope for lower percentages is nearly twice as great as that of the higher percentages (see Table 2), further demonstrating how the effects of spatial averaging are amplified at the smallest amounts of total points on a continental scale.
Results within national boundaries for soil similarity. Because of inconsistencies at the international level among soil classification systems, we applied our model to a data set at the national level as well, in this case Germany. This allows for examination of how spatial averaging affects analysis of similarity at a smaller scale. The results for soil points within Germany demonstrate similar negative relationships between the percentage of total points and the absolute distances to the first minimums of similarities. However, for this smaller data set, a log-linear model better fits the data then the negative power model that was fit to the distance decay results from the total data set (see Fig. 3). The regression results are presented below in Table 3.
Comparison of "aged" samples. We wanted to examine how the areas of similarity compared when the soil types were sampled at rates corresponding to specific ages of deposits. Surovell and colleagues have demonstrated that the expected number of sites surviving in the archaeological record depends on the time since their deposition (Surovell and Brantingham, 2007;Surovell et al., 2009). Using colleagues (2007, 2009) methodology, we fit an exponential function to radiometric dates for European Palaeolithic sites dating from 7000 to 900,000 BP, in order to determine how many points to use for looking at similarity decay corresponding to modelled ages of deposition. We then compared the difference in distance to the first minimum of similarity for deposits aged at 10,000, 50,000, and 100,000 years ago based on 491, 96, and 12 points, respectively. We found again that there were fairly wide ranges of variation for the relative differences in distance to the first minimum at each of the three modelled ages of deposition. However, when comparing these groups using a Mann-Whitney U-test there were significant (p < 0.01) differences between the 10 and 100 ka and between the 10 and 50 ka distance to first similarity minimum. Additionally, as can be seen in Fig. 4, there is an increase in the absolute distances to the first minimum as the age of deposits increases, meaning that as the modelled age of our deposits increases, so does the distance to the first minimum in similarity. That being said, there appears to be a certain proportion of the total data set at which the relationship between the increase in distance to the first minimum of similarity and number of points becomes negligible, as demonstrated by the lack of significant difference between the 50 ka sampling and the 100 ka sampling, which represent~0.004% and~0.0005% of the total soil points, respectively. Absolute distance to first minimum on the curve constructed with 100% of points and that with 1% of points indicated by the red arrows. Fig. 2 Regressions fitted to values of absolute distance to first minimum of similarity by the percentage of total points sampled for the total dataset. In a, negative power function (green line) fitted to all data for the absolute distance to first minimum. In b, linear regressions fitted to subsets of the data; the red line is fitted to samplings up to 10%, whereas the blue line is fitted to samplings from 10 to 1%. Location of reference point. Our results showed that the location of the starting point does not affect the overall trend of increasing distance to similarity with more site loss. However, the identified region of similarity will have different sizes depending on where the initial comparison point is located, as is evident in Fig. 5. The area of similarity at the lowest proportion of points in Fig. 5b is nearly twice times as large as in Fig. 5a, when the reference point is in a different location. This finding is consistent with the large ranges of variation in distance to first minimum of similarity noted above. This finding is of particular importance to archaeological research because the definition of cultural areas is typically based on similarity to 'type assemblages,' which are defined by the order of discovery. Our analysis shows that attempting to define regions of similarity based on comparison to a single starting point not only makes those regions subject to the effects of spatial averaging, but also this means the effects of spatial averaging will differ depending on where that starting point is. This quantitatively demonstrates the well-known limitations of applying the definitions of cultural areas in one area of the world to the archaeological record of another; depending on where the initial definition occurs, the region of similarity will be of a different size, no matter how spatially incomplete the dataset is.

Discussion
The results presented above indicate that as we consider fewer points for determining regions of similarity, the distance to the first minimum in similarity increases. This conclusion should apply to all spatially autocorrelated datasets, whether they are anthropogenic or not, but it is particularly important for anthropogenic ones, where the behavioural signal can be all but erased by spatial averaging. All of the analyses showed that the distance to first minimum in similarity increases by 0.5-3% with each percentage point of reduction in the number of points considered. When we consider that our analysis is on the scale of thousands of kilometres, these small percentage increases in distance to first minimum of similarity result in large overestimates of similarity areas. It should be noted that the Rsquared values on the regressions are small, explaining at most 4% of the variation. This is likely due to the large variation in how similar points are to each other at a given distance (see the large error bars in Figs. 1-3). As stated above, this variation results from different reference points and different comparison points used for each subsampling. Soil types do not exhibit perfect gradients in terms of regions of similarity; instead they are distributed in a mosaic, with similarity decreasing at one distance only to increase again at a farther distance. Despite this variation, the negative relationships presented above are still statistically significant, which means the proportion of points considered does have a significant impact on the size of the region of similarity identified, whether that is at the continental scale or a more regional scale (i.e., within Germany). The goal of this study was to identify if missing point data results in an extension of identified areas of similarity, not to predict the magnitude of this effect. Given that this magnitude is likely dependent on the original signal strength, future studies should use archaeologicallyrelevant data to assess it.

Conclusions
The effect of spatial averaging on our understanding of areas of similarity has two important implications for archaeology. First, the amount of spatial averaging is inversely proportional to the amount of data remaining. This should affect both older periods and geographic regions with histories of significant landscape changes. We should expect to have fewer and fewer data points as  Fig. 4 Comparison of distance to first minimum of similarity between "aged" deposits. Comparison via Mann-Whitney U-tests in absolute difference in distance to first minimum in similarity for expected number of surviving archaeological sites at 10,000, 50,000, and 100,000 years ago. ** = significant to p < 0.01. Fig. 3 Linear regression for soil types within Germany. Linear regression models fitted to data for the absolute distances to first minimum in similarity using soil points from within Germany.
we consider older and older periods of prehistory (Surovell et al., 2009). This means that for older archaeological deposits, especially those from the Stone Age, we are likely vastly overestimating regions of similarity just by virtue of not having the complete data set of points, particularly when comparing similarity on an attribute basis. For large parts of the Stone Age, this effect is compounded by the common use of lithics as the single available type of material culture. Likewise, we can expect the amount of erosion, sediment accumulation, and overprinting to produce different amounts of spatial averaging in different regions (e.g., Wilson, 1988;Fanning and Holdaway, 2004;van Leusen et al., 2011;Lovis et al., 2012;Iovita et al., 2014). Second, the geographic location of the first point of comparison matters. This makes sense if similarity is not originally uniformly distributed, and especially if the information loss will affect the underlying distribution at different rates. As stated above, this is likely to create undetectable biases in classifications, as chance, research history, the availability of funding, and many other factors unrelated to ancient behaviour will determine where any particular type of material culture is first found.
Our report of spatial averaging adds an important dimension to the recent focus on time averaging in archaeological deposits. Several recent papers highlight its role in the formation and interpretive potential of the archaeological record (Davies et al., 2016(Davies et al., , 2018Dibble et al., 2017;Coco et al., 2020;Rezek et al., 2020). Furthermore, the effects of time averaging as a result of the way archaeologists group data has become better established (Miller-Atkins and Premo, 2018;Perreault, 2018). For example, Miller-Atkins and Premo use agent-based simulations to show that considering time-averaged assemblages can increase the apparent spatial spread of a cultural signature beyond the actual occupation region of a population with that particular cultural variant (Miller-Atkins and Premo, 2018). The effects of spatial averaging due to missing data points are similar to those demonstrated for time-averaged assemblages. The results from the study presented here demonstrates that spatial averaging separately from time averaging can also increase the area within a region of similarity. Further research should look into how the combined effects of time and spatial averaging impact archaeological patterns.
Given that both temporal and spatial averaging are inherent biases of the archaeological record, we need to consider more carefully how we use similarity to demonstrate the cultural relatedness of any two sites in archaeology. This is not to say that archaeologists should not be organising the archaeological records into units of analysis based on similarity. Such a goal is probably unrealistic and impractical. Instead, the results presented here demonstrate that the similarity noted in the archaeological record does not necessarily map onto the original spatial organisation of cultures at the time of interest due to timedependent taphonomic information loss. Thus, we as archaeologists need to rethink using the size of a defined material cultural area to discuss population and cultural dynamics. The inherent spatial and temporal incompleteness of the archaeological record and its demonstrated effects on interpretations of cultural areas clearly shows that archaeological "cultural" areas are not true representations of actual living cultures. As such, we need to further examine what behavioural and cultural signals our regions of similarity actually represent and better incorporate that into our interpretations of the archaeological record.

Data availability
The raw soil classification spatial data used for this project is available for download from the European Soil Data Centre at https://esdac.jrc.ec.europa.eu/content/european-soil-database-v20-vector-and-attribute-data. The point data created from the European Soil Database specifically for this project is available via the OSF registration at https://osf.io/vzpyt. Code and spatial data used for spatial subsampling are available in a GitHub repository at https://github.com/cocoemily/spatial-averaging.