Introduction

Unravelling past spatio-temporal patterns and processes of culture change is one of the primary aims of archaeology. Over the last 30 years, genetic analyses have increasingly contributed to this agenda as they promise to disambiguate purely cultural and biological dynamics. A fundamental precondition for recognising such processes is the clear definition of the analytical taxonomic units used for investigation. This entails (i) consistent criteria for their definition and delimitation, the validity of which is established a priori in relation to the questions asked, (ii) a clear taxonomic system into which such archaeological entities can be placed, (iii) agreement on the meaning of the relative ranks within this taxonomic system, and (iv) their meaning within an agreed-upon theoretical framework. These four requirements are essential for conducting comparative and cumulative research at a supra-regional and diachronic scale, and for constructing narratives of deep history.

The definition of archaeological taxonomic units was of great concern to early practitioners, and the typological method (Montelius, 1903) was developed to this end, with evident and explicit inspiration from similar taxonomic efforts in biology (Riede, 2006; Riede, 2010). Taxonomic units—with vernacular labels such as cultures, technocomplexes, groups, industries, traditions or facies—proliferated; these were thought to represent actual past ethnic groups, sometimes implicitly, at other times very much explicitly (Bergsvik, 2003; Clark, 1994; Sackett, 1991; Barton, 1997). The difficulties of inferring group coherence or indeed even ethnicity from archaeological material have re-emerged with new urgency in the wake of recent publications such as David Reich’s (2018) programmatic monograph on archaeogenetics. Foreshadowed by critical reviews of recent archaeogenetic research (Johannsen et al., 2017; Furholt, 2018; Hofmann, 2015), this publication has engendered immediate responses that argue for a more even-handed integration of genomic and archaeological datasets (Linderholm, 2018; Horsburgh, 2018; Vander Linden, 2018; Klein, 2018; Bandelt, 2018; Kirch, 2018). These responses, however, offer little in the way of concrete advice on how such an integration may be achieved methodologically.

Cognisant of the issue of implied ethnicity and after reviewing the current melange of ad hoc naming conventions, Eisenmann et al. (2018) propose a palette of patently pragmatic rather than polemic solutions that would see genetically-recognised population clusters preferentially named by geography and relative cultural chronology (e.g., C_Europe_LN for the Central-European Late Neolithic). According to Eisenmann et al., such naming conventions offer the advantages of brevity, coherence, accessibility, flexibility, and stability and avoid a simplistic matching of archaeogenomic clusters with archaeological cultures. While useful from the point of view of labelling, however, such taxonomies do not exploit the evidential potential of the archaeological record in relation to past demography. We see the aim of archaeological analysis to be the tracking of patterns and processes of cultural transmission, which can subsequently be brought into dialogue with genetic data under the umbrella of dual-inheritance theory (e.g., Shennan, 2011). In this view, spatial and temporal coherence of certain material culture attributes are the result of cultural transmission dynamics within given populations. Hence, a definition of cultural taxa based on attributes that can reasonably be linked to the transmission of craft skills offers a more robust theoretical grounding. Rather than abstaining from creating cultural taxonomies, we therefore suggest that evolutionary approaches provide a way forward that is at once epistemologically and analytically viable and suitably ambitious vis-à-vis the epistemic work that the archaeological record can potentially do. Rather than abandoning cultural labels we suggest that current cultural taxonomies need to be scrutinised and revised before we can reconcile these meaningfully with their archaeogenomic counterparts.

Evolutionary thinking has a long pedigree in archaeology (Shennan, 2002a), and in the last two decades, substantial strides have been made towards a definition of culture as an evolutionary system parallel to other domains of inheritance (i.e., genetic, epi-genetic: see Shennan, 2002b, Lipo et al., 2006; Shennan, 2009). In fact, a major boost to definitions of culture as an information transmission system took the form of formal models inspired by population genetics. Known as gene-culture co-evolution models or dual-inheritance theory, these approaches elaborate the point that culture and genetics are linked but must be understood separately and at similar levels of quantitative sophistication (Boyd and Richerson, 1985; Cavalli-Sforza and Feldman, 1981). In the following, we demonstrate the epistemological and methodological differences between traditional definitions of archaeological cultures and evolutionary archaeological definitions thereof. We then show how the evolutionary nomenclatures become amenable to analytical approaches (i.e., phylogenetics and formal modelling) that mirror those used in population genetics and how these hence offer interpretative avenues that deliver substantially higher epistemic dividends. This paper is a comment on recent taxonomic practice offered in the hope of stimulating further productive developments.

Ideational versus materialist definitions of culture—and the role of computational methods

Archaeologists developed an ideational, essentialist and top-down typological approach in the late 19th and early 20th century, a time when rapid agricultural and industrial development resulted in a massive increase of archaeological finds that needed to be put in order. Archaeological typology was in fact modelled on biological taxonomies as understood at that time (Riede, 2010). However, biology later went through a conceptual revolution that transformed this essentialist understanding of these key analytical units into a materialist and population-focused one (Mayr, 1959). This was then followed by the adoption of computers opening up novel ways of dealing with ever-larger datasets and with this ever-larger variation without resorting to the obvious abstraction of idealisation (Hagen, 2003). In turn, this facilitated the development of precisely those phylogenetic methods now used to partition biological variation at different levels (O’Hara, 1997), including the intra-population divisions into clusters now so clearly revealed for past human populations by recent aDNA genomic studies.

In contrast, and despite the efforts of David Clarke (1968) in the 1960s to develop a ‘polythetic’ definition of archaeological entities, much of archaeology never underwent such a conceptual and methodological overhaul (Lycett and Shennan, 2018). At the time when such a rethinking might have happened, the discipline instead turned towards different concerns (Shennan, 2004; Trigger, 2006). Computers and statistics were slow to make an inroad into the discipline (e.g., Aldenderfer, 2005) and cultural taxonomic studies fell radically our of fashion with the result that culture-historical nomenclatures were largely left unexamined and unrevised (Roberts and Vander Linden, 2011). The current diversity of archaeological taxonomic units and the evident methodological heterogeneity behind their construction and interpretation is a major issue for both later prehistory, as summarised by Eisenmann et al., and also for earlier periods (Clark and Riel-Salvatore, 2006; Sauer and Riede, 2019). In part at least, this heterogeneity is the result of an inertia in the revision of epistemologies and analytical methods when it comes to classification and cultural taxonomy (Bisson, 2000), an inertia linked at least in part to the obvious need to communicate within the discipline and to external stakeholders including the public. In addition, many traditional typological units have become reified in heritage databases. Their epistemological status has nonetheless come under close scrutiny and one branch of archaeology has begun to address this issue, evolutionary archaeology. Inspired by the development of gene-culture co-evolutionary models, which view culture as a multi-generational system of information transmission akin to but also different in its details from genetic inheritance (CavalliSforza and Feldman, 1981; Boyd and Richerson, 1985), evolutionary archaeologists have for the last 30 years been adapting both micro-evolutionary (population genetic), as well as macro-evolutionary (phylogenetic) methods to the study of material culture change (Bettinger, 2008; O’Brien and Lyman, 2000; O’Brien, 2008; Shennan, 2009; Shennan, 2008; Lipo et al., 2006; Mace et al., 2005; Shennan, 2002b). Such conceptions of cultural variation are also highly cognisant of not conflating phylogenetic branches with biological or ethnic groups, but instead argue that they can validly be understood as the outcome of past communities of practice (Fig. 1; O’Brien et al., 2008; Collard and Shennan, 2008; Riede, 2011b).

Fig. 1
figure 1

A schematic figure outlining the difference between typological thinking and population thinking as implemented in regard to material culture variation. A given population (from which also archaeogenetic samples are taken) is seen as a community of practice (Lave and Wenger, 1991) composed of individuals of different age, sex, ability, access to knowledge and raw materials, here indicated by size and colour differences. Production processes and pedagogical practices in such past communities can sometimes be inferred in great detail (Donahue and Fischer, 2015; Bodu, 1996; Högberg, 2008). The artefacts produced in such communities vary, which is shown here through the outlines of Final Palaeolithic (15,000–11,000 cal BP) large tanged points from the type site of the so-called Bromme culture (Mathiassen, 1946). Panel a shows how within the framework of traditional typological thinking, the typological abstraction is thought of as a somehow idealised shared mental template, here represented by the median shape, which however has no actual empirical representative. Once defined, such idealised types act as reified stand-ins for the communities of practice. In contrast, panel b shows how in a materialist approach further variation is considered to be introduced over generations (g) into the total sample of artefacts (Eerkens and Lipo, 2007), which can subsequently be selected by cultural and natural factors. Here, large samples of artefacts, together with chronological and spatial data facilitate inferences about transmission processes and hence about changing population dynamics

We know from many detailed studies of past technologies which traits are best suited for such analysis and which traits are likely to reflect knowledge and know-how acquired as part of apprenticeship processes involving close interaction between learner and teacher (Jordan, 2015; Stark et al., 2008; Tehrani and Riede, 2008; Tostevin, 2013). Instead of placing archaeological material into preconceived and usually rather static ideal categories, such evolutionary approaches apply phylogenetic techniques and isolation-by-distance modelling (Shennan et al., 2015), among other methods, to empirically investigate taxonomic structure in a given dataset. Such techniques can additionally also provide independent estimates of population contact and mixing, in the future creating archaeologically-based admixture graphs (Pickrell and Pritchard, 2012), which can be tested against genetic data. In addition, such techniques can reveal structured nesting of coherent cultural taxonomic groups and thereby provide robust criteria for differentiating between vernacular categories such as cultures, technocomplexes, groups, industries, traditions or facies that are often, but rarely consistently, understood to represent different levels of cultural differentiation. Such phylogenetically derived groupings can then be understood as past communities of practice tied together by shared transmission histories of cultural traits, i.e., they represent the methodological complement to palaeogenetic clusters that are the result of shared biological transmission histories. In principle, these clusters should then display a degree of spatial and temporal coherence, but spatio-temporal coherence is a result of, rather than a necessary feature for, taxonomic coherence.

Selecting the right proxies for the right questions

The current wave of archaeogenetic studies is trying to assess past population relationships and the dynamics that produced them using genetic proxies. Similarly, archaeological taxonomies are tools to infer past cultural dynamics. Yet, both spatial and temporal proximity are rather indirect proxies at best for interaction patterns especially under those conditions—rapid culture change, dispersal, migration and population contact—of particular interest to such investigations. Chronology is important but a poor proxy for cultural relatedness because multiple traditions can coexist at any one time. Just as a given population can contain multiple genetic variants, so can their cultural composition be a mixture of many traditions and practices, as has been demonstrated for the Neolithic and Bronze Age Corded Ware and Bell Beaker cultures (Furholt, 2014; Vander Linden, 2016). Spatial proximity is a poor proxy for cultural relatedness because humans can be highly mobile. That said, geographic closeness does play some role in structuring cultural transmission and hence material culture variation but the degree to which this is so must be assessed empirically on a case-by-case basis (Jordan and O’Neill, 2010; Jordan, 2009). The key characteristic in question, cultural evolutionary descent (i.e., historical relatedness), is more robustly tracked by those traits acquired through social learning in the intimate settings of childhood and apprenticeship (Table 1).

Table 1 Different broad categories of cultural transmission that produce attendant long-term dynamics of culture change (Hewlett and Cavalli-Sforza, 1986)

Against the background of their very valuable review, Eisenmann et al. accept a diversity in naming practice and argue that workable nomenclatures must satisfy five key criteria: brevity, coherence, accessibility, flexibility, and stability. Phylogenetically derived cultural clusters do not come with vernacular labels readymade, although particular monophyletic branches occasionally coincide with traditionally-named cultures, whose designations could then, in principle, be transferred so long as the new basis for them was clear (Riede, 2011a). Yet, while names can be important, we consider issues of coherence, accessibility, flexibility, and replicability more critical. While clearly safeguarding against naïve juxtapositions of archaeogenomic and archaeological patterns, we worry that the labelling approach of Eisenmann et al. underutilises the epistemic potential of the archaeological record for shedding light on population-level processes of cultural transmission, mobility and contact. In contrast, computational phylogenetic and network-based methods, including the use of admixture trees, offer a transparent, replicable and case-transferable way to construct statistically validated and hence stable archaeological operational taxonomic units, especially as code-sharing and replicability come to the fore within the discipline (Marwick et al., 2017; Marwick, 2017). Archaeological taxonomies could be constructed using a wide range of material culture datasets, going from relatively simple presence/absence counts of particular object classes in, for instance graves or settlements to two- or three-dimensional object scans taken from key artefacts; we provide an example of such an approach below. The ever-more rapid capture of such digital images and scans would swiftly result in large databases; artefact forms could then be interrogated using, for instance, geometric morphometric approaches (Petřík et al., 2018; Schillinger et al., 2016; Serwatka and Riede, 2016; Buchanan et al., 2014) coupled with more traditional trait-based analyses.

Once a solid overview of the existing material culture variation is attained and appropriate analytical protocols are established, adding data derived from newly excavated sites or previously untapped museum archives becomes straightforward. It is interesting to note in this context that archaeologists make widespread use of drawings as a way of conveying artefact characteristics (Lopes, 2009). Many extensive catalogues of such drawings exist. More often than not, it is these drawings rather than the actual objects that then are consumed and absorbed by practitioners when constructing or circumscribing particular cultural units. The digitisation and subsequent computer-aided analysis of such images is now within easy reach.

Computational methods for constructing cultural taxonomies: morphometrics and cultural phylogenetics

To demonstrate the importance of computational phylogenetic and network-based methods in assessing operational taxonomic units, we present a dendrogram of broadly contemporaneous Final Palaeolithic large tanged points as an example. Based on their size, evaluated against the ballistic requirements of different weapon delivery systems, these most likely all served as dart-points associated with a spear-thrower propulsion system (Riede, 2009b); in order to avoid confounding factors such as re-sharpening we include only complete specimens in our analysis. Traditional culture-historical assignments see variants of this artefact class representing local or regional populations, descended from a common ancestor (Sinitsyna, 2002; Szymczak, 1987), most notably the southern Scandinavian Bromme culture (Mathiassen, 1946). To evaluate this quasi-ethnic population division, we construct a dendrogram based on a two-dimensional geometric morphometric methodology, encompassing 226 large tanged point illustrations from Eastern and Northern Europe (Fig. 2). Illustrations from this geographical region include examples from Belarus, Lithuania, Poland, western Russia and Ukraine; we also include illustrations from the Bromme culture itself (Table 2).

Fig. 2
figure 2

A map of all sites examined throughout the article (n = 56). (1) Baroŭka; (2) Chilczyce; (3) Chvojnaja; (4) Koromka; (5) Krasnasieĺski; (6) Motol; (7) Woronowka; (8) Elemly Sø; (9) Hjarup Mose; (10) Rolykkevej; (11) Rundebakke; (12) Sølystgaard; (13) Bromme; (14) Trollesgave; (15) Bro; (16) Alt Duvenstedt; (17) Dohnsen; (18) Sassenholz; (19) Baltašiškės; (20) Derežnyčia; (21) Duba; (22) Ežerynas; (23) Glūkas; (24) Glyno Pelkė; (25) Gribaša; (26) Kašėtos; (27) Katra; (28) Lieporiai; (29) Marcinkonys; (30) Margių; (31) Maskauka; (32) Merkys-Ūla; (33) Mitriškės; (34) Rudnia; (35) Varėna; (36) Varėnė; (37) Vilnius; (38) Burdeniszki; (39) Dziewule-Piaski; (40) Krzemienne; (41) Maćkowa Ruda; (42) Płaska; (43) Stańkowicze; (44) Suraż; (45) Wolkusz; (46) Zusno; (47) Podol; (48) Ust-Tudovka; (49) Anosovo; (50) Vishegore; (51) Tieply NRuchey; (52) Krasnosillya; (53) Lipa; (54) Liutka; (55) Rudnya; (56) Velyky Midsk

Table 2 The dataset used for the dendrogram (ntotal = 226)

To examine differences between archaeological units through geometric morphometrics, elliptic Fourier analysis (EFA henceforth) was utilised. EFA is a common method of closed-outline shape analysis grounded in the decomposition of closed outlines into an infinite series of repeating trigonometric functions (harmonics). In comparison to other methods of two-dimensional closed-outline methods including coordinate-point eigenshape (Macleod, 1999), Fourier radius variation and Fourier tangent angles (Zahn and Roskies, 1972), and the fitting of polynomial curves (Rogers and Fog, 1989), EFA boasts a number of methodological advantages. One noticeable advantage is that EFA does not require data points to be equal in number, or evenly spaced, allowing more closely-spaced data points on segments of high curvature and artefact complexity (Rohlf and Archie, 1984; Crampton, 2007). As such, EFA is now commonplace in the statistical analysis of archaeological stone tool shapes (e.g., Saragusti et al., 2005; Iovita, 2009; Cardillo, 2010; Iovita et al., 2017; Serwatka, 2015). For more information on the fundamentals and mathematical framework underpinning EFA please refer to Caple et al. (2017).

All illustrations (.png) were first synthesised into one thin-plate spline (.tps) file, common for geometric morphometric analyses. This was performed in tpsUtil v.1.69, with Cartesian coordinates and positions for each image created using the ‘Outline object’ function in tpsDig2 v.2.27 (Rohlf, 2015). As these outlines do not require the same number of landmarks (given the chosen method of analysis), and in order to capture as much of the original shape as possible, the raw outline was retained. Thus, the tanged points feature an average of 1544 Cartesian coordinates. In standardising all outlines prior to EFA, all specimens were normalised to a common centroid (0,0) and rescaled using their centroid size (Bonhomme et al., 2017).

Normalisation through rotation was unnecessary as this is incorporated through subsequent elliptic fitting. A principal component analysis (PCA) was then conducted on the elliptic Fourier coefficients, with the principal scores used for agglomerative hierarchical cluster analysis (with archaeological taxonomic units for the tanged points displayed). All analyses were performed in the R Environment (R Core Team, 2017), using Momocs v.1.2.9 (Bonhomme et al., 2014). For visualisation of the dendrogram (Fig. 3) the ggtree v.10.5 package was used (Yu et al., 2016). The .tps file, metadata (in .csv format) and R Markdown (d: in .rmd and .html formats), extensively detailing the exploratory procedure in this article, can be found on the Open Science FrameworkFootnote 1.

Fig. 3
figure 3

An example dendrogram of broadly contemporaneous Final Palaeolithic (15,000–11,000 cal BP) unifacial large tanged points from Europe. All these objects are technologically very similar and are often thought to be historically related. Here, they are analysed by their two-dimensional shape variation using geometric morphometric methods. Colours mark traditional typological labelling, reflecting culturehistorical divisions of the material into regional units and sub-units: cultures. Such a tree-building analysis reveals cultural taxonomic structure at different levels without a priori idealisation, but also shows that many traditional units cannot be re-found in this manner. Large numbers of objects can readily be included in the analysis and the placement of each object can be tracked

By then including dating information and stratigraphy, such dendrograms can be transformed into cladograms and used to infer the cultural evolutionary history of particular communities of practice. The critical result here is, however, that very few of the objects traditionally assigned to different ‘cultures’ defined by region or presumed affinity show consistent clustering (see Fig. 3). This supports recent critiques of this particular artefact class as a valid cultural diagnostic (Kobusiewicz, 2009; Riede, 2017; Serwatka and Riede, 2016) and demonstrates that traditional definitions of archaeological cultures are ripe for reinvestigation using computational methods. Once archaeological groups are defined as operational taxonomic units within a transmission system, they become epistemologically aligned with genetic groups, opening avenues for parallel co-phylogenetic analyses that compare like with like.

The promise of a phylogenetic concept of culture

In the above, we have sketched out a roadmap for creating robust operational archaeological units that can, in principle, be meaningfully reconciled with archaeogenetics, as well as, incidentally, palaeoenvironmental datasets (Gamble et al., 2005). No ancient genetic analyses matching our Final Palaeolithic case study are available yet and we do admit that archaeology still has some way to go before such analytical definitions of culture become widespread. It is also critical that, whenever possible, multiple artefact classes are used to build cultural evolutionary trees as different objects may be responsive to different transmission pathways (related to age, gender, status, or use context). Also, techniques other than 2D or 3D morphometrics can be used, for instance, technological and attribute analysis; this is particularly relevant for lithic strategies which may not provide distinct shapes (and thus cannot be detected through EFA), e.g., blade-blank industries. These challenges aside, the similarities and differences between these cultural lineages—and their associated genetic patterns—would be revealing about societal dynamics. Future analyses aiming at the parallel understanding the diachronic evolution of genetic and cultural frequencies should target those sites that may yield aDNA, as well as directly associated artefact assemblages allowing for a rigorous re-analysis of the artefact material.

In sum, we see little value in replacing traditional cultural but problematic nomenclatures with new ones that do not make fuller use of the epistemic potential of the material at hand. Instead we strongly encourage archaeologists and geneticists together to seize the opportunity provided by parallel revolutions in not just archaeogenetics and in the palaeoenvironmental sciences but also, critically, in computational archaeology to more comprehensively refurbish archaeological taxonomic approaches. The promise of defining archaeological cultures phylogenetically rests not only in the robustness and transparency of the approach but also in bringing archaeological and genetic approaches methodologically closer. Once, but only once, both genetic clustering and material culture clustering are defined phylogenetically, can we explore how to compare the emergent structures in those datasets and the various drift and selection forces acting to produce them, not just qualitatively but quantitatively—and such co-phylogenetic methods are available (Tehrani et al., 2010; Riede, 2009a, Bortolini et al., 2017). Note finally that Marwick and Schmidt (2019) have recently shown how the adoption of new tools is driving substantial scientific advances in archaeology. The impact of purely natural scientific methods such as palaeogenomics on historical disciplines such as archaeology is beyond doubt (Kristiansen, 2014), Marwick and Schmidt demonstrate specifically how quantitative analytical approaches and code sharing also drive such change.

Our concern voiced here, namely to upgrade our treatment of the archaeological evidence in terms of constructing cultural taxonomies is, we believe, fully aligned with the intentions of Eisenmann et al. (2018, p. 10) who have deliberately offered their contribution as a springboard for “further reflection on the topic of naming conventions in archaeogenetics”. Unlike many other discussions of the conceived and real challenges in bringing archaeology and palaeogenetics together, our comment is directed primarily at our archaeological colleagues. We hope to have shown here that, rather than simply addressing naming conventions, a conceptually and methodologically forward-looking avenue for taking archaeogenetics and archaeology into the future lies in a materialist, population-based re-thinking of the archaeological cultures themselves.

Data sharing

Datasets related to this study are available at the OSF repository at (DOI: 10.17605/OSF.IO/VTDF2): https://osf.io/vtdf2/