Background & Summary

In the Paleolithic, geometric signs are found in parietal art as well as on mobile objects. Most of these signs appear in the period between 100,000 and 10,000 BP, but some examples are known from earlier periods1,2. The Paleolithic is further subdivided into so-called techno-complexes. For instance, the Aurignacian is an Early Upper Paleolithic techno-complex dating to around 43,000 to 30,000 BP3,4,5,6,7,8,9. It roughly corresponds to the time when anatomically modern humans migrated into the Near East and Europe and encountered and lived alongside Neanderthals for several millenia. One of the characteristics of the Aurignacian is the abundant use of osseous material for the production of tools, weapons, ornaments, and artworks (see Fig. 1)10,11,12,13,14,15,16,17,18,19,20,21,22. Many of these objects are decorated with geometric signs.

Fig. 1
figure 1

Examples of mobile objects with geometric signs from the Aurignacian. 1. Vogelherd, mammoth figurine, ivory, length 5 cm (Lipták © University of Tübingen); 2. Hohlenstein-Stadel, deer-tooth, personal ornament, size 2.9 cm (Dutkiewicz © Landesamt für Denkmalpflege im Regierungspräsidium Stuttgart); 3. Vogelherd, lissoir, bone, size 21 cm (Lipták © University of Tübingen).

Geometric signs are sometimes referred to as abstract motifs, patterns, marks, or jottings in the literature. We use the term “sign” here in line with Peirce’s semiotics23,24, i.e. in the broad sense of a representation of some kind, which is then further subdivided into index, icon, and symbol10,25,26. In this context, the term “abstract” is often used to further underline that the signs are not obviously iconic, that is, they cannot be recognized by modern viewers as figurative depictions. Simple geometric forms such as dots, lines, and crosses, as well as more complex patterns such as grids or overlapping crosses, are interpreted in this sense. However, in some cases, even seemingly abstract signs might bear some iconicity. For instance, when applied to animal figurines, simple dots might reflect patterns on fur which were discernible for the Paleolithic viewer. With this caveat in mind, we here choose to use the attribute “geometric” rather than “abstract”. We thus merely refer to the visual property of signs, without further delving into the issue of whether particular signs are to be seen as indices, icons, or symbols.

There are several studies investigating geometric signs in parietal art27,28,29,30. However, studies scrutinizing signs on mobile objects, such as figurines, tools, or personal ornaments, are rare and mostly limited to either single objects, or to particular assemblages10,31,32,33,34,35,36. SignBase aims to provide extensive data on mobile objects and the geometric signs found on these. A large body of data from the Swabian Aurignacian is available from previous collection efforts10,31,32, involving first-hand analyses by the first author. We complement this data with other Aurignacian assemblages available via the literature3,22,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55. We thereby enable quantitative analyses of this rich material.

Early graphical expressions, such as found on objects from the MSA site Blombos in South Africa, have sometimes been interpreted as symbolism and associated with the cognitive modernity of humans, in some cases even with the presence of “fully syntactic language”56,57. However, a recent study performing experiments with modern-day participants comes to the conclusion that these engravings – while reflecting “socially transmitted cultural traditions” – bear no clear indication of symbolism36. Systematic studies of geometric signs beyond the earliest Paleolithic finds in South Africa will further help to establish their semiotic status. This will also give us a better understanding of their relation to later symbolic behavior such as early writing systems.

Apart from insights into the evolution of human cognition, the geographic distribution of geometric signs – and the associated archaeological cultures – is a second major line of research. Geographic analyses promise to shed light on cultural developments and population turn-overs across the Late Pleistocene, as inferred by the archaeological and human fossil record58,59,60,61. As an example of practical applications in this direction, we give a preliminary clustering analysis of the sign types found across Europe in the Aurignacian.

Methods

Objects

Decorated mobile objects are mostly (though not exclusively) made from osseous material, like ivory, bone, or antler, and usually come from stratified archaeological contexts. In contrast to the chronological difficulties of dating parietal art62,63,64, mobile objects are usually well-dated, at least with reference to the given techno-complex. The data of SignBase is structured according to these archaeological objects. Every artifact that carries geometric signs is assigned an object identifier consisting of a three-letter abbreviation of the excavation site (e.g. Vogelherd Cave: vhc) and a running, four-digit number. The identifier is linked to detailed information about the object. This is firstly the techno-complex, geographic information, stratigraphical unit (layer), and the dating method(s). The year of excavation is also indicated, since very old excavations may lack the relevant information, or it might be inadequate. Secondly, we give information about the object itself: the material, the type of object (object type), the dimensions (length, width, depth) as well as the state of preservation (complete, almost complete, and fragmented). This is followed by a short description of the object, as well as the relevant literature, such as excavation reports. The data file containing all objects and respective information is described in more detail in the section on Data Records. A website displaying this information per object is available online at www.signbase.org.

So far, 531 mobile objects carrying geometric signs from 65 Aurignacian sites in Europe and the Near East are registered in SignBase. The locations of excavation sites and individual artifacts can be seen in Fig. 2. The majority of sites that yield artifacts carrying geometric signs of the Aurignacian derive from four main areas: South-West France (in particular the Dordogne), the Swabian Jura in southern Germany, as well as a series of sites in modern-day Belgium, and the Czech Republic. There are further isolated sites in Southern Spain (El Salitre), Sicily (Riparo di Fontana Nuova), Israel (Hayonim cave), Iraq (Shanidar cave), and by the Black Sea (Muralovka) (Fig. 2a). While the density of sites is highest in the Dordogne, the density of artifacts is highest in the Swabian Jura (in particular the Vogelherd cave with overall around 170 artifacts) (Fig. 2b). Of course, both the distribution of sites and the distribution of objects are influenced by historical factors such as excavation and publication efforts of particular universities and researchers. Note, however, that many Aurignacian sites across Europe have not yielded artifacts with geometric signs22,37. Hence, while the picture can still change as new sites are discovered and new artifacts published, we expect to have uncovered the main tendencies of the Aurignacian.

Fig. 2
figure 2

Maps of the Aurignacian sites across Europe and the Near East. (a) Each triangle indicates an archaeological site where artifacts carrying geometric signs were found. A density plot is overlaid with high (red) and low (yellow) densities of sites. (b) Zoom into the areas yielding most objects with geometric signs. Individual artifacts are plotted as black dots (with some jitter added to avoid overplotting if many artifacts come from the same site). A density plot is overlaid with high (red) and low (yellow) densities of artifacts (for script see Supplementary File 1).

The majority (n = 450) of decorated mobile objects from the Aurignacian are made from osseous material (see Table 1). Rock material, like limestone, flint, or other rock types were used in 72 cases. The objects carrying geometric signs are in 430 cases so-called symbolic artifacts (e.g. figurines), tools are present in 72 cases (see Table 2).

Table 1 Overview of the raw materials used for objects with geometric signs from the Aurignacian (n = 531).
Table 2 Overview of the types of Aurignacian objects bearing geometric signs (n = 531).

Sign types

For each object registered in SignBase we then identify the types of signs represented on it. We define mutually exclusive types, for instance, straight line, oblique line, radial line, notch, dot, cross, and more complex forms like grid, hashtag, and zigzag (see Figs. 3 and 4). Reduced pictograms like vulvae, or animal paws, are included in this collection as well since they constitute borderline cases between iconic and geometric. For instance, a paw might be a reduced geometric substitute for the entire animal. Overall, we identify 30 different sign types, while uncertain cases are subsumed under the category “other”. Each of the identified sign types is marked as present/absent by one or zero. In Fig. 4, several examples of Aurignacian objects with different sign types are shown, such as line (aur0001, cas0014, lar0002, gdr0007, gpp0004, gdg0008, bla0014), oblique line (lar0002), notch (aur0001, cas0014, lar0002, gpp0005, msc0005, cel0005), oblique notch (gpp0004), circumferential spiral (cat0002), dot (gdg0003, bla0018, lar0002, bla0014), cross (gdg0008), hatching (gdg0008), and vulva (cal0002). Sometimes several sign types appear on a single object. In future versions of the database, numbers of sign occurrences per object, and coding of sign sequences will be included as well.

Fig. 3
figure 3

Schematic drawings of sign types as identified for the Aurignacian (in brackets like shown in the data base): 1. Line (line); 2. Oblique line (obline); 3. Concentric lines (concenline); 4. Dashed line (dashline); 5. Radial line (radline); 6. Circumferential line (circumline); 7. Circumferential spiral (circumspiral); 8. Notch (notch); 9. Oblique notch (obnotch); 10. Radial notch (radnotch); 11. Circumferential notch (circumnotch); 12. Dot (dot); 13. Cupule (cupule); 14. Cross (cross); 15. Rhombus (rhombus); 16. Hashtag (hashtag); 17. Grid (grid); 18. Hatching (hatching); 19. Zigzag (zigzag); 20. Zigzag-row (zigzagrow); 21. Rectangle (rectangle); 22. Maccaroni (maccaroni); 23. V-Sign (v); 24. Pin to the left side (pinleft); 25. Pin to the right side (pinright); 26. Star (star); 27. Vulva-Sign (vulva); 28. Paw-Sign (paw). Not shown in the table: anthropomorph, zoomorph, other.

Fig. 4
figure 4

Examples of some Aurignacian mobile objects registered in SignBase and the identified sign types (not in scale, for details see database): 1. aur0001: lines, notches; 2. cas0014: lines, notches; 3. cat0002: circumspiral; 4. gdg0003: dot; 5. bla0018: dots; 6. lar0002: line, obline, notch, dot; 7. gdr0007: line; 8. gpp0004: line, notch, obnotch; 9. msc0005: notch; 10. gdg0008: line, cross, hatching; 11. cel0005: notches; 12. cel0002: vulvae; 13. bla0014: dots, lines.

Note that disagreements between codings by different researchers are unavoidable. The description and typology of geometric signs must be based on visual impressions since we cannot understand the contextual relationships and meanings of such characters from today’s perspective. Any sign typology is hence to some extent subjective and leaves room for discussion. Different sign types can resemble each other, and it is sometimes difficult to determine them undoubtedly. We openly publish our coding decisions and hope for further input and discussion with other researchers. Furthermore, to estimate the degree of subjectivity in our choices, we have submitted the character types defined by us to several peers and calculated agreement scores (see the section on Technical Validation).

Frequencies of occurrence

Given our coding decisions, we can assess how often particular sign types occur. Some are frequently found across different sites and objects, while others are rarer or even restricted to a particular object (see Fig. 5). The most frequent sign type is the simple notch (a short incision deeper than a line and in most cases applied on the edge of an object), occurring on almost half of the objects (48%, i.e. 254 of 531), followed by the line (33%), and cross (10%). Dots and cupules (7% and 2% respectively) are less frequent but still well-attested, whereas more elaborate signs such as hashtags, stars, or zigzags are exceedingly rare, and often associated with particular objects and sites. For example, clear instances of star-shaped signs (i.e. more than two lines crossing in a center point) are currently only attested on a figurine from the Vogelherd cave (vhc0159), and an engraved ivory blade from the Grotte de La Princesse Pauline in Belgium (gpp0003).

Fig. 5
figure 5

Frequencies of 31 different sign types (including “other”) across the 531 Aurignacian objects (for script see Supplementary File 2).

Geographic clusters

Apart from differences in frequencies of occurrence, sign types also differ regarding their geographic spread in the Aurignacian. While the most frequent sign types are spread widely across Europe and the Near East, others are geographically more confined (see Fig. 6). The notch, for instance, is ubiquitous. It is found in some of the most westward (e.g. La Viña in Spain) and eastward sites (e.g. Hayonim Cave in Israel), as well as in the most southern (Sicily) and northern sites (Belgium) of central Europe. Others, such as crosses, hatchings, and dots, center around areas of high artifact density such as southwestern France, southern Germany, Belgium, and the Czech Republic. As an extreme example, abstract depictions of vulvae, while being attested a considerable amount of times (i.e. in 20 of 531 objects), are strictly limited to caves in the Dordogne, potentially indicating a local practice of graphic expression3,37,42,45,50,53,55.

Fig. 6
figure 6

Maps for the presence/absence of particular sign types. Black triangles indicate archaeological sites where artifacts carrying geometric signs were found. Red triangles indicate the presence of a particular sign type (for script see Supplementary File 3).

Automated analyses

Beyond visual inspection of geographic maps, we here propose a more systematic way of scrutinizing clusters of artifacts and sign types. This also helps to better understand the type of data represented in SignBase. Take the following examples of binary sign type vectors for two objects from the Hohle Fels cave in the Swabian Jura (hfc0015, hfc0006), and Spy in Belgium (spy0023).

hfc0006:                            110100000101000000001000000000                            

Shfc0006 = {line, oblique line, dashed line, dot, cross, v-shape}

hfc0015:                            111000000100001000000000001000                            

Shfc0015 = {line, oblique line, radial line, dot, hatching, concentric line}

spy0023:                            100000000001000010001000000000                            

Shfc0015 = {line, cross, zigzag row, v-shape}

These vectors have 30 binary values – the value for “other” is discarded here, which leaves us with 516 objects (i.e. 15 objects have only sign type “other”). The values reflect whether a particular sign type is present (1) or absent (0). An equivalent representation is to give the set of sign types present on an object. These sets are displayed below the binary vectors. The Jaccard distance65 between any given two sets A and B is then calculated as

$${d}_{Jaccard}=1-\frac{\left|A\cap B\right|}{\left|A\cup B\right|},$$

where the numerator is the cardinality of the intersection of the two sets, i.e. the number of shared sign types. Whereas the denominator is the cardinality of the union of two sets, i.e. the overall number of different sign types occurring on both objects together. Thus, the Jaccard distance between hfc0006 and hfc0015 is

$${d}_{Jaccard}^{hfc0006,hfc0015}=1-\frac{3}{9}\approx 0.67.$$

While the Jaccard distance between hfc0006 and spy0023 is

$${d}_{Jaccard}^{hfc0006,spy0023}=1-\frac{3}{7}\approx 0.57,$$

and for hfc0015 and spy0023 we have

$${d}_{Jaccard}^{hfc0015,spy0023}=1-\frac{1}{9}\approx 0.89.$$

Note that while hfc0006 shares three sign types with both hfc0015 and spy0023, the Jaccard distance measure “penalizes” the fact that there are overall more sign types occurring in hfc0006 and hfc0015 together (nine), compared to hfc0006 and spy0023 together (seven). Thus, the Jaccard distance is higher for the former. The rationale behind this is that if there are many different types occurring in two vectors, then it is more likely that the same types occur in both by chance.

Based on pairwise Jaccard distances we create a distance matrix for these three objects as below

$${D}_{Jaccard}=\left(\begin{array}{ccc}0 & 0.67 & 0.57\\ 0.67 & 0 & 0.89\\ 0.57 & 0.89 & 0\end{array}\right).$$

We can then use this distance matrix for cluster analysis.

Building a UPGMA tree

We here choose the so-called Unweighted Pair Group Method with Arithmetic mean (UPGMA) to create a clustering tree. UPGMA is an agglomerative bottom-up clustering method66. In the beginning, each “leaf” (object in our case) constitutes its own cluster. Given a distance matrix between clusters, the two clusters with the smallest distance are merged to yield a new cluster. The average distance of this cluster to all other clusters is computed and compared to distances between the other clusters to decide the next merger. The UPGMA algorithm thus successively merges clusters, until only one overall cluster (i.e. the final tree) is formed. Importantly, we here merely use this method to visualize the clustering of objects based on the Jaccard distances of their sign type presences. We do not claim that this clustering reflects actual evolutionary relationships between objects and their sign types. For further details on calculating Jaccard distances and building UPGMA trees see the R code in the files Supplementary File 4 as well as Supplementary File 5.

Given the Jaccard distance matrix of our three example objects, the UPGMA method yields the tree in Fig. 7. In this simple example, the object from Belgium (spy0023) is first merged with one of the objects from Hohle Fels in Germany (hfc0006), since they have the lowest distance in terms of sign type presence (0.57). The second object from Hohle Fels (hfc0015) is then joined with them in the second step, yielding the overall tree.

Fig. 7
figure 7

Example of a UPGMA tree for three objects. The object identifiers are given on the tips of branches. Colors indicate the country of provenience. A heatmap is given on the right of the tree with black indicating the presence of a sign type, and white indicating its absence for a given object. Only the sign types which occur in at least one of the three objects are considered here (for script see Supplementary File 4).

The same method is applied to all 516 objects (excluding objects which are coded as displaying only the sign type “other”) and their Jaccard distances to generate the UPGMA tree in Fig. 8. This gives a general impression of how objects from sites in different countries cluster together based on the presence/absence of particular sign types.

Fig. 8
figure 8

UPGMA tree for 516 Aurignacian objects and sign types. This tree is based on Jaccard distances of sign type presences/absences between pairs of objects. Only some sign types are represented in the presence/absence heatmap around the tree tips (line, notch, vulva, cross). With all sign types included, the plot would be too crowded (for code see Supplementary File 5).

The notch, for instance, is not only widespread (as was pointed out with reference to Fig. 6), but it is also often the only sign type present on objects. This is reflected in the large cluster spanning the lower right quarter of the circular tree in Fig. 8. The objects in this cluster come from a wide range of sites – reflected by the different colors representing modern-day countries. A similar picture emerges for the simple line (mainly upper and lower right quarter of the plot). It is also widely represented across sites, and often the only sign type present on an object. For geometric vulva representations, we find the opposite. They are exclusively found in France (Dordogne), and the objects carrying them mainly cluster together (upper right corner of the plot). There are only three objects carrying vulvae that diverge from this cluster since they carry other sign types (e.g. lines and notches) as well. The cross, on the other hand, is represented in various clusters. There is the main cluster of crosses containing mainly objects from Vogelherd Cave (vhc, upper right corner), but there are also other smaller clusters involving objects from France and Belgium.

However, we do here not further delve into the issues of statistical analyses, hypotheses testing, and interpretation of clusters. These are topics for future studies using SignBase.

Data Records

The data sets used for analyses in this article are available at figshare67: Dutkiewicz, Russo, Lee, & Bentz. SignBase: collection and analysis of geometric signs on mobile objects in the Paleolithic. figshare https://doi.org/10.6084/m9.figshare.c.4898643 (2020).

The names of data set files are:

- signBase_exampleObjects.csv (figshare title: “Example Objects”)

- signBase_Version1.0.csv (figshare title: “SignBase Main Data File”)

- TestCoding_000.csv (figshare title: “Test Coding (Coder 0)”)

- TestCoding_001.csv (figshare title: “Test Coding (Coder 1)”)

- TestCoding_002.csv (figshare title: “Test Coding (Coder 2)”)

- TestCoding_003.csv (figshare title: “Test Coding (Coder 3)”)

- TestCoding_004.csv (figshare title: “Test Coding (Coder 4)”)

The main data of the current version of SignBase is given in signBase_Version1.0.csv. As the data set is going to grow, newer versions will be available via www.signbase.org. The column names are given in parentheses and described in the following:

  • Microsoft Access ID (access_id): This is an internal ID used by the Microsoft Access database that the online version of SignBase is based on. For the general public, this ID is not important.

  • Object identifier (object_id): The object ID is created by a three-letter (lower case) abbreviation of the site name, followed by a four-digit running number, e.g. for objects from La Ferrassie: laf0001, laf0002, etc.

  • Techno-complex (techno_complex): Entities in prehistory are based on material cultures. An archaeological culture is a recurring assemblage of artifacts from a specific time and place that may constitute the material culture remains of a past human society. For the Paleolithic, there are mostly artifacts made from lithic or organic material. Characteristic artifact types or assemblages of artifacts define the entity, here referred to as techno-complex. This might be, for instance, the Aurignacian, the Gravettian, or the Magdalenian in Europe, or the Middle Stone Age or the Later Stone Age in Africa.

  • Site and location (site_name, location, country, longitude, latitude): All the objects in SignBase are finds from archaeological excavations. These may be caves or open-air sites. The archaeological site (site_name) is indicated, as well as the closest community or town (location) and the country. For an exact location, the latitude and longitude are also given.

  • Layer (layer): In archaeological excavations, objects are found in units of sediments, which are called stratigraphical units or layers. These units are usually defined by the archaeologists during excavation and give information about the provenience of the object and its assignment to a particular techno-complex.

  • Dating method (dating_method): This describes the method that has been used for dating the find. For absolute dating in Prehistory, mainly radiocarbon dating is used (C14). Other often-used methods are Accelerator Mass Spectrometry (AMS), Thermoluminescence (TL), optically stimulated luminescence (OSL), or Uranium–thorium dating.

  • Date (date_max-min): The absolute dating (uncalibrated before present/BP) of the object or layer it derives from as given in the literature.

  • Excavation year (excavation_year): The year the object was excavated.

  • Material (material): The raw material of the object. For the Paleolithic, mostly osseous material, like ivory, bone, or antler is used. Other organic materials, like shells of mollusks, eggshells, or teeth, but also inorganic materials like rocks, pigments, or ceramics might appear. If the material has not been undoubtedly determined, this feature takes the value undetermined.

  • Type of object (object_type): Typological determination of the object, usually as indicated in the literature, or revised, if needed. Might be tools, personal ornaments, art figurines, or any other type of archaeological object. If the type of object has not been undoubtedly determined, this feature takes the value undetermined.

  • Length, width, depth (length_mm, width_mm, depth_mm): Gives the dimensions of the object in millimeters. Usually as indicated in the literature.

  • Preservation (preservation): The preservation state of the object. Complete – the whole object is preserved; almost complete – the whole form of the object is preserved, with some damage; fragmented – only partly preserved, the original dimension and shape of the object is not preserved. If the preservation has not been undoubtedly determined, this feature takes the value undetermined.

  • Short description (short_description): This is a very brief (mostly one sentence) impressionistic description of the type of object and – in some cases – the respective signs represented on it. This mostly follows the description authors use in the original articles publishing the object.

  • General literature (general_literature): Literature references about the object, its provenience, excavation report, or dating of the object/stratigraphical unit.

Technical Validation

Establishing a sign type and its presence/absence on a particular object is a non-trivial task. Decisions are based on subjective judgment. We hence expect some disagreement between our “expert” coding and the potential coding of other researchers in the field. To get a first impression of the degree of subjectivity in our coding decisions, we have submitted 30 randomly chosen objects (of the 531 overall objects) to four colleagues at the University of Tübingen, who are familiar with the archaeological material. Familiarity with the objects and the respective literature is necessary since judging surface patterns is not possible without an understanding of their characteristic texture. This is particularly important given that coding decisions have to be taken based on pictures (photographs and/or drawings, in some cases of low resolution). Disturbances of the material due to natural processes (cracks, wholes, bite marks, etc.), as well as functionally motivated adaptations, can be easily misinterpreted as intentional geometric signs by a layperson.

Having said this, we did not explicitly teach the four test coders how to determine sign type presence. We merely provided them with the same interface used by the SignBase team to make coding decisions. In this interface, a picture of the respective object, alongside the meta-information described above, is provided. Furthermore, there are 31 predefined sign types to choose from, including a category “other”, in case none seem to fit. Test coders were instructed to choose the sign types they identify on any given object. While several different sign types can occur on a single object, each sign (i.e. pattern on the object’s surface) should only be associated with one sign type. For example, if a line is visible, the coder must decide whether it is a regular (i.e. straight) line or an oblique line. The same pattern cannot be coded as both.

Given the original expert coding by the SignBase team (Test Coding (Coder 0)), and the coding by four test coders (Test Coding (Coder 1), Test Coding (Coder 2), Test Coding (Coder 3), Test Coding (Coder 4)), we firstly calculate the so-called joint-probability agreement, namely, simply the percentage-wise overlap in coding decisions. Secondly, we calculate Cohen’s Kappa68. This is a more conservative metric of agreement devised to consider that for any given number of coding decisions, coders might agree just by chance. For example, if coders just choose uniformly at random between presence and absence for all the sign types, the expected agreement of coding decisions between two coders is already 50%. The Kappa metric takes this chance agreement as a baseline. A Kappa value of 0 indicates that there is no coder agreement beyond that predicted by chance, while a Kappa of 1 indicates that there is perfect coder agreement. Additionally, the R package we use to calculate Kappa also provides a p-value. If this p-value is <0.05 for a given Kappa value, then it is significantly bigger than 0. In Fig. 9, the joint-probability agreement (Agree.), as well as Kappa values for pairwise comparisons between the original SignBase coding and the four test coders, is illustrated.

Fig. 9
figure 9

Evaluation of coding. 30 randomly selected objects with object identifiers given on the x-axis. In the left panel, the presence of a given type (y-axis) is given in black, grey tiles indicate the absence of the type according to the original coding by the SignBase team. The plots to the right illustrate the agreement with test coders. Green indicates coding agreement, red indicates disagreement. Percentages of agreeing tiles, as well as Cohen’s Kappas, are given in the lower right corner (for code see Supplementary File 6).

The joint-probability agreement ranges from 91% to 94% between our original coding and the coding by test coders. Cohen’s Kappa ranges from 0.29 to 0.44. While there is hence high coding agreement according to the joint-probability measure, the Kappa values indicate that the agreement between coders is rather moderate. However, given that coders were not explicitly instructed on how to make their decisions, even medium range Kappas are encouraging. The p-values for all Kappas are <0.05 (in fact, they are output as 0), meaning that there is clearly more agreement between coders than expected by chance. The R code with further explanations is given in Supplementary File 6.