Enclaves of genetic diversity resisted Inca impacts on population history

The Inca Empire is claimed to have driven massive population movements in western South America, and to have spread Quechua, the most widely-spoken language family of the indigenous Americas. A test-case is the Chachapoyas region of northern Peru, reported as a focal point of Inca population displacements. Chachapoyas also spans the environmental, cultural and demographic divides between Amazonia and the Andes, and stands along the lowest-altitude corridor from the rainforest to the Pacific coast. Following a sampling strategy informed by linguistic data, we collected 119 samples, analysed for full mtDNA genomes and Y-chromosome STRs. We report a high indigenous component, which stands apart from the network of intense genetic exchange in the core central zone of Andean civilization, and is also distinct from neighbouring populations. This unique genetic profile challenges the routine assumption of large-scale population relocations by the Incas. Furthermore, speakers of Chachapoyas Quechua are found to share no particular genetic similarity or gene-flow with Quechua speakers elsewhere, suggesting that here the language spread primarily by cultural diffusion, not migration. Our results demonstrate how population genetics, when fully guided by the archaeological, historical and linguistic records, can inform multiple disciplines within anthropology.

4 include the Quechua spoken (now by just a last few elderly speakers) in small communities around Chachapoyas. Classed with it in the QIIb clade is the Quechua spoken in another of our sample locations, the town of Lamas, near the city of Tarapoto, ~300 km east of Chachapoyas in the Amazonian lowlands (here represented by the genetic sample from the indigenous neighbourhood known as Wayku). Also classed within QIIb is the 'Inga' spoken in one pocket of southern Colombia, and the much more widely spoken Ecuadoran 'Kichwa', in both the Andean highlands and Amazonian lowlands of Ecuador.
Given the multiple possible explanations for the presence of Quechua in any one region, it is still not fully understood exactly when the language reached Chachapoyas and Lamas. There are at least no strong archaeological candidates for significant enough impacts from any Quechua-speaking region before the Inca period -but that does not exclude Spanish colonial impacts. Nor is there consensus on the extent to which the characteristics of Chachapoyas Quechua reflect those that already defined a clear QIIb branch at the time, or arose independently in Chachapoyas by a linguistic substrate effect.
It is relevant also that in northern Peru, Quechua never seems to have been dominantly To classify certain forms of Quechua all within a putative QIIb branch entails, implicitly, a historical hypothesis: that there once existed a Proto-QIIb language, spoken by some specific, single population, and that all QIIb languages derive from that single source. An alternative hypothesis is possible, however. Indeed there are linguistic doubts as to whether the supposed QIIb varieties actually form a valid clade at all 21,22 . QIIb is defined on very few linguistic criteria, of questionable validity. Cerrón-Palomino ( 20 : 239) mentions just two features shared in all QIIb varieties, of which the second is in any case not unique to QIIb, but found also in some varieties classed as QIIa. The first criterion is that the pronunciation distinction between Quechua /k/ and /q/ consonants is lost, since the latter also becomes /k/: so /qiru/ wood and /kiru/ tooth become indistinguishable, as both /kiru/. The second criterion is that voiceless stop consonants turn into their voiced counterparts (not native to Quechua) when they follow 5 a nasal consonant /m/, /n/ or /ɲ/: so /inti/ sun becomes /indi/, /inka/ becomes /inga/, and so on.
Both of these changes are also highly natural, however, and open to arising independently when Quechua is learnt by speakers of other native languages. Relatively few languages make a /k/~/q/ distinction, whereas many, unlike original Quechua, do have voiced stop consonants, particularly natural after nasals. So the changes found shared across those varieties may simply reflect similar contexts, of Quechua spreading culturally, by being learnt by multiple populations in situ, rather than demographically, brought by an incoming population from a single Proto-QIIb source that may never actually have existed. Genetic data from Quechuaspeaking populations offer a means to test between these language hypotheses.

2.a. Geographical Grouping of Samples
A fieldwork expedition was carried out in the provinces of Amazonas and San Martín, collecting samples from various locations, grouped into six geographically coherent subregions for the purposes of our analyses. One of these groups, our only sample from the province of San Martín, was Wayku, a neighbourhood of the town of Lamas where a variety of Quechua is still spoken, traditionally classified as a separate sub-branch of QIIb to that of Chachapoyas. Within Amazonas province, meanwhile, samples were assigned to one of five groups, by ancestry from Chachapoyas city, Huancas, Luya, La Jalca, or Utcubamba South.

•
The Chachapoyas city group covers the city itself, and villages immediately to its north.
• Huancas is a village also close to Chachapoyas, but distinguished from it in our analyses in order to explore the village's putative ancestry in populations who migrated from central Peru (including the modern city of Huancayo) before the Spanish conquest 23 .

•
The Luya group covers villages in the province of Luya, to the west of the Utcubamba.

•
The La Jalca group is centred on the town of the same name, to the east of the Utcubamba river.
• The Utcubamba South group covers villages close to the towns of Tingo and Yerbabuena, and further south along the Utcubamba.

2.b. Sampling Policy: Surname Analysis
Sampling was conducted specifically to target individuals with local Chacha surnames, with the aim of reaching descendants of local populations rather than recent migrants there. These surnames were identified from the list first reported by Zevallos Quiñones 14 Table 1). The latter are of course not included in the Y-chromosome analysis, and the numbers reported therefore differ slightly from the Y-chromosome dataset. In Chachapoyas province we identified 22 individuals with surnames of possible Chacha origin, making up 21% of the sample. Most of these surnames were already present in the report by Taylor 13 .
The individuals are distributed as follows: 1 individual in Chachapoyas city, 1 in Huancas, 5 in La Jalca, 12 in Luya, 3 in Utcubamba South. Thirty-one surnames were identified as typically Quechua, making up 30% of the sample. In the province of San Martín, 11 out of 16 individuals had typical local surnames, but ultimately of putative Amazonian linguistic origin.
For both provinces, all remaining individuals had surnames of Spanish origin. The list of individual surnames is not provided, to guarantee anonymity.
The surname origin proportions do not correspond exactly to those of Native American vs.
European genetic components, however, at least for the uniparental markers considered: 98% of the maternal and 72% of the paternal lineages belong to a native American haplogroup.
The highest Native American ancestry is found in Luya, where it reaches 80% in the paternal line.
We researched the regional archives from 1560 to 1700 and found no fixed practice in surname inheritance. Surnames could be received from either the father or mother, or from neither, although in mixed native/Spanish families, only the Spanish surname was retained. In line with this inconsistent inheritance, the Chacha or Quechua surnames in our samples do not correspond to specific Y-chromosome branches in the network ( Supplementary Fig. 5).
Nevertheless, Chacha and Quechua surnames are found in branches close to each other, and often correspond to individuals from Quechua-speaking families, particularly in Luya and La Jalca. There is a signal of local ancestry, then, but it conflates the Chachapoya and Inca/Quechua components. Indeed, it is possible that when the Spanish colonial administration first established population registers and required surnames to be assigned, the local indigenous population simply chose their preferred surnames, whether Chacha, Quechua or even Spanish.

2.c. Sampling Policy: Quechua Speakers
We conducted a linguistic survey to assess whether Quechua was (or still is) spoken in each participant's family. In Wayku, Quechua is still actively spoken, although declining in usage 7 among children. All participants from Wayku either speak Quechua themselves, or have a family member who does. As for the Chachapoyas region, we assume that any Quechua reported among our participants is that characteristic of Chachapoyas (traditionally also classified as part of the putative Quechua IIb clade, but a separate sub-branch of it to Lamas Quechua). On some occasions we were able to confirm this by hearing the participant (or a member of his/her family) speaking, but cannot in all cases rule out that family members may in fact have spoken other varieties of Quechua, learnt elsewhere in Peru, rather than Chachapoyas Quechua.
Within Chachapoyas, the area where Quechua is most present, spoken either by some

3.a. Haplogroup composition
Haplogroup composition varies between our geographical groups (Supplementary Table S2  For these haplogroups too, we find previously undescribed lineages. We now detail how each of these four haplogroups patterns in our data-set. • Haplogroup A2 is strongly represented in North and Meso-America 27,28 . Some lineages in our sample form a characteristic separate branch (marked in grey), with representatives from 5 groups: Chachapoyas city, Utcubamba South, Luya, La Jalca and Huancas. A second branch includes one individual from Luya and three from Wayku, while a third branch sets individuals from Utcubamba South, La Jalca and Luya close to sequences from Amazonia 29 .
• Haplogroup B2 is generally reported as prevalent in the Andean highlands, in particular the Central and Southern Andes, where it began to diffuse in the last millennium 25,30 . It is also found in pockets of high frequency in some populations of the lowlands, for example in the Mato Grosso and Gran Chaco 31 . In the network ( Supplementary Fig. S2b), haplogroup B2 displays two distinct branches: one for individuals from North and Meso-America, and one for individuals from the far south. Meanwhile, the remaining lineages (mostly from Ecuador and Peru 32 ) radiate from the centre of the network, without forming distinct branches. Some of the haplotypes in our sample are found in isolate lineages, while in four cases they are found in branches (marked in grey) that bring together individuals from more than one of our geographical sub-groups, three of them sharing branches with other haplotypes from Ecuador. Achuar -marked in green in Supplementary Fig. S7). For our Huancas sample itself, however, we find no direct connection to the samples available from any of the supposed highland migration sources. The Huancas sample is instead more similar to its neighbours within the Chachapoyas region.
When allowing for slight differences between haplotypes (similar haplotypes adjusted for mutation rate), sharing patterns remain the same, in line with a high degree of isolation for our entire population sample from the whole Chachapoyas region (Fig. 5b). populations. For QIIb and for Aymara, sharing is found within ~150 km, while QIIc has a long-distance sharing profile comparable to that of the geographical group South Central Andes, described above, with whom they roughly overlap.
Sharing frequency was also correlated with population sample size, to illustrate how exceptional the Chachapoyas case is within our database ( Supplementary Fig. S8). For this broad comparison, the samples are summarily divided into either Andean or Amazonian. The   Fig S1). Nevertheless, at a finer level this relationship is not matched by any shared haplotypes (Supplementary Fig. S2); rather, we find a distinct pocket of D1 lineages particular to La Jalca. (To explore this further, geneticists must first address the overall paucity of comparative mitogenomic data for South America.)

The origin of Chachapoyas populations and microgeographical diversity patterns
Our results undermine not only the strictly Amazonian or Andean origin hypothesis, but also the frequent depiction of Chachapoyas as a crossroads on the lowest-elevation corridor between Amazonia and the Pacific coast 2 . As a caveat, it should be acknowledged that  Fig. S2) and the Y chromosome ( Supplementary Fig. S5). In mtDNA only, the two samples differ in internal diversity: Luya has higher nucleotide diversity, and La Jalca a high overall ST pairwise distance to all other populations (Table   1). This implies different maternal demographic histories for these populations: La Jalca more isolated, Luya in greater contact with other populations, although exactly when this admixture may have occurred cannot yet be specified. The BSPs (Fig. 3b) also indicate that Luya had the larger population size. Historical reconstructions do not suggest a specific scenario of isolation for La Jalca that would explain the genetic drift experienced. The highest diversity found in Luya could be a sampling effect: the samples for this population were drawn from small villages scattered over a wider geographic range within the province of Luya. The proposed structure between these two populations could also be the legacy of different identities that survived from before the Inca period, between distinct populations that archaeologists bring together under the umbrella of "Chachapoya culture". Overall, the microgeographical differences found between the populations studied, and the new genetic lineages described for both mtDNA and Y chromosome, define an enclave of Native American genetic diversity that merits further, higher-resolution investigation at the genomic level.
Supplementary Figure S1: CA plot of mtDNA haplogroup frequencies on a continental scale colored for broad geographic range, with target population highlighted. For the corresponding population codes, see Supplementary