Archeological evidence from prehistoric sites in North China points to the west Liao River and the Yellow River Valley as separately developed Neolithic centers in North China (Figures 1 and 2).1 Ancient communities in North China transitioned from hunting and gathering to domestication as their predominant subsistence strategy 10 000 years ago.1, 2, 3 The process of population movements and interaction during the Neolithic expansions between the west Liao River Valley and the Yellow River Valley are not well understood, but have a major role in the modern distribution of haplogroups across North China.

Figure 1
figure 1

Geographic location of the valleys and archeological sites used in this study. The study sites in this research: Jiangjialiang site (Xueshan culture) and Sanguan site (Lower Xiajiadian culture) in the Sanggan River Valley. The genetic data were previously published for 11 sites: Niuheliang site (Hongshan culture); Halahaigou site (Xiaoheyan culture); Dadianzi site (Lower Xiajiadian culture); Dashanqian site (upper Xiajiadian culture); and Jinggouzi site (Northern Steppe culture) in the West Liao River Valley; and Erlitou site (Erlitou culture), Hengbei site (Western Zhou Dynasty) and Taojiazhai site (Han Dynasty) in the Yellow River Valley. The red star represents Beijing, China. A full color version of this figure is available at the Journal of Human Genetics journal online.

Figure 2
figure 2

Cultures studied here from the West Liao River Valley and the Yellow River Valley. Characteristic artifacts are shown, along with approximate dates for each culture. A full color version of this figure is available at the Journal of Human Genetics journal online.

The northern farming center is in the West Liao River Valley.4 The earliest evidence of crop domestication dates to 6500 BC at the Xinglongwa site.2 The common millet became a staple diet during the Hongshan culture period (4500–3000 BC, characterized by their jade artifacts and dragon-shaped relics),5, 6 indicating that agriculture was a chief way of life. Agriculture provides high yields and allows the support of an increasing population and flourishing culture,6 such as the Xiaoheyan culture (3000–2200 BC) and the Lower Xiajiadian culture (2200–1600 BC).7, 8 Following the Lower Xiajiadian culture, paleoclimate records showed a regional shift toward a cold and steppe-like environment, which encouraged nomadic subsistence strategy ~1000 BC.9, 10, 11 Consequently, the widespread agricultural practices disappeared from the archeological records in this region, but some characteristics of this farming center remained in North China, such as the widespread use of jade.12

To the south, the farming center traces to the Yellow River Valley.2 The earliest evidence of crop domestication in this region dates to 8000 BC at the Cishan site.3 The majority of archeological sites located at the Yellow River Valley sites belong to the Yangshao culture (5000–3000 BC), which is known for its distinctive geometric or animal prints patterned pottery in red or black.13 The common millet and foxtail millet were grown throughout this region at that time period.2, 14, 15 Yangshao culture holds significance in Chinese history as scholars, arguing that it gives rise to the present-day Han Chinese culture.16

The Sanggan River Valley is centrally located and accessible between the West Liao River Valley and the Yellow River Valley (Figure 1). Until ~4300 BC, hunting-and-gathering was the main way of life for the ancient community, since local climate aridity is not favorable for intensified agriculture development.17 After this period, archeological evidence indicates a shift toward a farming strategy.17, 18, 19, 20 Regional data from the Sangan River Valley suggest that technologies from the west Liao River Valley and the Yellow River Valley were influential on the development and adaptations of agricultural practices.21 Hence, the genetic changes of populations in this region during their lifestyle shifted toward an agriculturally focused subsistence strategy can provide evidence of Neolithic farming expansions in North China.

The Jiangjialiang (JJL) and Sanguan (SG) sites are located in the Sanggan River Valley (Figure 1), yet display different subsistence economies. The JJL site was one of the largest Neolithic sites in the Sanggan River Valley. It is the oldest Neolithic Sanggan River Valley site with human remains.19 A previous excavation uncovered a total of nine buildings and 78 graves (Supplementary Figure S1).19 Radiocarbon dating of the nine buildings in the lowest stratigraphic layer indicated that they were constructed in 4850±80 BC.19, 22 Artifacts were found in the buildings and graves, including pottery, millstones, stone axes and flake tools. The non-decorative sand inclusion pottery is stylistically similar to the early Xueshan culture (3600–2900 BC).23 A large number of millstones were excavated in buildings and graves at this site, which demonstrated that by ~7000 years ago people had already begun to practice agricultural techniques. However, stone axes and flake tools are also found in the buildings and graves, suggesting that hunting still had a crucial role.22

The SG site (1435±170 BC) is more recent than the JJL site by 1500 years. It is approximately 30 km from the JJL site.24 The SG site is not geographically isolated from the JJL site. Excavation from the SG site and the JJL site yield bowls with similar characteristics. This suggested a close cultural affinity between these two sites.19 The SG population may extensively practice agricultural subsistence strategy, and share aspects of the Lower Xiajiadian culture (2500–1500 BC).8 For the SG population, agriculture was the primary form of subsistence, replacing the earlier hunting and gathering lifestyle.

JJL people and SG people represent populations in the same region whose subsistence changed from hunting to farming. In this study, mtDNA and the Y chromosome from ancient humans were sampled and analyzed to determine the genetic structure of the JJL and SG populations. By comparing the genetic diversity of the JJL and SG populations with ancient populations from the West Liao River Valley and the Yellow River Valley in North China (Figure 1), we aim to understand the Neolithic farming expansions in North China.

Materials and methods

Sample selection

Forty-one individuals were randomly selected from 74 human skeletal remains from the JJL site (Supplementary Figure S1). The remains were kept at the State Museum of Research Center for Chinese Frontier Archeology at Jilin University. Anthropological sex identification had been previously performed on the majority of these samples (Supplementary Table S1).22 From the SG site, we selected seven samples whose mtDNA had already been observed25 for a Y-chromosome single-nucleotide polymorphism (SNP) analysis. The genetic samples were exclusively taken from teeth. All human remains used in this study were excavated by the Cultural Relics Institute of Hebei Province, with permission from the State Administration of Cultural Heritage of China.

Sample decontamination, grinding and DNA extraction

Proper precautionary processes were taken to minimize contamination. Soil and tartar were removed from the surface of the teeth from the 41 individuals of JJL population. A liquid treatment was administered with a 5% sodium hypochlorite solution, rinsed once with ddH2O and 95% ethanol. These samples were dried on each side in a UV-radiation box for 30 min. For each sample, 200 mg of fine enamel powder was collected using a dental drill (STRONG 90, Taegu, Korea) into a 15 ml tube. The enamel powder was stored at −20 °C. For the seven SG individuals, we used enamel powder collected from the previous study.25 Each sample was digested in 5 ml of EDTA (0.5 mol, PH 8.0) and 70 μl of proteinase K (100 mg ml−1) for 12–16 h in a rotating hybridization oven at 55 °C. The supernatant was transferred to an ultrafiltration tube (centricon YM-10, Merck Millipore, Darmstadt, Germany) and condensed to ~100 μl at 6300 r.p.m. DNA was then extracted and eluted to 70 μl using the QIAquick PCR Purification Kit (Qiagen, Düsseldorf, Germany), using the manufacturer’s protocol.

Mitochondrial DNA sequencing and SNP typing

Within the mitochondrial genome, the hypervariable region 1 (HVR1) and coding region SNPs are the focus of this study. The HVR1 was amplified with two overlapping primer pairs spanning 375 bp (nucleotide position 16035–16409). Twelve haplogroup-defining SNPs in the coding region of the mitochondrial genome were typed to determine mtDNA lineages. The primer sequences are shown in Supplementary Table S2. The HVR1 and SNP PCR products were amplified using Takara Ex Taq Hot-Start DNA polymerase. The amplified products were further sequenced directly using forward and reverse primers and an ABI 310 Terminator Sequencing Kit on an ABI PRISM 310 automated DNA sequencer (Applied Biosystems, Foster City, CA, USA) without additional purification. These mitochondrial sequences were then analyzed for mtDNA variability using the program Sequencher 5.0 according to the Cambridge Reference Sequence.26 MitoTool 1.1.2 were used to yield a list of the variants and determine the haplogroup status of that lineage according to Phylotree (, 28 Haplogroups M/N, B, C, D, D4 and F were detected using product-length polymorphism analysis. Haplogroups A, D5, G, M10, R and Z were identified by sequencing using amplification products. The PCR amplification was performed as described by Li et al.39 with modified primer annealing time to 30s. To reveal maternal genetic differences between the ancient populations of North China, Fst genetic distances were calculated using Arlequin 3.1129 by the HVR1 data (nucleotide positions 16051–16400). These populations include JJL, SG, Erlitou (ELT), Hengbei (HB), Niuheliang (NHL), Halahaigou (HLHG) and Dadianzi (DDZ).

Sex identification, Y-chromosome SNP and Y-STR

The amelogenin gene and SRY locus were used for sex identification of individuals from the JJL and SG sites.30, 31 The male samples were then selected for Y-chromosome analyses. The most prevalent Eastern Asian Y-chromosomal haplogroup F-M89 was detected in this study. Subsequent analyses were restricted to seven sub-haplogroups (K-M9, NO-M214, N-M231, N1a-M128, N1c-Tat, O-M175 and O3-M122). All PCR amplifications were performed as described by Karafet et al.32 The primer sequences used are shown in Supplementary Table S2.

Short tandem repeats (STRs) on the Y chromosome (Y-STRs) from the JJL samples were analyzed on 18 loci (DYS19, DYS385a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS447, DYS448, DYS456, DYS458, DYS635 and Y GATA H4) using the AGCU Y18 STR Kit (AGCU ScienTech, Wuxi, China). Y-STR data are not available to the SG population because DNA extracts from the SG individuals ran out. PCR amplification was performed as described by Cui et al.38 The products from the Y-STR reactions were analyzed using GeneMapper software 4.2 (Applied Biosystems) and Network 5.0 (Fluxus Technology Ltd, Kiel, Germany).

Precautions against ancient DNA contamination

To prevent modern sources of contamination in ancient DNA, researchers in the pre-PCR lab wore full-body protective clothing, facemasks and several layers of gloves. A strict workflow protocols were followed when doing the laboratory work. The post and pre-PCR labs are located in separate buildings. Sample preparation, DNA extraction, purification and PCR set-up were completed in the pre-PCR laboratory. PCR and sequencing were carried out in the post-PCR laboratory. Every step in the pre-PCR laboratory was executed in hoods that were irradiated with UV for at least 30 min and cleaned with DNA-Off before, between, and after each use. The rooms and other equipment were also treated as described above. Extraction and PCR negative controls yielded negative results, as expected. We selected teeth specimens in situ to avoid sampling duplication. Twenty individuals have performed at least two duplicated extractions. All the results were verified by performing two duplicated amplifications. HVR1 sequences from the SG samples from this study yield the same results as previous study conducted Zhao et al.25 HVR1 and Y-STR of all researchers who worked in the lab were sequenced to check for contamination (Supplementary Table S3).


The resulting mtDNA HVR1 sequences from JJL specimens were compared with the revised Cambridge Reference Sequence.26 We identified 23 polymorphic sites from 41 JJL individuals. A total of 36 haplotypes from 10 haplogroups (D, A, G, B, M10, C, Z, F, R and M*) were observed (Supplementary Table S1). Two samples marked M* represent a failure in the typing process. Haplogroups D (34%), A (15%), G (12%), B (10%) and M10 (7%) are present in the JJL population at a higher frequency than the other haplogroups (C, Z, F and R). Haplogroups C (5%), R (5%), Z (5%) and F (2%) were observed at a low frequency in the JJL population (Table 1). To better understand the regional genetic relationship in the Sanggan River Valley, West Liao River Valley and Yellow River Valley, Fst genetic distances between seven ancient populations were calculated (Figure 3 and Supplementary Table S4).33, 34, 35, 36, 37 The result indicated that the smallest Fst value was between the JJL and SG populations (Fst=0.00009). Samples from the TJZ site are not included in the Fst analysis, as specimens are excavated from a family cemetery with similar mtDNA variation.

Table 1 Distribution of mtDNA haplogroups (%) in the eight ancient populations and the modern North China population
Figure 3
figure 3

Fst values between ancient populations of North China, based on HVR1 data. The ‘+’ indicates the Fst value is significant, with a P-value less than 0.05; the ‘−’ indicates the Fst was not significant, with a P-value greater than 0.05. A full color version of this figure is available at the Journal of Human Genetics journal online.

Using the HVR1 data from Zhao et al.25 (Supplementary Table S5), we determined that the haplogroups belonging to the SG population include B, D, G, A, M10 and M*, all of which are also present in the JJL population at a higher frequency. With the small sample size, the haplogroup frequency of the SG population was not calculated.

In total, 17 out of 41 individuals from the JJL site were classified as male through sex identification. Twenty individuals were classified as female, including two individuals (95JJLM38 and 95JJLM65x) who were identified as females in molecular identification but were considered males through morphological identification.22 The results for three individuals (95JJLM43z, 95JJLM52 and 95JJLM53) were unclear using the molecular identification method (Supplementary Table S1). All JJL males expressed the Y haplogroup N-M231 (Supplementary Table S5), the most common haplogroup in North China 3000 years ago.38 Haplogroup N1 (× N1a, × N1c) is present in 10 out of the 17 males while the haplogroup N1c-Tat was observed in the other seven male individuals (Table 2).

Table 2 Geographic locations and the Y-chromosome haplogroup distribution of ancient populations in this study

For the SG samples, four of seven individuals were classified as male, and they all belonged to the Y haplogroup O3-M122 (Supplementary Tables S2 and S5), common in the ancient people of the Yellow River Valley.38, 39 Haplogroup O3-M122 is the main haplogroup observed in the Han Chinese people today,40, 41, 42 however was not identified in the JJL population.

Y-STR data were available from the JJL samples (Supplementary Table S5). DYS447, DYS439 and DYS392 were not successfully amplified in almost all the samples. This is likely because the lengths of these amplicons are too long for ancient DNA. From the Median-joining network analysis, we found the Y-STRs in the N1c population were extremely variable. While the N1 (× N1a, × N1c) population can be divided into three paternal groups (Supplementary Figure S2). JJL3, JJL6, JJL25, JJL40 and JJL63z individuals share one paternal relative, JJL9, JJL10 and JJL11 individuals share another paternal relative, and JJL16 and JJL29 share a third paternal relative.


The mtDNA haplogroups D, A, G, B and M10 were present at a high frequency in the JJL population. These are the most common haplogroups found in the ancient and present-day northeastern Chinese populations.33, 36, 37, 39 MtDNA haplogroups C, Z, F and R were present at a low frequency in the JJL population. This is expected because low frequencies of these haplogroups are also observed in most other northeastern Chinese ancient populations (Table 1).33, 39 All haplogroups observed in the JJL population were exclusive to North China, indicating that the JJL population was an ancient local population of North China. No evidence of in-migration is found based on haplogroups present in regions outside of Northeast China, such as the West Eurasian maternal lineages U and W in the Tianshanbeilu site of Xinjiang in northwestern China.43 Unfortunately, only seven samples were excavated from the SG site, so the data were too small to the mtDNA structure analysis.

Archeological evidence clearly suggested that the ancient JJL and SG communities utilized different subsistence strategies.22, 44 The subsistence strategy was changed during this 1500-year period from both hunting and farming at the JJL site to extensive farming at the SG site. The mtDNA data from this study, however, did not reveal a significant difference of each population. In fact, all the haplogroups (B, D, G, A and M10) in the SG population were present at a high frequency in the JJL population. This is further supported by the small Fst distance between the JJL and SG populations (Figure 3). The mtDNA results indicate that there might be a close maternal relationship between the two populations. More mtDNA data from relevant spatial and temporal ranges are needed to more accurately describe the maternal model of population dynamics during the subsistence shift observed in the archeological records.

The resulting Y-chromosome analysis indicated the presence of a rare haplogroup N1c-Tat in the JJL population. This is an unexpected result for ancient populations in North China. Searching through previously reported Y-chromosome variations for 119 known ancient individuals from 13 archeological sites (approximately dated to 6000–2000 years ago) in North China, the N1c-Tat was found in only three individuals from the Bronze Age Dashanqian site (~1000 BC) in the West Liao River Valley.38, 39, 45

Considering the rarity of the N1c-Tat haplogroup distribution in the ancient populations throughout the late Neolithic and Bronze Ages, it is possible that N1c-Tat represented the local population of the Sanggan River Valley. These people may be the original settlers who used a hunter-gatherer subsistence strategy in the Sanggan River Valley. During the shift from hunting to farming, they likely embraced new techniques and then began to use farming technology.

The haplogroup N1 (× N1a, × N1c) was widely distributed at a high density in ancient populations of North China but at a much lower frequency (<5.96%) in modern East Asian populations.42 Previous studies supported that haplogroup N1 expanded into North China 12–18 kya.46, 47 High-frequency distributions of N1 (× N1a, × N1c) (Table 2) were observed in populations from the West Liao River Valley during the Neolithic and early Bronze Age,38 showing that it became the dominant Y-chromosome lineage of North China at that time. Through the West Liao River Valley, researchers argued that the N1 (× N1a, × N1c) haplogroup migrated to northwestern China (for example, the Bronze Age Tianshanbeilu site in Xinjiang)43 and the Yellow River Valley (for example, the Hengbei site).48 The presence of N1 (× N1a, × N1c) haplogroup in the ancient JJL population suggested that this population maybe in close affinity with the West Liao River Valley populations. This suggests possible genetic exchanges between the West Liao River Valley and the Sanggan River Valley populations.

Haplogroup O3-M122 is the main lineage of the Yellow River Valley.49 It is, however, not present in the JJL population by this study. Artifacts excavated from the JJL site also showed more stylistic features of the Hongshan culture of the West Liao River Valley.19 Genetic and archeological evidence indicates that people from the Yellow River Valley probably did not reach the Sanggan River Valley before 3000 BC. Therefore, people from the West Liao River Valley, not the Yellow River Valley, most likely first influenced the agricultural practices in the Sanggan River Valley.

Haplogroup O3-M122 was observed in all four males from the SG site. Considering the SG site is more recent than the JJL site, the presence of haplogroup O3-M122 suggests that people carrying this haplogroup from the Yellow River Valley may have migrated to the Sanggan River Valley sometime between 3000 and 1500 BC. Haplogroups N1 (× N1a, × N1c) and N1c were not present in the SG samples in this research. Their absence may be due to the small sample size of the SG population. Another hypothesis is that the Yellow River Valley population carrying haplogroup O3-M122 completely replaced the local people carrying N1 (× N1a, × N1c) and N1c. This hypothesis is not very compelling because haplogroup N1(× N1a, × N1c) and N1c were still found in the later populations in North China. The more likely scenario is that the Yellow River Valley populations admixed with the local populations during this expansion. For example, the Dadianzi population, a Lower Xiajiadian cultural group of West Liao River Valley, carried N1 (× N1a, × N1c) and O3 haplogroups.39, 45 Similarly the later Dashanqian population, characterized to the Upper Xiajiadian culture, had evidence of N1 (× N1a, × N1c), N1c and O3 haplogroups.38, 45

Combining the Y-chromosome analyses with the archeological evidence,8, 19, 23 we find Neolithic farming expansions from the Yellow River Valley and the West Liao River Valley. The West Liao River Valley population expanded earlier, with agriculturalists spreading south and reaching the Sanggan River Valley before 3000 BC. This migration was combined with the appearance of agricultural techniques in the Sanggan River Valley.19, 22 The culture also shifted into a unique transitional culture, the Xueshan culture, which was characterized by both hunting and farming subsistence strategies.23

The later expansion from the Yellow River Valley may happen during 3000–1500 BC. This expansion also greatly influenced people in the Sanggan River Valley genetically and culturally.8 During this expansion, people from the Yellow River Valley migrated to the Sanggan River Valley and finally settled there. They also contributed to the change from the Xueshan culture to the Lower Xiajiadian culture, a highly developed agricultural culture. Another consequence of the expansion of the Yellow River Valley population is that a portion of local Sanggan River Valley people might have migrated to the West Liao River Valley. This hypothesis is supported by the appearance of the haplogroup N1c in individuals from the Dashanqian site.45

More ancient data, especially for the Y chromosome, during adjacent time periods and from the Yellow River Valley and the Sanggan River Valley, are needed for further research regarding farming expansion in North China.