Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Large-scale lexical and genetic alignment supports a hybrid model of Han Chinese demic and cultural diffusions


The Han Chinese history is shaped by substantial demographic activities and sociocultural transmissions. However, it remains challenging to assess the contributions of demic and cultural diffusion to Han culture and language, primarily due to the lack of rigorous examination of genetic–linguistic congruence. Here we digitized a large-scale linguistic inventory comprising 1,018 lexical traits across 926 dialect varieties. Using phylogenetic analysis and admixture inference, we revealed a north–south gradient of lexical differences that probably resulted from historical migrations. Furthermore, we quantified extensive horizontal language transfers and pinpointed central China as a dialectal melting pot. Integrating genetic data from 30,408 Han Chinese individuals, we compared the lexical and genetic landscapes across 26 provinces. Our results support a hybrid model where demic diffusion predominantly impacts central China, while cultural diffusion and language assimilation occur in southwestern and coastal regions, respectively. This interdisciplinary study sheds light on the complex social-genetic history of the Han Chinese.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Geographic characteristics of Chinese dialects.
Fig. 2: Manhattan plot of outlier lexical traits.
Fig. 3: Internal structure of Chinese dialects.
Fig. 4: Admixture patterns of Chinese dialects.
Fig. 5: Correlation between the genetic component of Northern Han populations and the linguistic component of northern language (Mandarin) in China.

Similar content being viewed by others

Data availability

The lexical inventory of Chinese dialects and other necessary datasets to reproduce the results in the paper are available in the Supplementary Tables. The allele frequency data of genetic variants used in this study are available via PGG.Han website at The pairwise FST for any two provincial populations and the inferred ancestry composition for each of the provincial populations are available in GitHub at or via Zenodo at (ref. 112). The use of genetic data in this work is approved by The National Health Commission of the People’s Republic of China (No. 2024BAT00503). Release of the summary statistics of genetic data in this work is recorded by the National Health Commission (NHC) of the People’s Republic of China at Open Archive for Miscellaneous Data (OMIX) with accession number OMIX004518. All data generated or analysed during this study are included in this Article, its Supplementary Information and publicly available repositories.

Code availability

The codes required to transform the data into the statistics and outputs reported in the paper are available in Supplementary Information and have been deposited in GitHub at and in Zenodo at (ref. 113).


  1. Pagel, M. Human language as a culturally transmitted replicator. Nat. Rev. Genet. 10, 405–415 (2009).

    Article  CAS  PubMed  Google Scholar 

  2. Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton Univ. Press, 1994).

  4. Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science 337, 957–960 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).

    Article  CAS  PubMed  Google Scholar 

  6. Tagore, D., Aghakhanian, F., Naidu, R., Phipps, M. E. & Basu, A. Insights into the demographic history of Asia from common ancestry and admixture in the genomic landscape of present-day Austroasiatic speakers. BMC Biol. 19, 61 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  7. de Filippo, C., Bostoen, K., Stoneking, M. & Pakendorf, B. Bringing together linguistic and genetic evidence to test the Bantu expansion. Proc. Biol. Sci. 279, 3256–3263 (2012).

    PubMed  PubMed Central  Google Scholar 

  8. Tambets, K. et al. Genes reveal traces of common recent demographic history for most of the Uralic-speaking populations. Genome Biol. 19, 139 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Robbeets, M. et al. Triangulation supports agricultural spread of the Transeurasian languages. Nature 599, 616–621 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ge, J., Wu, S. & Cao, S. Zhongguo Yi Min Shi (History of Migrations in China) (Fujian People’s Publishing House, 1997).

  11. Zhou, Z. & Lo, K. Migrations in Chinese history and their legacy on Chinese dialects. J. Chin. Linguist. Monogr. Ser. 3, 29–49 (1991).

  12. Coblin, W. S. Migration history and dialect development in the lower Yangtze watershed. Bull. Sch. Orient. Afr. Stud. Univ. Lond. 65, 529–543 (2002).

    Google Scholar 

  13. Lee, J. Z. in Annales de demographie historique Vol. 1982 279–304 (Persée, 1982).

  14. Lee, J. & Wong, R. B. Population movements in Qing China and their linguistic legacy. J. Chin. Linguist. Monogr. Ser. 3, 50–75 (1991).

  15. Xu, S. et al. Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am. J. Hum. Genet. 85, 762–774 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wen, B. et al. Genetic evidence supports demic diffusion of Han culture. Nature 431, 302–305 (2004).

    Article  CAS  PubMed  Google Scholar 

  17. Deng, W. et al. Evolution and migration history of the Chinese population inferred from Chinese Y-chromosome evidence. J. Hum. Genet. 49, 339–348 (2004).

    Article  PubMed  Google Scholar 

  18. Ethnologue: Languages of the World (SIL International, 2023).

  19. The Sino-Tibetan Languages (Routledge, 2016).

  20. Norman, J. Chinese (Cambridge Univ. Press, 1988).

  21. Yuan, J. Hanyu Fangyan Gaiyao (Shangwu Yinshuguan, 2003).

  22. Coblin, W. S. A brief history of Mandarin. J. Am. Orient. Soc. 120, 537–552 (2000).

    Article  Google Scholar 

  23. Hamed, M. B. Neighbour-nets portray the Chinese dialect continuum and the linguistic legacy of China’s demic history. Proc. R. Soc. B 272, 1015–1022 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zheng, Z. & Xiong, Z. (eds) Language Atlas of China 2nd edition Vol. Chinese Dialects (Shangwu Yinshuguan, 2012).

  25. Kurpaska, M. Chinese Language(s): A Look Through the Prism of the Great Dictionary of Modern Chinese Dialects. Chinese Language(s) (De Gruyter Mouton, 2010).

  26. Ho, D. in The Oxford Handbook of Chinese Linguistics (eds Wang, W. S.-Y. & Sun, C.) 149–160 (Oxford Univ. Press, 2015).

  27. LaPolla, R. J. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Aikhenvald, A. Y. & Dixon, R. M. W.) 225–254 (Oxford Univ. Press, 2001).

  28. Xue, F. et al. A spatial analysis of genetic structure of human populations in China reveals distinct difference between maternal and paternal lineages. Eur. J. Hum. Genet. 16, 705–717 (2008).

    Article  CAS  PubMed  Google Scholar 

  29. LaPolla, R. J. in The Cambridge Handbook of Language Contact Vol. 1 (eds Escobar, A. M. & Mufwene, S. S.) 64–83 (Cambridge Univ. Press, 2022).

  30. Zhang, M. Diversity of language structure is shaped by demographic activities: comment on ‘Rethinking foundations of language from a multidisciplinary perspective’ by T. Gong et al. Phys. Life Rev. 26–27, 146–148 (2018).

    Article  PubMed  Google Scholar 

  31. Cao, Z. et al. (eds) Hanyu Fangyan Dituji (Linguistic Atlas of Chinese Dialects) Vol. Lexicon (Shangwu Yinshuguan, 2008).

  32. Coblin, W. S. Neo-Hakka, Paleo-Hakka, and Early Southern Highlands Chinese. Yuyán Ánjiù Jíkan 21,175–238 (2018).

  33. Baker, H. D. R. Migration and ethnicity in Chinese history: Hakkas, Pengmin, and their neighbours. By Sow-Theng Leong edited By Tim Wright, pp. xix, 234, 1 fig., 11 maps. Stanford, California, Stanford Univ. Press. 1997. J. R. Asiat. Soc. 9, 350–351 (1999).

    Article  Google Scholar 

  34. Hashimoto, M. J. Origin of the East Asian linguistic structure: latitudinal transitions and longitudinal developments of East and Southeast Asian languages. Comput. Anal. Asian Afr. Lang. 24, 35–42 (1984).

    Google Scholar 

  35. Hashimoto, M. in Contributions to Sino-Tibetan Studies 76–97 (Brill, 1986).

  36. Hashimoto, M. Language diffusion on the Asian continent: problems of typological diversity in Sino-Tibetan. Comput. Anal. Asian Afr. Lang. 3, 49–65 (1976).

    Google Scholar 

  37. Yue-Hashmoto, A. The lexicon in syntactic change: lexical diffusion in Chinese syntax. J. Chin. Linguist. 21, 213–254 (1993).

    Google Scholar 

  38. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).

    CAS  PubMed  Google Scholar 

  39. Bryant, D. & Moulton, V. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21, 255–265 (2004).

    Article  CAS  PubMed  Google Scholar 

  40. List, J.-M., Shijulal, N.-S., Martin, W. & Geisler, H. Using phylogenetic networks to model Chinese dialect history. Lang. Dyn. Change 4, 222–252 (2014).

    Article  Google Scholar 

  41. Pulleyblank, E. G. Chinese dialect studies. J. Chin. Linguist. Monogr. Ser. 3, 429–453 (1991).

  42. Zhang, M.-H., Pan, W.-Y., Yan, S. & Jin, L. Phonemic evidence reveals interwoven evolution of Chinese dialects. Preprint at (2018).

  43. Coblin, W. S. A Study of Comparative Gàn (Institute of Linguistics, Academia Sinica, 2015).

  44. Iwata, R. Chinese geolinguistics: history, current trends, and theoretical issues. Dialectologia: revista electrònica 1, 97–121 (2010).

  45. You, R. et al. Hanyu Fangyanxue Daolun (Chinese Dialectology) (Shanghai Jiaoyu Chubanshe, 1992).

  46. Levinson, S. C. & Gray, R. D. Tools from evolutionary biology shed new light on the diversification of languages. Trends Cogn. Sci. 16, 167–173 (2012).

    Article  PubMed  Google Scholar 

  47. Syrjänen, K., Honkola, T., Lehtinen, J., Leino, A. & Vesakoski, O. Applying population genetic approaches within languages: Finnish dialects as linguistic populations. Lang. Dyn. Change 6, 235–283 (2016).

    Article  Google Scholar 

  48. Dor, D. & Eva, J. From cultural selection to genetic selection: a framework for the evolution of language. Selection 1, 33–56 (2001).

  49. Carling, G., Cronhamn, S., Lundgren, O., Bogren Svensson, V. & Frid, J. The evolution of lexical semantics dynamics, directionality, and drift. Front. Commun. (2023).

  50. Lupyan, G. & Dale, R. Language structure is partly determined by social structure. PLoS ONE 5, e8559 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Romano, N., Ranacher, P., Bachmann, S. & Joost, S. Linguistic traits as heritable units? Spatial Bayesian clustering reveals Swiss German dialect regions. J. Linguist. Geogr. 10, 11–22 (2022).

    Article  Google Scholar 

  52. Jackson, D. A. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204–2214 (1993).

    Article  Google Scholar 

  53. Shen, R. in The Palgrave Handbook of Chinese Language Studies (ed. Ye, Z.) 441–456 (Palgrave Macmillan, 2021).

  54. Norman, J. Guanyu guanhuafangyan zaoqi fazhan de yixie xiangfa (some thoughts on the early development of Mandarin). Dialect 4, 295–300 (2004).

    Google Scholar 

  55. Liu, X. Zailun hanyu beifanghua de fenqu (On the dialect areas of Northern Chinese). Zhongguo Yuwen 8, 439–452 (1995).

    Google Scholar 

  56. Hashimoto, M. J. The Hakka dialect: a linguistic study of its phonology, syntax and lexicon. Bull. Sch. Orient. Afr. Stud. 37, 278–279 (1974).

    Google Scholar 

  57. Hashimoto, M. J. Hakka in Wellentheorie perspective. J. Chin. Linguist. 20, 1–49 (1992).

    Google Scholar 

  58. Yan, M. M. Introduction to Chinese Dialectology (LINCOM Europa, 2006).

  59. Chappell, H. in Sinitic Grammar: Synchronic and Diachronic Perspectives (ed. Chappell, H.) 3–28 (Oxford Univ. Press, 2001).

  60. Norman, J. The Mǐn dialects in historical perspective. J. Chin. Linguist. Monogr. Ser. 3, 323–358 (1991).

  61. Lipson, M. et al. Efficient moment-based inference of admixture parameters and sources of gene flow. Mol. Biol. Evol. 30, 1788–1802 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Sagart, L. Gan, Hakka and the Formation of Chinese Dialects (Academia Sinica, 2002).

  63. Szeto, P. Y., Ansaldo, U. & Matthews, S. Typological variation across Mandarin dialects: an areal perspective with a quantitative approach. Linguist. Typol. 22, 233–275 (2018).

    Article  Google Scholar 

  64. You, R. & Zhenhe, Z. Fangyan Yu Zhongguo Wenhua (Dialects and Chinese Culture) (Shanghai Renmin Chubanshe, 2006).

  65. Wang, J., Lin, X., Bloomgarden, Z. T. & Ning, G. The Jiangnan diet, a healthy diet pattern for Chinese. J. Diabetes 12, 365–371 (2020).

    Article  PubMed  Google Scholar 

  66. He, K., Lu, H., Zhang, J., Wang, C. & Huan, X. Prehistoric evolution of the dualistic structure mixed rice and millet farming in China. Holocene 27, 1885–1898 (2017).

    Article  Google Scholar 

  67. Valliant, J. C. D., Bruce, A. B., Houser, M., Dickinson, S. L. & Farmer, J. R. Product diversification, adaptive management, and climate change: farming and family in the U.S. Corn Belt. Front. Clim. (2021).

  68. Honkola, T. et al. Evolution within a language: environmental differences contribute to divergence of dialect groups. BMC Evol. Biol. 18, 132 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Mufwene, S. Population movements and contacts in language evolution. J. Lang. Contact 1, 63–92 (2007).

    Article  Google Scholar 

  70. Posth, C. et al. Language continuity despite population replacement in Remote Oceania. Nat. Ecol. Evol. 2, 731–740 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Szeto, P. Y. & Yurayong, C. Sinitic as a typological sandwich: revisiting the notions of Altaicization and Taicization. Linguist. Typology 25, 551–599 (2021).

  72. Chappell, H. in Areal Diffusion and Genetic Inheritance: Problems in Comparative Linguistics (eds Aikhenvald, A. Y. & Dixon, R. M. W.) 328–357 (Oxford University Press, 2001).

  73. Jolliffe, I. in Encyclopedia of Statistics in Behavioral Science (eds Everitt, B. S. & Howell, D. C.) (John Wiley & Sons, 2005).

  74. Hastie, T., Tibshirani, R., Narasimhan, B. & Chu, G. impute: imputation for microarray data. R package version 1.76.0 (2023).

  75. Novembre, J., Williams, R., Pourreza, H., Wang, Y. & Carbonetto, P. PCAviz: visualizing principal components analysis. R package version 0.3-37 (2019).

  76. Gower, J. C. Generalized procrustes analysis. Psychometrika 40, 33–51 (1975).

    Article  Google Scholar 

  77. Wang, C. et al. Comparing spatial maps of human population-genetic variation using Procrustes analysis. Stat. Appl. Genet. Mol. Biol. 9, 13 (2010).

  78. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).

  80. Hijmans, R. J. Raster: geographic data analysis and modeling. R package version 3.4-8 (CRAN, 2023).

  81. Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Tech. J. 29, 147–160 (1950).

    Article  Google Scholar 

  82. Mantel, N. & Valand, R. S. A technique of nonparametric multivariate analysis. Biometrics 26, 547–558 (1970).

    Article  CAS  PubMed  Google Scholar 

  83. Oksanen, J. et al. Vegan: Community Ecology Package (CRAN, 2022).

  84. Evans, C. et al. The uses and abuses of tree thinking in cultural evolution. Phil. Trans. R. Soc. B 376, 20200056 (2021).

  85. Mace, R. & Holden, C. J. A phylogenetic approach to cultural evolution. Trends Ecol. Evol. 20, 116–121 (2005).

    Article  PubMed  Google Scholar 

  86. Wu, F. & Huang, Y. in The Palgrave Handbook of Chinese Language Studies (ed. Ye, Z.) 1–28 (Springer Nature, 2020).

  87. Hamed, M. B. & Wang, F. Stuck in the forest: trees, networks and Chinese dialects. Diachronica 23, 29–60 (2006).

    Article  Google Scholar 

  88. Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2006).

    Article  CAS  PubMed  Google Scholar 

  89. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Hubisz, M. J., Falush, D., Stephens, M. & Pritchard, J. K. Inferring weak population structure with the assistance of sample group information. Mol. Ecol. Resour. 9, 1322–1332 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620 (2005).

    Article  CAS  PubMed  Google Scholar 

  93. Reesink, G., Singer, R. & Dunn, M. Explaining the linguistic diversity of Sahul using population models. PLoS Biol. 7, e1000241 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Auderset, S., Greenhill, S. J., DiCanio, C. T. & Campbell, E. W. Subgrouping in a ‘dialect continuum’: a Bayesian phylogenetic analysis of the Mixtecan language family. J. Lang. Evol. 8, 33–63 (2023).

  95. Jakobsson, M. & Rosenberg, N. A. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801–1806 (2007).

    Article  CAS  PubMed  Google Scholar 

  96. Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv. Genet. Resour. 4, 359–361 (2012).

    Article  Google Scholar 

  97. Caye, K., Deist, T. M., Martins, H., Michel, O. & François, O. TESS3: fast inference of spatial population structure and genome scans for selection. Mol. Ecol. Resour. 16, 540–548 (2016).

    Article  CAS  PubMed  Google Scholar 

  98. Lipson, M. et al. Reconstructing Austronesian population history in Island Southeast Asia. Nat. Commun. 5, 4689 (2014).

    Article  CAS  PubMed  Google Scholar 

  99. Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

    CAS  PubMed  Google Scholar 

  100. Reich, D., Thangaraj, K., Patterson, N., Price, A. L. & Singh, L. Reconstructing Indian population history. Nature 461, 489–494 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Sagart, L. in Dialect Variations in Chinese 129–154 (Academia Sinica, 2002).

  103. Lipson, M. New Statistical Genetic Methods for Elucidating the History and Evolution of Human Populations. Ph.D. thesis, Massachusetts Institute of Technology (2014).

  104. MATLAB version 8.6.0 (R2015b) (MathWorks, 2015).

  105. Privé, F., Luu, K., Vilhjálmsson, B. J. & Blum, M. G. B. Performing highly efficient genome scans for local adaptation with R package pcadapt version 4. Mol. Biol. Evol. 37, 2153–2154 (2020).

    Article  PubMed  Google Scholar 

  106. Storey, J. D., Bass, A. J., Dabney, A. & Robinson, D. Qvalue: Q-value estimation for false discovery rate control. R package version 2.34.0 (2023).

  107. Gao, Y. et al. PGG.Han: the Han Chinese genome database and analysis platform. Nucleic Acids Res. 48, D971–D976 (2019).

    Article  PubMed Central  Google Scholar 

  108. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  110. Cong, P.-K. et al. Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat. Commun. 13, 2939 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Zhang, X. Shuhua-Group/Genetic-characteristics-of-the-Han100K-initiative (v1.0). Zenodo (2024).

  113. Yang, C. JoshuaThieriot/Chinese-dialect-project: the first release of analytical codes for Chinese dialects (v1.0.0). Zenodo (2024).

Download references


This research was supported by the National Natural Science Foundation of China (T2122007, 32288101, 32070577 and 32030020), National Key R&D Program of China (2023YFC2605400, 2020YFE0201600), National Social Science Foundation (23&ZD317 and 20&ZD301), the Shanghai Science and Technology Commission Program (23JS1410100), the Office of Global Partnerships (Key Projects Development Fund), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 883700 TRAM). This work was also sponsored by the ‘Shuguang Program’ supported by the Shanghai Education Development Foundation and Shanghai Municipal Education Commission (20SG06), and by the Fundamental Research Funds for the Central Universities (2022ECNU-XWK-XK005). The computational work in this study was supported by the CFFF Computing Platform and the Human Phenome Data Center of Fudan University. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



C.Y., S. Yan, L.J., S.X. and M.Z. designed the research. C.Y., F.Y., Y.C., N.X., Z.W. and M.Z. collated the linguistic data of the Chinese dialects. X.Z. and S.X. assembled the genetic data of the Han Chinese and performed genetic data analysis. C.Y., S. Yang, B.W. and M.Z. performed the linguistic analyses and interdisciplinary alignment. C.Y., S. Yan, L.J., S.X. and M.Z. discussed the results. C.Y., X.Z., S.X. and M.Z. wrote and revised the paper. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Li Jin, Shuhua Xu or Menghan Zhang.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Peer review

Peer review information

Nature Human Behaviour thanks Randy J. Lapolla and Chuang-Chao Wang for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note, Discussion, Figs. 1–3, and table of contents for Supplementary Tables 1–13.

Reporting Summary

Peer Review File

Supplementary Tables 1–13

Lexical inventory, datasets generated from statistical analysis, and other relevant analytical results.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Zhang, X., Yan, S. et al. Large-scale lexical and genetic alignment supports a hybrid model of Han Chinese demic and cultural diffusions. Nat Hum Behav (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing