World War II, a global military conflict involving more than 100 million people over 30 countries, was the largest and deadliest war in history. Chinese government put in significant efforts to field search, repatriate, identify, and rebury the remains of Chinese soldiers from the World War II. Identification of human remains relies on various procedures, such as facial recognition, fingerprint, dental analysis, hair comparisons, and DNA analysis and so on. Considering the temporal gap of 70 years, DNA analysis seems to be the only applicable approach for the identification of victim remains from the end of the World War II.

Actually DNA typing has already been used to identify the skeletal remains of World War II victims in Slovenia [1, 2], Bosnia and Herzegovina [3], Poland [4] and Finland[5], Korean War dead by the Ministry of National Defense Agency for Killed in Action Recovery and Identification (MAKRI) in Korea [6], and Vietnam War soldiers by Armed Forces DNA Identification Laboratory (AFDIL) in the United States [7]. In our previous study, we tested the Y-chromosomal lineages of 27 Chinese Expeditionary Force remains found in Myitkyina area, Myanmar [8], which might give hope to the families who had been searching for their loved ones for decades.

The current standard methodology in forensic DNA typing as mentioned above mainly relies on commercially available kits with amplification of short-tandem repeat (STR) markers by the polymerase chain reaction (PCR) and allele sizes (i.e., length based) determined for each locus using capillary electrophoresis. However, these length-based kits have two obvious disadvantages. One of the limitations is that forensically relevant loci that can be multiplexed and simultaneously detected is less than 30. Iterative testing of targets (e.g., autosomal STRs and Y-STRs or X-STRs or mitochondrial DNA [mtDNA]) requires additional input templates from sometimes limited DNA extracts. A second limitation is that partial STR profiles or inconclusive results from degraded DNA samples are exacerbated due to the STR amplicon lengths required across each size range of each fluorescently labeled dye. Next generation sequencing technology is a feasible option for the identification of old skeletal remains. In addition to sequencing size-reduced amplicons of various markers in parallel, the amount of input template (1 ng or less) makes processing degraded DNA samples possible.

In this context, we develop a universal procedure including more efficient DNA extraction methods, as well as more sensitive sequence-based panels, enabling to conduct a full identification process with the use of DNA analysis in connection with anthropological studies.

(1) A total of 331 soldier remains from seven sites of the World War II were excavated according to standard archeological procedures during 2015–2018 (Fig. 1a). Prior to genetic analysis, basic anthropological examination was carried out in order to individualize the skeletal remains and obtained basic biological data on sex, age-at death, stature, ante-mortem fractures, or possible cause of death, among other data.

Fig. 1
figure 1

Geographic and genetic information of the Chinese soldier remains from World War II in this study. a The homepage of National DNA Martyry. Geographic location of the seven sampling sites: 1 Shidian, Yunnan (DianXi Protecting War) (n = 29), 2 Pingjiang, Hunan (Jiuling Battle) (n = 17), 3 Hengyang, Hunan (Hengyang Battle) (n = 3), 4 Xiangtan, Hunan (Xiangtan Battle) (n = 1), 5 Pingyao, Shanxi (Pingyao Rencounter) (n = 1), 6 Gaotai, Gansu (Gaotai Battle) (n = 14), and 7 Myitkyina area, Myanmar (Myitkyina Battle) (n = 266). b Haplogroup frequencies of 65 samples from 6 sites based on mtDNA HVI/HVII region sequences; haplogroup nomenclature according to phylotree, build 17

(2) In order to avoid contamination, recommendations suggested for work with ancient DNA were followed [9]. We extracted the DNA for bones and teeth using an improved procedure with additional purification [8]. Initially, for each sample, Panel 1 was used to simultaneously analyze the HVI (16,025–16,399; 375 bp) and HVII (65–371; 307 bp) regions of mtDNA for maternal line and three genes (Y-indel, SRY, and Amel) for sex determination. Panel 2 enables amplification and massively parallel sequencing of 47 Y-STRs and 485 Y-SNPs and was used for paternal line, if Panel 1 suggested the specimen is male. In some cases, when needing the examination of mixed remains collected at battle sites, such as Changsha Battles, we used the Panel 3 containing 145 Ancestry Informative SNPs for differentiation of Chinese and Japanese soldiers [10]. The details of methods and three panels mentioned above are provided in Doc S1.

(3) Finally, all data created by three panels were kept in a special website: China National DNA Martyry (CNDM, https://mj.szlongyue.org/). Taking mtDNA testing as an example, the typing of the HVI/HVII region by Panel 1 was successful in all 65 samples from six sites, which represents a 100% success rate (Fig. 1b). The genotyping results are provided in Table S1. The 63 haplotypes were assigned to 41 distinct haplogroups according to the mitochondrial phylogenetic tree, build 17. A detailed list of all haplogroups and their frequencies are given in Fig. 1b. The Haplogroup Diversity values of 6 sites are not less than 0.934.

CNDM is also in urgent need of reference samples from volunteers of putative relatives. Recently, the comparison of a reference sample’ profile against the CNDM that we have processed resulted in one positive identification. A 73-year-old lady had been searching for her father lost in the Red Army-Japanese Army Pingyao Rencounter for 38 years. CNDM confirmed that the unknown remains labeled D02324 belonged to her father, based on her autosomal STR profile and a male paternal relative’s Y-chromosomal profile (further determined by the capillary electrophoresis-based kit; detailed results are shown in Table S2). By overcoming a number of challenges, such as optimization of DNA extraction protocols, specific laboratory procedures, and communication with the relatives, we successfully identified a number of Chinese soldiers’ remains from World War II. Above all, we have established the CNDM that facilitates the assignment of uncovered skeletal remains to missing persons after more than half a century, which gives families hope to find their missing relatives. We believe that the CNDM is a beacon of hope illuminating the martyrs’ way home.