Investigation of blood group genotype prevalence in Korean population using large genomic databases

Blood group antigens, which are prominently expressed in red blood cells, are important in transfusion medicine. The advent of high-throughput genome sequencing technology has facilitated the prediction of blood group antigen phenotypes based on genomic data. In this study, we analyzed data from a large Korean population to provide an updated prevalence of blood group antigen phenotypes, including rare ones. A robust dataset comprising 72,291 single nucleotide polymorphism arrays, 5318 whole-exome sequences, and 4793 whole-genome sequences was extracted from the Korean Genome and Epidemiology Study, Genome Aggregation Database, and Korean Variant Archive and then analyzed. The phenotype prevalence of clinically significant blood group antigens, including MNSs, RHCE, Kidd, Duffy, and Diego, was predicted through genotype analysis and corroborated the existing literature. We identified individuals with rare phenotypes, including 369 (0.51%) with Fy(a−b+), 188 (0.26%) with Di(a+b−), and 16 (0.02%) with Jr(a−). Furthermore, we calculated the frequencies of individuals with extremely rare phenotypes, such as p (0.000004%), Kell-null (0.000310%), and Jk(a−b−) (0.000438%), based on allele frequency predictions. These findings offer valuable insights into the distribution of blood group antigens in the Korean population and have significant implications for enhancing the safety and efficiency of blood transfusion.

Predicted blood group phenotype frequencies from WGS data.WGS enabled the analysis of blood groups that could not be examined using SNP array data, such as the MN, Kell, Ok, and other high-frequency antigen groups (Lutheran, Yt, Colton, Landsteiner-Wiener, Cromer, Knops, and Indian).The results of the 2897 WGS analyses are summarized in Table 2.For the blood groups included in the SNP array data analysis, similar results were observed for each blood group frequency compared with the SNP array data.Among the blood groups that could not be predicted based on the SNP array data, the MM and NN phenotypes of the MN blood group were observed in 717 (24.75%) and 870 (30.03%) individuals, respectively.For the Ok blood group, all study populations were predicted to have the Ok(a+) phenotype.Unlike the SNP array data analysis, no individuals with the Jr(a−) phenotype were observed possibly because of the smaller sample size of the entire Predicted extended blood group phenotype frequencies from WGS data.WGS data provide gene sequences for each individual, allowing the prediction of the extended blood group antigen phenotypes of each individual.These data are shown in Table 3.The most common phenotype combination, observed in 137 cases (4.74%), included MN, ss, CcEe, Fy(a+b−), Jk(a+b+), Di(a−b+), Do(a−b+), and KANNO1+ .Individuals with blood group antigen phenotypes exhibiting a frequency of > 1% accounted for approximately half of the total cases (1403/2897, 48.4%).These individuals were predicted to display the s, Fy(a), Di(b), Do(b), and KANNO1+ antigens.Extended blood antigen phenotypes with a frequency of less than 1% are shown in Supplementary Table S1.
Genotype-based estimation of rare blood group phenotypes.Investigation of the frequencies of low-frequency antigen alleles from KoGES, gnomAD, and KOVA enabled us to predict the frequencies of lowfrequency antigen phenotypes in the Korean population.The predicted phenotype frequencies were calculated using the Hardy-Weinberg equation and are presented in Table 4.For comparison, the allocated antigen phenotype frequencies of East Asian and European populations, as analyzed from gnomAD, are also provided.Detailed number and frequencies of identified alleles for each antigen are shown in Supplementary Table S2.
Alleles associated with the Fy(a−b−), Di(a−b−), and Cr(a−) phenotypes were not observed in any of the three databases.Except for the c.274G>A variant in BSG (Ok(a−)), c.376C>T variant in ABCG2 (Jr(a−)), c.655G>A variant in PRNP (KANNO1−), and c.1396T>C variant in B4GALNT2 (Sd(a−)), all other variant alleles were observed at frequencies of < 0.2%.The frequencies of p, Co(b+), In(a+b−), and Rhnull were predicted to be less than 0.000005%.

Discussion
Previous studies have attempted to elucidate the distribution of RBC antigen phenotypes and genotypes in the Korean population.However, conventional serological methods have proven challenging for examining a wide range of RBC antigens because of the considerable cost, time, and effort required to verify an individual's RBC antigens using antisera for multiple antigens.Similarly, for the genotyping of RBC antigens, most previous studies obtained data by enrolling patients and conducting genetic testing with their blood samples, primarily based on sequencing using PCR methods or commercial allele-specific probes [6][7][8]  www.nature.com/scientificreports/predicted by the genotypes obtained from the three databases showed relatively good agreement with each other.
In addition, it correlated with previous genotype studies (see Supplementary Table S3).The investigation of the frequency of rare RBC antigens is significant because it allows transfusion centers to be prepared for unusual instances of transfusions.This approach mitigates the risk of antigen sensitization and the acquisition of irregular antibodies.However, the prevalence of rare blood group antigens in the Korean population has yet to be adequately determined.While these data are lacking, there have been several reports of irregular antibodies to high-frequency antigens in Korea, including anti-PP1Pk, anti-Rh17, anti-Ku, anti-Fy(a), anti-Di(b), anti-Ge, anti-Yk(a), anti-Ok(a), anti-JMH, anti-Jr(a), and anti-Sd(a) 17 .This study revealed the presence and number of individuals with the Fy(a−b+), Di(a+b−), Jr(a−), Sd(a−), and KANNO1− phenotypes in the population.In the present study, the Fy(a−b+) phenotype frequency ranged from 0.21 to 0.51%, which was slightly lower than the ranges in previous studies 7,8,18,19 .The Fy(a−b−) phenotype, which is linked to resistance to malaria, was not observed across the three databases investigated in this study.The prevalence of the Di(a+b−) phenotype is consistent with previous studies 7,8,20 .The Jr(a) antigen is a high-frequency antigen.The Jr(a−) phenotype is predominantly reported in Japan, with an estimated prevalence of 0.05% in the Japanese population 21 .Several reports focused on Jr(a−) and anti-Jr(a) in Korea 17,21 .However, to the best of our knowledge, the prevalence of individuals with the Jr(a−) phenotype in the Korean population has not been investigated.In the KoGES SNP array analysis, 16 individuals exhibited the Jr(a−) phenotype, with a prevalence of 0.02%.Similarly, the prediction of Jr(a−) phenotype prevalence, as calculated using the Hardy-Weinberg equation based on allele frequency, was 0.03%.These frequencies within the Korean population are slightly lower but still show a high degree of similarity to the Japanese prevalence data.Antibodies against Duffy, Diego, and JR have previously been associated with hemolytic disease of the fetus and newborn (HDFN) or hemolytic transfusion reaction (HTR) 17 .Although some suspected HTRs have been reportedly caused by an unusually strong Sda antigen (Sd(a++)) 1,22 , anti-Sd(a) is generally considered clinically insignificant because of its extreme rarity in causing HTRs 1 .Consequently, prevalence data for Sd(a−) had not been identified in the Korean population prior to this study.Anti-KANNO1 has been associated with pregnancy in Japanese women, but it is not related to HTR or HDFN 23 .The KANNO1− phenotype has been reported in 0.44% of the Japanese population 24 .Frequency data of the KANNO1− antigen among Koreans are currently lacking.In the present study, the prevalence of the KANNO1− phenotype was 0.25% in KoGES SNP array data, 0.17% in KoGES WGS data, and 0.37% in gnomAD data, all of which are slightly less than the Japanese prevalence.
At the allele level, alleles associated with the p, Kell-null, Ok(a−), Jk(a−b−), and Lan− phenotypes were observed.The Kell blood group characterized by high immunogenicity and antibodies that evoke HTR or HDFN.K+ is rare among Korean and East Asian populations 18,25,26 , and no individuals with the K+ antigen were identified in this study.The Kell-null allele, however, was shown to be 0.18% in total, and the predicted phenotype frequency was 0.000310%, indicating that approximately 150 individuals with the Kell-null phenotype are expected in Korea.Anti-Ok(a) is not associated with HTR or HDFN, but it is deemed clinically significant because its reaction with the Ok(a+) antigen decreases the survival of RBCs [27][28][29] .The allele frequency was predicted to be 0.46%, resulting in a phenotype frequency of 0.002%.Moreover, alleles associated with Co(b+) and Rhnull were identified, and the Colton and RHAG systems were also considered to be clinically significant antigens that may cause HTR or HDFN 30,31 .Although not observed in the homozygous pattern, the frequency of Jk(a−b−) alleles was estimated to be more than 0.2% in our investigation.Kidd antigens cause delayed HTR in addition to typical HTR, and the Kidd-null phenotype is rarely observed in most ethnic groups 32 .To date, no reports of the Jk(a−b−) phenotype in Korea are available.In East Asian countries, the frequency of the Jk (a−b−) phenotype is expected to be 0.002% in Japanese, 0.023% in Taiwanese, and 0.008% in Chinese 32 .According to the Hardy-Weinberg equation, the prevalence of individuals with the Jk(a−b−) phenotype was predicted in the present study to be 0.0004%, which is much lower than that of their East Asian neighbors (0.0178%).The overall frequency of alleles associated with Lan− was > 0.1%.Anti-Lan, which causes mild-to-severe HTR and HDFN 33 , has not been reported in the Korean population.Only a few Japanese reports on the Lan− phenotype have been published in Asian countries 34 .In the present study, the prevalence of individuals with the Lan− phenotype was predicted as 0.0001%.
When determining the pool of potential donors for patients in need of transfusions involving rare blood phenotypes, estimating the frequency of these rare blood antigen phenotypes necessitates consideration of the concurrent expression of ABO and RhD antigens.Thus, it should be noted that the frequencies of rare blood antigens should be interpreted in the context of the frequencies of ABO and RhD phenotypes, as demonstrated in this study and previous studies 35,36 .
This study has some limitations.First, analyses were limited by each data format.KoGES SNP array data were initially generated to investigate specific gene regions associated with diseases or SNPs common to Koreans.Therefore, only these particular genetic regions were included in the KoGES SNP array data, and variants related to blood group antigens outside of these regions could not be analyzed.gnomAD and KOVA data only provide the frequency of specific variants in the population, complicating the analyses of blood group antigen variations that require two or more variants.Second, there may be discrepancies in phenotype prediction.We predicted blood group antigen phenotypes by investigating specific gene regions associated with blood group antigen expression.Although this investigation was based on established references, such as ISBT working parties, the actual phenotype could not be confirmed.This limitation could cause discrepancies and, thus, inaccuracies in the results to some extent.
In this study, we examined the prevalence of blood group antigens in a Korean population by using various genetic databases.To the best of our knowledge, this study is the most comprehensive blood group genotype analysis conducted in the Korean population.Investigating a large sample size allowed us to provide accurate and representative data on genotype prevalence in Koreans.Furthermore, our study extended beyond the frequency of blood group phenotypes and also explored the extended blood group antigen phenotype frequencies through KoGES WGS data analysis.Importantly, our relatively large sample size enabled the identification of rare

Table 2 .
Frequencies of blood group antigen phenotype predicted based on WGS data (n = 2897).

Table 3 .
Extended blood group antigen phenotype predicted based on WGS data (n = 2897).
. Consequently, large-scale studies are difficult to conduct, and only a limited number of variants have been investigated.Advancements in genome sequencing techniques have facilitated the conduct of WGS and WES, leading to the compilation of several databases.Using these resources, we were able to investigate the distribution of blood group genotypes in Koreans with the largest sample size to date, including the KoGES database with 2897 individual WGS data, 72,291 SNP array data, 1909 individual gnomAD data, and KOVA data.The distribution of Korean blood group phenotypes

Table 4 .
Predicted frequencies of rare blood group phenotypes.*Data analyzed from Korean Genome and Epidemiology Study, Genome Aggregation Database (gnomAD), and Korean Variant Archive.**Data analyzed from gnomAD.