Expanded carrier screening (ECS) has been demonstrated to increase the detection rate of carriers compared with traditional tests. The aim of this study was to assess the potential value of ECS for clinical application in Southern China, a region with high prevalence of thalassemia and with diverse ethnic groups, and to provide a reference for future implementations in areas with similar population characteristics. A total of 10,476 prenatal/preconception couples from 34 self-reported ethnic groups were simultaneously tested and analyzed anonymously for 11 Mendelian disorders using targeted next-generation sequencing. Overall, 27.49% of individuals without self-reported family history of disorders were found to be carriers of at least 1 of the 11 conditions, and the carrier frequency varied greatly between ethnic groups, ranging from 4.15% to 81.35%. Furthermore, 255 couples (2.43%) were identified as carrier couples at an elevated risk having an affected baby, sixty-five of which would not have been identified through the existing screening strategy, which only detects thalassemia. The modeled risk of fetuses being affected by any of the selected disorders was 531 per 100,000 (95% CI, 497–567 per 100,000). Our data demonstrate the feasibility of ECS, and provide evidence that ECS is a promising alternative to traditional one-condition screening strategies. The lessons learned from this experience should be applicable for other countries or regions with diverse ethnic groups.
Prenatal/preconception carrier screening, which aims to identify individuals or couples at risk of passing on recessive monogenic disorders to their offspring, has been accepted as an effective early intervention strategy to reduce the prevalence of disability and disease in the newborn population. A carrier screening program for Tay-Sachs disease started in 1971  and has become the prototype for carrier screening of recessive disorders.
Advances in sequencing technology and decreases in cost  have made expanded carrier screening (ECS) feasible and affordable. In 2011, after 14 years of cumulative experience in gene-by-gene carrier screening, screening tests were first expanded to simultaneously test for 448 Mendelian recessive diseases using next-generation sequencing (NGS) technology . Since then, ECS has been implemented in several populations, and the power of NGS and expanded panels increases detection rates compared with traditional tests [4,5,6,7].
It is known that the carrier frequencies of Mendelian disorders differ considerably between ethnic groups [7, 8]. Scientific guidelines announced by the American College of Obstetricians and Gynecologists (ACOG), the American College of Medical Genetics and Genomics (ACMG), and other professional institutes dictate careful vetting of demographic characteristics, such as ethnic background, during carrier screening [9,10,11,12]. Knowing the frequency of variants in the population being tested facilitates pan-ethnic testing, but this is generally unrealistic to achieve due to costs and concerns of stigmatization. Moreover, geographical mobility is continuously increasing, also in China , increasing the possibility of mixed marriages and the complexity of genetic backgrounds.
China is a populous country with high incidence of birth defects , and has 56 ethnic groups (the Han and 55 ethnic minorities) [15, 16] which typically have their own distinctive culture and language. Several studies have elucidated some of the genetic variation in these groups [17,18,19]. These population characteristics provide a good opportunity to explore the performance of ECS in area with diversified ethnicities.
Here, we report the first design and validation of an ECS panel covering 11 Mendelian disorders using targeted NGS in China. The testing of the ECS in 10,476 ethnically diverse couples simultaneously supports its feasibility and provides evidence that ECS is useful in a region with diverse ethnic groups. We also identified several groups at considerable risk of passing genetic variants to their offspring, indicating a potential need for expanded carrier screening.
Materials and methods
The samples were categorized into two groups. The first group included 2238 patients or carriers with known genetic variants and was used for the validation of the ECS panel. The second group consisted of samples from 10,527 couples without a self-identified family history of inherited disorders, who were selected from five provinces in southern China for a hemoglobinopathies research project . The pre-test genetic counseling for the hemoglobinopathies project was free to access. Basic genetic knowledge, such as inheritance patterns was spread through posters in hospitals while the hemoglobinopathies research project was underway. These couples were informed that the remaining samples might be used for other research purposes after anonymization and de-identification. Informed consent was required for all individuals.
Fifty-one of these couples were excluded from further analysis because either or both of the partners’ DNA failed the sample quality control. Members of a couple underwent the test simultaneously and were required to report their ethnic background. Individuals in China identify as part of an ethnic group only if at least one of their parents belongs to the group. As this study was performed 1 year after sample collection, the testing results were not informed to individuals. The Institutional Review Board (IRB) of BGI approved the study.
Disease selection and panel design
The selection of monogenic diseases followed the criteria recommended by the ACMG, including considerations regarding prenatal diagnosis and a validated clinical association between mutation(s) and disease severity . Eleven recessive disorders were included, namely α-thalassemia (OMIM 604131); β-thalassemia (OMIM 613985); Phenylalanine hydroxylase deficiency (PAH deficiency; OMIM 261600); Wilson disease (WD; OMIM 277900); GJB2-related DFNB1 nonsyndromic hearing loss and deafness (DFNB1; OMIM 220290); Deafness, autosomal recessive 4, with enlarged vestibular aqueduct (DFNB4; OMIM 600791)/Pendred syndrome (PDS; OMIM 274600); Pompe disease (PD; OMIM 232300); Autosomal recessive polycystic kidney disease (ARPKD; OMIM 263200); Hyperphenylalaninemia, BH4-deficient, A (HPABH4A; OMIM 261640); Galactosemia (OMIM 230400); and Dystrophinopathies (Duchenne muscular dystrophy; OMIM 310200 & Becker muscular dystrophy; OMIM 300376). The severity of these diseases was classified as moderate, severe or profound  (Supplementary Table 1).
Twelve genes were selected for the design of a customized capture array (NimbleGen, Roche) to capture the target regions of interest, including all exons along with 60-bp flanking intronic sequences and certain well-characterized intron sequences. For thalassemia, in addition to the above-mentioned regions, known deleterious copy number variants (CNVs) listed in HbVar (http://globin.cse.psu.edu/hbvar/) outside exons and single nucleotide variants (SNVs)/insertion and deletion variants (indels) were included. Additionally, 21 autosomal SNPs and 2 genes (PS4Y1 and AMELY) on the Y-chromosome were included for internal quality assurance. The total size of targeted regions was 386,605 bp.
NGS and data analysis
Genomic DNA extraction was carried out with a MagPure Buffy Coat DNA Midi KF Kit (Magen, China) from peripheral blood. The DNA was then sequenced on a HiSeq 2000 or 4000 (Illumina, San Diego, USA) in 100-bp paired-end reads. The capture of targeted regions, enrichment, elution, and data analysis was performed according to a previously published protocol . The reference genome sequence used was hg19.
Classification and interpretation of variants were performed according to the standards and guidelines recommended by ACMG and published in the literature [22,23,24]. The classification criteria used to support pathogenicity of the variants are listed in Supplementary Table 2. A variant database was constructed as previously described . Our final database included a total of 2319 variants, including 1310 SNVs, 432 indels and 577 CNVs. Individuals were identified as carriers if they were positive either for variants in our database or for loss-of-function (LOF) variants not listed in our database that have an allele frequency lower than 0.001 in the 1000 Genomes Project (www.1000genomes.org/) and dbSNP (www.ncbi.nlm.nih.gov/SNP/). If both the male and the female were carriers of a same autosomal recessive disorder or the female was the carrier of an X-linked recessive disorder, they were identified as a carrier couple at increased risk having an affected baby.
We first evaluated the performance of the NGS-based screening assay. The Yan Huang genome sample  was utilized as reference material to assess the sequencing depth needed for detecting variants of target regions. When the sequencing depth was greater than 100-fold, the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of SNVs on the targeted regions (≥20 depth) reached 99.9% (Supplementary Figure 1). Coverage of the targeted regions (≥20 depth) was also above 99% (Supplementary Figure 2). We next tested the reproducibility of the assay and detected no inconsistent variants between intra- and inter-run replicates, giving a concordance rate of 100%. The consistency of the same regions in ≥20-fold coverage was over 99.8% (Supplementary Table 3-4).
We also simulated 100 genomes containing 1743 deleterious variants to appraise the performance of the assay. When the mean sequencing depth was ≥100-fold, the accuracy, specificity, and sensitivity of variant detection were all over 99.9% (Supplementary Table 5). Finally, we analyzed 2238 samples with 4322 known variants to validate the ECS panel (Supplementary Table 6). Overall, 4321 of the 4322 variants were detected. The undetected variant was the partial deletion (30 bp) of exon 52 in the DMD gene, which was removed during quality filtration due to the small size of deletion. In general, on the basis of the validation results, a mean sequencing depth of ≥100-fold was aimed for in the study of 10,476 couples.
The screening population included 10,476 couples (20,952 individuals) with an average age of 29.23 years and a median age of 28 years. Approximately 96.71% (10,131/10,476) of the couples included ethnic information for both partners, with 2.68% (n = 281) having ethnic information for only one partner and 0.61% (n = 64) not identified to any ethnicity. In total, 34 self-reported Chinese ethnic groups were included. The largest population was the Han ethnicity (n = 14,233), comprising 67.93%, with minorities accounting for 30.12% (n = 6310) and 1.95% (n = 409) of individuals without a reported ethnic group (Supplementary Table 7). The top ten minority groups in the south of China were all represented in the study, namely the Zhuang, Hui, Miao, Yi, Tujia, Dong, Buyei, Yao, Bai, and Hani . Fourteen ethnic groups which had over one hundred representatives in this study were included for further analysis and discussion.
Disease carrier frequencies
Data about the variants identified in this study has been deposited in Genome Variation Map at http://bigd.big.ac.cn/gvm/getProjectDetail?project=GVM000025. The carrier frequencies of the 11 conditions are presented in Table 1. Of the 20,952 individuals screened, 27.49% (n = 5707) were found to be carriers of at least one condition. The most common disease was α-thalassemia, with a rate of 15.12%, extraordinarily high compared with other diseases detected in this study. This was four times greater than PAH deficiency (n = 753, 3.59%) and three times greater than β-thalassemia (n = 995, 4.75%). The carrier frequencies of five disorders are reported for the first time in the Chinese population: WD (n = 410, 1.96%), ARPKD (n = 118, 0.56%), HPABH4A (n = 52, 0.25%), galactosemia (n = 26, 0.12%), and dystrophinopathies (n = 13, 0.12% of females).
Carrier frequencies categorized by ethnicity
The overall carrier frequencies varied substantially between ethnic groups, ranging from 4.15% of individuals of Hani ethnicity to 81.35% of individuals of Li ethnicity (Table 2). Three ethnic groups (the Dai, Zhuang, and Yao) demonstrated high positive rates for at least one disease on the ECS panel (39.82%, 35.70%, and 29.17%, respectively). The Han ethnicity, which is the largest ethnic group in China, had an overall carrier frequency of 25.28%.
Of the 5707 carriers, ~89.07% (n = 5083) were positive for one disorder and 10.93% (n = 624) for two or more disorders (Table 2). The Li ethnicity again ranked first for multiple-disease carriers (n = 60, 10.66%). Individuals from the Yao and Dai ethnicities were more likely than other ethnicities to be multiple-disease carriers as well (6.25% and 6.19%, respectively). Conversely, no Hani and Va individuals carrying multiple disease-causing variants were identified with this ECS panel.
The data were stratified by the ethnicity and disease to explore the variability among ethnic groups (Table 3). It is apparent that the selected diseases are widely spread among most of the ethnic groups. Variation in thalassemia carrier frequency predominantly contributed to the variability. The Li, Dai, and Zhuang had a carrier frequency of 78.33%, 27.43%, and 22.83% for α-thalassemia and 9.59%, 11.06%, and 8.60% for β-thalassemia. Moreover, 14.32% of individuals from the Miao ethnic group were heterozygous for PAH deficiency, which was almost four times the overall frequency in the screened population (3.59%) and twice the carrier frequency among the Sui (7.04%). Finally, the Buyei group was at a greater than average risk of carrying ARPKD (1.80% vs. 0.56% overall) and the Bai of carrying DFNB1 (3.93% vs. 1.66% overall).
Overall, 254 carrier couples (2.42%) were heterozygous for one condition, and one carrier couple (0.01%) was heterozygous for both PAH deficiency and β-thalassemia (Table 4). Unsurprisingly, α-thalassemia was the top disease in carrier couples (n = 137, 1.31%), followed by β-thalassemia (n = 53, 0.51%) and PAH deficiency (n = 34, 0.32%).
The carrier burdens of couples by ethnicity were analyzed and groups represented by fewer than 20 couples are not listed (Supplementary Table 8). Of the forty-five Dai couples, 11.11% (n = 5) were carrier couples, four times greater than the average (2.43%). Of note, only 2.73% (5/183) of the Li couples were identified as carrier couples, despite a high carrier frequency in the Li ethnicity (81.35%).
Selection of disorders
The aims of carrier screening are to detect carriers of genetic disorders, to enhance reproductive autonomy, and to reduce the prevalence of disability and disease in the newborn population . Therefore, one of the important criteria in our selection was disease severity. All the selected diseases were either profound or severe, with the exception of DFNB1 and DFNB4/PDS, which are considered moderate disorders. These two disorders were nevertheless included due to their high prevalence in the Chinese population , which was corroborated by this study. In addition, screening for hearing loss provides information which can facilitate earlier intervention to improve language development in affected children . Although only 11 conditions were selected for this study, we have expanded our panel to a broader range including hundreds of conditions. A more comprehensive dataset will be published at a later date.
Comparison to published carrier frequencies
The carrier rates of thalassemia were higher in our study than published estimates for the Chinese population  based on the analysis of specific variants. This might be due to the adoption of targeted next-generation sequencing, which facilitated the identification of rare novel variants as well as reported variants that affect function [3, 6, 31]. Of the 376 identified deleterious or likely deleterious variants in our study, 82.98% (n = 312) had already been reported in the literature but 17.02% (n = 64) were novel, including nonsense, frameshift, deletion or duplication of exons, and splice site variants (Supplementary Table 9). The carrier frequencies of DFNB1 and DFNB4/PDS were similar to published estimates . While Wilson disease has been investigated in a large Han population, the carrier frequency was not reported . The remaining disorders have been subject to less investigation in the Chinese population, and study populations were either too small or limited to patients rather than including healthy individuals.
The difference in carrier frequencies compared with Western populations was striking. For example, the carrier frequency for dystrophinopathies generated in our study (1 in 806) was lower than that previously reported (1 in 433.7); likewise, 1 in 806 subjects was carrier of galactosemia, lower than the reported frequency of 1 in 100 . Furthermore, the carrier rate for PAH deficiency (1 in 28) was higher compared to the Hispanic descent (1 in 163) and the southern Europeans (1 in 75) . These differences may be attributed to differences in sample size, ethnic groups or variants detection range. These results also indicate that the disorders recommended for screening in Western societies might not be the correct targets in other populations.
The need for expanding carrier screening
Our data suggest a potential need to implement ECS in China for several reasons. First, around 27.49% of individuals were carriers for at least 1 of 11 selected diseases. Even when thalassemia is removed from our analysis, 10.02% of the individuals were still positive for at least one selected disorder. It is reasonable to believe that the frequency will increase when more diseases are included in the screening panel. Second, routine prenatal/preconception clinical screening for monogenic diseases (with the exception of thalassemia) is a novel practice in Chinese health care [17,18,19,20, 33]. ECS would detect carrier couples affected by monogenic diseases other than thalassemia, which comprised 6.2‰ (65/10 476) of our study population (Table 4). Third, one couple was found to be at risk of having a fetus with both β-thalassemia intermedia and PAH deficiency simultaneously. Traditional screening strategies would require multiple tests to identify two or more disorders. Fourth, according to calculations using a published method , the modeled risk of fetuses being affected by any of the selected disorders was 531 per 100,000 (95% CI, 497–567 per 100,000), which translates into approximately 1 in 188 genetically affected births. This rate is comparable to the frequency of neonates affected by Down syndrome , for which free, government-subsidized screening is offered in routine antenatal tests in China.
We recommend that the screening strategy be universal rather than based on ethnicity. There are five reasons for this recommendation: (1) Thalassemia screening is currently recommended in southern China for all couples who are considering pregnancy or are already pregnant, regardless of their ethnicity, although the ethnic variability of thalassemia carrier frequency has been reported [18, 19]. This serves as a good example which ECS should follow. (2) We have illustrated the variability of disease carrier frequencies in the different ethnic groups (Table 3), the complexity of which makes it unrealistic to develop screening panels for each of the 56 ethnic groups. (3) Information regarding variants in the population to be tested would be needed , but such data are barely available in the minority groups, especially those with a smaller population. (4) Ethnic-based screening may raise the risk of discrimination or stigmatization . (5) Internal geographical mobility is continuously improving in China , which increases the possibility of mixed marriages and hinders ethnic identification. In our data, there were over 1127 (>11%) couples of mixed ethnicity (Supplementary Table 8).
Challenging for genetic counseling
Undoubtedly, genetic counseling is an essential part of implementing carrier screening . Our study provides valuable information about causative mutation(s) and mutation frequency in different ethnic groups which will greatly help clinical genetic counselors in risk evaluation. However, challenges remain, especially in the interpretation of variants of uncertain significance (VUS) generated from NGS. For instance, while the pathogenicity of VUS is controversial or the evidence for their classification is insufficient, they cannot be neglected because VUS have been reported as disease etiologies in some patients . In our data, the paternal partner in one couple had the deleterious variant NM_000277.1: c.331C>T p.(Arg111*) in the PAH gene, while the maternal partner had a VUS NM_000277.1: c.1144T>A p. (Phe382Ile) in the same gene. Determining the pathogenicity of p. (Phe382Ile) would be crucial for reproductive advice in this case. Furthermore, for females with an X-linked VUS detected, pathogenicity evaluation is a necessity to properly evaluate the implications for reproductive risk.
Furthermore, our data revealed that the variant spectrum of diseases in an ethnicity is uncertain and complex, which complicates genetic counseling. Specifically, around 2.73% of Li couples were found to be carrier couples (Supplementary Table 8), despite a much higher individual carrier rate of 81.35% among the Li (the α-thalassemia carrier frequency was 78.33%). This phenomenon was mainly due to the complicated variant spectrum of α-thalassemia. In the Li ethnicity, the –α3.7 allele and –α4.2 alleles account for 72.95% of mutated alleles. Since the combination of –α3.7 and –α4.2 results in α-thalassemia trait, a mild phenotype , couples with the two alleles were not classified as carrier couples.
Clinical utility of ECS
Our study revealed a wide range of carrier frequencies among the different ethnic groups in Southern China, highlighting the potential value of ECS. However, we only evaluated the clinical utility of ECS in terms of carrier and risk couple frequency distribution and test feasibility and validity. A proposed framework for the assessment of clinical utility of genetic testing should include the feasibility and validity of test, diagnostic thinking, therapeutic choices, patient outcomes and societal impacts . Aspects other than test feasibility and validity and carrier and risk couple frequency need to be evaluated in future research for the full assessment of the clinical utility of ECS.
In summary, we report the first expanded carrier screening in China of 10,476 couples with diverse ethnicities. The data not only revealed the carrier frequencies of 11 monogenic disorders, but also suggested the potential utility of ECS in regions with diversified ethnic backgrounds. This experience should be applicable for other countries or regions with diverse ethnic groups. We note that most of the couples screened were from the Han ethnicity (67.93%), and sample numbers of several ethnic groups were limited. However, this is consistent with the demographic characteristics of the Chinese population, in which the Han makes up the majority (91.52%) according to the Sixth National Population Census in 2010 . Nevertheless, we expect further examinations of these minority groups using a larger study population. Finally, we argue for the necessity of expanded carrier screening as a promising alternative and for genetic counseling, which is indispensable in the responsible implementation of expanded carrier screening.
We are grateful for the contributions of all the employees in Tianjin and Shenzhen Medical Laboratory, BGI. They provided strong support in the libraries construction, sequencing, data processing, etc. The study was supported by Tianjin Municipal Science and Technology Special Funds for Enterprise Development (No. 14ZXLJSY00320), Pearl River Nova Program of Guangzhou (No. 201506010100), Special Foundation for High-level Talents of Guangdong (2016TX03R171) and Natural Science Foundation of Guangdong Province (2017A030313673).