The value of single-molecule real-time technology in the diagnosis of rare thalassemia variants and analysis of phenotype–genotype correlation

To compare single-molecule real-time technology (SMRT) and conventional genetic diagnostic technology of rare types of thalassemia mutations, and to analyze the molecular characteristics and phenotypes of rare thalassemia gene variants, we used 434 cases with positive hematology screening as the cohort, then used SMRT technology and conventional gene diagnosis technology [(Gap-PCR, multiple ligation probe amplification technology (MLPA), PCR-reverse dot blot (RDB)] for thalassemia gene screening. Among the 434 enrolled cases, conventional technology identified 318 patients with variants (73.27%) and 116 patients without variants (26.73%), SMRT identified 361 patients with variants (83.18%), and 73 patients without variants (16.82%). The positive detection rate of SMRT was 9.91% higher than conventional technology. Combination of the two methods identified 485 positive alleles among 49 types of variant. The genotypes of 354 cases were concordant between the two methods, while 80 cases were discordant. Among the 80 cases, 76 cases had variants only identified in SMRT method, 3 cases had variants only identified in conventional method, and 1 false positive result by the traditional PCR detection technology. Except the three variants in HS40 and HBG1-HBG2 loci, which was beyond the design of SMRT method in this study, all the other discordant variants identified by SMRT were validated by further Sanger sequencing or MLPA. The hematological phenotypic parameters of 80 discordant cases were also analyzed. SMRT technology increased the positive detection rate of thalassemia genes, and detected rare thalassemia cases with variable phenotypes, which had great significance for clinical thalassemia gene screening.


INTRODUCTION
Thalassemia, also known as Mediterranean anemia, is a hereditary hemolytic anemia mainly caused by deletions or point mutations of globin genes. It is one of the most common single gene diseases in the world. The global thalassemia gene carriers comprise~1.67% of the total population, which are mainly distributed in the Mediterranean coast, North Africa, the Middle East, the Indian mainland, Southeast Asia, and southern China [1]. Thalassemia is one of the most common genetic diseases in southern China. The pathogenic variants of thalassemia include single-nucleotide variations (SNVs), indels, and large fragments of copy number variants (CNVs) and structural variations (SVs). Among them, α-thalassemia is mainly caused by large fragment deletions, and β-thalassemia mainly involves point mutations. In China, simple and low cost red blood cell and hemoglobin tests are used as a first-tier screening strategy. Then molecular diagnosis will be performed for individuals with positive results of blood test. Conventional molecular diagnosis methods for detecting thalassemia genes include Gap-PCR, PCR-RDB, PCR-flow fluorescence hybridization, and MLPA. Other common technologies used in China include gene chip, Sanger sequencing, and next generation sequencing (NGS). Conventional screening methods can only detect a limited spectrum of gene mutations, which sometimes lead to misdiagnosis. NGS used in thalassemia screening can effectively reduce the need for various types of conventional genetic testing, but there could be missed diagnoses [2]. Although the probe hybridization target capture NGS method can simultaneously detect deletions and SNV/indels, the detection cost is high, and the accuracy is not ideal. Gap-PCR combined with NGS technology is currently used to compensate for the shortcomings of NGS capture sequencing technology. In addition, due to the high homology between HBA2 and HBA1 genes, the short-read NGS method cannot distinguish HBA2 and HBA1 effectively [3,4]. With the advantage of long-molecule sequencing, PacBio real-time sequencing technology (SMRT) had been used for comprehensive and precious thalassemia test [5,6]. In this study, SMRT technology and conventional methods were performed for 434 suspected carriers of thalassemia to simultaneously detect deletion and non-deletion variants of α-thalassemia and β-thalassemia. Compared to conventional methods, SMRT technology detected more abnormal hemoglobin variant sites on the HBA1, HBA2, and HBB genes, which illustrated the value of SMRT technology in the diagnosis of common and rare types of αthalassemia and β-thalassemia variants.

PATIENTS AND METHODS Patients
A total of 434 patients who attended Liuzhou Maternal and Child Health Hospital in Guangxi, China from January 2018 to December 2020, were included in the study. The enrolled patients should meet at least one of the following inclusion criteria: (1) routine hematology examination showed abnormal mean corpuscular volume (MCV ≤ 80 fL) and/or mean corpuscular hemoglobin (MCH ≤ 27 pg); (2) hemoglobin electrophoresis showed HbA2 < 2.5% or HbA2 ≥ 3.5% or elevated HbF or abnormal hemoglobin; (3) the results of conventional genetic diagnosis were inconsistent with the results of the hematology phenotype; (4) the patient gave birth to children with moderate or severe thalassemia; and (5) there may be abnormalities outside the scope of conventional genetic testing techniques. The exclusion criteria included: (1) incomplete basic clinical data; (2) the patient had other blood diseases; and (3) the patient had mental abnormalities or cognitive dysfunctions. The study group was comprised of 185 males and 249 females, with age range 3 days to 56 years, and an average age of 26.4 ± 12.59 years. This study was approval by the ethics committee of our hospital, and all research subjects or their legal guardians signed an informed consent form.

Methods
Hematology and hemoglobin electrophoresis analysis. An automatic blood cell analyzer was used for routine blood analyses, and high-performance liquid chromatography was used for hemoglobin analysis to detect HbF, HbA2, HbH, and other hemoglobin variants.
Genomic DNA extraction. The magnetic bead method was used to extract nucleic acids (LabAid820; Xiamen Zhishan Biotechnology, Xiamen, China). The nucleic acid analyzer (ASP-2680; ACTGene, Piscataway, NJ, USA) was used to detect DNA concentration and purity. The A 260 /A 280 of extracted DNA was between 1.6 and 1.9, and the concentration was 20-30 ng/µL. α-thalassemia and β-thalassemia genotyping. Genomic DNA extracted from peripheral blood were used for thalassemia test. Gap-PCR (Yishengtang, Shenzhen, China) was performed for the four common α-thalassemia deletions [--SEA (Southeast Asia), −α 3.7 (rightward), −α 4.2 (leftward) --THAI (Thailand)] were performed using the gap-polymerase chain reaction (Gap-PCR). PCR-RDB assay (Yishengtang, Shenzhen, China) was performed for the three common non-deletional α-thalassemia mutations including Hb SMRT and data analysis. Genomic DNA was extracted from peripheral blood leukocytes using the QIAamp DNA blood mini kit (Qiagen, Hilden, Germany). Purified DNA samples were quantified using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, USA) using a Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA). Samples were sent to an independent laboratory (Berry Genomics, Beijing, China) for sequencing using the Sequel II platform and data analysis (PacBio, Menlo Park, CA, USA). Briefly, genomic DNA samples were subjected to multiplex longmolecule PCR using optimized primers to generate specific amplicons that encapsulated currently known structural variation (SV) regions, singlenucleotide variations (SNVs), and indels (insertions and deletions) in the HBA1, HBA2, and HBB genes. After purification and end repair, the barcoded adapters were ligated to the 5′ and 3′ ends, and SMRT bell libraries were prepared using the sequel binding and internal Ctrl Kit 3.0 (PacBio). Primed DNA-polymerase complexes were loaded onto SMRT cells (PacBio) and sequencing was performed on the PacBio Sequel II system to generate 10-25 subjects per molecule. Following alignment of subreads, the consensus circular sequence was mapped to the GRCh38 reference and variants identified (FreeBayes software, version 1.2.0). Variant pathogenicity was classified according to general guidelines and from information provided in hemoglobin variant databases. Phenotypes were finally assigned from known genotypic-phenotypic associations. Large deletion variants was confirmed by Gap-PCR or MLPA. SNVs and indels were confirmed by PCR-RDB or Sanger sequencing.
Sanger sequencing for HBA and HBB gene        (Fig. 2). Of the 49 types of variants, 30 types (61.23%) were detected by SMRT only, two types (4.08 %) were detected by conventional methods only, and 17 types (34.69%) were detected by both methods (Fig. 2). Of the 485 positive alleles, 77 (15.88%) were identified by SMRT only, three (0.62%) were identified by conventional methods and 405 (83.50%) were identified by both methods (Table 1 and Fig. 2). A total of 354 cases had completely concordant results between the two types of techniques, including 73 negative and 281 positive cases (Table 2 and Fig. 3), while 80 cases had discordant results (Fig. 3). Of the 80 cases, 14 patients had rare deletions and triplicate α-globin genes (Table 3), 16 patients had rare variants in the α-globin gene (Table 4), and 49 patients had rare variants in the β-globin gene (Table 5), one case had α 3.7 deletion. The genotype of sample D141966 was −α 3.7 /αα by SMRT method, while it was −α 3.7 /−α 3.7 by conventional Gap-PCR. Validated by MLPA technology, the genotype is the same as the result of SMRT technology (Fig. 4). The IGV plots of selected samples were displayed to show the thalassemia variants identified by SMRT (Fig. 5).

Comparison of genotyping results between traditional methods and SMRT technology
Hematology examination and hemoglobin electrophoresis results in patients with rare deletions and triplicate α-globin genes A total of 14 rare cases of α and β-globin gene deletions, or triplicate α-globin genes were found (Table 3). Among them, the result of hemoglobin electrophoresis of patients with --SEA /−α 2. 4 and −α 3.7 /HS-40 deletion showed the presence of HBH peak.

Hematology examination and hemoglobin electrophoresis results in patients with rare variants in HBB
The c.−100G>A, c.−136C>G, c.315+5G>C, c.380T>G, and c. −81A>C belonged to beta+ thalassemia (partial loss of function of β-globin gene). Heterozygous variants of these types showed silent or mild β-thalassemia. c.91A>G belonged to beta0 thalassemia (complete loss of function of the β-globin gene), and heterozygous mutations manifested as mild β thalassemia. The hematological phenotype of c.170G>A, c.431A>G, c.232C>T, c.341T>A, and c.431A>G heterozygous variation was normal, the HbA2 and HbF contents were within the reference range, but abnormal hemoglobins were detected. Abnormal hemoglobin of case with c.341T>A/c.315+5G>C accounted for 93.5%, and HbA was almost undetectable. HbA2 content of case with c.431A>G heterozygous variation was increased. The hematological           phenotype of c.−248A>G heterozygous variation was normal, and mainly manifested as decreased HbA2 content. c.316-45G>C, c.316-179A>C, and c.315+308delA combined with other types of α-thalassemia were manifested as silent or mild α-thalassemia (Table 5).

DISCUSSION
Thalassemia is a single gene disease, which is difficult to cure but more straightforward to diagnose and be prevented clinically. Its gene mutation types are diverse and complex. As of 2021, the LOVD (https://databases.lovd.nl/shared/genes) database had more than 2000 thalassemia and abnormal hemoglobin-related variant sites, and most of the sites have not been studied by conventional genetic testing methods, especially the large deletion variant type. At present, common clinical testing techniques for thalassemia genes include Gap-PCR, reverse dot hybridization, PCR-flow fluorescence hybridization, gene chip, MLPA, Sanger sequencing, and next generation sequencing. The conventional screening mode can only detect variants in known gene loci, which is far from sufficient for the detection of other variant loci, leading to missed diagnoses and misdiagnoses. There is therefore an urgent need to use more accurate and effective diagnostic techniques to screen thalassemia patients in clinical practice. In recent years, there have been reports of missed detections of thalassemia using conventional genetic testing methods. The SMRT technology can detect the thalassemia gene without interrupting the DNA, and can directly read the full-length gene sequence. The DNA does not need to be amplified by PCR during sequencing, which facilitates individual sequencing of each DNA molecule, and it has very long read lengths (a read length up to 30-100 kb), high accuracy (QV30 > 99.8%), no GC preference, and single-molecule resolution characteristics [7]. SMRT technology can facilitate the simultaneous detection of α-thalassemia and β-thalassemia in 1 μL of whole blood or 10-15 mL of amniotic fluid sample. It can also detect hotspots and rare variant sites and their arrangements with high accuracy, including comprehensive coverage of 2062 variant sites related to thalassemia, and detection of 18 α-globin gene deletion variants, four α-globin genes triplicate and two β-globin gene deletion. It can detect 96 samples at a time with high efficiency and high accuracy. Xu et al. [5] first used the SMRT to sequence full-length thalassemia-related genes (HBA1/2 and HBB) to obtain complete variant information of two alleles that were difficult to obtain by conventional genetic testing techniques. Twelve hospitals in southern China assessed a comprehensive analysis of thalassemia alleles (CATSA) for identifying both α and β thalassemia genetic carrier status by third-generation sequencing (TGS). Compared with standard thalassemia variant PCR panel testing, TCS can detected 33 more positive variants, and found that the traditional PCR detection technology had 1 false negative and 8 false positive result [6]. The present study used the SMRT and conventional technologies to test the thalassemia gene in the thalassemia screening positive population in this area. The results showed that the percentage of thalassemia gene was high and the genotype was complex, rare variant types of thalassemia and the phenotypes were diverse. Among the 434 cases, 49 variant types were detected, of which 19 were detected by conventional technology and 47 were detected by SMRT technology. Compared with conventional technology, SMRT technology detected 28 more variant types. The positive detection of SMRT was 9.91% higher than that of conventional technology, and SMRT technology increased the detection of thalassemia genes. At present, the detection range of the reagents we used only included 2062 variant sites related to thalassemia on the HBA1/2 and HBB genes. HS-40 deletion occurs upstream of the α-globin gene cluster, and HBG1-HBG2 deletion occurs upstream of the HBB gene cluster. The SMRT method developed in this study focused on detection of variants in HBA1, HBA2, and HBB genes, which consisted the vast majority of thalassemia variants. With expanded primer pairs, the SMRT technology can definitely detect HS-40 and HBG1-HBG2 deletions. However, the sequencing cost will increase with more primer pairs [7]. So, it was the limit of the design of SMRT method in this study but not SMRT technology itself.
This study found 14 cases of rare deletions or triplicate α-globin genes. Among them, the --SEA /−α 2.4 and −α 3.7 /HS-40 deletion patients all manifested with HbH disease [8,9]. Carriers of ααα anti3. 7 and ααα anti-4.2 had normal phenotypes, but HbF was significantly increased by 12.3% and 16.2%, respectively, and HbA2 was reduced. When compounded with β-thalassemia, it can manifest as intermediate β-thalassemia due to the aggravation of the imbalance between the α and β chains, and HbF is also significantly increased [10,11]. In the present study, among the thalassemia carriers whose detection results were −α 3.7 /αα by conventional methods, two of them were found to be HKαα/αα using SMRT technology, and the misdiagnosis rate was as high as 4.17% (2/48). HKαα/αα patients presented with silent α-thalassemia, and HKαα/--SEA patients presented with mild α-thalassemia, which is consistent with past reports [12]. Although the HBG1-HBG2 deletion combined with c.126_129delCTTT/WT had two allelic variants in the HBB gene, HBG1-HBG2 was functionally closed in adulthood and did not affect the expression of β globin, so it was clinically mild β-thalassemia. SMRT method showed that the genotype of sample D141966 was −α 3.7 /αα, while by conventional Gap-PCR it was −α 3.7 /−α 3.7 . Validation by MLPA confirmed D141966 had heterozygous −α 3.7 deletion. To investigate the basis of this discordance, we analyzed the SNV/indels in the αα allele identified by SMRT method and found there were three SNPs in the 3′-terminal of HBA2, which caused dropout of the αα allele in conventional Gap-PCR method that designed primer in this region.
This study found 16 rare HBA gene variants. Among them, c.34A>C, c.51G>C, c.84G>T, and c.19G>T were located in the HBA1 gene. c.34A>C and c.51G>C showed normal hematology, and  abnormal hemoglobin was detected [13,14]. Carriers with HBA1: c.84G>T, HBA1:c.19G>T, and HBA1:c.55G>C genotypes had a normal blood phenotype. When they were compounded with other deletion types, they could be mild or silent [15][16][17]. Carriers of these gene variants all showed abnormal hemoglobin, and no HbH phenotype was found in the compound Southeast Asian deletion. The --SEA /HBA2:c.2delT, --SEA /HBA2:c.2T>C, and --SEA /HBA2:c.52G>T are located in the more functional HBA2 gene, causing α chain synthesis to be affected, showing that the non-deletion HbH disease was more serious than the deletion of HbH disease [18][19][20]. HBA2: c.91G>C has a normal phenotype, the main manifestation is abnormal hemoglobin, and HbA2 is reduced [21]. Qadah et al. [22] reported that the HBA2: c.−59C>T variant caused a significant reduction in the transcription level of HBA2 by 53.7%. Our study reported, for the first time, HBA2:c.−59C>T and HBA2: c.91G>C compound heterozygous cases. Hemoglobin electrophoresis detected abnormal hemoglobin peaks at 3.784 min and 4.349 min, and the routine blood phenotype was normal, due to the abnormal hemoglobin peak time being very close to HbA2, so it could be easily misdiagnosed as a significant increase in HbA2.

C C C A T A G A C T C A C C C T G T G G G G T A A G G T C G G C G C C A G C C T G C A C T G G T G G G T A C C A A C C T G C C C
There are related reports of HBA2:c.256G>C [23], but no related reports of HBA2:c.256G>A. The phenotype of this case was normal, the main manifestation was abnormal hemoglobin, and HbA2 was reduced.
Among the rare variants in the HBB gene, carriers with c. −100G>A, c.−136C>G, c.315+5G>C, c.380T>G, and c.−81A>C were manifested as silent or mild β-thalassemia. The normal hematological phenotype of some cases is consistent with related reports [24][25][26][27]. Carrier with c.91A>G was manifested as mild β thalassemia, which is consistent with related reports [28]. Carriers with c.170G>A, c.431A>G, c.232C>T, c.341T>A, and c.431A>G s had normal hematological phenotypes. The content of HbA2 and HbF was within the reference range, and abnormal hemoglobin was detected [29][30][31][32]. In the first report of c.341T>A/c.315+5G>C case, abnormal hemoglobin accounted for 93.5%, and HbA was not detected. Carrier with c.431A>G had the peak time of abnormal hemoglobin and HbA2 overlapped, and the content of each component could not be detected correctly. The blood routine examination of c.−248A>G carrier was normal, mainly manifested as a decrease in HbA2 content [33].The phenotype of c.316-45G>C, c.316-179A>C, and c.315+308delA combined with other types of α-thalassemia may be as silent or mild αthalassemia.

CONCLUSIONS
In summary, in this region, the incidences of rare gene variants and abnormal hemoglobin cases were high. SMRT technology used in the genetic diagnosis of thalassemia had wide detection spectrum with improved efficiency and accuracy over conventional methods. The application of this technology has greatly enriched the thalassemia gene mutation bank and hemoglobin gene profiles in the region, and provided a reference for better prevention and control of thalassemia. However, the pathogenicity of many rare mutant genes is still unclear, and family analysis is required, which brings great challenges to clinical genetic counseling.

DATA AVAILABILITY
Individual participant data describing the results reported in this article, after deidentification (text, tables, figures, and appendices), together with the study protocol, will be available, beginning 9 months and ending 36 months following article publication. Data will be available for investigators whose proposed use of the data has been approved by an independent review committee identified for this purpose and for individual participant data meta-analysis.