The prevalence of thalassemia in mainland China: evidence from epidemiological surveys

Comprehensive data regarding the epidemiology and prevalence of thalassemia in mainland China are lacking. To assess the prevalence of thalassemia, we performed a meta-analysis including 16 articles published from 1981 to 2015. The overall prevalence of α-thalassemia, β-thalassemia and α + β-thalassemia was 7.88%, 2.21% and 0.48%, respectively. Trends in thalassemia prevalence in mainland China were not steady; a prevalence map based on a geographic information system (GIS) showed that the geographic distribution of thalassemia was highest in the south of China and decreased from south to north. Additionally, the most common α- and β-globin gene mutation was --SEA and CD41/42, respectively. The current study provides valuable information regarding epidemiology and intervention and supports the planning, implementation and management of prevention programmes for public health.

In Table S2, we show the detailed score of the 16 included studies. A total of five items with a maximum of 10 scores was used to analyse the quality of the identified studies. The results indicated that all the included studies were eligible; one study obtained full marks, one study obtained a score of 7, 10 studies scored an 8 and the remaining 4 studies scored a 9.

Meta-regression.
A meta-regression was performed to explore the sources of heterogeneity. In the present study, we considered several potential factors, including total sample size, quality score, diagnostic method, age range, survey date, location, and sampling method. Ultimately, none of these factors was identified as a source of heterogeneity for α-thalassemia (all p > 0.05). However, diagnostic method (p < 0.001) and survey date (NA, p = 0.02) were identified as a source of heterogeneity for β-thalassemia. Tables S3 and S4 present the results of this meta-regression.
In addition, removing the study by Zeng et al. 8 contributed to the marked change in the pooled prevalence of β-thalassemia. Similarly, removing each of the studies by Xu et al. 17 and Yin et al. 14 altered the overall prevalence of α + β-thalassemia. More information is listed in Table S5.

Discussion
To the best of our knowledge, this study is the first meta-analysis of epidemiological studies on the prevalence of thalassemia in mainland China. Our results indicated that the pooled prevalence of α-, βand α + β-thalassemia Scientific RepoRts | 7: 920 | DOI:10.1038/s41598-017-00967-2 was 7.88%, 2.21% and 0.48%, respectively. The geographic distribution of thalassemia showed that the prevalence was highest in the south of China and decreased from south to north. Thalassemia is a genetic disease for which it is possible to detect carriers using haematological indices rather than DNA analysis 26 . Increased HbA 2 levels in peripheral venous blood is the most important feature for identifying heterozygous β-thalassemia 27 . For β-thalassemia screening, people with increased Hb A 2 levels (HbA 2 > 3.5%) are diagnosed with β-thalassemia 7 . Therefore, three studies regarding β-thalassemia (Zeng et al. in 1987 8 , Ma et al. 20 and Liu et al. 21 ) screening without gene analysis were included in our meta-analysis. In determining the prevalence of α-thalassemia, cord blood samples were quantified for Hb Bart's when genetic analysis was not widely applied to α-thalassemia diagnosis. Pan et al. 6 employed haemoglobin electrophoresis for cord blood and gene analysis simultaneously and found that cases of α-thalassemia, including heterozygous α-thalassemia, were unlikely to be missed when using a 2% cut-off of Hb Bart's. Zeng et al. 8 in which 12,821 samples of cord blood from new-borns were screened by electrophoresis, was the only study included on α-thalassemia that lacked gene analysis. Although the limitations of the experiment technology at the time prevented an assignment of thalassemia mutation subtypes, the study provides very reliable data on the nationwide incidence. Although gene analysis began to be widely used in the 2000s to ascertain thalassemia mutations in individuals, haematological indices still play a key role in diagnosis.
The 643,580 research subjects included in the 16 studies examined can generally be divided into neonates, children and adults. Although the age ranges of the different studies varied widely, subgroup analysis based on age was not performed because thalassemia is an inherited disease, and the carrying rates of different age groups are consistent in the same area. Five included studies 6,8,12,23,24 in which the reported cases were only neonates attempted to determine the prevalence of α-thalassemia. The neonates were randomly selected such that the rates of thalassemia were representative and could be compared with the results for children or adults.
Surveys on thalassemia in China began in the 1980s. In 1987, Zeng calculated that the nationwide incidence of α-thalassemia and β-thalassemia was 2.64% and 0.66%, respectively 8 . Compared to the findings of Zeng, our meta-analysis revealed a higher prevalence of α-thalassemia (7.88%) and β-thalassemia (2.21%) in China. One explanation for the large variation in the reported prevalence is that in the previous study, a low number of samples were collected for thalassemia screening in provinces with a high incidence, such as Guangxi, where  Table 2. Summary of the prevalence estimation for thalassemia.
only 350 samples were collected, which made the total incidence lower. Overlooking some carriers of silent α-thalassemia mutations, such as α WS α/αα and α CS α/αα 18,28,29 , may be another reason. In addition, the prevalence of α + β-thalassemiain mainland China was found to be 0.48%, and our study confirmed that the double heterozygosity of αand β-thalassemia is not rare in areas where thalassemia is common. Because there are no significant haematological differences between these double heterozygotes and β-thalassemia, it is noted to do α-thalassemia gene analysis for β-thalassemia carriers.  In our study, we generated GIS maps to provide information for 14 provincial regions and illustrate trends in the geographic distribution of thalassemia. Although the majority of the regions in which epidemiologic surveys in mainland China have been conducted are located in the south, large areas have no epidemiological survey data for thalassemia and most are located in the north. Moreover, some provinces in southern China, such as Guizhou and Hunan, have a high prevalence of β-thalassemia but scant data of α-thalassemia. With industrialization over the past 20 years and the availability of jobs in the developed areas of mainland China, many people from the southwest region have migrated to cities in the north. Indeed, population mobility and migration have resulted in a significantly increasing thalassemia prevalence on other continents, such as in Europe and North America 30,31 . However, due to the lack of regional data in provinces in northern China with large population mobility, changes in epidemiological characteristics of thalassemia in those provinces remain unclear. Therefore, we suggest high-quality surveys should be conducted in those areas that lack data for thalassemia prevention.
Based on the present meta-analysis, the most common α-thalassemia mutationin mainland China is --SEA . The high gene frequency of --SEA indicates that the health burden resulting from Hb H diseases and Hb Bart's hydrops fetalis may be serious in mainland China. In addition, non-deletional α-thalassemia is not rare. α WS α, which is rather rare in other parts of the world 32 , is the most prevalent non-deletion type of α-thalassemia, with a gene frequency of 0.26%. Several studies on different populations have suggested that the non-deletion types of Hb H disease(--/α T α) are usually more severe than the deletion types (--/-α), with greater anaemia, jaundice,  Scientific RepoRts | 7: 920 | DOI:10.1038/s41598-017-00967-2 splenomegaly and early anaemic symptoms, and a higher proportion of patients who require blood transfusion and splenectomy 28,33 . Therefore, the non-deletion types of α-thalassemia should be included in thalassemia prenatal diagnosis.
To explore the sources of heterogeneity, meta-regression was performed, and the diagnostic method (p < 0.001) and survey date (p = 0.02) were identified as potential sources. To mitigate heterogeneity, subgroup analysis of diagnostic methods for β-thalassemia was performed. However, heterogeneity was still high within subgroup based on diagnostic methods (Table 2). Nonetheless, it has been reported that heterogeneity cannot be avoided in a meta-analysis 34 , especially in those which based on epidemiological surveys 35 .
Our findings showed that the results were unstable when removing some individual studies one at a time. When reviewing these studies in detail, we found that the total sample size may play a role in changing the consistency of the results. A large or small sample size with a relatively higher or lower prevalence more easily caused alterations of the results. Given the limited data, we could not identify other factors to verify the robustness of the presented results.
Publication bias existed in our study, even though we comprehensively and systematically searched related studies. However, only studies published in Chinese and/or English were used, which may be a potential factor for publication bias. Insufficient data in the included studies may have also affected the results.
Several other limitations of this study should also be considered. First, because epidemiological studies on thalassemia have only been conducted in 14 provinces in China, we did not obtain good epidemiological or demographic data from all provinces. Second, the epidemiological data available mainly focuses on southern China, particularly in regions of minority nationalities, which could impact the results of our study. Third, the strategies and methodology may also have influenced the estimation of the prevalence of thalassemia. Indeed, some carriers of silent α-thalassemia mutations may not be identified by haematological indices without gene analysis. Because of these limitations, caution should be exercised in interpreting the results and in prescribing direct policy recommendations based on this meta-analysis alone. Nevertheless, our meta-analysis covered and combined most of the available epidemiologic data to generate a reasonably precise estimate of the prevalence of thalassemia.
We conducted the first meta-analysis of the prevalence of thalassemia in mainland China from 1981 to 2015, revealing the epidemiological characteristics of thalassemia. These results show that the overall prevalence of the disease is still high. Individuals in southern China have a higher risk of getting a severe form of thalassemia than those in other regions of China. In the future, epidemic research in northern China and comprehensive measures for epidemic prevention and control in southern China are needed to combat the heavy burden of thalassemia in China.

Study identification.
We performed this meta-analysis on the basis of a systematic and comprehensive search of research on the prevalence of thalassemia in mainland China. Six electronic databases, including the Chinese National Knowledge Infrastructure database (CNKI), the WanFang database, the Chongqing VIP database, the Chinese Biological Medical Literature database (CBM), PubMed, and EMbase, were used for the identification of related studies from their establishment to January 1, 2016. The following key words were used when searching the Chinese databases: 'thalassemia'; 'prevalence'; and 'epidemiology' . In addition, the key words 'China' and 'meta-analysis' were used in the English databases. We also retrieved the reference lists so that we did not overlook a related study. Selection criteria. We used the studies for this meta-analysis that met the following criteria: (i) Studies were cross-sectional and conducted in mainland China (not including Hong Kong, Macao, and Taiwan); (ii) Studies stated the prevalence of thalassemia or the related available data (the number of the participants and the number of the thalassemia patients) to calculate the prevalence of thalassemia; (iii) Studies were published in Chinese and/or English; and (iv) Studies were based on epidemiological surveys in general populations.
We excluded the studies that met any of the following criteria: (i) Studies did not provide the relevant data for the prevalence of thalassemia; (ii) Studies were based on the special populations (e.g., the elderly, women, or workers) or special areas (e.g., schools, factories, or earthquake areas); and (iii) Duplicate studies or studies that were contained within another study.
The selection of the studies was performed by two authors independently. When the authors disagreed and could not reach an agreement after discussion, a third author was involved to reach a consensus.
Data extraction and quality assessment of the included studies. After reaching a consensus on the included studies, the data were extracted and entered into an Excel spreadsheet, including the author, publication year, survey date, age range, location, sampling method, diagnostic method, total sample size, the number of individuals in each gender (males and females), the number of patients (including α-, βand α + β-thalassemia), and the number and gene frequency of the subtypes (--SEA , -α 3.7 , -α 4.2 , α CS α, α WS α and α QS α of α-thalassemia; CD41/42, IVS-2-654, CD71/72, CD26, -28 and CD17 of β-thalassemia). The quality of the included studies was assessed by the 5 items and listed in the study by Li et al. 36 according to the "Strengthening the Reporting of Scientific RepoRts | 7: 920 | DOI:10.1038/s41598-017-00967-2 Observational Studies in Epidemiology" (STROBE) guidelines 37 . Each item was divided into 3 different levels with different scores (high risk or unclear = 0, moderate risk = 1, and low risk = 2).
Two authors finished the work independently and discussed the issues when disagreements occurred. If these authors could not reach a consensus, another author assisted in making the final decision.
Statistical analysis. The present meta-analysis was conducted using Stata version 12.0 (Stata Corporation, College Station, TX, USA). The DeSimonian and Laird method was used to estimate prevalence, 95% confidence intervals (95%CI), and the proportion of αor β-thalassemia subtypes. Prevalence is expressed as a percentage; if the number of the thalassemia patients was 0, we assigned a value of '0.01' to retain all useful data when conducting calculations. Additionally, for the study conducted in multiple regions of mainland China, the data for each region were extracted independently for later analysis. For example, Zeng et al. conducted a multi-region study in mainland China, and we extracted the available data in corresponding single regions. ESRI ArcGIS 10.0 version for desktop (http://www.esri.com/software/arcgis/arcgis-for-desktop) was used to assess differences in geographic distribution. Heterogeneity was analysed using Cochran's x 2 -based Q test and I 2 statistics (which ranged from 0 to 100%). Heterogeneity was considered to be moderate or high at p < 0.1 or I 2 ≧ 50%, and a random-effects model (the DeSimonian and Laird method) was selected for the meta-analysis. Otherwise, a Mantel-Haenszel fixed-effects model was used. Meta-regression was conducted to analyse the sources of heterogeneity. Sensitivity analysis was performed to assess the effects of single study on the consistency of the results after excluding the included studies sequentially. To evaluate publication bias, funnel plots and an Egger's test were used.