Association of BET1L and TNRC6B with uterine leiomyoma risk and its relevant clinical features in Han Chinese population

Previous studies have shown that uterine leiomyomas (UL) are benign tumours with contributions from environmental and genetic factors. We aimed to replicate two initial significant genetic factors, TNRC6B and BET1L, in a Han Chinese population. A total of 2,055 study subjects were recruited, and 55 SNPs mapped to TNRC6B and BET1L were selected and genotyped in samples from these subjects. Genetic associations were analysed at both the single marker and haplotype levels. Associations between targeted SNPs and relevant clinical features of UL were analysed in case only samples. Functional consequences of significant SNPs were analysed by bioinformatics tools. Two SNPs, rs2280543 from BET1L (χ2 = 18.3, OR = 0.64, P = 1.87 × 10−5) and rs12484776 from TNRC6B (χ2 = 19.7, OR = 1.40, P = 8.91 × 10−6), were identified as significantly associated with the disease status of UL. Rs2280543 was significantly associated with the number of fibroid nodes (P = 0.0007), while rs12484776 was significantly associated with node size (χ2 = 54.88, P = 3.44 × 10−11). Both SNPs were a significant eQTL for their genes. In this study, we have shown that both BET1L and TNRC6B contributed to the risk of UL in Chinese women. Significant SNPs from BET1L and TNRC6B were also identified as significantly associated with the number of fibroid nodes and the size of the node, respectively.

recruited, and 55 SNPs mapped to TNRC6B and BET1L were selected and genotyped in samples from these subjects. In addition to genetic associations between these SNPs and the disease status of UL, we also examined potential associations between targeted SNPs and clinical characteristics of UL. Bioinformatics tools were also utilized to evaluate the potential biological functions of the targeted SNPs.

Methods
Study subjects. In the present study, a total of 674 women with UL and 1,381 healthy women, controls without any systematic disease, were recruited from the Second Affiliated Hospital of Xi'an Jiaotong University between April 2013 and May 2017. All patients were diagnosed with UL by ultrasonography and confirmed by at least two senior physicians, and all subjects were screened for no other female reproductive system tumours, systemic disease or history of malignancy. Self-administered questionnaires were used to collect demographic data, and the characteristics of our study subjects are shown in Table 1. All participants were unrelated Han Chinese individuals, and the UL and control groups were matched by age and body mass index (BMI). Significant differences were identified for duration of menses (P = 0.005) and menstrual cycle (P = 0.003) between UL cases and healthy controls. The size of UL was categorized into three groups (small, medium, and large) based on the diameter of the UL (small ≤2 cm, 2 cm <medium <4 cm, large ≥4 cm). If subjects were diagnosed with multiple UL, the largest one determined the size group. The study protocol was approved by the Ethics Committee of Xi'an Jiaotong University in accordance with the ethical guidelines of the Declaration of Helsinki of 1975 (revised in 2008). Written informed consent was obtained from participants. SNP selection and Genotyping. We searched for all SNPs with a minor allele frequency (MAF) ≥0.05 within the regions of the TNRC6B and BET1L genes in the 1000 Genomes Chinese Han Beijing population (CHB). Then, MAF ≥0.05 with pair-wise tagging and r 2 ≥0.8 were used as the cut-off criteria during tag SNP selection, which generated 27 and 28 tag SNPs within the TNRC6B and BET1L genes, respectively. General information about these 55 selected SNPs is summarized in Supplemental Table S1. Most of the selected SNPs were non-coding SNPs. Genomic DNA was extracted from peripheral blood leukocytes according to the manufacturer's protocol (Genomic DNA kit, Axygen Scientific Inc., California, USA). Genotyping was performed for all SNPs using the Sequenom Mass ARRAY RS1000 system (Sequenom, San Diego, California, USA). The results were processed using Typer Analyser software, and genotype data were generated from the samples 17 . Case and control status was blinded during all genotyping processes for quality control. Five percent of the samples were repeated at random, and the results were 100% concordant.

Statistical and Bioinformatics Methods.
Hardy-Weinberg equilibrium was tested for each SNP within the control samples. χ 2 tests were performed for each SNP to evaluate the differences in allelic and genotypic distributions between UL cases and controls. Linkage disequilibrium (LD) blocks were constructed for both genes, and haplotype-based analyses were conducted for each block. Plink was utilized for the analyses mentioned above 18 . In addition to genetic association analyses focusing on disease status, we also analysed the potential link between significant SNPs and four clinical features of UL, including bleeding, pain, number of fibroid nodes, and size of the node, in a subset of our samples that included UL cases only. χ 2 tests were performed for these analyses. In general, Bonferroni correction was applied to address multiple comparisons. For single marker-based association analyses, the threshold P value was 0.05/55 ≈ 9 × 10 −4 . Genomic control was applied to correct for the potential effects of population stratification 19,20 . The null distribution of genomic inflation factor λ was constructed by 10,000 bootstrapping.  The potential biological functions of our selected SNPs were evaluated through RegulomeDB (http://www. regulomedb.org/) 21 . RegulomeDB is a database that annotates SNPs based on known and predicted regulatory element data from the ENCODE project. A score ranging from 1-6 was assigned to each SNP, and a lower score indicated a more significant biological function. In addition, we also extracted eQTL data from the GTEx database (https://www.gtexportal.org/home/) 22 to examine differences in gene expression associated with our significant SNPs.
Data extracted from RegulomeDB showed that both SNPs, rs2280543 and rs12484776, had a RegulomeDB score of 5 (Supplemental Table S1). This score indicates that there was very limited evidence indicating the potential regulatory role of these two SNPs. Expression quantitative trait loci (eQTL) data from GTEx for both rs2280543 and rs12484776 were extracted and examined. Significant findings are summarized in Table 5. The threshold P values were 0.05/47 ≈ 0.001. SNP rs2280543 was found to be significantly associated with BET1L gene expression in 15 of 47 human tissues, while rs12484776 was identified to be significant only in oesophagus muscularis (effect size = −0.17, P = 4.60 × 10 −4 ). Neither SNP was significantly associated with gene expression in the uterus (Supplemental Table S3).

Discussion
With the widespread application of sequencing and genetic association analyses for studying the genetics of complex diseases, candidate gene-based association studies have successfully mapped susceptibility for many complex diseases [23][24][25][26][27][28][29] . Our data based on ~2000 study subjects from a Chinese Han population provide strong evidence for the genetic association between UL and two candidate genes, BET1L and TNRC6B. To the best of our knowledge, this study is the first genetic association study for BET1L and TNRC6B and UL based on Chinese populations. Our findings of single marker-based associations for both rs2280543 and rs12484776 replicate initial reports    13 . Given that it is not sufficient to draw conclusions from limited SNPs analyses 30-32 , we performed haplotype analyses, which indicated a similar pattern with single marker-based associations. However, Bondagji et al. performed a replication study based on Saudi women, and rs2280543 from BET1L was not reported to be significant 16 . This difference might be due to the different LD structures from different genetic backgrounds. Both Japanese and Chinese Han populations belong to the Asian population and are therefore more genetically similar than Saudi women from the Middle East. In addition, different sample sizes between the two studies might be a reason for this difference. We have compared our association analyses results of rs2280543 and rs12484776 with the other 3 previous reports (Supplemental Table S4). Among these studies, the directions of effects for both SNPs were basically the same. The only different one was rs2280543 from the study of Bondagji et al. This might be due to its small sample size compared to the other 3 studies.
In the UL case only sub-group, we identified significant associations between two targeted SNPs and relevant clinical features of UL. Our data showed that SNP rs2280543 from BET1L was significantly associated with the number of fibroid nodes, while the SNP rs12484776 from TNRC6B was significantly associated with node size. rs12484776 of TNRC6B has been reported to be related to node size (volume) in at least two previous studies based on European populations 14,15 . However, to the best of our knowledge, rs2280543 from BET1L has never been reported to be associated with the number of fibroid nodes. Our finding indicated that the TT and CT genotypes of rs2280543 were related to multiple fibroid nodes rather than a single fibroid node in the Han Chinese population. Studies with comparative sample sizes based on other populations are needed to verify these findings in the future.    In this study, we investigated the potential association between UL and two loci, BET1L and TNRC6B. BET1L is a protein coding gene located at 11p15.5. It encodes a protein, BET1L, that facilitates the Golgi vesicular membrane trafficking process 33 . TNRC6B, which is located at chromosome 22q13.1, is a tri-nucleotide repeat containing the 6B protein, which was identified to be co-purified with a cytoplasmic HeLa cell protein complex. In addition, the TNRC6B protein was also reported to be required to mediate microRNA-guided mRNA cleavage in HeLa cell culture 34 . Despite these primary studies, no more specific functions of TNRC6B have been reported. As a population-based study, it is beyond our scope to investigate the underlying biological mechanisms of these two loci and relate them to the pathogenesis of UL. Experimental studies based on animal models are needed in the future to unravel the roles of both loci in the onset and development of UL.
Both significant SNPs, rs2280543 and rs12484776, seemed to have very limited functional significance based on their RegulomeDB scores, which are derived from regulatory element annotations based on ENCODE data. However, eQTL analyses based on GTEx data showed that both SNPs are significantly associated with the expression of their genes. This eQTL effect was relatively weaker for rs12484776, for which a significant difference in expression was identified in only 1 of 47 human tissues. On the other hand, this effect was more universal and widespread for rs2280543 and its gene, BET1L. Expression of BET1L was significantly associated with rs2280543 in 15 of 47 human tissues, and the most significant hit in skeletal muscle has a significance level of 10 −18 . Interestingly, a similar eQTL pattern was also reported in the initial GWAS conducted by Cha et al. 13 . They also identified that rs2280543 is significantly associated with transcript levels of BET1L in three cell types: lymphoblastoid cell lines, peripheral blood mononucleated cells and cortical brains based on in silico analysis. The findings of the functional consequence for these candidate SNPs indicate that these SNPs might be more than surrogates but rather have real biological functions contributing to the susceptibility of UL. A potential limitation for our eQTL results is that these data were based on human tissues from normal samples rather than from UL patients. Therefore, we need to be careful in making any premature conclusions. One thing interesting to note is that the protective allele T of rs2280543 from BET1L was significantly related to the up-regulated expression of BET1L in multiple human tissues. This connection between disease risk of UL and gene expression of BET1L might indicate some underlying pathogenesis mechanisms of UL, and further studies are still needed in future to unravel this biological mechanism.
In the study, we have tried our best to restrict population stratification when recruiting subjects by restricting the study subjects with stable living area 35,36 , but the potential population stratification could not be completely ruled out. Moreover, as a candidate gene-based study, we mainly focused on several pre-selected and common tagged polymorphisms. This strategy minimizes the experimental expense at the cost of dropping >90% of the variants of a particular gene. Structural variations and low-frequency and rare variants were not detected in this study. Several recent studies have shown that these undetected DNA variants might play an important role in the susceptibility to complex disorders 37 . Sequencing technology-based studies are needed in the future to systematically evaluate the genetic risk of UL.
In conclusion, in this study, we showed that both BET1L and TNRC6B contribute to the risk of UL in Chinese women. Significant hits were identified by both single marker-based and haplotype-based analyses. Significant SNPs from BET1L and TNRC6B were also identified to be significantly associated with the number of fibroid nodes and the size of the nodes, respectively.