Regional heritability mapping identifies several novel loci (STAT4, ULK4, and KCNH5) for primary biliary cholangitis in the Japanese population

While the advent of GWAS more than a decade ago has ushered in remarkable advances in our understanding of complex traits, the limitations of single-SNP analysis have also led to the development of several other approaches. Simulation studies have shown that the regional heritability mapping (RHM) method, which makes use of multiple adjacent SNPs jointly to estimate the genetic effect of a given region of the genome, generally has higher detection power than single-SNP GWAS. However, thus far its use has been mostly limited to agricultural settings, and its potential for the discovery of new genes in human diseases is yet to be fully exploited. In this study, by applying the RHM method to primary biliary cholangitis (PBC) in the Japanese population, we identified three novel loci (STAT4, ULK4, and KCNH5) at the genome-wide significance level, two of which (ULK4 and KCNH5) have not been found associated with PBC in any population previously. Notably, these genes could not be detected by using conventional single-SNP GWAS, highlighting the potential of the RHM method for the detection of new susceptibility loci in human diseases. These findings thereby provide strong empirical evidence that RHM is an effective and practical complementary approach to GWAS in this context. Also, liver tissue mRNA microarray analysis revealed higher gene expression levels in ULK4 in PBC patients (P < 0.01). Lastly, we estimated the common SNP heritability of PBC in the Japanese population (0.210 ± 0.026).


Introduction
Since the publication of the first GWAS study at the beginning of the twenty-first century, thousands of GWAS analyses have been performed and published; however, with the improvement of genotyping technologies, increased sophistication in study design, and the development of * Olivier Gervais olivier.gervais@genome.med.kyoto-u.ac.jp * Masao Nagasaki nagasaki@genome.med.kyoto-u.ac.jp large-scale DNA biobanks and cohorts containing a wide range of clinical data/phenotypes on hundreds of thousands of individuals, the genome-wide association studies conducted nowadays are often much more elaborate than they were only a decade ago. In spite of the many successes of GWAS, this development has created a need for innovative analytical methods and statistical models to better make sense of this newly available data, for instance by accounting for population structure and relatedness, reducing error rates in unbalanced case/control traits, improving the detection power of rare variants, or analyzing complex immunemediated diseases [1]. These aspects are especially important given that the heterogeneous nature of the genetic architecture of complex traits suggests that increasing the sample size and/or the number of phenotypes analyzed does not always produce the anticipated gains in terms of novel loci discovery [2].
In this context, linear mixed models have received considerable attention; their flexibility enables among others estimating trait heritability while adjusting for environmental factors, or accounting for population stratification and cryptic relatedness in mixed-linear-model association studies. This trend is best illustrated by the development of a number of software applications catering for linear mixed model analysis, such as GCTA [3], BOLT-LMM [4], and DISSECT [5], as well as practical statistical methods, including GRAMMAR [6], EMMAX [7], and GEMMA [8].
Among these, it has been demonstrated that the regional heritability mapping (RHM) method [9,10], which consists in estimating the genetic effect of "windows" (or regions) composed of multiple adjacent SNPs (in contrast to GWAS analysis that focuses on individual SNPs), possesses in a number of cases higher statistical power for the detection of causal loci compared with conventional single-SNP GWAS [10,11], albeit at the expense of computational power. Despite this, research on the RHM method has been thus far mostly limited to simulation studies [12], software development [5], and application to agricultural settings [13]. This can be explained in large part by the fact that (1) historically, most major advances regarding the application of general mixed model methods to genetics have taken place in the field of animal breeding for the purpose of estimating random genetic effects [14], and their application to human genetics is therefore relatively new, and (2) the substantial discoveries that have resulted from the application of GWAS to newly established large-scale genomic cohorts in recent years have overshadowed the benefits and slowed down the spread of other methods, including RHM, which are generally more complex and computationally demanding than single-SNP mapping methods.
In practice, this means that in spite of the firm theoretical foundation and the simulation studies supporting the ability of the RHM method to identify QTLs that cannot be detected by single-SNP GWAS, its potential for the discovery of new susceptibility loci in human diseases is yet to be fully exploited. In this study, we applied the RHM method to primary biliary cholangitis (PBC) in 5643 Japanese individuals, and identified three new loci at the genome-wide significance level (STAT4, ULK4, and KCNH5), two of which (ULK4 and KCNH5) have not been found associated with PBC in any population previously. These findings highlight the importance of polygenic model approaches like the RHM method for the discovery of new susceptibility genes alongside traditional GWAS analyses, given that these genes could not be detected with a conventional single-SNP GWAS approach. We tested these three new loci in an independent data set of 491 individuals for replication, and used liver tissue mRNA microarray data to analyze gene expression levels. We also carried out a univariate GCTA-GREML analysis [15] to estimate the common SNP heritability of PBC in the Japanese population.

Population and genotyping
This study is based on a high-quality clinical data set forming a representative sample of the Japanese population. The samples used in the discovery and replication data sets were collected nationwide from individuals in the Japan PBC-GWAS Consortium; the discovery and replication data sets represent independent samples. General information about the individuals included in this study is presented in Table S1. For the discovery analysis, sample genotyping was performed by using Japonica v1 (660k SNPs) (Toshiba, Japan) [16,17] and Axiom Genome-Wide ASI 1 (600k SNPs) (Affymetrix, USA) genotyping arrays, as described in previous papers [18,19]. All samples used for replication were genotyped with Japonica v1 arrays. Imputation was performed with IMPUTE4 [20] in each SNP array to impute SNPs with no genotype data, by using a phased reference panel of 2049 Japanese individuals from a prospective general population cohort study performed by the Tohoku Medical Megabank Organization, Japan [21]. In the discovery cohort, after imputation we extracted the SNPs with an info score over 0.5 in each array, and the two data sets were merged by using the SNPs common to both genotyping platforms (13.8M SNPs). All chromosome and basepair positions in this paper are given with regard to the GRCh37 (hg19) assembly.
In both the discovery and replication data sets, quality control procedures were performed per individual and per SNP, by using the following criteria in PLINK v1.90 [22]: individuals call rate ≥ 98%, SNPs call rate ≥ 98%, minor allele frequency ≥ 1%, and consistency with Hardy-Weinberg equilibrium (P ≥ 10 −6 ). Linkage disequilibrium pruning was also performed by using the --indep-pairwise 50 5 0.8 command. Identity by descent was calculated for each sample pair to remove related samples (pi-hat > 0.1). After these steps, there were 5643 individuals (1953 PBC cases, 3690 healthy controls) and 1,022,240 autosomal SNPs remaining for analysis in the discovery data set. SNP cluster plots were used locally as a visual tool to confirm the genotyping quality of specific SNPs and remove those displaying genotyping issues. A small number of regions that displayed convergence problems or extreme values were also removed from the analysis. In the replication data set, 491 individuals (220 PBC cases, 271 healthy controls) remained for analysis.

Statistical analysis
Linear mixed model methods were used for analysis. The following model was used to estimate the genome-wide SNP heritability: with Var(u) = Gσ 2 u and Var(e) = Iσ 2 e , where y is a vector that represents the PBC phenotypes, β is a vector of fixed effects that includes the overall mean, sex, genotyping array, as well as the first ten eigenvectors from principal component analysis (PCA), u is a vector of random effects representing the whole (genome-wide) genomic additive effect, and e is a vector of residual effects. X is the design matrix for the fixed effects, G is the whole genomic relationship matrix (GRM) computed by using all SNPs, and I is a unit matrix; σ 2 u and σ 2 e represent the genetic and residual variances, respectively.
The RHM method, which consists in scanning the genome by using windows (regions) of a given number of adjacent SNPs [9], was used to estimate the genetic effect of each region of the genome, as well as their significance compared with the null model (i.e., the model above used to calculate the genome-wide SNP heritability, with no regional component). The model used was as follows: with Var(u) = Gσ 2 u , Var(w) = Qσ 2 w , and Var(e) = Iσ 2 e , where w is a vector of random effects representing the regional genomic additive effect, Q is the regional GRM, and σ 2 w is the regional genomic variance. A window size of 50 SNPs was used in this study. The windows were shifted by 25 SNPs to create a 50% overlap between adjacent windows.
The GRMs were computed by using all of the SNPs to calculate the whole GRM, and by using the SNPs corresponding to each region of the genome to construct the regional GRMs. The genetic relationship between two individuals j and k was calculated as follows [3]: where x ij and x ik are the genotypes of the jth and kth individuals, respectively, at the ith SNP; p i is the frequency of the reference allele at the ith SNP; and N is the total number of SNPs. Additionally, a single-SNP GWAS analysis was performed to provide a point of comparison to the results obtained with the RHM method, using a logistic regression model with covariate adjustment for sex, genotyping array, and the first ten principal components. The replication data set was adjusted for sex and the first ten principal components.

Computer software
The Genetic Complex Trait Analysis (GCTA v1.91.4) software application [3] was used to perform most of the computations and statistical analyses in this study: the computation of the GRMs, the PCA, the univariate Genomic Restricted Maximum Likelihood (GREML) analysis, as well as the RHM analysis. The Average Information Restricted Maximum Likelihood procedure [23] was used for estimation of the variance components. PLINK v1.90 [22] was used to perform quality control as well as the single-SNP GWAS analysis, and R version 3.5.0 [24] was used in combination with the qqman [25] and ggplot2 [26] packages for visualizing the RHM and GWAS results. The limma package v3.42.2 [27] (http://www.bioconductor.org/) was used in R to analyze the mRNA microarray data.

Significance thresholds
The likelihood ratio test (LRT) statistic, LRT = −2ln(L 0 /L 1 ), was used to test for the presence of regional variance, where L 0 represents the likelihood for the null model (i.e., H 0 , with the whole genomic effect but with no regional genomic component), and L 1 represents the likelihood for the alternative model (i.e., H 1 , with both the whole and the regional genomic effects). It was assumed that the distribution of the LRT for regional variance follows a 50-50 mixture of chi-square distributions with degrees of freedom equal to 0 and 1 [28].
Bonferroni correction was performed to adjust for multiple testing; to estimate the Bonferroni-corrected significance thresholds, we used half the number of regions tested to account for overlapping windows [9]. The genome-wide critical P values and corresponding LRT thresholds are provided in Table 1.

PBC prevalence in Japan
To adjust the heritability estimates for ascertainment the heritability estimates were transformed from the observed scale to the liability scale by using disease prevalence estimates for the general population [1]. Because considerable variation has been reported in the prevalence of PBC between countries, we used prevalence estimates based on nationwide epidemiological studies of PBC in Japan. Concretely, given an estimated range of 380-460 cases per million in the literature [29], we assumed a prevalence of 420 per million. All of the heritability estimates reported in this study are given on the liability scale.

mRNA data analysis
Liver biopsy was performed to obtain liver tissue samples for the mRNA microarray analysis. All patients for whom liver biopsy specimens were available were included in this analysis; specifically, patients with PBC (36 individuals), chronic hepatitis C (CHC) (15 individuals), and metastatic liver cancer (5 individuals; normal liver tissue). Details about RNA extraction and preservation have been described previously [21]. Quantitative DNA microarray data were obtained using Agilent Feature Extraction software (Agilent Technologies), and data normalization (excluding lincRNA) was performed with the quantile method. Statistical analysis was performed with the limma package [27] in R, the P value indicated corresponding to PBC versus CHC/normal samples.

Genome-wide SNP heritability
In this study, we calculated the GREML-based common SNP heritability estimate of PBC (±standard error), which is generally thought of as the lower-bound estimate of narrowsense heritability. It was estimated to be 0.210 (±0.026), adjusting for sex, genotyping array, and the first ten principal components.

Regional heritability mapping analysis
The results of the RHM analysis are presented in Fig. 1 (top), which shows the result of the LRT comparing the likelihood of the model including regional genomic effects against the null model (i.e., without regional genomic  effects) for each 50-SNP region of the genome. The corresponding QQ-plot is shown in Fig. 2.
The regions that were found to be significant at the genome-wide level (LRT > 20.88) in this analysis are located on the following genes: STAT4, ULK4, NFKB1/ MANBA, HLA, TNFSF15/TNFSF8, KCNH5, and IKZF3. Information about each significant non-HLA region, as well as detail about the corresponding LRT and regional heritabilities, is provided in Table 2. Along with the HLA complex, the NFKB1/MANBA, TNFSF15/TNFSF8, and IKZF3 genes have already been examined in a number of publications on PBC in the Japanese population [18,[30][31][32] and will therefore not be discussed further here.
To provide a point of comparison to the results obtained with the RHM method, we performed a single-SNP GWAS analysis of the same data ( Fig. 1 (bottom); the corresponding QQ-plot is shown in Fig. S1). Although most genes identified by RHM were also detected by GWAS, the overlap was incomplete as three genes (STAT4, ULK4, and KCNH5) did not reach genome-wide significance (P < 5 × 10 −8 ) in the GWAS analysis, and conversely two genes (POGLUT1 and IL7R) were identified by GWAS but not RHM.
The RHM analysis provides evidence, for the first time, of a statistically significant association between STAT4 and PBC in the Japanese population at the genome-wide level. Beside the discovery of this association, this finding is consequential given that it demonstrates the potential of the RHM method for the detection of new disease-related genes in human populations, as this association would have gone undetected had we only performed a conventional GWAS analysis.
RHM also identified two entirely new susceptibility loci, located on the ULK4 and KCNH5 genes. We performed   [34] The suggestive regions are given in the lower part of the table. The chromosome and base-pair positions are given with regard to the GRCh37 (hg19) assembly. h 2 heritability, LRT likelihood ratio test for regional heritability > 0, SE standard error.
additional analyses for validation and to confirm their involvement in disease pathogenesis by uncovering differences in gene expression through mRNA microarray data analysis (see below). Additionally, the RHM analysis detected two other loci, IQCJ-SCHIP1/IL12A and PPARGC1A, at the suggestive level of significance (19.55 < LRT < 20.88). The IL12A gene is known to share a common pathogenic pathway with STAT4 and play an essential role in the development of PBC [33]. Although less is known about PPARGC1A, at least two Japanese studies have provided evidence of its function in PBC progression through the regulation of bile acid synthesis [34,35].

Replication and differences in gene expression
We used an independent replication data set of 491 Japanese individuals (220 cases, 271 controls) to validate the STAT4, ULK4, and KCNH5 genes. The results, presented in Table 3, confirmed the association of STAT4 and KCNH5 with PBC; while the replication of STAT4 was expected given that its role is known in PBC, the strongest replication was at KCNH5 (P = 5.83 × 10 −25 ), a new candidate gene that only RHM detected. This KCNH5 locus corresponds to rs28608483 (chr14:g.63375059A > G), which had a P value of P = 7.33 × 10 −8 (OR = 1.57) in the discovery data set. (The minor allele frequencies in controls and cases were 0.073 and 0.095, respectively). This analysis, however, failed to replicate the findings for ULK4. Nevertheless, subsequent mRNA analysis performed to examine differences in gene expression in liver tissues provided promising results. While no gene expression was detected for KCNH5, for ULK4 gene expression levels were significantly higher in PBC patients (P < 0.01) (Table S2). This finding is supported by the fact that rs35391137 (chr3: g.41617623A > G), the SNP with the lowest P value that we have identified in the ULK4 region in the discovery data set (P = 3.37 × 10 −6 ; OR = 1.66), corresponds to the binding site for the Hand1:E47 and Smad3 transcription factors [36].
However, in-depth analyses of the biological pathways of ULK4 and KCNH5 are necessary to uncover the mechanisms underlying disease etiology and progression, as well as clarify the potential role of the eQTL effects that have been found for ULK4, through cell-specific eQTL analyses of immune cells, in B, CD4+ T cells, CD8+ T cells, monocytes, and natural killer cells in the Japanese population (Table S3) [37].

Discussion
The genetic component of PBC While autoimmune diseases are considered complex and multifactorial, most have been shown to display high heritabilities [38]. PBC is no exception according to the research on familial occurrence and monozygotic twins, which has confirmed that family history is a strong risk factor for developing the disease [39]. Nevertheless, few estimates of the SNP heritability of PBC are available, as most of the publications providing measures of the heritability of PBC are based on family studies, and in such cases it is generally reported in the form of sibling relative risk estimates or differences in concordance rates between monozygotic twins and dizygotic twins [40].
Our estimate of the common SNP heritability of PBC (0.210 ± 0.026) is somewhat low given that previous studies suggest a strong genetic component to PBC, e.g., it is known to have much higher concordance rates in monozygotic twins (63%, one of the highest among autoimmune diseases) than dizygotic twins (≈general population), and a sibling relative risk of 10.5 [39]. However, these results are not contradictory since estimates from twin studies differ in their assessment of environmental components and include effects that common SNP heritability estimates are not meant to capture [41]. For this reason, a direct comparison of estimates is difficult, and more studies are needed in all human populations to further our understanding of the SNP heritability of PBC.

Identifying STAT4 in the Japanese population
Although STAT4 is a well-known PBC risk locus in populations of European ancestry and its role in the pathogenesis of PBC as well as other autoimmune diseases has already been thoroughly examined [42], previous single-SNP GWAS analyses in the Japanese population had failed to replicate this result and hinted at possible genetic differences between populations [18,31]. In our study, single-SNP GWAS also failed to identify STAT4 at the genome-wide significance level, while RHM detected an association signal; it is therefore possible that the failure of Japanese GWAS to identify STAT4 is simply due to the smaller sample sizes of these analyses, or maybe to untagged causal variants. Although rs7574865 (chr2:g.191964633T > G), the top STAT4 SNP (P = 2.75 × 10 −5 ) in a recent study of 2886 Japanese individuals (1381 PBC cases, 1505 healthy controls) [18], reached genome-wide significance in a GWAS performed in 11,375 individuals of European ancestry (2861 cases, 8514 controls) [43], earlier GWAS studies that reported associations between STAT4 and PBC in populations of European ancestry and that were of a more limited scale, such as those performed by Hirschfield et al. [44] and Liu et al. [45], only provided suggestive evidence of a statistical association between STAT4 and PBC. While sample size is not the only factor at play-differences exist for instance in terms of minor allele frequency between European and Asian populations with respect to rs7574865, one can assume that future GWAS analyses of PBC with larger sample sizes will detect an association signal at the genome-wide significance level for STAT4 in the Japanese population. RHM is therefore in all likelihood an effective approach when sample size matters for the detection of a given locus.

Single-SNP GWAS and RHM: a different set of QTLs?
The fact that STAT4 was detected by the RHM method but did not reach genome-wide significance in the single-SNP GWAS analysis also corroborates the results from simulations studies demonstrating that RHM often has higher power than single-SNP mapping methods [10,11], and is therefore a valuable tool for the discovery of new susceptibility loci in human diseases. On the other hand, our GWAS analysis identified two loci that are already known PBC QTLs in the Japanese population, namely IL7R (rs7717955 (chr5:g.35862841C > T); P = 3.91 × 10 −8 ) and CD80/POGLUT1 (rs13092998 (chr3:g.119245044G > T); P = 2.57 × 10 −10 ) [19,46], but that were not detected by using the RHM method. Even though computer simulations have previously indicated that GWAS and RHM tend to uncover slightly different sets of QTLs depending on the characteristics of the locus considered (such as the minor allele frequency or the number of QTLs of the region) [11,12], our findings represent strong empirical evidence that this is indeed the case for human diseases.
Regarding the characteristics of the loci detected by either one of the two methods, our findings seem to be in line with computer simulations showing that RHM has more power than GWAS when it comes to detecting regions with multiple QTLs, but suggest that single-SNP GWAS may be more efficient in some instances when the susceptibility locus contains only a single causal variant [11]. The case of STAT4, for example, supports this idea, given that it reached genome-wide significance in our RHM analysis but not with GWAS, and results from previous GWAS analyses have revealed that STAT4 comprises several independent association signals with PBC [47]. This may explain why single-SNP mapping methods appear to be slightly underpowered for the detection of this type of locus. Conversely, although many SNPs in the IL7R gene, which was detected with GWAS but not RHM, have been found to be associated with PBC and other autoimmune diseases, it appears that they are all linked to a single locus, rs6897932 (chr5:g.35874575C > T), known to affect the inclusion of exon 6 through splicing regulation. According to the 1000 Genomes Phase 3 database, the top IL7R SNP in our analysis, rs7717955, is in perfect linkage disequilibrium with rs6897932 in the Japanese population (D′ = 1; r 2 = 1) [47]. Our findings therefore support the hypothesis that single-SNP GWAS is likely to be more effective than RHM for the detection of loci for which a single QTL is responsible for the regional genetic effect.
Although more discovery studies using the RHM method are needed to confirm the results of our analysis, it seems that while RHM is an effective and practical method for the detection of susceptibility loci that are not easily identified by GWAS, it should be considered as a complementary approach to single-SNP GWAS analysis rather than a potential replacement.

Replication of RHM results
From a methodological point of view, the process of discovery and replication in GWAS is fairly straightforward: (1) identify a statistically significant SNP in the discovery data set, and then (2) use the same SNP for replication in an independent cohort. The only major caveat is that the SNP to be replicated-or a near-perfect proxy-must exist in the replication cohort. With RHM, several questions arise. First, applying the same standard as single-SNP GWAS-i.e., using the same SNPs in the discovery and replication cohorts-is not realistic in most cases since in RHM a given window contains tens or hundreds of SNPs; replication would therefore entail that all of the SNPs contained in a given window in the discovery cohort must also exist in the replication cohort. A more practical alternative would be to choose windows that cover exactly the same genomic region, i.e. at least the first and last SNPs are the same, with the additional assumption that the SNPs in between capture the regional genomic variance to the same degree in both cohorts. Proceeding this way, however, raises several other issues, such as the difference in marker density between windows in the target cohorts. Another key issue to contend with is sample size, given that replication cohorts are usually much smaller than discovery cohorts. For unrelated individuals and common SNPs, univariate mixed model analyses using REML methodology typically require sample sizes of over 3000 to bring the standard error of the SNP heritability below 0.1 [48], and in many cases, including this study, such thresholds are prohibitive. To validate our results, we therefore chose to use a different approach; among the SNPs contained in the windows identified with RHM in the discovery data set, we selected those with the lowest P value in the GWAS analysis-e.g., for KCNH5 this corresponds to rs28608483 (P = 7.33 × 10 −8 in the discovery data set)-and used them for replication. While we acknowledge that applying different approaches in the discovery and replication data sets is an imperfect strategy, we deemed it the most appropriate in this case.

Pursuing complementary alternatives to single-SNP mapping methods
On a different note, the findings of our study, by validating simulation results showing that the RHM method is able to identify QTLs that cannot be easily detected by single-SNP GWAS, and by demonstrating that RHM is a powerful tool for the discovery of new susceptibility loci, imply that analytical approaches that focus on the joint effect of multiple SNPs are effective in capturing genetic variation for highly polygenic traits and deserve a more prominent place alongside traditional GWAS methods for the detection of new loci. Our results also suggest that their systematic implementation is desirable whenever possible, especially when statistical power is an issue, for instance when the cost of gathering large amounts of data becomes a major hurdle and/or sample size is limited to begin with, such as in the case of rare diseases or for studies targeting people from a given genetic group for which large data sets are unavailable. More broadly, these findings can be extended to the application of a number of methods other than single-SNP GWAS; even though in this study we chose to focus on RHM for the analysis of common SNPs given the dearth of studies on the application of RHM to human traits for the discovery of new susceptibility loci, other approaches such as SKAT-O for the detection of rare variants are also necessary.
In the specific case of PBC in Japan, the value of pursuing complementary analytic methods in parallel with conventional single-SNP GWAS analyses with larger sample sizes is gaining recognition [49], one of the underlying reasons being that GWAS analyses performed in the Japanese population so far have only managed to identify a small number of susceptibility loci in comparison with European populations. We believe that methods that make use of multiple adjacent SNPs jointly can be particularly useful in this context given the increasing diversity of genetic cohorts worldwide.

Conclusion
In summary, this study provides strong empirical evidence that RHM is an effective tool for the identification of new susceptibility loci in human diseases with the ability to identify QTLs that cannot be detected with conventional single-SNP mapping methods. We identified associations at the genome-wide significance level between three new loci (STAT4, ULK4, and KCNH5) and PBC in the Japanese population, two of which (ULK4 and KCNH5) have not been identified in any population previously; this result was obtained by applying RHM and could not be achieved with single-SNP GWAS only. At the same time, this approach is not a replacement for GWAS, as GWAS appears to have its own benefits regarding QTL discovery. Additional research is nonetheless required to deepen our understanding of the underlying mechanisms between these new genes and the development of PBC.

Data availability
The analyses presented in this study were in part based on data accessed through the Tohoku Medical Megabank Organization (https://www.megabank.tohoku.ac.jp/english/). The summary statistics of the RHM and GWAS reported in this paper are available at the National Bioscience Database Center Human Database (NBDC Human Database; https:// humandbs.biosciencedbc.jp/en/) public repository (Research ID: hum0261.v1).
Welfare of Japan with a grant from the Research Program of Intractable Disease (to M. Nakamura).

Compliance with ethical standards
Conflict of interest The authors declare no competing interests.
Ethical approval This project has been conducted in accordance with the Japanese National Ethical Guidelines for Human Genome/Gene Analysis [50], and reviewed and approved by the ethics committees of the University of Tokyo and of all other participating institutions.
Informed consent All of the participants gave their written consent prior to study enrollment.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.