## Introduction

Heritability is the fraction of phenotypic variance attributed to genetics. More specifically, assuming that the variance $$\sigma _{\mathrm{P}}^2$$ of a phenotype equals the sum of its genetic $$\sigma _{\mathrm{G}}^2$$ and environmental variance $$\sigma _{\mathrm{e}}^2$$, heritability in its broad sense (H2) is expressed by the ratio $$\sigma _{\mathrm{G}}^2$$/$$\sigma _{\mathrm{P}}^2$$ (Fisher 1918; Wright 1921). The genetic variance $$\sigma _{\mathrm{G}}^2$$ can be further broken down into its additive ($$\sigma _{\mathrm{A}}^2$$), dominant ($$\sigma _{\mathrm{D}}^2$$), and epistatic ($$\sigma _{\mathrm{I}}^2$$) components (Visscher et al. 2008; Zaitlen and Kraft 2012). Most of the existing literature focuses on the fraction of phenotypic variance $${{\upsigma }}_{\mathrm{P}}^2$$ owing to additive effects alone ($$\sigma _{\mathrm{A}}^2$$/$$\sigma _{\mathrm{P}}^2$$)—the so-called narrow-sense heritability (h2).

Heritability is interesting in its own right, but it is also pivotal in quantitative genetic studies with many practical uses. Because, by definition, heritability measures the contribution of genetics to a phenotype, it allows us to gain insights into the genetic architecture of a trait. Moreover, knowledge about the heritability of a trait helps us evaluate the effectiveness of a genome-wide association study (GWAS), as the so-called SNP heritability $$h_g^2$$ of a trait informs us about the maximum discovery potential of a given genotyping platform. Similarly, heritability estimates provide an upper bound to the accuracy of polygenic predictions with predictors potentially having higher performance in more heritable traits.

There exist many ways to estimate narrow-sense heritability h2 and they usually boil down to estimating the variance owing to additive effects ($$\sigma _{\mathrm{A}}^2$$). Assuming an additive model, a classical approach is to use the phenotypic correlation between related individuals (Fisher 1918; Wright 1921; Yang et al. 2010), which for a pair j and k is

$${\mathrm{cor}}\left ({\mathrm{y}}_{\mathrm{j}},{\mathrm{y}}_{\mathrm{k}} \right) = \sigma _{\mathrm{A}}^2{\mathbf{K}}_{{\mathbf{causal}}_{\left[ {\mathrm{{{j}}},{\mathrm{k}}} \right]}}.$$

Kcausal is an idealized genetic relationship matrix (GRM) that reflects genetic relationships between individuals at an unknown set of causal variants.

Because the set of causal variants is unknown, Kcausal has been approximated by the expected relatedness in the pedigree matrix KPED, which equals twice the kinship matrix Φ

$${\mathbf{K}}_{{\mathbf{causal}}} \approx {\mathbf{K}}_{{\mathbf{PED}}} = \mathrm{E}\left( {2{\mathbf{\Phi }}} \right).$$

The entries in the kinship matrix Φ are known as kinship coefficients. Kinship coefficient φ is the probability that a random allele from subject j is identical by descent (IBD) to an allele at the same locus from subject k. As an example, for a pair of full siblings j and k, this expected probability equals ¼ and therefore KPED[j,k] = ½. For a design that focuses exclusively on sib pairs and assuming no dominance contribution, an estimate of the additive genetic variance $${\hat{\mathrm \upsigma }}_{\mathrm{A}}^2$$ is therefore two times the phenotypic correlation rP of sib pairs

$$\hat \sigma _{\mathrm{A}}^2 = 2r_{P[\mathrm{sibs}]}.$$

When extended pedigrees are available, the entire KPED can be leveraged in a linear mixed-model (LMM) framework. In this case, the phenotype vector y is modeled as

$${\mathbf{y}}\sim \mathrm{N}\left( {{\mathbf{\mu }},\sigma _{\mathrm{A}}^2{\mathbf{K}}_{{\mathbf{PED}}} + \sigma _{\mathrm{e}}^2{\mathbf{I}}} \right)$$

and the genetic variance $${\hat{\mathrm \upsigma }}_{\mathrm{A}}^2$$ is estimated with restricted maximum likelihood (Shaw 1987; Blangero and Almasy 1997; Lange 2002; Kang et al. 2010).

When genetic data are also available, the total IBD fraction of the genome KIBD[j,k], also referred to as $${\hat{\mathrm \uppi }}$$ ($${\hat{\mathrm \uppi }}$$ = 2φ), can be estimated for j and k and used in the above LMM instead of its expected value from KPED[j,k]. In other words, instead of being approximated with KPED, Kcausal is now approximated with KIBD (Visscher et al. 2006, 2007).

The advent of SNP chips resulted in the use of thousands of markers in the computation of the GRM, thus allowing to estimate heritability in samples without pedigree information (Yang et al. 2010; Lee et al. 2011). In this case, assuming an N × M genotype matrix $${\bar{\mathbf{G}}}$$ with zero-column mean and unit-column variance, Kcausal is approximated with an identity-by-state (IBS) genetic covariance matrix

$${\mathbf{K}}_{\mathbf{IBS}} = \frac{1}{{\mathrm{M}}}{\bar{\mathbf{G}}}{{\bar{\mathbf{G}}}}^{\mathbf{T}}.$$

Given that the set of M-typed SNPs typically does not include all causal variants and/or it includes tag SNPs that are in imperfect linkage disequilibrium (LD) with said variants, the use of KIBS can lead to an underestimation of the total heritability. For unrelated individuals, this estimate reflects only the proportion of phenotypic variance captured directly or indirectly by the typed SNPs—i.e., the so-called SNP heritability $$h_{g}^{2}$$ (note that SNP heritability $$h_{g}^{2}$$ is smaller than total heritability h2 and is based on unrelated individuals). The GCTA software (Yang et al. 2011) uses KIBS on unrelated individuals in order to estimate $$h_{g}^{2}$$. It has been shown that $$h_{g}^{2}$$ can vary with minor allele frequency (MAF), LD, and genotype certainty (Speed et al. 2017). The LDAK software (Speed et al. 2012) can be used in order to accommodate these parameters in the computation of KIBS.

More recently, it has been shown that KIBD can be effectively substituted in the LMM by an IBS genetic covariance matrix KIBS>t, in which all KIBS entries below a threshold t are set to zero (Zaitlen et al. 2013). Moreover, the same authors introduced a method for the simultaneous estimation of SNP $$h_{g}^{2}$$ and total heritability h2 by jointly fitting an LMM with KIBD & KIBS (or KIBS>t instead of KIBD). This way, the authors provided heritability estimates with narrower confidence intervals and showed that total heritability estimates under this approach were very similar to those under KIBD (or KIBS>t) alone.

In order for existing methods to produce meaningful heritability estimates, no population structure should be present in the studied samples (Zaitlen and Kraft 2012). Population structure can arise when individuals of different ancestry are found in the same sample and/or when individuals are admixed. Individuals from different populations tend to have different minor allele frequencies as well as different environmental exposures (Zaitlen and Kraft 2012). Because population structure correlates with environmental structure, it can inflate heritability estimates. It has been shown that, for SNP heritability ($$h_{g}^{2}$$) estimates, inclusion of principal components (PCs) as fixed effects cannot fully account for the structure bias (Browning and Browning 2011). Moreover, the differences in ancestral allele frequencies affect the computation of KIBS, which ceases to be proportional to KIBD—even for close relationships. To date, there is not a clear strategy for estimating and interpreting heritability in structured/admixed populations.

Nevertheless, an LMM that uses a relationship matrix Kγ based on local ancestry, i.e., the genetic ancestry of an individual at a particular chromosomal location, instead of genotypes has been proposed (Zaitlen et al. 2014). The authors of this method produced accurate estimates of total heritability h2 for several phenotypes in admixed African-American samples by fitting an LMM with Kγ and rescaling its regression coefficient accordingly. However, this method also has limitations, as it relies on accurate knowledge of local ancestry and assumes that samples are unrelated.

In light of the above, finding an efficient framework for total narrow-sense heritability estimates in admixed populations with high levels of relatedness remains an open question. Many experimental designs could benefit from a better understanding of how heritability estimates are affected by the simultaneous presence of population and family structure. In this work, we use extensive simulations to evaluate the performance of existing classical and LMM frameworks for estimating total (h2)—but also SNP ($$h_{g}^{2}$$)—heritability in two population-based cohorts from Greenland. The Greenlanders are a population isolate with many unique characteristics, such as small census size, high levels of relatedness, and extensive population structure with ancestry from both the Inuit and Europeans (Moltke et al. 2015; Pedersen et al. 2017). Our goal is to find a way to estimate and understand narrow-sense heritability in such populations, thus gaining valuable insights into the genetic architecture of complex traits in said populations.

## Materials and methods

### Samples

#### Greenlanders

The Greenlandic subjects (N = 4659) came from two general health surveys. The first survey (Bjerregaard et al. 2003) consisted of Greenlanders living in Denmark (the BHH cohort, N = 546), recruited during 1998–1999, as well as Greenlanders living in Greenland (the B99 cohort, N = 1328), recruited during 1999–2001 as part of a general population health survey. The second survey (Jørgensen et al. 2013) consisted of Greenlanders living in Greenland (the IHIT cohort, N = 2785), recruited during 2005–2010 as part of a population health survey.

#### Danes

The population-based Danish sample (N = 5470) was obtained from Inter99 (Glümer et al. 2004), a randomized intervention study collected at the Research Centre for Prevention and Health. In addition, 513 (N = 1169) Danish sib pairs were identified across several available Danish cohorts, namely the population-based cohorts (i) Inter99 (Jørgensen et al. 2003) (N = 294), (ii) Helbred2006 (Thuesen et al. 2014) (N = 121), (iii) Helbred2008 (Byberg et al. 2012) (N = 38), and (iv) Helbred2010 (Aadahl et al. 2014) (N = 57), all recruited from the Research Centre for Prevention and Health, Glostrup Hospital, Denmark, as well as cohorts collected for the study of type 2 diabetes, namely (v) Vejle Diabetes Biobank (Petersen et al. 2016) (N = 570), recruited at the Vejle Hospital, Denmark, (vi) the ADDITION study (Lauritzen et al. 2000) (N = 8), recruited at the Department of General Practice at the University of Aarhus, Denmark, and (vii) SDC (Andreasen et al. 2008) (N = 81) recruited at the outpatient clinic at Steno Diabetes Center, Denmark.

### Genotyping and quality control

Both the Greenlandic and the unrelated Danish samples were typed on Illumina’s Cardio-MetaboChip (Illumina, San Diego, CA, USA). The Cardio-MetaboChip includes 196,725 SNPs selected from genetic studies of cardiovascular, metabolic, and anthropometric traits (Voight et al. 2012). Moreover, the unrelated Danish samples and the Danish sib pairs were typed on Illumina’s Infinium OmniExpress chip, which includes ~710,000 markers. Standard quality control was carried out separately on each dataset with PLINK v1.9 (Chang et al. 2015) and included filtering for per-individual (‑‑mind 0.01) and per-marker (--geno 0.01) genotype missingness = 1%. The datasets passing quality control consisted of (i) 4659 Greenlanders typed on 187,181 Cardio-MetaboChip autosomal SNPs, (ii) 5470 unrelated Danes typed on 186,639 Cardio-MetaboChip and 618,037 OmniExpress autosomal SNPs, and (iii) 1169 Danes forming sib pairs typed on 609,605 OmniExpress autosomal SNPs.

### Phenotype simulations

We simulated 1000 quantitative phenotypes with true total narrow-sense heritability h2 = {0.4, 0.6, 0.8} using real genetic data from 4659 Greenlanders and 6639 (5470 + 1169) Danes. Simulations were carried out separately on each dataset as follows:

First, we defined an N × C causal genotype matrix Gcausal by sampling C = 1500 SNPs from a list of all available SNPs.

Second, we sampled SNP effects, represented by a C × 1 effect vector b, from a standard normal distribution N(0,1). In order to model the relationship between the effect size of SNP i and its allele frequency fi, a genotype matrix G can be standardized according to the formula

$${\bar{\mathbf G}}_{{\mathrm{j}},{\mathrm{i}}} = \left( {{\mathbf{G}}_{{\mathrm{j}},{\mathrm{i}}} - 2f_i} \right)\left[ {2{f_i}\left( {1 - f_i} \right)} \right]^{\frac{\alpha }{2}},$$

(Speed et al. 2017). GCTA assumes a specific inverse relationship between SNP effect size and allele frequency by converting the original matrix Gcausal into a zero-column mean and unit-column variance standard score matrix $${\bar{\mathbf G}}_{{\mathbf{causal}}}$$. This can be seen as a special case of the above standardizing formula for α = −1.

Third, we computed a vector of polygenic score S by multiplying $${\bar{\mathbf G}}_{{\mathbf{causal}}}$$ by the corresponding effect vector b

$$\mathrm{S} = {\bar{\mathbf G}}_{{\mathbf{causal}}}{\mathbf{b}},$$

and its additive genetic variance as

$$\sigma _A^2 = \mathrm{var}\left( \mathbf{S} \right).$$

Finally, we computed the phenotype vector P by adding an environmental vector ε of i.i.d. error terms to vector S

$${\mathbf{P}} = {\mathbf{S}} + {\boldsymbol{\varepsilon}}.$$

Error terms were sampled from the distribution $${\mathrm{N}}\left( {0,{\mathrm{var}}\left( {\mathbf{S}} \right)\left( {\frac{1}{{h^2}} - 1} \right)} \right)$$ (Yang et al. 2011).

In some simulations involving the Greenlanders, we also modeled the interaction between environment and ancestry by adding an interaction vector E×Anc to the sum

$${\mathbf{P}} = {\mathbf{S}} + {\mathbf{E}} \times {\mathbf{Anc}} + {\boldsymbol{\varepsilon}}.$$

In particular, if $$h_{\mathrm{E} \times \mathrm{Anc}}^2$$ is the proportion of phenotypic variance explained by the interaction and θInuit is the vector of proportion of Inuit ancestry, then

$${\mathbf{E}} \times {\mathbf{Anc}} = {\mathbf{\theta }}_{{\mathbf{Inuit}}}\sqrt {\left( {\frac{{h_{\mathrm{E} \times \mathrm{Anc}}^2}}{{h^2}}} \right)\frac{{{\mathrm{var}}({\mathbf{S}})}}{{{\mathrm{var}}({\mathbf{\theta }}_{{\mathbf{Inuit}}})}}}.$$

As a consequence, the noise terms in ε are now sampled from $${\mathrm{N}}\left( {0,{\mathrm{var}}\left( {\mathbf{S}} \right)\left( {\frac{1}{{h^2}} - 1} \right) - {\mathrm{var}}\left( {{\mathbf{E}} \times {\mathbf{Anc}}} \right)} \right)$$.

In other simulations involving the Greenlandic sib pairs, we added a vector E reflecting shared environment—i.e., the household effect (Almasy and Blangero 1998)—between siblings

$${\mathbf{P}} = {\mathbf{S}} + {\mathbf{E}} + {\boldsymbol{\varepsilon}}.$$

In particular, we drew environmental effects from a normal distribution making sure to assign the same value to all individuals belonging to the same sibling cluster. As a consequence, the noise terms in ε are now sampled from $${\mathrm{N}}\left( {0,{\mathrm{var}}\left( {\mathbf{S}} \right)\left( {\frac{1}{{h^2}} - 1} \right) - {\mathrm{var}}\left( {\mathbf{E}} \right)} \right)$$.

### Linear mixed model

In an LMM, the phenotype y is modeled as a mixture of fixed and random effects (i.e., the effects of the causal variants)

$$\mathbf{y} = {\boldsymbol{\mu}} + {\bar{\mathbf G}}_{{\mathbf{causal}}}{\mathbf{b}} + {\boldsymbol{\varepsilon}}.$$

Assuming that b ~ N(0, $$\frac{{\upsigma}_{\mathrm{A}}^{2}}{{\mathrm{C}}}$$) and ε ~ Ν(0, $${{\upsigma }}_{\mathrm{e}}^2{\mathbf{I}}$$) under the GCTA model with α = −1, y follows a multivariate normal distribution with mean μ and variance

$$\begin{array}{*{20}{l}} {{\mathrm{var}}}({\mathbf{y}}) \hfill & = \hfill & {\mathrm{var}}({\mathbf{\mu }} + {\bar{\mathbf G}}_{{\mathbf{causal}}}{\mathbf{b}} + \varepsilon ) \hfill \\ {} \hfill & = \hfill & {\mathrm{var}}({\bar{\mathbf G}}_{{\mathbf{causal}}}{\mathbf{b}}) + \mathrm{var}(\varepsilon ) \hfill \\ {} \hfill & = \hfill & {\bar{\mathbf G}}_{{\mathbf{causal}}}\mathrm{var}\left( {\mathbf{b}} \right){\bar{\mathbf G}}_{{\mathbf{causal}}}^{\mathbf{T}} + \mathrm{var}\left( \varepsilon \right) \hfill \\ {} \hfill & = \hfill & {\bar{\mathbf G}}_{{\mathbf{causal}}}\frac{{{{\upsigma }}_{\mathrm{A}}^2}}{{\mathrm{C}}}{\bar{\mathbf G}}_{{\mathbf{causal}}}^{\mathbf{T}} + {{\upsigma }}_{\mathrm{e}}^2{\mathbf{I}} \hfill \\ {} \hfill & = \hfill & {{\upsigma }}_{\mathrm{e}}^2{\mathbf{{K}}}_{{\mathbf{causal}}} + {{\upsigma }}_{\mathrm{e}}^2{\mathbf{I}} \hfill \end{array},$$

such that

$${\mathbf{y}}\sim {\mathrm{N}}\left({\mathbf{\mu }},{{\upsigma }}_{\mathrm{A}}^2{\mathbf{K}}_{{\mathbf{causal}}} + {{\upsigma }}_{\mathrm{e}}^2{\mathbf{I}} \right).$$

Total narrow-sense heritability is then defined as

$$h^2 = \frac{{{{\upsigma }}_{\mathrm{A}}^2}}{{{{\upsigma }}_{\mathrm{A}}^2 + {{\upsigma }}_{\mathrm{e}}^2}}.$$

Because Kcausal is unknown, we approximated it with other GRMs instead, such as KIBD, KIBS, and KIBS>t.

### Relationship matrices

We computed the KIBD matrix for the entire Greenlandic sample from pairwise kinship coefficients ($${\hat{\mathrm \uppi }}_{{\mathrm{j}},{\mathrm{k}}}$$ = 2φj,k = ½k1,j,k + k2,j,k) using RelateAdmix (Moltke and Albrechtsen 2014) or alternatively REAP (Thornton et al. 2012) on a genotype file with MAF cutoff = 0.01. We note that the total genomic IBD estimates are generally robust to the ascertainment scheme of the array used. We subsequently identified 1465 Greenlandic sib pairs by use of empirical thresholding over k1 (IBD1, 0.3 < k1 ≤ 0.7) and k2 (IBD2, 0.1 < k2 ≤ 0.5) on the RelateAdmix output. We then recomputed the KIBD matrix for the identified sib pairs using RelateAdmix. KIBD for the Danish sib pairs was computed with the PLINK --genome flag, using MAF cutoff = 0.01. For both the Greenlandic and Danish sib pairs, we also computed a KIBD>t matrix, in which all entries below a threshold t = 0.05 were set to zero, and a $${\mathbf{K}}_{{\mathbf{IBD}}}^0$$ matrix, in which all between-sib-pair values were set to zero.

KIBS and KIBS>t (t = 0.05) for both Greenlanders and Danes were computed with GCTA using a MAF cutoff = 0.01. Causal variants were removed from the computation of KIBS and KIBS>t. Because causal variants are selected from the entire list of available SNPs, we assume that they have an allele frequency distribution similar to the genotyped SNPs. We can explicitly control for this by adding --grm-adj 0 to the GCTA command line; however, this setting had no effect on our estimates and was dropped early (data not shown). For some heritability estimations, we also computed $${\mathbf{K}}_{{\mathbf{IBS}}}^ \ast$$ after removing not only the causal variants, but also all variants in LD with those (i.e., applying extreme LD pruning in the vicinity of causal variants). In addition, we estimated heritability by use of a $${\mathbf{K}}_{{\mathbf{IBS}}}^{\mathbf{c}}$$ matrix in which causal variants were included in the computation. Finally, household effects in Greenlandic sib pairs were captured by use of the KHH matrix, whereby 1’s were assigned to all pairs of an individual with itself and its siblings and 0’s otherwise.

### Heritability estimation

Additive genetic variance $${\hat{\mathrm \upsigma }}_{\mathrm{A}}^2$$ and, subsequently, total narrow-sense heritability h2 were estimated for various GRMs with the GRM-based restricted maximum likelihood (GREML) procedure implemented in GCTA and in LDAK. For the sib pairs in particular, we carried out total narrow-sense heritability estimations using (i) the IBD-based matrices (KIBD, KIBD>t, and $${\mathbf{K}}_{{\mathbf{IBD}}}^0$$) alone, (ii) the IBS-based matrices (KIBS, $${\mathbf{K}}_{{\mathbf{IBS}}}^ \ast$$, and $${\mathbf{K}}_{{\mathbf{IBS}}}^{\mathbf{c}}$$) alone, (iii) KIBS together with KIBD, KIBD>t, $${\mathbf{K}}_{{\mathbf{IBD}}}^0$$, or KIBS>t, and (iv) the classical sib-pair approach. When two relationship matrices were used, $${\hat{\mathrm \upsigma }}_{\mathrm{A}}^2$$ was equal to the sum of the two variance components corresponding to said matrices (Zaitlen et al. 2013). In the particular case of evaluating the household effect, its variance $${\hat{\mathrm \upsigma }}_{{\mathrm{HH}}}^2$$ was subtracted from the final estimation. We also estimated $${\hat{\mathrm \upsigma }}_{\mathrm{A}}^2$$ after adjusting for the first 5, 10, or 20 PCs, or a proportion of Inuit ancestry (where applicable). We note that SNP array ascertainment is not expected to affect total heritability estimates, as those are dependant on robust IBD measures. Conversely, SNP heritability estimates are sensitive to the ascertainment scheme of a given genotyping platform.

### Analysis settings

We ran phenotype simulations and heritability estimates on four groups of Greenlanders: (i) all samples (N = 4659), (ii) sib pairs (N = 1688), (iii) more distantly related individuals (“cousins”; N = 2615), and (iv) unrelated individuals (N = 585), as well as two separate groups of Danes: (i) unrelated individuals (N = 5470), and (ii) sib pairs (N = 1169). In both populations, the $${\hat{\mathrm \uppi }}$$ threshold for identifying unrelated individuals was 0.0625. Note that we did not merge the unrelated Danes with the Danish sib pairs because, unlike the Greenlanders, they do not come from the same population-based study. We estimated heritability on the above groups using different GRMs, without and with covariates (summarized in Table 1). For the sake of simplicity, we do not show results that are nonsensical (e.g., KIBD for unrelated individuals).

### Application to real data

We applied the best-performing model to real phenotypic data from the two population-based Greenlandic cohorts. All phenotypes considered were quantitative and consisted of basic anthropometric traits (height, weight, body mass index, hip circumference, waist circumference, and waist-to-hip ratio), as well as serum lipid levels (total cholesterol, HDL cholesterol, LDL cholesterol, and triglycerides). Data were rank-transformed to the quantiles of a standard normal distribution. Age and sex were included as covariates. We also carried out an empirical investigation of the impact of allele frequency and LD weighting—as defined in the LDAK model (Speed et al. 2017)—on heritability estimates. In particular, we estimated total narrow-sense heritability for the ten available traits assuming seven different genotype standardizations by setting LDAK’s parameter α = {−1.25, −1, −0.75, −0.5, −0.25, 0, 0.25}, and accounting for LD weighting. In this context, the model used by GCTA can be seen as a special case of the LDAK model by setting α = −1 and ignoring LD weighting.

## Results

### Admixture and relatedness in the Greenlandic and Danish data

Principal component analysis (PCA) and ADMIXTURE (Alexander et al. 2009) analysis of 4659 individuals showed that the general Greenlandic population is the result of admixture between Greenlandic Inuit and European populations, and that there is high variance in the admixture profiles of the Greenlanders (Fig. 1a) (Moltke et al. 2015; Pedersen et al. 2017). We note that assuming K = 2 ancestral components is a simplification of the admixture history of the Greenlanders as it has been previously shown that there are up to three distinct Inuit ancestral components with FST values as high as 0.04 (Moltke et al. 2015). Conversely, a sample of 5470 Danish individuals appeared largely unstructured (Fig. 1b), matching previous observations (Athanasiadis et al. 2016). We identified a large number of sib pairs in the Greenlandic sample (1465 pairs, N = 1688 individuals, Fig. S1A). We also confirmed the lack of relatedness in the 5470 unrelated Danish samples (Fig. S1B) and the presence thereof in the 513 Danish sib pairs (Fig. S1C).

### Identity by state in the Greenlandic and Danish data

We illustrate the intrinsic differences of IBS between admixed and unadmixed populations by plotting the IBS-based genetic covariance against the IBD-based $${\hat{\mathrm \uppi }}$$ estimates from the Greenlandic and Danish sib pairs, respectively (Fig. 2). For a given kinship (e.g., full siblings), the corresponding IBS values were far more dispersed in the Greenlanders (Fig. 2a) than in the Danes (Fig. 2b). This is due to the heterogeneous admixture profiles in the Greenlanders (Fig. 1a). In other words, whereas IBS is proportional to IBD in unadmixed individuals, this does not hold for the admixed individuals.

### Heritability estimates in phenotypes with no population-specific environmental effects

We explored a number of approaches for estimating narrow-sense heritability h2 in the admixed Greenlandic and the unadmixed Danish population. We simulated quantitative traits by (i) randomly selecting 1500 causal loci with effect sizes depending on the allele frequency, such that the effect sizes of the standardized genotypes are normally distributed as assumed in the GCTA software, and (ii) adding environmental noise so that the true simulated h2 was 0.4, 0.6, or 0.8. In these simulations, all individuals were set to having the same environmental variance regardless of their ancestry. We then estimated h2 in an LMM framework for different GRMs—e.g., KIBD, KIBS, and KIBS>t and combinations thereof (Fig. 3; Figs. S2 and S3; Supplementary file). In the following paragraphs, we first report the results based on one GRM, followed by the results based on two GRMs. As a reminder, for heritability estimates to be interpreted as “total” (h2), it is required that the sample includes related individuals.

#### Total heritability estimates in sib pairs using one GRM

The use of KIBD, which captures the fraction of the genome-shared IBD ($${\hat{\mathrm \uppi }}$$), resulted in underestimates of total heritability both in the Greenlandic and the Danish sib pairs (Fig. 3). Total heritability in the Greenlandic sib pairs was also underestimated when we used KIBD>t (Fig. 3a) or $${\mathbf{K}}_{{\mathbf{IBD}}}^0$$ (Fig. S2A). Conversely, using KIBD>t or $${\mathbf{K}}_{{\mathbf{IBD}}}^0$$ in the Danish sib pairs did not result in any significant downward biases (Fig. 3b; Fig. S2B). These results were insensitive to the method of IBD inference: both our method of choice (RelateAdmix) and an alternative (REAP) returned similar results (data not shown).

For closely related unadmixed individuals, IBS is proportional to IBD, and therefore estimates based on KIBS will correspond to the total narrow-sense heritability as well (Hayes et al. 2009). Indeed, true simulated h2 was fully recovered in the Danish sib pairs when KIBS was used (Fig. 3b), but showed consistent downward biases across all simulated h2 values in the Greenlandic sib pairs (Fig. 3a). Removing all SNPs with any LD with the causal variants (r2 = 0) from the GRM (i.e., using the $${\mathbf{K}}_{{\mathbf{IBS}}}^ \ast$$ matrix) returned lower yet still comparable h2 estimates in both the Greenlandic (Fig. S2A) and Danish sib pairs (Fig. S2B). Conversely, when causal variants were included in the computation of the GRM (i.e., using the $${\mathbf{K}}_{{\mathbf{IBS}}}^{\mathbf{c}}$$ matrix), then $${\mathbf{K}}_{{\mathbf{IBS}}}^{\mathbf{c}} \cong$$ Kcausal and, consequently, h2 was recovered (Fig. S4).

#### Total heritability estimates in sib pairs using two GRMs

The use of two GRMs (e.g., KIBD & KIBS) for heritability estimates is meant to leverage datasets in which both closely and more distantly related individuals are present (Zaitlen et al. 2013). However, when we applied this approach to the entire Greenlandic dataset (N = 4659), we observed a downward bias in total heritability estimates for the KIBD & KIBS model, while using KIBS>t & KIBS erroneously returned heritability estimates near 1.00 regardless of the true simulated value (Fig. S5). Nevertheless, when we performed the two-GRM analysis on the 1465 Greenlandic sib pairs alone, the true simulated h2 was almost perfectly recovered for all GRM combinations, with estimates showing only a minor downward bias (Fig. 3a; Fig. S2A). Interestingly, these models outperform the classical sib-pair analysis as evidenced by their lower root-mean-square deviation (Table S1; Fig. S3). Extending the sample to include more distant relatives (“cousins”; $$\hat \pi$$ = [0.15, 0.67]; N = 2615) resulted in underestimates of the total heritability (Fig. S6), implying that the KIBD & KIBS model performs efficiently only on first-degree relatives when admixture is present. We therefore further examined the KIBD & KIBS model in the following section, focusing our attention on the sib pairs.

#### Total heritability estimates of phenotypes with shared environment between siblings

When we performed the two-GRM analysis of phenotypes that included the effect of shared environment on the 1465 Greenlandic sib pairs, the true simulated h2 was inflated (squares in Fig. S7). The stronger the household effect, the higher the inflation. Notably, the inclusion of 10 PCs or a proportion of Inuit ancestry did not have any noticeable effect on the estimates (circles and triangles in Fig. S7). However, when we performed the KIBD & KIBS & KHH analysis and subtracted the variance of the household effect, we were able to recover almost perfectly the total heritability estimates (diamonds in Fig. S7).

#### SNP heritability estimates in unrelated individuals

As previously mentioned, the use of KIBS on unrelated individuals yields SNP ($$h_g^2$$) rather than total heritability (h2) estimates (Yang et al. 2010). Bearing this in mind, we estimated $$h_g^2$$ in both unrelated Greenlanders and unrelated Danes (Fig. S8; Table S2). In all cases, we found that $$h_g^2$$ < h2, as expected. For the Danish samples in particular, the MetaboChip $$h_g^2$$ was smaller than the OmniExpress $$h_g^2$$.

#### Heritability estimates in phenotypes with population-specific environmental effects

In all of the above simulations, we assumed that the environmental component was independent of ancestry. However, when we added to the simulated phenotypes an environmental component correlating with ancestry, the use of KIBS in the unrelated Greenlanders led to overestimates of SNP heritability $$h_g^2$$, despite adjusting for population structure (Table S2).

Adjusting the KIBD model for either the first 10 PCs or a proportion of Inuit ancestry produced a consistent yet uninterpretable pattern along the $$\frac{{h_{\mathrm{E} \times \mathrm{Anc}}^2}}{{h^2}}$$ ratio (Fig. S9). On the contrary, adjusting the KIBD & KIBS model for the same covariates produced a predictable as well as interpretable pattern across all choices for the $$\frac{{h_{\mathrm{E} \times \mathrm{Anc}}^2}}{{h^2}}$$ ratio (Fig. 4). In particular, estimates from the KIBD & KIBS model without covariates corresponded to the inflated quantity $$\frac{{\sigma _{\mathrm{A}}^2 \, + \ \sigma _{{\mathrm{E}} \times {\mathrm{Anc}}}^2}}{{\sigma _{\mathrm{A}}^2 \, + \ \sigma _{{\mathrm{E}} \times {\mathrm{Anc}}}^2 + \ \sigma _{\mathrm{e}}^2}}$$ (squares in Fig. 4). After adjustment for ancestry, the resulting estimates corresponded to removing the environmental interaction component ($$\sigma _{{\mathrm{E}} \times {\mathrm{Anc}}}^2$$) from the numerator and denominator of the formula (i.e., $$\frac{{\sigma _{\mathrm{A}}^2}}{{\sigma _{\mathrm{A}}^2 \, + \ \sigma _{\mathrm{e}}^2}}$$; circle and triangle points in Fig. 4). We note that adjustment for 10 PCs was equivalent to adjustment for a proportion of Inuit ancestry (Fig. 4).

Thanks to the interpretability of the resulting “conditional” estimates, we were able to recover the true simulated heritability $$\frac{{{{\upsigma }}_{\mathrm{A}}^2}}{{{{\upsigma }}_{\mathrm{A}}^2 \, + \ {{\upsigma }}_{{\mathrm{E}} \times {\mathrm{Anc}}}^2 \, + \ {{\upsigma }}_{\mathrm{e}}^2}}$$ – i.e., the “marginal” heritability (Weissbrod et al. 2018) (diamond points in Fig. 4). We achieved this by rescaling the conditional estimates by a factor of $$1 - \frac{{{\hat{\mathrm \upsigma }}_{{\mathrm{E}} \times {\mathrm{Anc}}}^2}}{{{{\upsigma }}_{\mathrm{P}}^2}}$$, where $${\hat{\mathrm \upsigma }}_{{\mathrm{E}} \times {\mathrm{Anc}}}^2$$ is an estimate of the environmental interaction variance computed as $${\hat{\mathrm \upsigma }}_{{\mathrm{E}} \times {\mathrm{Anc}}}^2 = {{\upsigma }}_{\mathrm{P}}^2 - {{\upsigma }}_{{\hat{\mathrm P}}}^2$$. $${\hat{\mathrm P}}$$ is the phenotype residuals after regressing out the effect of structure captured either by admixture proportions or the first two principal components.

### Application to real phenotypes

We applied the best model (i.e., KIBD & KIBS & 10 PCs) and the follow-up PCA-based adjustment to ten quantitative traits in the 1465 Greenlandic sib pairs (Table 2). Not all phenotypes were equally sensitive to the PCA-based adjustment of their estimated conditional heritability, implying trait-specific environment-by-ancestry interactions. The GCTA model accommodates only one type of genotype standardization (α = −1), resulting in strong assumptions about the distribution of effect sizes. We therefore also used the LDAK model (Speed et al. 2017) and found that the optimal α value for genotype standardization varied across traits with most phenotypes supporting α ≥ −0.5 (Fig. S10; Table S3). Total heritability estimates under the LDAK model (Speed et al. 2017) were generally higher than under GCTA (Table 2; Table S3), with the greatest difference observed for height (0.657 ± 0.042 for GCTA against 0.786 ± 0.041 for LDAK). In addition, heritability estimates in eight out of ten real phenotypes were smaller in the Greenlanders than in their European or Mexican counterparts estimated with similar models.

## Discussion

In this work, we explored the performance of existing methods for heritability estimates in the admixed Greenlandic population. Our goal was to propose a framework for unbiased heritability estimates in datasets where both population and family structure are notably present, as well as a way to interpret the resulting estimates. Even though the main focus is on total narrow-sense heritability (h2), we also report the results for SNP heritability ($$h_g^2$$), a quantity that has gained a lot of attention in the past decade due to the availability of GWAS data (Yang et al. 2010; Lee et al. 2011; Browning and Browning 2011).

Through extensive simulations, we observed that all LMMs using one GRM led to downward biases in total heritability estimates when applied to family data from Greenland. Common choices of GRM, such as KIBD and KIBS, led to underestimates of total heritability in Greenlandic sib pairs, whereas no such biases were generally observed for the Danish sib pairs, indicating that inheriting DNA from different ancestral populations (i.e., admixture) exerts a biasing effect on both IBD- and IBS-based estimates. Even though this is not surprising for the IBS-based estimates, as the IBS ~ IBD assumption does not hold for the Greenlanders, it is not very clear why IBD-based heritability estimates are also affected by admixture. One possible explanation could be that IBD estimates become less accurate for more distantly related pairs, and therefore including them in the LMM introduces noise as evidenced by the underperformance of the full KIBD matrix in the Danes.

We also observed that an LMM with two GRMs (KIBD & KIBS), a method designed to work on data with notable presence of family structure (Zaitlen et al. 2013), also led to downward biases in total heritability estimates when applied to the entire dataset from Greenland. However, when the same analysis focused on the Greenlandic sib pairs, it returned nearly unbiased heritability estimates. This could be due to the fact that, by restricting the analysis to the sib pairs, we controlled more efficiently for the noise that comes from between-sib-pair IBD estimates (Moltke and Albrechtsen 2014). The KIBD & KIBS model performs well under the assumption that shared environment among siblings has a negligible effect. We show that a nonzero household effect can potentially inflate total heritability estimates, but this effect can be accounted for with the inclusion of a shared environment matrix KHH—at least in the simulation setting. We note that the KIBD & KIBS model outperformed the KIBD model in the Danes, rendering more advisable the use of two GRMs in total narrow-sense heritability estimates in unadmixed populations too.

When there is no environmental correlation with ancestry, the KIBD & KIBS (or any other combination of one IBD- and one IBS-based GRM) model provides an accurate estimate of the true heritability matched only by the classical sib-pair analysis. However, we expect environmental structure to exert an inflating effect on heritability estimates due to its correlation with genetic structure. We found that adjusting for structure did not remove the inflation. Nevertheless, we provide a way to interpret the resulting total heritability estimates from the KIBD & KIBS & 10 PC models, as well as a way to adjust for the inflation. In particular, this inflated quantity is referred to as “conditional heritability” in a recent paper (Weissbrod et al. 2018), after adjusting for model covariates like in our case. We observed that, under the KIBD & KIBS & 10 PC models, the resulting conditional heritability estimate will be inflated by a factor of 1/(1 − $$h_{\mathrm{E} \times \mathrm{Anc}}^2$$), and we propose an adjustment that accounts efficiently for this inflation in order to retrieve the “marginal heritability” (Weissbrod et al. 2018). Finally, we note that the classical sib-pair approach will also produce inflated estimates when there is interaction with the environment, and that adjustment for PCs will not fix the issue.

As for the total narrow-sense heritability estimates of the real phenotypes obtained with the best model (KIBD & KIBS & 10 PCs), we observe that in some occasions, these are lower for the Greenlandic population than for European populations. A notable example is height, for which total heritability in the Greenlanders was estimated to be 0.656 ± 0.042 (0.611 after the PCA-based adjustment), whereas in unadmixed Europeans it was estimated at 0.860 (Visscher et al. 2007). We believe that this could be due to the reduced genetic diversity observed in the Greenlanders as a consequence of their particular population history, which included an extreme and prolonged bottleneck in recent times (Moltke et al. 2015; Pedersen et al. 2017), even though we did observe a notable increase when LD weighting was included in the estimation model according to LDAK (Speed et al. 2017).

Finally, our SNP heritability ($$h_g^2$$) estimates in unrelated Greenlanders could be inflated due to genetic structure as reported previously (Browning and Browning 2011), even though we could not assess the level of inflation. In any case, SNP heritability estimates in the Greenlanders should be interpreted with caution because, as we saw, IBS measures are affected by admixture that can lead to artificially increased levels of LD between causal and typed markers.

It is important to note that this work does not solve all problems of heritability estimates in admixed populations. Our work should be viewed as a first attempt to explore the problem, and therefore the insights and solutions we provide here might not apply in all cases. Additional work is warranted in order to, e.g., model more accurately complex patterns of environmental stratification—similar to the household effect (Almasy and Blangero 1998)—and exposure of the same genetic ancestry to different environmental backgrounds. In addition, even though there are multiple methods for improving heritability estimates using, for example, LD score regression or partitioning SNPs according to allele frequencies (Gazal et al. 2017; Evans et al. 2018), we have not explored them here as they are harder to implement in admixed populations, where LD patterns and allele frequencies can be misspecified.

In summary, we advise against the use of KIBD or KIBS alone for total narrow-sense heritability estimates in populations with substantial levels of population and family structure. Instead, KIBD & KIBS & 10 PCs on a subset with high relatedness (preferably sib pairs) are advisable, given that KIBD can now be efficiently computed for admixed populations (Thornton et al. 2012; Moltke and Albrechtsen 2014), with the caveat that the method could be capturing sizeable levels of shared environment among siblings. In any case, the resulting conditional h2 estimates should be viewed as potentially inflated by a factor that we estimated at 1/(1 − $$h_{\mathrm{E} \times \mathrm{Anc}}^2$$), and an additional PCA-based adjustment should be carried out in order to recover the marginal total heritability estimate.