Modeling assortative mating and genetic similarities between partners, siblings, and in-laws

Assortative mating on heritable traits can have implications for the genetic resemblance between siblings and in-laws in succeeding generations. We studied polygenic scores and phenotypic data from pairs of partners (n = 26,681), siblings (n = 2,170), siblings-in-law (n = 3,905), and co-siblings-in-law (n = 1,763) in the Norwegian Mother, Father and Child Cohort Study. Using structural equation models, we estimated associations between measurement error-free latent genetic and phenotypic variables. We found evidence of genetic similarity between partners for educational attainment (rg = 0.37), height (rg = 0.13), and depression (rg = 0.08). Common genetic variants associated with educational attainment correlated between siblings above 0.50 (rg = 0.68) and between siblings-in-law (rg = 0.25) and co-siblings-in-law (rg = 0.09). Indirect assortment on secondary traits accounted for partner similarity in education and depression, but not in height. Comparisons between the genetic similarities of partners and siblings indicated that genetic variances were in intergenerational equilibrium. This study shows genetic similarities between extended family members and that assortative mating has taken place for several generations.


. The Correlation in Genetic Signal (rGenSi) model: description and simulation
The basic rGenSi model The basic version of the correlation in genetic signal model (rGenSi) is shown in Supplementary Fig. 1. The model uses data on phenotypes and polygenic scores (PGS) from four adult individuals: Two siblings and their partners. Hence, it includes siblings, partners, siblings-in-law (in-laws for short), and co-siblings-in-law (co-in-laws for short). The model takes eight variables as input and reduces the 8x8 correlation matrix to four estimated parameters, s, h, m, and rs. It includes latent genetic factors that influence the phenotype and the polygenic score of each individual. It captures the variance common to the phenotype and the polygenic score and thereby represents the genetic signal of interest. The correlation between an individual's phenotype and polygenic score is decomposed into s, which is an estimate of the genetic signal in the polygenic score, and h, which is the influence of genetic factors on the phenotype. Squared, s 2 is the proportion of variance in the polygenic score explained by the latent genetic signal, and h 2 is an estimate of the heritability. The correlation between the polygenic score and the phenotype for an individual is s * h. Residual variance additionally influences phenotypes and polygenic scores. These are not freely estimated as the variance is fixed to 1 in all variables. With data on individuals, s and h cannot be freely estimated. This is possible when including data on relatives. In-laws In-laws Co-in-laws Supplementary Fig. 1

. The basic rGenSi model
The model assumes that partners assort directly on the phenotype; hence partners are linked by the co-path m. All partner similarity goes through this co-path. Co-paths are an extension of conventional path tracing rules introduced by Cloninger 1 . A summary of rules concerning co-paths are found in Balbona et al. 2 . In short, co-paths have no arrows (direction), and connect other valid chains of paths. They contribute to covariance between variables, but not to variance. This is useful when analyzing assortative mating because the assortment of partners does not change their phenotypes but leads to an association. The basic rGenSi model assumes that the phenotypes on which the assortment takes place are perfectly measured and included. It also assumes that there is no shared environmental variance between siblings. Hence, in this model, siblings can only resemble each other for genetic reasons. The correlation in the genetic signal between siblings is estimated in this model as rs. Under random assortment, the genetic correlation between siblings is expected to be 0.50. However, it can rise above this value due to positive assortment in previous generations. In addition, the polygenic scores can be influenced by factors not related to the phenotype, which are expected to correlate 0.50 between siblings. The correlation between the different individuals' latent genetic factors can be deduced by path tracing rules including co-paths. Hence, the correlation between siblings' genetic signals is rs. The correlation between partners' genetic signal is h * m * h = mh 2 , the correlation between in-laws' genetic signal is h * m * h * rs = rsmh 2 , and the correlation between coin-laws' genetic signal is h * m * h * rs * h * m * h = rsm 2 h 4 . The correlation between one's polygenic score and one's own phenotype is s * h, whereas the correlation between one's polygenic score and one's partner's phenotype is s * h * m. Hence, m can be estimated by dividing the cross-partner phenotype-polygenic correlation (s * h * m) with the within individual phenotype-polygenic correlation (s * h). (Robinson et al. derived the phenotypic correlation in an equivalently way by regressing a phenotype on the partner's genetic predictor 3 .) Other correlations between observed phenotypes or polygenic scores can similarly be deduced with path tracing rules. For instance, the correlation between partners' phenotypes is m, whereas the correlation between co-in-laws' polygenic scores is s * h * m * h * rs * h * m * h * s = rsm 2 h 4 s 2 . As we will show later with empirical data, this model fits excellent for height.

Simulation of the basic rGenSi model
To test whether this model can accurately estimate s, h, m, and rs, we ran 1,000 simulations with a sample size of 50,000 individuals and 1,000 simulations with a sample size of 1,000 individuals. Each extended family consisted of four individuals, so for the sample size of 1,000 individuals, there were 250 extended families which included 250 complete pairs of siblings, 500 complete pairs of partners, 500 complete pairs of in-laws, and 250 complete pairs of co-in-laws. In each simulation, we drew s 2 and h 2 from a uniform distribution ranging from 0.20 to 1.00 and m from a uniform distribution ranging from 0.00 to 1.00. We simulated 100 genetic variants related to a phenotype for a founder population of the stated sample size who mated with each other and produced offspring for 10 generations. The polygenic score was a partial measure of the true underlying genetic signal and included a fraction of the causal SNPs equal to s 2 . For instance, 50 of these 100 genetic variants were included in the polygenic score if s 2 = 0.50 and s = 0.71. Within each generation, individuals within each sex were ranked according to the sum of their phenotype and a random component representing "assortment noise". The "assortment noise" was sampled from a normal distribution with mean zero and variance (1 -m) / m, where m is the correlation among partners. Individuals were then mated according to their rank, i.e., the highest scoring within sex 1 with the highest scoring within sex 2 and so on. The resulting partner correlations were stable over generations. Siblings in the founder generation were the product of random mating and shared on average half of the genetic variants; if estimated in the first generation, rs would therefore be 0.50, but this rises over generations with assortment. The expected value of rs can be derived mathematically from m and h and can also be observed directly in the simulated data. In the simulated data, the genetic factor is not a latent variable but an observed variable including all causal genetic variants, also those excluded from the polygenic score. It is not included in the model but allows one to check if the model correctly estimates the genetic correlations using the imperfect polygenic scores.
In the 10 th generation, we estimated the rGenSi model based on the four simulated phenotypes and the four simulated polygenic scores. This process was repeated 1,000 times for each sample size with random values of s, h, and m, as described above. Supplementary Fig. 2 shows the correspondence between simulated true values and the corresponding values estimated in the rGenSi model. Supplementary Table 2 shows the means, SDs, and medians of the true and the estimated values. The match between true and estimated values was excellent with 50,000 simulated individuals. With 1,000 simulated individuals, the match was good in a majority of datasets. In conclusion, the model can reconstruct the relevant parameters in realistic scenarios.  The rGenSi model with mating on a latent phenotype and influences of shared environment Two additional parameters may be added to the rGenSi model. This version of the model is shown in Supplementary Fig. 3. The first parameter is a (alpha), or the correlation between the measured phenotype and the one on which mating takes place. The assortment between partners may not take place on the measured variable. Like the polygenic scores, the phenotypic variable could be imperfectly measured, or it could be that the assortment takes place on another variable that correlates with the one we have observed (indirect or secondary assortment). Mating on a latent phenotype related to the observed phenotype resembles the Cascade model 4 . For instance, current depression may not be the basis of assortment; rather, individuals may assort based on their tendency to be depressed or how depressed they were when they met. Likewise, educational attainment may not per se be the factor underlying assortment; related variables, such as personality and cognitive abilities, could be relevant. By adding one parameter to the rGenSi model, this can be estimated. This parameter is the association between the observed phenotype and the unobserved S7 and potentially composite phenotype on which mating occurs. The model is shown in Supplementary  Fig. 3. The a parameter can be identified because it can reasonably be assumed to be equal in all relatives. Path tracing shows that adding this parameter reduces correlations between all relationship types to the same degree (all correlations between phenotypes are in the basic rGenSi model, but multiplied by a 2 ; the formulas for correlations between polygenic scores are not changed). As we will later show with empirical data, this model fits well for depression, whereas a freely estimated alpha is unnecessary for height (a=1 for height). In this model, m is not simply the correlation between two partners' observed phenotypes but derived by fitting the 8x8 correlation structure to this model with 5 freely estimated parameters. In this version of the model, where a is freely estimated, the correlation between one's polygenic score and one's own phenotype is a * s * h, whereas the correlation between one's polygenic score and one's partner's phenotype is a * s * h * m. It is still possible to estimate m by dividing the cross-partner phenotype-polygenic correlation with the within individual phenotype-polygenic correlation, as (a * s * h * m) / (a * s * h) = m. However, m now describes the similarity in the latent phenotype underlying assortment rather than the observed phenotypic similarity. As we will show with empirical data, a freely estimated a leads to improved fit for models of depression and educational attainment.  The second parameter that can be added to the model is effects of shared environment, defined as factors that make siblings similar to each other. Shared environment is added to the model as a factor perfectly shared between siblings that influences both siblings' phenotypes and is unrelated to the genetic factors. Hence, if the siblings' phenotypes correlate higher than expected from the correlation between their polygenic scores and between their polygenic scores and phenotypes, we can expect a positive value of c, interpreted as environmental factors that make siblings similar to each other. Note that our estimates of h and c do not rely on the equal environments assumption. These two parameters, a and c, may be added independently of each other. The model described in the previous section corresponds to the extended model where c has been set equal to 0, and a has S8 been set to 1. As we will show with empirical data, the inclusion of shared environment leads to the best fit for educational attainment.

Simulation of the full rGenSi model
To test whether the rGenSi model can estimate c and a simultaneously as it estimates s, h, m, and rs, we simulated 1,000 new datasets with 50,000 individuals and 1,000 new datasets with 1,000 individuals using process described above. Alpha (a) was drawn randomly from a uniform distribution with values from 0.20 to 1.00, whereas c 2 was drawn randomly from a uniform distribution ranging from 0.00 to 1.00. If the sum of c 2 and h 2 exceeded 1.00, both values were redrawn. Otherwise, this simulation followed the same rules as the simulation of the basic rGenSi model, described on page 4 of this supplement. (We drew s 2 and h 2 from uniform distributions ranging from 0.20 to 1.00 and m from a uniform distribution ranging from 0.00 to 1.00. In the first generation rs equals 0.50, but it can rise over generations with assortment, depending on the other parameters. We simulated 100 genetic variants related to a phenotype and assortment for 10 generations before we estimated the rGenSi model.) The correspondence between the true and estimated values is shown in Supplementary Fig. 4.   Supplementary Fig. 4. Simulation of the full rGenSi model. Comparison of true and observed values for h, s, m, rs, a, and c. 1,000 repetitions with a sample size of 50,000 and 1,000 repetitions with a sample size of 1,000.
A low a value reduces the associations between the observed phenotypes -in the extreme scenario that a is close to 0, these would be close to uncorrelated, and it would be near impossible to reconstruct the relations between the mated phenotypes. Therefore, we filtered out the simulations where a was > 0.60 and present these in Supplementary Fig. 5. Supplementary Table 3 shows the means, SDs, and medians of the true expected values and estimated values for the corresponding data. Whereas the model with a and effects of shared environment is less precise than the more restricted model, it provides reasonable estimates of the six parameters in most cases when the sample size is large. It is noteworthy that it can estimate the partner association in the mated phenotypes (m) with high accuracy even when this phenotype is not directly measured. Nevertheless, we recommend using the more restricted model when it has a good fit.  Testing the assumption that the residual genetic correlation between siblings is 0.50 The residual component of the polygenic score is defined as not being associated with the phenotype. Therefore, we assumed that it could not be a basis for partner selection and that is should not correlate between partners. Similarly, we assume that this genetic residual was not associated with partner selection in previous generations and thereby assume that the residual genetic correlation between siblings is 0.50. However, this is not statistically necessary to make this assumption. It is also possible to test. If effects of shared environment (c) are fixed to a specific value (usually zero), it is possible to freely estimate the correlation between siblings' genetic residuals. In the rGenSi model, it is unfortunately not possible to simultaneously estimate both effects of shared S10 environment and the correlation between siblings' genetic residuals. The genetic correlation between siblings' residuals was estimated very close to 0.50 in all tested simulations when we fixed c=0. (In simulations, we also generated half-siblings sharing 0.25 of genetic factors and a fictitious "worker bee" sibling type sharing 0.75 of genetic factors. In these cases, the correlation between genetic residuals matched the expected 0.25 and 0.75, respectively.) Hence, we conclude that the residual correlation between siblings' polygenic scores can be freely estimated if effects of shared environment (c) are not included in the model but that it is 0.50 in realistic scenarios. Supplementary Fig. 6 shows how the genetic variance (g_var) and the genetic correlations between siblings increase after 0 to 15 generations of assortative mating when the assortment (phenotypic correlations between partners) is 0.40, and the initial heritability is 0.60. These numbers are roughly similar to those for educational attainment. As can be seen, values close to equilibrium are reached after relatively few generations. For instance, the curves are close to equilibrium after 5 generations, and no increase is seen after 10 generations.

Increase in genetic variance and sibling correlations over generations
Supplementary Fig. 6. Genetic variance and covariance over generations. Genetic variance and genetic sibling correlations after 0 to 15 generations of assortative mating on an initially 60% heritable phenotype with stable assortment of 0.40.

Testing intergenerational equilibrium with imperfect indicators of genotype and phenotype
We test whether the genetic variance is in intergenerational equilibrium by comparing genetic correlations between siblings and partners. In intergenerational equilibrium, these can be predicted from one another, as the expected ,̂=

S11
In the first generation, before assortment started, the observed siblings correlation is 0.50; hence U=0.00. In intergenerational equilibrium, the observed and expected correlations should match, and U would have the expected value of 1.00.
The rGenSi model allows for both phenotypes and polygenic scores to be imperfect indicators of the phenomena causing partner similarity. When a is less than 1, the phenotype is an imperfect indicator of the phenotype that is the basis for selection, whereas when s is less than 1, the polygenic score is an imperfect indicator of the relevant genetic factors. When polygenic scores do not include all relevant causal genetic variants (s<1), both partner and sibling correlations can be underestimated towards the expected values under no assortment (0.00 for partners and 0.50 for siblings). The genetic correlation between siblings is directly estimated in the model as , = and the genetic correlation between partners is derived by path tracing as , = ℎ ℎ. Hence, , and , are independent of both s and a. The genetic correlations between the latent genetic factors should thus not be expected to be affected by the imperfection of measurement in phenotypes or in polygenic scores, and genetic sibling and partner correlations can be used to test intergenerational equilibrium even when s<1 and a<1. The relative deviation from equilibrium, indicated by U, should be consistent across varying proportions of genetic variance covered by the observed polygenic scores. Thus, one can test intergenerational equilibrium even if the genetic variables are partial, for instance, by using polygenic scores directly. Likewise, the partner and sibling correlations can be used to test whether the genetic variance is in intergenerational equilibrium both under direct and under indirect assortment.
To ensure that this reasoning is correct, we ran simulations similar to those described above. We used a sample size of 50,000. Initial simulations indicated that U was always (close to) 1.00 after 10 generations, even when a and s were <1. This indicates equilibrium. To make the simulations more interesting, each simulation was run for between 1 and 6 generations (drawn from a uniform distribution), so the genetic variance is plausibly not in equilibrium. We set each of s 2 and a to 0.90 and to 0.30, giving four combinations of s 2 and a for each data set. In each dataset, we estimated U four times. We did this in 100 datasets. The results with the four different combination of values for a and s values were similar, as indicated by Supplementary Table 4. The results were robust to low values of a, that is, to indirect selection. A low s 2 led to reduced accuracy in the estimations, in particular in combination with a low a, but the results were not biased in any particular direction.

Partialness versus noise in polygenic scores
An imperfect correlation between the polygenic score and the genetic signal can be due to partialness of the polygenic score (it does not cover all relevant genetic variants) and due to inclusion of irrelevant genetic factors (noise). These two mechanisms both reduce the correlation between the latent genetic variable and the polygenic score, estimated as s. Additional simulations indicated that s is correctly estimated regardless of which these two mechanisms make the polygenic scores imperfect. We have not found a way to distinguish between these in the model while maintaining the other parameters, but as s appears to be correctly estimated, the difference between these two mechanisms is irrelevant to the current research question.

Scripts
Scripts for the rGenSi model and the simulations are provided in Supplementary Software 1 and at https://osf.io/v9ybu/.

Supplementary Note 1. Model fitting of the rGenSi models
Supplementary Table 5 shows the model fit indices for the rGenSi models for the three phenotypes educational attainment, height, and depression. First, the full rGenSi model (with freely estimated a and c) is compared to a saturated model to test how well it fits the data. Then, more restricted versions of the rGenSi model are compared to this full model to see if these simpler models can adequately account for the observed correlational structure.
For educational attainment, the AIC value improved (was lower) in the rGenSi model, compared to the saturated model, indicating a better balance between parsimony and complexity. However, restricting c to 0, a to 1, or both, increased the AIC values, indicating a worse fit. Models with a set to 1 had a dramatically worse fit, indicating that the observed educational attainment is not identical to the (composite) phenotype that is the basis of partner selection and for the observed partner resemblance in educational attainment. For height, the model with no effects of shared environment and a equal to 1 had the best fit. This indicates that measured height is a basis of partner selection. For depression, the best fitting model had no shared environment but a below 1. This indicates that depression assessed at the time of the survey was not alone the basis of partner selection, but rather some highly correlated variable; this could, for example, be risk for depression or maybe manifest depression at the time the partnership was initiated.
In the full model, the genetic partner correlation was 0.37, predicting a genetic sibling correlation of 0.37/2+0.50 = 0.69, which compared well to the estimated sibling correlation of 0.68. Hence, in the full model, we found no deviations from intergenerational equilibrium in the genetic factors for educational attainment (p=0.987). The results would have been different had we based this comparison on the more restricted model. In the most restricted model (with a = 1 and c = 0), the genetic partner correlation was 0.21, predicting a genetic sibling correlation of 0.21/2+0.50 = 0.61, considerably lower than the observed 0.70 (p-value for deviation from equilibrium = 0.047). Hence, had we not used latent phenotypes, we could have concluded that the genetic variance in educational attainment was not in intergenerational equilibrium, but these models had a worse fit.
The model of educational attainment with no effects of shared environment (model 3) was close in fit to the one that included effects of shared environment (model 2). Three indices indicate that model 2 is the correct one. First, educational attainment is known from previous studies to be influenced by shared environment. Second, model 6, with shared environment and equilibrium, exhibits the overall best fit. Third, model 7 (no shared environment and in equilibrium) has a poor fit and does not correspond with the results showing that the crude polygenic scores are in intergenerational equilibrium (presented next). Therefore, the models with shared environment (model 2 and model 7) fit best not only in terms of statistical fit, but also with previous research and parallel analytic approaches.
Supplementary Table 6 shows the model parameters for the four different versions of the rGenSi model and the resulting estimates of genetic correlations between relatives. The most restricted model had the best fit for height and was relatively similar to the best fitting model for depression, with partner correlation and heritability being a bit higher in the model with mating on latent variables. However, for educational attainment, the model that freely estimated a had some noticeable differences from the most restricted model. First, in the restricted model, m is equal to the observed correlation between partners, whereas in the model where a is estimated freely, m is the partner correlation in a latent variable on which partners assort phenotypically. Hence, estimating m based on the observed or latent phenotypes leads to different estimates of partner correlations and thus also of genetic partner resemblance. The more restricted models had narrower confidence intervals than the more open models, indicating that the more restricted models should be preferred when they fit the data well. 1 0.087 Notes: * best fitting model before adding equilibrium constraint. ** previously best fitting model with equilibrium constraint added.The difference in -2 times log likelihood (-2LL) is asymptotically χ 2 -square distributed (with Δdf degrees of freedom), which allows testing for significant differences in χ 2 for nested submodels. If the difference in χ 2 is non-significant, the simpler, more restricted model is preferred. In addition, models with low Akaike Information Criterion are preferred. Table 6. Parameters estimated in rGenSi models and genetic correlations between relatives implied from these models. Parameter  Full (best)  Free a,c=0  a=1, free c  a=1, c=0  est cilo  cihi  est  cilo  cihi  est  cilo cihi  est  cilo

Supplementary Note 2. Polygenic scoring method, adjustment for principal components, and sensitivity analyses using different thresholds
Polygenic scores were calculated using the following parameters (set via the relevant flags in the PRSice software): -Clumping parameters: window size = 500kb; p-value threshold = 1; r2 threshold= 0.25 -Minor allele frequency: 1% -Exclusion of MHC region specified at chr6:25000000-34000000 Supplementary Table 7 describes the correlations between partners in polygenic scores by the threshold for including a SNP in the polygenic score. For educational attainment and height, using higher cut-offs implied higher correlations between partners' polygenic scores, but this levelled off around p=0.0005. For depression, the partners' polygenic scores had low correlations at all thresholds. p-values on the x-axis are not results of this study, but refer to the probability of observing an association between a SNP and the phenotype of equal or greater magnitude in the discovery GWAS under the hypothesis of no association. The p-values are not adjusted for multiple testing; p < 5*10 -8 is considered genomewide significant.

Supplementary
To evaluate the influence of different methods for adjusting for principal components, we calculated associations between relatives' polygenic scores and relatives' phenotypes using three methods. These principal components are described in more detail in Supplementary Methods, page 1. The three sets of correlations between relatives are shown in Supplementary Table 8 and 9. In the first set of results (a), we estimated crude associations between polygenic scores without taking the principal components into account. In the second set of results (b), we first residualized polygenic scores and phenotypes on the principal components and then estimated the associations between these residuals. In the third set of results (c), we included the principal components directly in the model used to estimate the correlations. In Supplementary Table 8, we present results adjusted for 20 principal components, which is the number of principal components used in our analyses. Supplementary Table 9 we present results adjusted for 50 principal components to check if this changes the results. The correlations were estimated in OpenMx, and the principal components were set to influence the means in the model. The results indicate that the three methods for estimating the correlations provide similar results, with differences in the second decimal place. Most differences were 0.00 or 0.01, with the largest being 0.03 (correlations between siblings' polygenic scores for height). We consider these differences to be trivial and of no practical importance. The small consequences of adjusting for the principal components may be due to the homogeneity among participants in the Norwegian Mother, Father and Child Cohort Study (MoBa), from which we draw our sample.
We identified 1,492 pairs of second, third, or fourth-degree relatives in the sample by using the KING (Kinship-based INference for Gwas) software (https://www.kingrelatedness.com/) with the -ibs command (identity by state). Siblings (first-degree relatives) were already a part of the design. As a sensitivity analysis, we randomly excluded one individual from each pair and recalculated the correlations between relatives' polygenic scores and relatives' phenotypes. This led to changes in correlations of at most 0.01 compared to Supplementary  Fig. 7 presents model parameters from rGenSi models using polygenic scores with different thresholds, whereas Supplementary Fig. 8 similarly presents genetic correlations between relatives derived from these models. Using higher cut-offs was associated with increases in the genetic signal for all phenotypes and smaller confidence intervals for the rGenSi estimated genetic correlations. Otherwise, the results remained similar across thresholds.
Supplementary Fig. 7. Parameter estimates by thereshold for including SNPs in polygenic scores. Estimated parameters from rGenSi models using polygenic scores with different cut-offs for inclusion of SNPs, including 95% likelihood-based confidence intervals. p-values on y-axis refer to the probability of observing an association between a SNP and the phenotype of equal or greater magnitude in the discovery GWAS under the hypothesis of no association. The p-values are not adjusted for multiple testing; p < 5*10 -8 is considered genomewide significant. edu=educational attainment at age 30; hcm = height in centimeters; mdd = symptoms of major depressive disorder. Based on genotype data from n=26,681 complete pairs of partners, n=2,170 complete sibling pairs, n=3,905 complete in-law pairs, and n=1,763 complete co-in-law pairs, and phenotype data from n=63,781 complete pairs of partners, n=13,455 complete sibling pairs, n=21,496 complete in-law pairs, and n=8,699 complete co-in-laws pairs. Estimated genetic correlations from rGenSi models using polygenic scores with different cut-offs for inclusion of SNPs, including 95% likelihood-based confidence intervals. p-values on y-axis refer to the probability of observing an association between a SNP and the phenotype of equal or greater magnitude in the discovery GWAS under the hypothesis of no association. The p-values are not adjusted for multiple testing; p < 5*10 -8 is considered genomewide significant. edu=educational attainment at age 30; hcm = height in centimeters; mdd = symptoms of major depressive disorder. Based on genotype data from n=26,681 complete pairs of partners, n=2,170 complete sibling pairs, n=3,905 complete in-law pairs, and n=1,763 complete co-in-law pairs, and phenotype data from n=63,781 complete pairs of partners, n=13,455 complete sibling pairs, n=21,496 complete in-law pairs, and n=8,699 complete co-in-laws pairs.