GWAS of bone size yields twelve loci that also affect height, BMD, osteoarthritis or fractures

Bone area is one measure of bone size that is easily derived from dual-energy X-ray absorptiometry (DXA) scans. In a GWA study of DXA bone area of the hip and lumbar spine (N ≥ 28,954), we find thirteen independent association signals at twelve loci that replicate in samples of European and East Asian descent (N = 13,608 – 21,277). Eight DXA area loci associate with osteoarthritis, including rs143384 in GDF5 and a missense variant in COL11A1 (rs3753841). The strongest DXA area association is with rs11614913[T] in the microRNA MIR196A2 gene that associates with lumbar spine area (P = 2.3 × 10−42, β = −0.090) and confers risk of hip fracture (P = 1.0 × 10−8, OR = 1.11). We demonstrate that the risk allele is less efficient in repressing miR-196a-5p target genes. We also show that the DXA area measure contributes to the risk of hip fracture independent of bone density.


Supplementary
The mature miRNA (miR) sequence derived from MIR196A2 is excised from the 5´ arm of the pre-miR hairpin, then referred to as miR-196a-5p (mirbase.org). This microRNA is tissue-specific and is enriched in chondrocytes and in glycosaminoglycan secreting cells (chondrocytes, tendon cells, fibroblasts) (http://fantom.gsc.riken.jp). The rs11614913 variant resides on the 3´ arm ( Supplementary Fig. 3) and for this reason does not alter the seed sequence. Therefore, the variant is not expected to affect recognition of it´s target gene mRNAs. However, the T-allele in the mature miR duplex introduces a wobble bond, i.e. pairing between uracil and guanine in RNA, at the 5´end of the 5p-arm ( Supplementary   Fig. 3). On the basis of this observation, we hypothesized that the effect of the variant might lie in efficiency, rather than specificity, of target gene repression due to either 1) differences in thermodynamic stability of the mature miR duplex by genotype thereby leading to effects on incorporation of the mature 5p strand into Ago complex 1 or 2) effects of the genotype on miR processing to the mature miR duplex form 2,3 , as previously proposed 4 , which could affect the abundance of the mature miR duplex. In both cases, we expect the variant to have a general effect on transcription levels accross genes targeted for repression by miR-196a-5p.
In order to assess the impact of the rs11614913 variant in MIR196A2 on target gene expression, we carried out an experiment in which we transfected HEK293T cells with over-expression plasmids containing 1) MIR196A2 with the C-allele, 2) MIR196A2 with the T-allele and 3) no insert (empty plasmid). In order to avoid biases introduced by variation in transfection efficiency between cell cultures, we carried out cell sorting by flow cytometry for cells marked by GFP protein expression from plasmid DNA. After cell sorting we isolated RNA and carried out RNA-seq (polyA+) to assess genomewide transcript expression levels. The experiments were independently done in six replicates for each condition (n=18).
The RNA-seq transcript expression was quantified with Kallisto 5 making use of Ensembl transcript sequences (version 87) as a reference transcriptome. Gene expression was computed by aggregating exression across transcripts. We then used DESeq2 for Bioconductor in R to analyse differential expression 6 . In principle, the biological interpretation of differentially expressed genes in our set up can be seen as 1) primary effects and 2) secondary effects. Here, the primary effects are the direct target genes of mir-196a2 whereas the secondary effects include downstream effects caused by downregulation of the target genes, e.g. downregulation of transcription factors (TFs) by miR-196a2 can lead to downstream changes in gene expression. In order to study the genotype effects on the primary target genes, we sought to define a group of high-confidence miR-196a-5p target genes. We downloaded a list of 302 predicted target genes (TargetScan 7.2) and we find that target genes with high conservation scores (PCT; Probability of conserved targeting), defined in TargetScan 7.2, are more likely   to have experimental support (genes identified as plausible direct targets of miR-196a-5p; according to   Tarbase V8 7 ) as compared with those with low conservation score ( Supplementary Fig. 4). In particular, the fraction of genes containing a 3´UTR recognition site, predicted by TargetScan v7.2, with experimental support in Tarbase starts to become particularily high at PCT > 0.86 ( Supplementary Fig. 4).
On the basis of this observation, we defined our high confidence miR-196a-5p target genes as genes with in silico predicted 3´UTR recognition site for miR-196a-5p where the PCT score is > 0.86; resulting in a list of seventeen genes. This PCT score translates into less than 14% probability for the miR-196a-5p binding sites in the target genes being conserved by chance. The biological relevance of these genes to human variation in bone morphology is seen in that the seventeen target genes are significantly enriched for functional roles involved in skeletal system development (13.5 fold overrepresentation; P-value=2.36e-5), see Supplementary Table 11. Fourteen of the high-confidence miR-196a-5p target genes are expressed in the MIR196A2 transfected HEK293T cells.
To further assess the validity of the defined list of miR-196a-5p target genes, we find that these genes are highly enriched among the set of genes found significantly downregulated in cells with MIR196A2 insert (C-or T-alleles) versus empty controls (Supplementary Fig. 5A  (Bioconductor for R). We also carried out a permutation based test to determine significance; we counted how often the observed difference was lower than the differences derived from permutations of samples agnostic to the allele annotation; ( 12 6 ) =924 possible unique permutations. This analysis demonstrates a significant difference in downregulation of the target genes between the two transfected cells types (P = 0.042) showing that the rs11614913[T] allele is less effective, when compared with the C-allele, in repressing expression of miR-196a-5p target genes.
Finally, in order to gain insight into the biological effects of miR-196a2, we carried out a gene set enrichment analysis for functional categories in the comparison between MIR196A2 insert (C-or T-allele) and empty vector controls. This analysis was carried out using fgsea in Bioconductor for R using gene ontology (GO) terms derived from MSigDB v6.2 ref8 using log2 fold-change values as the gene ranking parameter. The top significant functional category (P-adjusted < 0.05) involves downregulation of genes involved in energy metabolism (e.g. GO terms oxidative phosphorylation and electron transport chain).
Also, a functional category relevant to biological processes in bone morphology was found to be enriched (GO term embyonic skeletal system morphogenesis). By looking at the leading edge genes for the embryonic skeletal system morphogenesis category in this analysis, i.e. genes in this category with large effect sizes for downregulation, we note that 7 out of these 21 genes have experimental support as direct target genes of miR-196a-5p. Thus, many of the genes driving the enrichment for this functional category are likely primary targets of miR-196a-5p, but others are not; wheather these are downstream effects caused by miR-196a-5p remains to be fully established.

Supplementary Note 2.
Hip fractures and DXA area, BMD, and BMC We performed a multivariable logistic regression for hip fractures, using age, height and sex adjusted measurements of DXA area, BMD and BMC of lumbar spine, as well as area and BMD of femoral neck, intertrochanteric and trochanteric and BMC of the total hip as explanatory variables. Each of these 10 phenotypes showed independent association with hip fractures, adjusting for the other 9 phenotypes, according to likelihood ratio test (Supplementary Table 17).
Furthermore, five of the loci associated with bone area were also nominally associated with hip fractures (p<0.05). The figure below shows the estimated effect sizes, with 95% confidence intervals, of each of these five variants for the phenotypes listed above. There it can be seen that all of these variants also show significant association with several BMD or BMC measurements so we are not able to see whether the increased risk of fractures is due to bone area rather than BMD or BMC.  SNPs are colored to reflect their linkage disequilibrium (LD) with the marker that was genotyped in the replication samples, in most cases this is the top marker, however, at some loci a functional single-variant genotyping assay could not be made for the top marker and, therefore, a proxy SNP was used. The blue line indicates recombination rates, based on the Icelandic recombination map for males and females combined, with the peaks indicating recombination hotspots. Known genes in the region are shown underneath the plot. The plots were created using a stand-alone version of LocusZoom software 9 . Source data are provided as a Source Data file. Supplementary Figure 4. Fraction of genes with predicted miR-196a-5p 3´UTR recognition sites that have experimental support The fraction of genes with in silico predicted 3´UTR recognition sites for miR-196a-5p that have experimental support (Tarbase V8) increases for predictions with higher PCT scores. Source data are provided as a Source Data file. Supplementary Figure 6. Downregulation in MIR196A2 insert cells for genes with predicted 3'UTR recognition sites.
The mean for log2 fold change values derived from MIR196A2 insert versus empty vector controls for genes with in silico predicted 3´UTR recognition sites shown as grey dots (y-axis) calculated at incremental steps of thresholding the PCT score (x-axis); the lines represent one standard deviation from the mean. with respect to PCT score. Negative log2 fold change values represents downregulation in MIR196A2 insert cells compared with empty controls. Note the change point for the mean in log2 fold change values at PCT > 0.86, highlighting the value of using PCT score for defining a list of high-confidence miR-196a-5p target genes. Source data are provided as a Source Data file. Figure 7. Gene set enrichment for the entire set of genes suppressed by MIR196A insert.

Supplementary
Gene set enrichment for the MIR196A insert versus empty vector control comparison. In this analysis, the entire set of genes tested for differential expression are ranked on the basis of log2 fold change values, and then assessed for whether specific functional gene ontology terms are enriched at the high or low end of the gene rankings. We find significant enrichment (P-adjusted < 0  Tables   Supplementary Table 1. Conditional analysis between variants at three loci in the Icelandic discovery dataset: a) between two variants at 15q25.2 at the ADAMTSL3 and SH3GL3 genes, b) between the two variants at the SOX9 locus that associate with intertrochanteric area in the Icelandic discovery dataset and two recently published variants that associate with hip shape measures, and c) between the 4q31.21 Icelandic variant and a recently published variant associated with hip shape measure  Table 3. Gene function analysis. A) Analysis of gene ontology overrepresentation for the high-confidence miR-196a-5p target genes as carried out in Panther (pantherdb.org); showing results at FDR < 5% ordered according to decreasinging level of overrepresentation ratios. B) The seventeen miR-196a-5p target genes listed out with mean expression in HEK293T (inhouse data) and osteoblasts (Fantom5 data). We tested the correlation between expression of HOXC8 in adipose tissue samples from 746 individuals and genotypes of the two variants: rs11614913 in MIR196A2 and rs371683123 in the promoter region of the HOXC8 gene. The two-sided P values and effect, in standard deviations, is shown before and after conditioning on the other variant: rs11614913 before and after conditioning on rs371683123, and rs371683123 before and after conditioning on rs11614913. EA is the effect allele, OA the other allele, EA Freq. is the effect allele frequency.