Japanese GWAS identifies variants for bust-size, dysmenorrhea, and menstrual fever that are eQTLs for relevant protein-coding or long non-coding RNAs

Traits related to primary and secondary sexual characteristics greatly impact females during puberty and day-to-day adult life. Therefore, we performed a GWAS analysis of 11,348 Japanese female volunteers and 22 gynecology-related phenotypic variables, and identified significant associations for bust-size, menstrual pain (dysmenorrhea) severity, and menstrual fever. Bust-size analysis identified significant association signals in CCDC170-ESR1 (rs6557160; P = 1.7 × 10−16) and KCNU1-ZNF703 (rs146992477; P = 6.2 × 10−9) and found that one-third of known European-ancestry associations were also present in Japanese. eQTL data points to CCDC170 and ZNF703 as those signals’ functional targets. For menstrual fever, we identified a novel association in OPRM1 (rs17181171; P = 2.0 × 10−8), for which top variants were eQTLs in multiple tissues. A known dysmenorrhea signal near NGF replicated in our data (rs12030576; P = 1.1 × 10−19) and was associated with RP4-663N10.1 expression, a putative lncRNA enhancer of NGF, while a novel dysmenorrhea signal in the IL1 locus (rs80111889; P = 1.9 × 10−16) contained SNPs previously associated with endometriosis, and GWAS SNPs were most significantly associated with IL1A expression. By combining regional imputation with colocalization analysis of GWAS/eQTL signals along with integrated annotation with epigenomic data, this study further refines the sets of candidate causal variants and target genes for these known and novel gynecology-related trait loci.

. Phenotype and case/control sample counts/demographic information 4 Supplementary Datasets 8 Figure S1. Population structure analysis 10 Figure S2/page 1. Meta-analysis Manhattan plots for phenotypes with significant loci 11 Figure S2/page 2. Meta-analysis Manhattan plots for phenotypes with significant loci 12 Figure S3/page 1. QQ plots of genome-wide meta-analysis association statistics 13 Figure S3/page 2. QQ plots of genome-wide meta-analysis association statistics 14

Regional association and GWAS/eQTL colocalization plots 15
Common legends for regional association and GWAS/eQTL colocalization plots: 15 Figure

Hondo
Okinawan Chinese Figure S2/page 1. Meta-analysis Manhattan plots for phenotypes with significant loci Manhattan plots of -log10(P value) statistics from DISTMIX summary statistics based imputation (using 1000 Genomes Phase 3) of gynecology-related phenotype GWAS analyses. Top association signals were labeled with up to two annotated genes from Supplementary Worksheet S1 for variants with r 2 >0.8 to the top SNP. Peaks with more than two genes overlap more than one independent association signal. Green line denotes the nominal significance cutoff (1.21x10 -7 ) and the red line significance after adjustment for multiple testing of 22 phenotypes (P<5.5x10 -9 ).

. Meta-analysis Manhattan plots for phenotypes with significant loci
Manhattan plots of -log10(P value) statistics from DISTMIX summary statistics based imputation (using 1000 Genomes Phase 3) of gynecology-related phenotype GWAS analyses. Top association signals were labeled with up to two annotated genes from Supplementary Worksheet S1 for variants with r 2 >0.8 to the top SNP. Peaks with more than two genes overlap more than one independent association signal. Green line denotes the nominal significance cutoff (1.21x10 -7 ) and the red line significance after adjustment for multiple testing of 22 phenotypes (P<5.5x10 -9 ).

Regional association and GWAS/eQTL colocalization plots
Common legends for regional association and GWAS/eQTL colocalization plots: Panels (a): Plot of -log10(P-values) around association signal. Upper sub-panel displays points sized by LD r 2 to the top SNP. Lower panel shows -log10(P-values) with (red circles) and without (black circles) conditioning on the top SNP . The top SNP in each panel is plotted as a purple upright triangle. Next labelled panels show analyses of GTExPortal single-tissue and multi-tissue eQTL data. Each single-or multi-tissue eQTL analysis was processed to identify putative independent signals based on pairwise EUR and AFR LD r 2 . SNPs in each sub-panel were then coloured by signal assignment and rank of the top SNPs (1 st ranked=green, 2 nd =orange, 3 rd =purple, 4th=magenta) and sized by LD r 2 to each signal's top SNP. Inset left-side  A common description of the regional-association/signal-colocalization plots is provided above. Panel (b) shows analysis of GTExPortal CCDC170 eQTL data with the two upper panels showing single-tissue analyses for subcutaneous adipose and breast mammary tissue samples. −log 10 (P )

Mb (KCNU1/ZNF703) locus
A common description of the regional-association/signal-colocalization plots is provided above Figure S4.
Analyses of GTExPortal single-tissue and multi-tissue eQTL data for ZNF703 in Panel (b) and for ncRNA RP11-419C23.13 in Panel (c). ABF colocalization analysis of multi-tissue data for both genes was run using Metasoft FE β-coefficients and standard errors as input. The gene model sub-panel highlights the two target eQTL genes in red and blue. All high LD GWAS variants shown in Panel (d) also had RSS>0.8 to top eQTL signal SNPs for each gene-tissue pair that is presented.  GTEx ZNF703 eQTL analyses   Multi−tissue RE2   Figure S6. Transformation of pain severity levels to 0-10 Numeric Rating Scale Dysmenorrhea was mapped from the original 1-5 integer values to points on the 0-10 Numeric Rating Scale for pain. Panels (a) and (b) show that the transformed values better represent a surrogate variable for pain severity, namely, the proportion of subjects in each level who take pain medicine. Panel (a) Untransformed pain severity levels versus proportion of subjects in each level taking pain medicine during menstruation. Panel (b) Transformed pain severity versus proportion of subjects in each level taking pain medicine during menstruation. Figure S7. Dysmenorrhea (pain severity) chr1:115.80-115.83 Mb (NGF) locus A common description of the regional-association/signal-colocalization plots is provided above Figure S4. Panel (b) shows analyses of GTExPortal multi-tissue and single-tissue eQTL data for lncRNA RP4-663N10.1. There were no NGF eQTLs that overlapped GWAS SNPs.   Figure S8. Dysmenorrhea pain severity chr2:113.48-113.58Mb (IL1 gene cluster) locus A common description of the regional-association/signal-colocalization plots is provided above Figure S4. Analyses for single-tissue and multi-tissue IL1A and IL37 eQTL data are shown in Panel (b) and Panel (c), respectively.

Mb (OPRM1) locus
A common description of the regional-association/signal-colocalization plots is provided above Figure S4. Panel (b) shows analyses of GTExPortal single-tissue and multi-tissue OPRM1 eQTL data. The ABF colocalization method was run using β-coefficients and standard errors. Candidate causal SNPs shown in Panel (c) are SNPs with r 2 equiv>0.8 and RSS>0.8 to top eQTL signal SNPs for brain cerebellar hemisphere and testis tissue samples.