ANO7 African-ancestral genomic diversity and advanced prostate cancer

Background Prostate cancer (PCa) is a significant health burden for African men, with mortality rates more than double global averages. The prostate specific Anoctamin 7 (ANO7) gene linked with poor patient outcomes has recently been identified as the target for an African-specific protein-truncating PCa-risk allele. Methods Here we determined the role of ANO7 in a study of 889 men from southern Africa, leveraging exomic genotyping array PCa case-control data (n = 780, 17 ANO7 alleles) and deep sequenced whole genome data for germline and tumour ANO7 interrogation (n = 109), while providing clinicopathologically matched European-derived sequence data comparative analyses (n = 57). Associated predicted deleterious variants (PDVs) were further assessed for impact using computational protein structure analysis. Results Notably rare in European patients, we found the common African PDV p.Ile740Leu (rs74804606) to be associated with PCa risk in our case-control analysis (Wilcoxon rank-sum test, false discovery rate/FDR = 0.03), while sequencing revealed co-occurrence with the recently reported African-specific deleterious risk variant p.Ser914* (rs60985508). Additional findings included a novel protein-truncating African-specific frameshift variant p.Asp789Leu, African-relevant PDVs associated with altered protein structure at Ca2+ binding sites, early-onset PCa associated with PDVs and germline structural variants in Africans (Linear regression models, −6.42 years, 95% CI = −10.68 to −2.16, P-value = 0.003) and ANO7 as an inter-chromosomal PCa-related gene fusion partner in African derived tumours. Conclusions Here we provide not only validation for ANO7 as an African-relevant protein-altering PCa-risk locus, but additional evidence for a role of inherited and acquired ANO7 variance in the observed phenotypic heterogeneity and African-ancestral health disparity.


Annotation of short variants
The annotation of short variants was processed with the online tool SNPnexus (https://www.snpnexus.org/v4/)(1).SNPnexus provides multiple tools and datasets using the publicly accessible Ensembl gene annotation system (Ensembl Variation 95).The database includes HapMap (Nov 2018 updated), 1000 Genomes (Nov 2018 updated), and gnomad v2.1(Mar 2019 updated).Predicted effects of single nucleotide variants (SNVs) were merged from two annotation tools, including Sorting Intolerant From Tolerant (SIFT, Jan 2019 updated) (2) and Polymorphism Phenotype (PolyPhen, Jan 2019 updated) (3).Predicted deleterious variants (PDVs) of the main transcript of ANO7 EN single nucleotide variant ST00000274979 included deleterious variants predicted by SIFT and probably/possibly damaging variants predicted by PolyPhen, as well as indels with stop-gain/frameshift effects predicted with Ensembl (4).Minor allele frequency (MAF) of PDVs in African and European populations were obtained from online Allele Frequency Aggregator (ALFA) (5).

Sequence analysis
The sequence and phylogenetic analyses were processed using MEGA (v11) (1) where multiple algorithms and methods are available for each step.A total of 45 unique sequences of amino acid sequences were obtained by replacing the original transcript (ENST00000274979) with respectively germline missense variants for each of the 166 patients.Multiple sequence alignment of the 45 sequences was made using MUSCLE in MEGA (2).The best protein model estimated by PhyML (v 3.0) (3) was Jones-Taylor-Thornton (JTT) with evolutionary rates among sites invariable and following discrete Gamma distribution, which in turn was used for the construction of phylogenetic trees and estimation of pairwise genetic distance in MEGA.We used the neighbour-joining statistical method and bootstrap values equal to 1,000 to construct a phylogenetic tree, which was only assessed for groupings due to low branch support.

Pairs of correlated variants and haplotype block analysis
Correlations between ANO7 PDVs and other important variants, including germline structural variants (SVs) and previously reported PCa causal SNVs, were accessed using Spearman's rank correlation coefficient (ρ) from Stats package in R, which assumes no frequency distribution.Significantly correlated pairs were those with FDR<0.05 after p-value adjustment by multiple hypothesis correction using Rstatix package (v 0.7.0) (4).
Haplotype block analysis of SNVs within ANO7 was conducted with Haploview (v 4.1) (5).Input data was prepared using VCFtools (v 0.1.14)(6) excluding multi-allelic variants, insertions, or deletions.Haploview calculated pairwise measures of LD between SNVs with MAF>0.001, and defined LD blocks under default confidence intervals where 95% of the SNVs were considered in strong LD (5).The strong LD of a pair was defined if the one-sided lower and upper 95% confidence bounds on D-prime were above 0.7 and 0.98, respectively (7).

Age at diagnosis and ANO7 variants
The associations between age at diagnosis and selected ANO7 variants was investigated with linear regression models using Stats package in R. Variables were examined using t-test with a threshold of significance at a P-value of 0.05.An indicator was made to indicate whether a patient carried more than two variants of PDVs and/or germline SVs.One African patient was filtered out for lacking age information and all the European patients were excluded as none presented with more than two selected variants.The analysis cohort (n=108, all African) consisted of 93 patients with <3 selected variants and 15 patients with ≥3 selected variants.As the linear regression model allows assessment of effects of multiple variables simultaneously, the best model was determined by the fitness of the model estimated by Akaike's Information Criterion (AIC) in stepwise selection.Variables in the best model included the indicator of whether having more than two selected variants, the total count of genome-wide short germline variants and PCa risk levels.Genome-wide tumour mutational burden (TMB) was tested but was not selected.

ANO7 variants prevalence in ethnic groups
The difference of allele frequency in different ethnic groups (African, n=109 and European, n=57) was compared for 13 PDVs using logistic regression models from Stats package in R. The logistic regression analyses model the probability of a binary discrete variable, so the genotype information was transformed into a binary variable where "0" means no alternate alleles and "1" means otherwise.The significant level was tested using t-test, and corrected by false discovery rate (FDR) using Rstatix package (v 0.7.0) in R (v 4.1.3;).Significantly differential MAFs were defined by FDR<0.05.

Pores identified in all the sequences containing PDV p.Ile740Leu
Ten unique sequences across the study cohort contain p.Ile740Leu and other missense variants if co-occur in a patient.Identified pores in the ten sequenced were classified into Pores 1-2 if showing the same placement of pores identified in the original protein, and into two new pores named as Pores 3-4, listed in Table S7 and presented in Figure S13-S17.
Pores 1-2 with normal radius were both identified in Seq 39 that contains p.Ile740Leu and p.Asp156Glu while Pore 2 with normal radius was found in Seq 43 that contains p.Ile740Leu and p.Asp70Asn.Save for the above two altered proteins, Pores 1-2 with narrower bottlenecks were observed in altered proteins (Seq 28 for narrower Pore 1, and Seqs 9, 28, and 31 for narrower Pore 2), while neither of the Pores 1-2 were identified in the other four altered proteins (Seqs 18-20, 22).
Two new pores were identified repeatedly in altered proteins and both bypassed the putative Ca2+ binding sites.Pore 3 is the one with one end among at helices α1-2, 9 and the other end among α5-9 (Figure S12a), identified in two proteins (Seqs 12 and 18).Pore 4 is among helices α 5,7, 9 for one end and among α 5-7, 9 for the other end (Figure S12b), reported in three proteins (Seqs 20, 22, and 28).Protein predicted with Seq 19 only showed a broken pore.

Figure S2 .Figure S3 .Figure S4 .
Figure S2.MAFs of 17 ANO7 array SNPs in cancer cases (n=473) and controls (n=307).Red circles represent the MAF in cases while green circles represent the MAF in controls.Two SNPs (rs111624461 and rs199829153) were excluded from the lollipop plot due to no protein changes observed.

Figure S5 .
Figure S5.PPFIA4-ANO7 fusions.The fusions between PPFIA4 on chr1 and ANO7 on chr2 was reported in two SV fusion events in sample KAL0022 at positions noted in red vertical lines.a, 3' end of PPFIA4 at chr1: 203030949 was connected to 5'end of ANO7 at chr2: 241236618.b, 5' end of PPFIA4 at chr1: 203030713 fused with 3'end of ANO7 at chr2: 241236638.

Figure S8 .
Figure S8.Ion conduction of Pore 1 predicted in Anoctamin 7 protein.The top part shows placement of Pore 1 is among helices α5-9.Transmembrane domains were between blue and red nets.The bottom part shows the properties of Pore 1.The Y values of bar plots indicate radii of the pore.The top bar plot shows hydropathy with colours where blue indicates hydrophilicity and yellow indicates hydrophobicity.The bottom bar shows ionisable capability, where the darker the purple, the easier to be ionisable.

Figure S9 .
Figure S9.Alterations of Pore 2 identified in Anoctamin 7 with p.Ala470Val and p.Ile740Leu.Residues 673-694 are in orange.Pore 2 is in purple.PDVs are in red.

Figure S10 .
Figure S10.Alterations of Pore 2 in Anoctamin 7 with p.Ile740Leu and p.Arg578Cys.Left, residues 673-694 are in orange.Pore 2 is in purple.PDVs are in red.Middle and right top, reduced bottleneck is shown.Right bottom, characteristics predicted for Pore 2 in the altered protein.More detailed configurations are described in Figure S8.

Figure S11 An example of a reproducible model of Pore 1 .
Figure S11 An example of a reproducible model of Pore 1.Three pores in green, red and orange colours are placed in the same group of helices 5-9, but divergent to each other within the top compartment of the model.

Figure S12 .
Figure S12.Pores that are commonly identified in altered proteins.a, Pore 3 in marine is shown in an altered protein that contains a PDV p.Ile740Leu in red and b, Pore 4 in green is shown in an altered protein that contains PDVs p.Ile740Leu and p.Ala494Val in red.Putative Ca 2+ binding sites are in cyan, shown as sticks.Residues 673-694 that change positions are in orange.

Figure S13 .
Figure S13.Pore 1 identified in altered proteins.a, Pore 1 for Seq 39 that is normal.b, Pore 1 (0.5Å) for Seq 28 that is narrower.The characteristics of respective pores are on the right plots.More detailed configurations are described in Figure S8.

Figure S15 .
Figure S15.Pore 3 identified in proteins.a, Pore 3 for Seq 12 and b, Seq 18.More detailed configurations are described in Figure S8.

Figure S17 .
Figure S17.Positions and characteristics of broken pores.a, the pore broken in Seq 19.b, the pore identified in Seq 20 (Pore 4-like).It is broken when passing through helix α7.c, another broken pore identified in Seq 20. d, the pore broken in Seq 22. e, the pore identified in Seq 28 (Pore 4-like).It is broken when passing through helix α7.More detailed configurations are described in Figure S8.

Table S2 . Results of risk associations on 17 SNPs identified in exome array data
bIf being a PDV, variants will also be included in Table1

Table S3 . Intercorrelations between germline structural variants (SVs) and risk variants.
a rho, Spearman's correlation coefficient.b false discovery rate (FDR).c IC is short for inclusive correlated.Y is for inclusive correlated pairs and N is for non-inclusive correlated pairs.d Genomic changes compared to NG029845.

Table S5 . Haploview results of African and European WGS data.
Table in the attached excel file.

Table S8 . Information of samples for WGS study.
Table in the attached excel file.