Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus)

Diabetes mellitus, a common endocrinopathy affecting domestic cats, shares many clinical and pathologic features with type 2 diabetes in humans. In Australia and Europe, diabetes mellitus is almost four times more common among Burmese cats than in other breeds. As a genetically isolated population, the diabetic Australian Burmese cat provides a spontaneous genetic model for studying diabetes mellitus in humans. Studying complex diseases in pedigreed breeds facilitates tighter control of confounding factors including population stratification, allelic frequencies and environmental heterogeneity. We used the feline SNV array and whole genome sequence data to undertake a genome wide-association study and runs of homozygosity analysis, of a case–control cohort of Australian and European Burmese cats. Our results identified diabetes-associated haplotypes across chromosomes A3, B1 and E1 and selective sweeps across the Burmese breed on chromosomes B1, B3, D1 and D4. The locus on chromosome B1, common to both analyses, revealed coding and splice region variants in candidate genes, ANK1, EPHX2 and LOX2, implicated in diabetes mellitus and lipid dysregulation. Mapping this condition in Burmese cats has revealed a polygenic spectrum, implicating loci linked to pancreatic beta cell dysfunction, lipid dysregulation and insulin resistance in the pathogenesis of diabetes mellitus in the Burmese cat.

Genome-wide and haplotype association analyses. Samples from eighty-two Burmese cats that were genotyped on the Illumina Infinium Feline 63 k iSelect DNA array passed quality control. Multidimensional scaling (MDS) revealed some geographical clustering of cat populations (Fig. 1a). Cats from Europe (EU) clustered distinctly from Australian (AUS) and British (UK) samples. Given geographical clustering of cases, the GWAS was first run in Australian cats only to control for population stratification (analysis 1). Analysis 1 comprised of 22 cases and 20 controls, Cases and controls within the cohort of samples used in the initial case-control GWAS (analysis 1) which were evenly dispersed across the population cohort (herein referred to as the ' Australian cluster') ( Fig. 1b). For analysis 1, the QQ-plot revealed a deviation from the expected P-value distribution only in the tail (λ = 1.03) (Fig. 1c). This analysis totalled 30,212 SNVs and one marker at E2:6,883,182 (P raw = 4.68 × 10 −6 ) passed the empirical genome-wide significance threshold (Fig. 2a).
Five regions of association on chromosomes A3, B1 and E1 were defined based on LD clumping with r 2 > 0.8 to the top regional SNV and used to identify FDM-associated haplotypes within each of these clumps ( Table 1). The highest associated haplotype spanned the interval chrA3:134,425,431-134,601,739 bp. This region spanned genes; Lipin-1 (LPIN1), Neurotensin receptor 2 (NTSR2) and GREB1 and harboured a SNV within GREB1 (P = 4.33 × 10 −06 ) unique to cases.
The second highest associated haplotype spanned the interval chrB1: 48 www.nature.com/scientificreports/   www.nature.com/scientificreports/ and splice region variants in Cathepsin Z (CTSZ), Negative elongation factor CD (NELFCD), Neuroendocrine secretory protein 55 (NESP55), Ankyrin 1 (ANK1) and UNC-5 netrin receptor (UNC5D) ( Table 2), 708 intronic, two splice region variants, two 5′ UTR variants in GPAT4 and GINS4 and seven 3′ UTR variants in CTSZ and NELFCD matched risk haplotypes across the five FDM-associated haplotype blocks (Table S3). Most notably, variants matching the FDM-risk haplotype on B1 were identified in common T2D candidate gene, ANK1. Of WGS samples, two cases were heterozygous for both ANK1 splice variants, all other individuals were homozygous for the reference allele. SNV, g.43227755C > T, was located within 5 bp of the region of a constitutive donor splice site at the start of intron 4 and g.43235911C > T was located 2 bp upstream of a constitutive acceptor splice site at the end of intron 9.    (Table 2). SNVs within the chrB1:34,395,302-38,202,712 bp ROH were in moderate LD (r 2 = 0.5) to the highest associated SNV in the chrB1:40,408,022-54,187,390 bp ROH (Fig. 4a). ChrB1:40,408,022-54,187,390 overlapped the most significant region of FDM-association identified in analysis 2 (Fig. 4b) and was observed in 80% of controls and 49% in cases and contained 65 genes. FDM-risk variants segregating in cases could not be detected as only one diabetic WGS sample had the chrB1:40408022-54187390 ROH. A homozygous missense variant (c.163A > G) in exon 1 of Epoxide hydrolase 2 (EPHX2) was fixed across all WGS samples. Annotated FDM-risk variants discovered across ROH are presented in table S4.

Discussion
We sought to identify novel genetic loci associated with FDM in the Australian Burmese population. GWAS exploits the non-random coinheritance of genetic variants (LD) to assay thousands of markers for an association with any phenotypic trait. The Burmese breed is characterised by a low fraction of GWAS-informative SNVs on the 63 K array, and only 49% of the available array markers were included in our analysis. However, the low allele frequency spectrum used in array-based association analyses, coupled with the genetic profile of this breed limits www.nature.com/scientificreports/ the need to control for false-positives. Given the extent of LD observed in pedigreed cat breeds, traditionally conservative forms of correction (i.e. Bonferroni correction) are considered excessive. Empirical estimations of genome-wide significance (i.e. 95% CIs from the distribution of raw P values) are more conservative than what is typically used in human GWAS but are adequate in population isolates 16,36 . Genomic clustering of samples was concordant with geographical provenance. Geographic population stratification was expected as the cats in this study were collected from British, European and Australian populations that have each been subjected to population bottlenecks during their local breed development. Population substructure in GWAS cohorts is identified as a cause of potentially spurious associations 37 . Imprecise modelling of genetic relatedness within GWAS sample populations can cause substantial inflation of test statistics and potentially false association signals. Further, limiting study samples to entirely unrelated individuals is difficult in pedigreed breeds, such as the Burmese. To limit the effects of hidden relatedness and population stratification, the linear mixed-model (LMM) approach implemented by EMMAX was used. LMM have been shown to perform well in comparison to traditional family-based association tests 38 .
The Burmese cats in this study displayed high homozygosity and long ROH consistent with previous reports 14 . The distribution of ROH across the Burmese samples was consistent with previous results 39 . Long ROH (> 5 Mb) are indicative of recent inbreeding, unbroken by historical recombination events. No detectable difference in inbreeding coefficients between FDM cases and control groups was identified. Several factors influence the resolution of ROH calling, including marker density, marker distribution and genotype call quality. Therefore, for most populations, medium density genotyping arrays do not lend themselves to high resolution analysis of ROH 40 . For this analysis, minor allele frequency (MAF) and LD pruning were not employed in the ROH detection as both have been shown to mask the detection of true ROH by limiting genome coverage 39 . Long ROH (1-10 Mb) made up the largest size category observed across all populations, accounting for over 70% of ROH observations. Large ROH may persist in a population because of low rates of local recombination particularly in genomic regions that are subject to positive selection pressures. The risk of T2D is increased by moderate inbreeding and consanguinity within isolated populations [41][42][43] . ROH did not implicate autozygosity as a risk www.nature.com/scientificreports/ factor for FDM in Burmese cats, as diabetic cats did not display a higher number of ROH or higher inbreeding coefficient than non-diabetic cats. Selective sweeps, indicative of positive selection, manifest by the reduction of genetic variation surrounding a beneficial mutation, and occur due to positive selection pressure increasing the frequency of the favourable allele over time 44 . In cats, at least eight loci are involved in coat colour determination, with various combinations of these responsible for extensive phenotypic variation. Genes, TYR and TYRP1, inside ROH support strong selection of tyrosine metabolism and common Burmese coat 'ticking' and colour phenotypes [45][46][47] . Tyrosine metabolism influences the development, differentiation and proliferation of melanocytes, the construction and transport of the melanosome and the synthesis of melanin 48 . Short selective sweeps, indicative of extensive LD, have been associated with T2D in isolated human populations 49 . Sweeps can cause a shift in the allelic frequency of a selected allele and the 'hitchhiking' alleles in their vicinity. Pancreatic islets of FDM and T2D patients are characterised by reduced beta cell function and decreased insulin gene expression 50,51 . FDM-risk variants within these selective sweeps identified GRM5 and DLG2 as candidates for beta cell dysregulation, as both are involved in glucose-stimulated insulin secretion and hyperglycaemia susceptibility in rodent and human [52][53][54] . Additionally, PTPRD contained within the chromosome D4 ROH has been associated with progression to diabetes in humans through enhanced insulin resistance 55,56 . The ROH on chromosome B3 overlapped a syntenic loci in humans implicated in Prader Willi Syndrome (PWS). T2D has been found to affect between 7 and 24% of PWS patients, far exceeding the prevalence in the general population 57,58 . This is likely a consequence of insulin resistance resulting from morbid obesity however PWS patients also exhibit a state of hypoinsulinaemia without expected insulin resistance despite their obese state. Deficits in pancreatic islet development may play a role in the PWS phenotype 59,60 . NIPA1 is a magnesium transporter, upregulated in response to reduced magnesium concentration 61 . Variants in NIPA1 have previously been associated with T2D risk 62 and GABRG3 is an early childhood obesity gene contributing to PWS phenotype 63 . The basis of the diabetic state in obese PWS patients is currently unclear but the PWS locus contains epigenetically imprinted genes that serve as viable candidates for FDM.
The most compelling FDM-association was provided by the locus on chromosome B1, identified in both ROH and GWAS analyses. Coding variants matching the risk-haplotype segregated in ANK1, an established T2D candidate gene associated with decreased beta-cell function in humans 64,65 . Feline ANK1 is highly orthologous (93.89%) to human ANK1 sequence and multiple ANK1 isoforms with affinities for various target proteins are expressed in a tissue specific manner [66][67][68] . Transcripts of varying size are present in tissues essential to glucose metabolism: skeletal muscle 69 , pancreas, adipose and liver 70 . Splice region variants identified in this study may be influential in the molecular function of feline ANK1 isoforms, but any potential impact of these will be dependent on tissue-specific expression of ANK1 isoforms. Alternatively-spliced ANK1 transcripts are functionally diverse and variants in regulatory regions have been implicated in altered molecular function of the various human isoforms and their role in T2D 71 . The physiological role of ANK1 in T2D pathogenesis is unconfirmed but Ankyrin B (ANK2) regulates ATP sensitivity in murine pancreatic beta cells 72 . Additionally, SNVs in the ANK1 promoter region have been found to increase intramuscular fat content in pigs 73 and increased muscle-specific ANK1 expression in human skeletal muscle 71 implying SNVs in the ANK1 promoter region may play a role in the development of insulin resistance.
ROH analysis implicated two loci on chromosome B1 as FDM-risk loci, containing risk variants segregating in cases and across the breed, further highlighting the genetic complexity of FDM. Both partially overlapped the semi-dominant Abyssinian 'ticked' coat locus responsible for the homogenous agouti coat with no body markings characteristic of Burmese cats 47 . Across chrB1:34,395,302-38,202,712, variants segregated in cases in LOXL2, previously implicated in nephropathy and retinopathy in T2D patients 74,75 , indicating these may play a critical role in the progression to FDM. Diabetic nephropathy is not routinely recognised in diabetic cats, although diabetic retinopathy has been reported 76 . Dyslipidaemia is a critical factor in the early inflammatory response in the development of retinopathy 77 . LOX2 overexpression is considered a potential target for treatment of vascular changes involved in diabetic retinopathy 78 .
Inherited derangements in lipid metabolism have been described in the Burmese breed 20,79 and may be related to the high prevalence of FDM. Familial hypercholesterolaemia (FH) is a common autosomal dominant disorder of lipoprotein metabolism, with variable phenotypic presentation in humans 80 . Variants in EPHX2 have been found to exacerbate the dysfunction of the Low-density lipoprotein receptor (LDLR) gene in FH 81 and are associated with insulin resistance in T2D patients 82 . Defects in LDLR result in disturbed clearance of low-density lipoprotein cholesterol (LDL-C) and the clinical presentation varies widely depending on the segregation of risk alleles and haplotype. A missense substitution (c.163A > G) in EPHX2 was fixed across all WGS Burmese samples within the chrB1:40,408,022-54,187,390 bp ROH. Phenotypic heterogeneity is present between homozygous and heterozygous FH patients, with a range of increased plasma triglyceride levels, likely influenced by modifying gene-gene interactions. A syndrome similar to FH has been described in Burmese cats with affected individuals exhibiting significantly elevated plasma triglyceride concentrations, lipid aqueous and delayed triglyceride clearance compared with other breeds [83][84][85] . Post-prandial hypertriglyceridemia has been described in cats with inherited Lipoprotein lipase (LPL) factors 86 with variable phenotypic presentation in homozygous and heterozygous individuals. Further studies will be required to determine whether the lipid metabolism defects in Burmese cats can be attributed to EPHX2 dysfunction and any potential contribution to increased risk of FDM.
This study has unveiled genomic regions underlying dysfunctional metabolic processes predisposing Burmese cats to FDM, highlighting lipid metabolism as a contributing factor. Larger cohorts of comprehensively phenotyped individuals across multiple breeds are needed to validate the genetic risk factors presented here. The segregation of polymorphisms across autozygous regions and identification of risk-haplotypes reveal a potential highly penetrant recessive locus on chromosome B1. This strategy for identifying risk-loci across a predisposed population suggests the interaction of multiple genes across a fixed number of loci are likely responsible for FDM www.nature.com/scientificreports/ in Australian Burmese cats. The regions reported here characterise a window of FDM association containing candidate genes for further investigation in larger studies with prospective clinical phenotyping. Detailed characterisation of the genetic risk factors involved in the pathogenesis of FDM, including the allelic variants and candidate genes presented here, will provide a more comprehensive understanding of the molecular mechanisms involved and the genetic interaction profiles responsible for this disease, and for T2D in humans. Building on insights gained from this study, future work can potentially pinpoint population specific allele frequency profiles.

Materials and methods
Clinical diagnosis and sample collection. Eighty-two Burmese cats, comprising 22 diabetic and 60 non-diabetic individuals were submitted for genotyping analysis on the Illumina Infinium iSelect 63 k Cat DNA genotyping array. Genomic DNA was extracted from whole blood using the DNeasy blood and tissue kit (Qiagen GmbH, Germany) or Performagene PG-100 buccal swabs (DNA Genotek, Canada). All cases ( Remapping array variants to feline 9.0 genome assembly. To determine the exact position of feline array single nucleotide variants (SNV) on the Felis_catus_9.0 genome assembly, we performed a Basic Local Alignment Search 87 using SNV probes available in the feline manifest 88 . A FASTA file was created using the SNV identifier and flanking genomic sequence upstream and downstream of each SNV in top orientation. The longest flanking sequence was retained and output in FASTA format. A custom BLAST database was prepared using BLAST+ 87 to make the reference FASTA sequence searchable. A nucleotide BLAST search was performed and output was filtered to collect the single best hit for each sequence, hits that failed to return any result, hits that failed to reach the base position immediately adjacent to the SNV and those that had an E-value greater than 1e−05 were rejected.
Population structure and quality control. Quality control was carried out for 82 samples and 61,386 SNVs using PLINK 1.9 89 . First, we identified possible duplicate samples and outliers based on pairwise genetic distances (-genome) and removed one sample from each pair with an identity by descent (IBD) estimate > 0.65. Population stratification was evaluated using a MDS plot with two dimensions (-mds). Marker-based QC included pruning the total set of SNVs at a MAF (-maf) of 0.05 and a SNV (-mind) and individual call rate (-geno) of > 90%.
Case-control genome-wide association study and haplotype analysis. Testing for association between FDM and markers on the feline genotyping array was performed in two stages using Efficient Mixed-Model Association eXpedited (EMMAX) 90 . Given all cases were of Australian provenance, an initial association analysis (analysis 1) was performed on samples within the unstratified Australian cluster. To validate the associations observed in the Australian cluster, a second association analysis (analysis 2) run in EMMAX included an expanded group of individuals from Australian, UK and EU populations (22 cases, 60 controls) and the top 10% of association signals from analysis 1. The genome-wide significance threshold in both analyses was calculated based on empirical 95% confidence intervals (CIs). The probability distribution was determined by running the GWAS 1000 times with randomly permuted phenotypes. The genome-wide significance threshold was set at the 97.5% upper CI (based on a two-tailed distribution) (P < 7.6 × 10 −5 ) (Karlsson et al. 16 ).
Regions of association were refined using LD clumping (-clump) in PLINK. First, a region of weak LD (r 2 > 0.2) was defined within 5 Mb of each top SNV and then the region was narrowed to a single locus of high LD (r 2 > 0.8) within 2 Mb of the highest associated SNV per locus. The refined regions were submitted for haplotype analysis with HAPLOVIEW 91 . FDM-associated haplotype blocks were examined in all 82 individuals and were defined using the four-gamete rule 92 . Consecutive haplotype blocks with a multiallelic r 2 value of 1 were combined and blocks containing the top associated SNVs were used to define our FDM-associated loci. Significance of FDM-associated haplotype blocks was measured by running 10,000 permutations.
Runs of homozygosity analysis. ROH analysis was performed to identify FDM-associated regions of autozygosity using PLINK (-homozyg function). The input settings for minimal density of SNVs, maximal gap size, scanning window length and threshold settings were determined empirically 39 . The ROH analysis was run using a maximal gap size of 300 kb (-homozyg-gap) and scanning window size setting of minimal density (homozyg-density) at 80 kb/SNV at a genome coverage of 98.9%. Additional settings were; a scanning window hit rate of 0.05 (-homozyg-window-threshold), maximum of one heterozygous SNV per final ROH segment (-homozyg-het) and a minimum of 94 SNVs in each final ROH (-homozyg-SNV) (Table S2). Inbreeding coefficients (F ROH ) for genotyped individuals were calculated based on the length of the autosomal genome and the length of the genome covered by feline array markers (Meyermans et al. 2019). Correlations between inbreeding coefficients F ROHcov and F ROHaut were measured using a Pearson's correlation test. We did not prune SNVs based on MAF or LD. ROH on chromosomes B1, B3, D1 and D4 were examined for syntenic regions using the Ensembl Bioinformatics database and a database search was performed using the BLAST algorithm. Compara-

Scientific Reports
| (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 www.nature.com/scientificreports/ tive genome analyses of ROH on chromosomes B1 and B3 were undertaken. ROH distributions were compared between the Australian cluster and the expanded group of Burmese and between cases and controls across all autosomes to identify any regions associated with FDM.
Risk variant discovery and annotation. Genomic DNA was extracted from whole blood samples of five Australian Burmese cats (3 cases, 2 controls) included in GWAS analysis, using a phenol-chloroform extraction protocol. DNA samples were submitted for whole genome sequencing (WGS). Illumina paired-end libraries were prepared and sequenced on the Illumina HiSeq 2000 platform with 150 bp paired-end reads (~ 12-21 × coverage). An additional three WGS samples, of unknown diabetic phenotype, available on Sequence Read Archive (SRA) were downloaded (SRX2376210, SRX2669131, SRX2376201) (~ 12-27 × coverage). All samples were aligned to the Felis_catus_9.0 reference assembly using BWA-mem 93 . Base quality score recalibration, indel realignment and duplicate removal was performed using Genome Analysis ToolKit (GATK) 94 . Genome-wide variant detection was performed using GATK's HaplotypeCaller according to best practices and variants filtered for sequence depth (DP > 10), quality of alignment (GQ > 20) and strand bias 95  www.nature.com/scientificreports/