Introduction

Obesity is a complex condition that results from an imbalance between energy intake and expenditure [1]. Body mass index (BMI) percentiles and z-scores relative to age- and gender-specific reference values are frequently used to classify obesity in the pediatric population. According to the World Health Organization (WHO), obesity is defined as having a BMI >99.9%ile (+3 SD) for age and gender in children aged 0–5 years and a BMI >97%ile (+2 SD) in children/adolescents aged 5–19 years [2]. Obesity is associated with an increased risk of multiple chronic conditions, including coronary heart disease, hypertension, and type 2 diabetes, which result in a significant health and economic burden.

Twin, family, and adoption studies show that BMI is highly heritable and strongly influenced by genetic factors [3]. Genome-wide association studies (GWAS) have been successful in identifying multiple associated loci, many of which are involved in the control of body weight and appetite via the central nervous system [4]. Nevertheless, loci identified through GWAS account for less than 10% of the heritability [4]. In recent years, rare copy number variations (CNVs) have been implicated in the etiology of numerous conditions, including autism spectrum disorder [5], schizophrenia [6], type I diabetes [7], congenital heart disease [8], and pediatric growth disorders [9]. CNVs have also been reported in the pathogenesis of obesity highlighting the potential role of another type of genetic contribution [10], but to date, few studies have performed genome-wide analyses. The most commonly reported CNV to be associated with early-onset obesity is highly penetrant deletions at 16p11.2 encompassing SH2B1 (which account for ~0.7% of individuals with obesity) [11].

Here, we use high-resolution microarrays to scan the genome to investigate the contribution of rare CNVs in a pediatric cohort comprised of 67 individuals with obesity of unknown cause including 22 probands with co-occurring developmental delay (DD).

Materials and methods

Study cohort

Our study included 67 children with obesity, aged 2–17.5 years, from two different cohorts. The first group consisted of 52 individuals who were recruited through the SickKids Team Obesity Management Program (STOMP) at the Hospital for Sick Children (SickKids) in Toronto, ON, Canada. STOMP patients were 12–17.5 years old with a BMI >99%ile or a BMI >97%ile with a minimum of one significant obesity-related co-morbidity or significant co-existing chronic illness, such as technology-dependent sleep-disordered breathing, obesity-associated type 2 diabetes, hypertension, pseudotumor cerebri, dyslipidemia, psychiatric illness, severe psychological impairment, or obesity-associated orthopedic complications. All families that were interested in participating were enrolled. These individuals underwent microarray analysis at The Centre for Applied Genomics (TCAG) at SickKids and were designated the “research microarray cohort”. One additional child (6-0234-03) who was 2 years old at the time of clinical assessment was also included in this group due to an early presentation of obesity, but was not seen through the STOMP program. Whole blood samples were collected from probands during the enrollment for DNA extraction. Subsequently, saliva kits were mailed to families to obtain parental samples, but only a small number were returned. In 47/51 (92%) of probands from this cohort, there exists a positive family history of obesity (family history of obesity is unknown for one proband). This study was approved by The Research Ethics Board at SickKids (REB#1000007909). Informed consent was obtained from all families for participation in this study.

A second group of patients comprised of 15 individuals aged 2–6 years, presenting with obesity and DD, and were enrolled through the Division of Clinical and Metabolic Genetics at SickKids in Toronto, ON, Canada. We designated this group of individuals the “clinical genetics cohort”. Probands within this group all had a BMI >99.9%ile (+3 SD) except for one 3-year-old patient who had a BMI of 99.5%ile (+2.56 SD). All patients underwent assessment by a clinical geneticist; clinical features are presented in Table 1. Clinical microarray analysis was performed in the Genome Diagnostics Cytogenetics Laboratory at SickKids. In 10/12 (83%) of probands from this cohort, there is a positive family history of obesity (family history of obesity is unknown for three probands). BMI percentiles and z-scores of probands (according to age and gender) were calculated using WHO’s Anthroplus software for both cohorts [12].

Table 1 Clinical feature summary of the study cohort

The majority of probands did not undergo genetic testing for known pediatric obesity syndromes. For those who did undergo additional genetic testing (N = 14), the results were all negative (details in Supplementary Table 1).

DNA microarray analysis

DNA was extracted from whole blood samples for 52 probands from the research microarray cohort and genotyped on three different microarray platforms; 16 samples with the Affymetrix Genome-wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA), 31 samples with the Illumina HumanOmni2.5 (Illumina, San Diego, CA, USA), and five samples with the Affymetrix CytoScan HD (Affymetrix, Santa Clara, CA, USA), according to the manufacturer’s guidelines. Microarray data are deposited in Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE101418). The DNA samples from the research microarray cohort were collected over an extended period of time, during which time microarray technology evolved and, as in other studies, samples were genotyped using the relevant platform at the time (three different platforms, see Supplementary Table 2 for more details). Multiple algorithms were used for CNV calling (for details regarding algorithms, see also Supplementary Table 2). We defined CNVs as high quality (predicted by at least two algorithms, spanning at least five consecutive probes, greater than 10 kb in size and overlap <75% of segmental duplication) and rare (present in <0.1% of population controls) according to previously described criteria [13,14,15,16]. To identify rare CNVs, we first compared proband CNV calls with platform-matched control data sets and then subsequently with remaining control data sets. Our control dataset comprises of 10,851 unique subjects, majority of them with European ancestry, genotyped on multiple platforms including the Affymetrix Genome-wide Human SNP Array 6.0, Illumina HumanOmni2.5, and Affymetrix CytoScan HD. Further details regarding these samples are described elsewhere [15, 16]. DNA was also extracted from the saliva of eight parents of the 52 probands (both parents were available for three cases and only the mother for five cases). Parents of proband 6-0282-03 who also had a clinical microarray were tested by fluorescence in situ hybridization (FISH) in the Genome Diagnostics Cytogenetics Laboratory at SickKids. Other parental samples were not available or individuals declined to participate.

For the 15 probands from the clinical genetics cohort, five were analyzed on the Affymetrix CytoScan HD and 10 samples on the ISCA 4×180K Oligonucleotide array (Oxford Gene Technologies, Begbroke, UK) according to the manufacturer’s instructions. Chromosome Analysis Suite (Thermo Fisher Scientific, Waltham, MA, USA) was used for the detection of CNVs in the samples analyzed on the CytoScan HD. CytoSure Interpret (Oxford Gene Technologies, Begbroke, UK) was used for identification of CNVs in the group analyzed on the 4×180K array (Supplementary Table 2). For our analysis, we obtained all CNVs that were categorized as being pathogenic or a variant of unknown significance. Rare CNVs were determined as those being in fewer than two independent studies in the database of genomic variants (DGV) and less than 1% in the in-house database.

Prioritization of rare CNVs

We prioritized rare CNVs as clinically relevant if they corresponded with a genomic disorder that has obesity as a known phenotype and as potentially clinically relevant if they satisfied at least one of the following criteria: locus previously reported in obesity studies (i.e., CNV, GWAS), a genomic locus identified in two or more unrelated probands, genomic locus overlapping CNV reported in DECIPHER [17] database in patients with obesity phenotype, and/or impact gene(s) found to be involved in energy homeostasis or related processes (Supplementary Table 3). We used the NCBI RefSeq numbering scheme for the numbering of exons (detailed information on exon boundaries provided in Supplementary Table 5). For the research microarray cohort, we validated selected CNVs using quantitative real-time PCR (qPCR) according to previously reported procedures [18] and tested parental samples when available. In the clinical genetics cohort, microarray testing of parental samples was conducted following a clinically relevant or variant of unknown significance (potentially clinically relevant) finding in the proband. All CNVs found to be clinically relevant or potentially clinically relevant were confirmed using one of these methods with the exception of the CNV finding in proband P0310 (from the clinical genetics cohort), as her father was unavailable to undergo testing and her mother declined.

Results

The clinical features of the 67 individuals tested are presented in Table 1. All probands included in this study presented with obesity; 22 also exhibited DD. A total of 128 rare CNVs (Supplementary Table 4) were identified in the 52 probands from the research microarray cohort. Of these, 48% (61/128) were duplications and 52% (67/128) were deletions. 34% (43/128) of identified rare CNVs impact coding regions. We found variants of clinical or potential clinical relevance in 15% (10/67) of the probands (Table 2). By combining both cohorts, this represents ~4% (3/67) of probands harboring clinically relevant variants, but 20% (3/15) of probands in the clinical genetics cohort alone. In ~10% (7/67) of the probands, we identified CNVs of potential clinical relevance using a stringent set of criteria, as described in Materials and Methods.

Table 2 Summary of clinically relevant and potentially clinically relevant CNVs found in our probands

Clinically relevant CNVs

We prioritized rare CNVs as clinically relevant if they had previously been associated with a genomic disorder that has obesity as a known phenotype. We identified three individuals (including two siblings) from two families with pathogenic 16p11.2 deletions encompassing SH2B1, which is known to be the most common genomic variation associated with early-onset obesity [11]. One female proband (P0310), who presented with early-onset obesity and DD, was found to have a 232 kb deletion in this region (Fig. 1). Her father, mother, and maternal grandmother are all reported to be obese. However, because her mother declined testing and her father was unavailable, we were unable to investigate the segregation of this deletion.

Fig. 1
figure 1

CNVs impacting 16p11.2 locus. Female proband P0310 harbors a 232 kb deletion, and brothers P0821 and P0652 harbor a 201 kb deletion

A 16p11.2 deletion encompassing 201 kb was also found in two brothers (P0821 and P0652) (Fig. 1). Both presented with early-onset obesity (≤3 years old) and DD. Microarray testing of parental DNA samples revealed the 16p11.2 deletion to be maternally inherited. Their mother’s BMI (~22 kg/m2) was within the normal range according to WHO, which is interesting considering the high penetrance of 16p11.2 deletions for the obesity trait [11]. However, the mother had a history of psychiatric illness (i.e., eating disorders, anxiety, and depression), which has been observed in some individuals with 16p11.2 deletions [19]. A third family that included two obese siblings (male and female) and an asymptomatic mother with a 16p11.2 deletion was previously reported [20]. These families reflect the clinically variable (obesity) phenotype amongst 16p11.2 deletion carriers.

Potentially clinically relevant CNVs

We used the following criteria to prioritize CNVs as potentially clinically significant: (1) locus previously reported in obesity studies (i.e., CNV and GWAS), (2) genomic locus identified in two or more unrelated probands, (3) genomic locus overlapping CNV reported in DECIPHER database in patients with obesity phenotype, or (4) genomic locus reported to be involved in energy homeostasis or related processes. We identified CNVs of potential clinical significance in 10% (7/67) of probands; the majority were found to be duplications (Table 2). In two unrelated male and female probands (6-0208-03 and 6-0279-03), we detected a duplication at 6p22.2 impacting SCGN (Fig. 2). Male proband 6-0208-03 harbors a 619 kb duplication of unknown inheritance encompassing seven genes including SCGN. His BMI plots at +5.85 SD and he has multiple obesity-associated comorbidities including type 2 diabetes, hypertension, obstructive sleep apnea, asthma, fatty liver, and dyslipidemia. Female proband 6-0279-03 carries a 68 kb duplication of unknown inheritance also impacting SCGN. Her BMI plots at +2.96 SD and she has acanthosis nigricans. SCGN encodes secretagogin, a calcium-binding protein that is highly expressed in the neurons of the hypothalamus, neuroendocrine cells, and pancreatic β-cells. Recent studies have found that secretagogin is involved in the secretion of insulin and corticotropin-releasing hormone within pancreatic β-cells and the hypothalamus, respectively [21, 22]. Moreover, secretagogin was determined to be expressed in a fraction of NPY/AgRP neurons found in the leptin–melanocortin pathway of the hypothalamus [23]. This pathway regulates food intake and maintains energy homeostasis, supporting a role for secretagogin in energy balance. SCGN is copy number stable according to the CNV map [24] and the Database of Genomic Variants Gold Standard map [25].

Fig. 2
figure 2

CNVs impacting SCGN locus. Male proband 6-0208-03 harbors a 619 kb duplication, and female proband 6-0279-03 harbors a 68 kb duplication

Female proband 6-0197-03 (BMI +3.30 SD) with typical development carries a 730 kb duplication of unknown inheritance within the 22q11DS locus (2.44 Mb) encompassing 12 genes (Supplementary Figure 1). We prioritized this CNV as potentially clinically relevant as obesity has been reported in individuals with 22q11.21 deletion and duplication syndromes (ISCA: 37446) [26]. One study investigating the clinical features of adults with 22q11DS found obesity to be a prevalent phenotype [27]. A subsequent study found that an increased prevalence of obesity in an adult cohort of 22q11DS is independent of psychotropic medication use; however, the degree of obesity is increased in individuals on these medications [28].

Female proband 6-0282-03 (BMI +4.03 SD) with typical development carries a 4.4 Mb deletion of at 10q21.2–21.3 encompassing 10 genes (Supplementary Figure 2), which was determined by FISH to be de novo in origin. She also presented with renal tubular acidosis, growth hormone deficiency, delayed puberty, and autoimmune hepatitis. This CNV satisfied our potentially clinically relevant criteria as ARID5B, one of the genes within this deletion is highly expressed in adipose tissue and is reported to be involved in adipogenesis and lipid metabolism in mice [29]. Further, a larger 15 Mb deletion encompassing the same locus is present in a patient presenting with obesity, cognitive impairment, delayed speech and language development, sparse hair, and supernumerary nipples, from the DECIPHER database [17].

We identified a maternally inherited 634 kb duplication at 2q21.2 encompassing the promoter region and exon 1 of GPR39 (NM_001508.2) in male proband 6-0210-03 (BMI +2.41 SD), who developed obesity at an early age (<5 years) and has fatty liver and high triglycerides (Supplementary Figure 3). We classified this CNV as potentially clinically relevant as GPR39, which is widely expressed throughout the body with high levels of expression in adipose tissue, stomach, intestine, pancreas, liver, and kidney, has been shown to be involved in energy homeostasis in mice [30]. A positive family history of obesity is reported in the maternal grandfather.

A 65 kb duplication of unknown inheritance at 7q36.3 impacting exons 17–22 of PTPRN2 (NM_001308267.1) was detected in female proband 6-0248-03 (BMI +4.16 SD) (Supplementary Figure 4). She developed cerebral palsy (spastic diplegia) following complications from interferon treatment for a hemangioma during infancy. We considered this CNV to be potentially clinically relevant because PTPRN2 has been reported to be involved in insulin secretion processes [31]. PTPRN2 encodes a tyrosine phosphatase receptor that is found in neuroendocrine cells and is associated with cell growth and differentiation. Moreover, a recent study investigating the role of rare CNVs in obsessive-compulsive disorder implicates a duplication encompassing PTPRN2 [15].

Lastly, male proband 6-0191-03 (BMI +4.82 SD) with typical development carries a maternally inherited 15 kb duplication of 8q21.11 encompassing the last exon of HNF4G (NM_004133.4) (Supplementary Figure 5). Knockout mouse studies suggest that this gene likely plays a role in the regulation of energy balance [32]. Thus, we considered this CNV to be of potential clinical relevance. In addition, HNF4G was also identified as a novel obesity-associated locus in a meta-analysis of GWAS studies investigating anthropometric traits [33].

pLI

We obtained the probability of truncating loss-of-function intolerance (pLI) [34] for all genes within the 14 rare deletions impacting coding regions. Only 4 of these 27 genes were likely to be intolerant of loss-of-function variants (pLI ≥ 0.9), two of which, (ARID5B (pLI = 1) and JMJD1C (pLI = 1)) were within the 10q21.2–21.3 deletion in proband 6-0282-03, which was considered likely pathogenic. The other two genes were PPM1A (pLI = 0.99), which has not previously been implicated in disease and LARGE (pLI = 0.96), which is associated with autosomal recessive congenital muscular dystrophy type 1D. Neither PPM1A nor LARGE have been implicated in obesity to date.

Discussion

We describe the identification of CNVs affecting known obesity risk genes/loci or new functional candidate genes in 15% (10/67) of individuals in our study cohorts. In our research microarray cohort, we have a higher ratio of females (63%) to males (37%) and only 15% of these individuals have DD. In contrast, we have a greater proportion of males (67%) in our clinical genetics cohort who comprised of probands with obesity as well as DD, which is consistent with the literature where a greater frequency of males are reportedly affected by DD [35]. In 47/51 (92%) of probands from the research microarray cohort and 83% (10/12) of probands from the clinical genetics cohort, a positive family history of obesity was observed. However, we were unable to investigate CNV segregation patterns in all families due to limited availability of parental samples.

Previously, Vuillaume et al. investigated the contribution of CNVs in 100 individuals (40 females, 60 males) who were 1–18 years old with syndromic obesity, which is generally defined as the accompaniment of obesity with congenital malformations and/or DD [36]. They reported clinically relevant or potentially clinically relevant CNVs in 22% of their patients [10]. This is a higher frequency than we found in our cohort, a difference that could be attributed to the variation in clinical features among cohorts. Their study was comprised of patients with obesity accompanied by mainly DD (85%) and/or dysmorphic facial features (78%). Rare CNVs are well recognized to significantly contribute to the pathogenesis of DD [37]. When we restrict our cohort to only those probands with comorbid DD, we find clinically relevant or potentially clinically relevant CNVs in 14% (3/21) of unrelated cases.

We identified two families with pathogenic 16p11.2 deletions. This highly penetrant deletion is the most frequently reported (~0.7%) genomic alteration in individuals with early-onset obesity. This CNV was only identified in our clinical genetics cohort, where 14% (2/14) of unrelated probands carry this deletion. This higher than expected frequency likely reflects our small sample size and/or sample ascertainment bias. Our list of potentially clinically relevant CNVs were all identified in our research microarray cohort with 13% (7/52) of these individuals harboring CNVs.

We identified multiple candidate genes that are likely biologically relevant to obesity. We found a 10q21.2–21.3 de novo deletion (4.4 Mb) in proband 6-0282-03. ARID5B, a gene found in this region is essential for the formation of histone H3K9me2 demethylase complexes involved in the regulation of genes required for adipogenesis [29, 38]. Interestingly, a recent study found that the obesity-associated FTO SNP rs1421085 affects a conserved binding motif of ARID5B, leading to disrupted regulation of two genes (i.e., IRX3 and IRX5) involved in adipocyte thermogenesis. This disruption, which leads to decreased thermogenesis and an increased storage of lipids in primary preadipocytes of FTO-risk allele carriers, has been postulated to be responsible for the development of obesity [39].

In another proband, we identified a duplication partially encompassing GPR39. There is compelling data for the involvement of GPR39 in lipolysis via the regulation of hormone-sensitive lipase (HSL) and adipose triglyceride lipase (ATGL). Mice deficient for GPR39 have been shown to have significantly increased body weight compared with wild-type littermates. The authors concluded from this study that the observed body weight increase was the result of decreased lipolytic capacity rather than increased food intake [30].

Another proband was identified with a duplication encompassing the last exon of HNF4G. This gene encodes a nuclear hormone receptor that has been identified through GWAS to be associated with BMI [33]. As well, targeted disruption of Hnf4g in mice leads to increased body weight as a result of decreased physical activity and reduced energy expenditure [32].

Last, our finding of CNVs impacting SCGN in two unrelated probands is potentially the most interesting discovery. SCGN encodes a protein containing six calcium-binding domains. SCGN resides in a copy stable region of the genome, i.e., CNVs are rarely found at this locus in control populations [24]. There is intriguing evidence of the possible role of SCGN in the regulation of energy homeostasis [23] and these observations together make SCGN an attractive candidate gene for obesity. One of the duplications (in proband 6-0208-03) spans the whole SCGN gene as well as seven other genes (619 kb). The second duplication (in proband 6-0279-03) is 68 kb in size and spans the last five exons of SCGN (NM_006998.3). This partial duplication encompasses the last three calcium-binding domains of SCGN. Given that SCGN has a low loss-of-function intolerance score (pLI: 0.04) [34], it is likely not sensitive to haploinsufficiency; so the impact of these CNVs may be due to other mechanisms, such as dosage or gain of function. Additional studies are needed to ascertain the pathogenicity of CNVs involving the SCGN locus with respect to human obesity.

Here, we identify important candidate genomic regions for obesity using a CNV scanning approach. The novel candidate genes identified do not function in the same biological pathways or processes, adding additional support for significant heterogeneity involved in pediatric obesity.

In the current approach, microarrays were used because they are the first-tier test in our clinical laboratories providing the best-understood data [40]. However, microarrays are able to only accurately detect deletions and duplications of size 10 kb or greater. Further, as not all probands underwent genetic testing for known pediatric obesity syndromes (e.g., obesity gene panel and Prader–Willi syndrome testing), it is also possible that some may have a pediatric obesity syndrome due to sequence-level variant(s) in known disease-associated genes. We anticipate that the application of whole-genome sequencing [41] to detect all classes of genetic variants will lead to additional genetic factors and loci being identified in pediatric obesity, and potentially add further support for the novel loci we have identified.