Chromosomal microarray analyses from 5778 patients with neurodevelopmental disorders and congenital anomalies in Brazil

Chromosomal microarray analysis (CMA) has been recommended and practiced routinely since 2010 both in the USA and Europe as the first-tier cytogenetic test for patients with unexplained neurodevelopmental delay/intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies. However, in Brazil, the use of CMA is still limited, due to its high cost and complexity in integrating the results from both the private and public health systems. Although Brazil has one of the world’s largest single-payer public healthcare systems, nearly all patients referred for CMA come from the private sector, resulting in only a small number of CMA studies in Brazilian cohorts. To date, this study is by far the largest Brazilian cohort (n = 5788) studied by CMA and is derived from a joint collaboration formed by the University of São Paulo and three private genetic diagnostic centers to investigate the genetic bases of neurodevelopmental disorders and congenital abnormalities. We identified 2,279 clinically relevant CNVs in 1886 patients, not including the 26 cases of UPD found. Among detected CNVs, the corresponding frequency of each category was 55.6% Pathogenic, 4.4% Likely Pathogenic and 40% VUS. The diagnostic yield, by taking into account Pathogenic, Likely Pathogenic and UPDs, was 19.7%. Since the rational for the classification is mostly based on Mendelian or highly penetrant variants, it was not surprising that a second event was detected in 26% of those cases of predisposition syndromes. Although it is common practice to investigate the inheritance of VUS in most laboratories around the world to determine the inheritance of the variant, our results indicate an extremely low cost–benefit of this approach, and strongly suggest that in cases of a limited budget, investigation of the parents of VUS carriers using CMA should not be prioritized.

Chromosomal microarray analysis (CMA): SNP-array. Genomic DNA samples were extracted from peripheral blood cells or saliva following standard procedures. SNP-array experiments were performed using the Illumina Infinium CytoSNP 850 K BeadChip (Illumina, San Diego, USA), except for 810 cases which were carried out using the Affymetrix CytoScan 750 K Array (Affymetrix, Santa Clara, USA). Data were analyzed using either the BlueFuse™ Multi Analysis (Illumina, San Diego, USA) or the Chromosome Analysis Suite-ChAS Software (Affymetrix, Santa Clara, USA). Log2 ratio and B Allele Frequency (BAF) values were plotted along chromosomal coordinates, allowing the detection of both copy number changes and copy neutral regions of homozygosity (ROH).

Variant analysis and clinical interpretation.
Copy number variants were classified for their clinical impact according to the American College of Medical Genetics (ACMG) guidelines 15 . The criteria for variant classification were as follow: • Pathogenic = (1) when the CNVs were more than 4 Mb in length harboring genes, or; (2) overlapped with regions associated with OMIM morbid genes or DECIPHER/ClinGen microdeletion/microduplication syndromes; (3) deleted haploinsufficiency of OMIM genes; • Likely Pathogenic = when the CNVs (1) were deletions partially affecting haploinsufficiency of OMIM genes; (2) harboring genes and were 1-4 Mb in length; • VUS = when the CNVs (1) were duplications containing MIM genes; (2) were deletions, containing recessive MIM genes; or (3) when the segment was larger than 300 kb and harbored genes.
The common variants, i.e., those commonly reported in curated databased (DGV) were disregarded from this study. Chromosomal rearrangements were defined by the presence of more than one large CNV in different or in the same chromosome (e.g.: chromosomes derived from translocations and inversions). In particular, copy neutral ROHs restricted to a single chromosome, known to harbor imprinted regions, were considered pathogenic and likely representing uniparental disomy (UPD). ROH > 10 Mb or at least two ROH > 5 Mb were considered indicative of consanguinity, albeit not Pathogenic per se.
Ethical approval. This study is in accordance with ethical standards established in the Declaration of Helsinki (1964), its subsequent revisions, and Resolution 466/2012 of the Brazilian National Health Council. The Research Ethics Committee of the Institute of Biosciences from the University of São Paulo gave ethical approval for this work (CAAE 80921117.5.0000.5464), and an informed consent was obtained from the patients' parents or guardians for genetic testing.

Results
Diagnostic rate of the cohort. An overview of the number of individuals with clinically relevant CNVs obtained in the cohort of this study is shown in Fig. 1. Out of the 5778 patients with neurodevelopmental disorders or congenital abnormalities investigated, relevant CNVs were detected in 1886 individuals, and were classified in three main categories: (i) Pathogenic (54%); (ii) Likely Pathogenic (5%) and (iii) Variants of Unknown Significance (VUS; 41%). Taking into account just the Pathogenic, Likely Pathogenic and UPD cases, the overall diagnostic yield in our cohort was 19.7%. www.nature.com/scientificreports/ In these 1886 individuals, a total of 2,270 relevant CNVs, were identified and the corresponding frequency of each category was 55.6% Pathogenic, 4.4% Likely Pathogenic and 40% VUS ( Fig. 2A). As expected, Pathogenic CNVs accounted for the largest proportion of diagnostic alterations and are divided into seven clinically relevant classes of variants, as presented in Fig. 2B. The description of all individual Pathogenic, Likely pathogenic CNVs, and VUS can be found in Supplementary Tables 1-3. Aneuploidies and marker chromosomes. Sex chromosome (SCA) and autosomal aneuploidies accounted for 34 cases (3.3% of the pathogenic cases) in our cohort. Considering them as a separate groups, the proportion was very similar between SCA and autosomal trisomies: 16 and 18 cases, respectively. SCA comprise 47,XXX, 48,XXYY, 47,XYY, 47,XXY, and 45,X; the most frequent being 47,XXY (Klinefelter syndrome), found in 8/34 cases (23.5%). Among the autosomal trisomies, the most common was trisomy 21, found in 12/34 (35.3%), followed by trisomy 13 in 2/34 (5.9%). Excluding the known viable autosomal trisomies, an extra copy of other autosomes was only observed in mosaics, as it was the case with chromosomes 8, 9, 14 and 22. The frequency of each aneuploidy is shown in Fig. 3A. Marker chromosomes were identified in 53 patients (5.2% of the pathogenic cases). We only considered in this category those markers seen in karyotype, and which did not characterize a well-known OMIM syndrome, such as Pallister-Killian, Cat-eye or Emanuel syndromes. Marker chromosome 15 was the most frequent, corresponding to 20/53 (37.8%). Other markers originated from chromosomes 7, 8, 9. 10, 11, 12, 13, 18, 19, 22, X and Y; except for those derived from the sex chromosomes, all were supernumerary marker chromosomes (Fig. 3B).

Figure 1.
An overview of cases with clinically relevant copy number variations (CNVs) identified in the cohort. The figure shows that, from a total of 5778 patients with neurodevelopmental disorders referred for chromosomal microarray analysis (CMA), 1886 carried clinically relevant CNVs. classified into three main categories: (i) Pathogenic CNVs; (ii) Likely Pathogenic CNVs; and (iii) Variants of Unknown Significance (VUS). The total number of cases corresponding to each category is presented in the diagram. Those individuals with more than one alteration were classified within the most clinically relevant category.      Uniparental disomy (UPD) and copy neutral regions of homozygosity (ROH). Copy neutral ROH were observed in 259 patients and were divided in two categories, according to the supposed origin of the ROH. Twenty-six were large or whole-chromosome ROH block(s) restricted to a single chromosome, classified as UPD; of these UPD cases, 14 were classified as Pathogenic by mapping to imprinted chromosomes 6, 7, 11, 14 and 15. The most common UPD was UPD15 (7/26), followed by UPD14 (3/26) and UPD1 (3/26) (Fig. 6). In the remaining 233 individuals carrying ROHs, the presence of blocks of homozygosity in more than one chromosome was indicative of identity by descent.

Discussion
In this study, we report the largest Brazilian cohort of patients with neurodevelopmental disorder investigated by CMA. An overall diagnostic yield of 19.7% was determined, a result similar to that found in other studies [8][9][10][11] . An extensive study with over 15,000 patients 17 found a lower frequency of diagnosis (~ 14.2%), but considered only those CNVs above 400 kb. The copy number data presented here was deposited in the DECIPHER database, and clearly demonstrated as previous studies the massive importance of copy number changes in postnatal diagnosis. CMA has been recommended and practiced routinely in the USA and Europe as the first-tier test for patients with neurodevelopmental disorders and congenital abnormalities since 2010 1 . However, the use of CMA tests is still limited in Brazil due to their high costs. It is relevant to mention that the healthcare system in Brazil is a complex mixture of public and private funding, with governance and ownership agreements. The Brazilian public health sector is one of the world's largest single payer healthcare systems. In complementation with this scenario there is a significant and large private sector supported with high investment. It is estimated that only ~ 26% of Brazilians have a private health insurance, and it is mainly concentrated in the urban areas of the Southeastern part of the country 18 . Although nearly all patients referred for CMA come from the private sector, the health insurances require that G-banded karyotype be used as the first genetic test. The patients with no structural and/ or numerical alterations by karyotyping are subsequently referred for investigation by microarray analysis. In contrast, in the public sector, CMA is not even offered to the patients, since the price established by the government for the total genetic investigation of a patient does not pay even the costs of material for a single CMA. In practice, CMA is provided for few patients at Public Universities or Institutions, when it is linked to specific projects and research grants. However, this situation is not just a Brazilian peculiarity, since in countries with better economic conditions, private laboratories can contribute to a class disparity regarding access to medical analyzes.
Although many patients in our cohort had been previously investigated by G-banded karyotyping, we found 34 cases of aneuploidy, in which trisomy 21 was the most frequent chromosomal disorder encountered. The patients with Down syndrome were referred for CMA for presenting autistic features to search for other CNVs associated with autism spectrum disorders; however, in none of the cases, additional CNVs were detected. It is noteworthy that the presence of autism spectrum disorders in individuals with Down syndrome has been well documented for several years 19,20 ; therefore, the analysis through CMA in these cases is puzzling. The second most common aneuploidy detected was the Klinefelter syndrome; also in this case, behavioral disorders were the main reason for CMA referral. It is well documented that behavior problems, including autism, are relatively common in Klinefelter sydnrome 21 . In fact, autistic features may be more common in individuals with sex chromosome aneuploidies than generally believed 22 .
Marker chromosomes from both autosomes and sex chromosomes represented 4.6% of the diagnostic alterations (Pathogenic, Likely Pathogenic, UPD). Except for the X and Y chromosomes, all were supernumerary. The correlation of specific supernumerary marker chromosomes (SMC) with distinct clinical features have been demonstrated for some syndromes; those were classified as syndromes rather than markers, for example i(12p)-(Pallister-Killian) syndrome (MIM#601803), and Cat eye syndromes (MIM#609029 and MIM#115470, respectively) 23 . The only marker that represents a translocation derivative chromosome in our dataset characterize the Emanuel syndrome (MIM#609029), and results from missegregation of the only known recurrent, non-Robertsonian, constitutional translocation in humans [der(22)t(11;22)(q23;q11.2)]. Although it was not possible to obtain cytogenetic characterization of all markers, it is known that an inverted duplicated chromosome 15 24 is the most common of the heterogeneous group that constitute the supernumerary marker chromosomes.
The microdeletion and microduplication syndromes account for the largest proportion of the diagnosis obtained in our cohort (41.8%). Many of these syndromes harbor genomic hotspots flanked by homologous segmental duplications prone to unequal crossing over, and have high elevated de novo mutation rates, generally with similar CNV sizes 25,26 . In this study, we detected 71 distinct microdeletion/microduplication syndromes in a total of 477 individuals, in which deletions were twice as common than duplications (n = 48 deletions vs n = 23 duplications). Among the five most frequent syndromes, shown in Fig. 4 www.nature.com/scientificreports/ and 15q11.2 deletion (MIM#615656). Such susceptibility CNVs impose a challenge in genetic counseling since they are present in the normal population but enriched in individuals with various neurodevelopmental disorders. Moreover, these CNVs are often inherited from a normal or mildly affected parent, and they lack phenotypic specificity, being associated with a variety of neuropsychiatric disorders, congenital abnormalities, and variable dysmorphisms. The 22q11.2 deletion was the most frequent syndrome in our cohort, reflecting the same frequency reported by other large population studies 27 . In particular, the 15q13.3 duplication encompassing the CHRNA7 gene was previously associated to several neurodevelopmental disorders 17,28,29 . However, overlapping duplications in this genomic region were also documented in many individuals of the general population (~ 0.6%-estimated prevalence of 1:174-186 individuals 30 ) and, in almost all cases investigated, patients inherited the duplication from clinically normal parents. The high frequency of duplications of this segment in the general population, together with the lack of enrichment in clinical cohorts 31,32 , indicate that, if this variant has any clinical impact, the penetrance would be very low.
The recurrent 15q11.2 deletion (BP1-BP2), which includes the CYFIP1, NIPA1, NIPA2, TUBGCP5 genes, is consistently associated with neurocognitive function. Jonch et al. (2019) 33 performed a comprehensive meta-analysis on individuals with 15q11.2 deletions, comparing data across 20 studies. The case-control study using their clinical cohort compared to controls in the UK Biobank cohort showed enrichment of the deletion in the patient population. Nonetheless, the reciprocal duplication of the 15q11.2 region has refuted clinical significance 31,34 . Duplications of this region are common in the general population and the majority of case-control studies have observed a lack of enrichment in the clinical population. Recent studies of duplication carriers identified through cohort studies in the general population have also shown that carrier individuals perform similarly to non-carrier controls on neurocognitive tests. Therefore, duplication of this segment is considered unlikely dosage sensitivity. Indeed, 15q11.2 and 15q13.3 restricted duplications were historically classified as pathogenic, and the data presented reflects a retrospective analysis; currently, are classified as benign in light of recent evidence, and in accordance with databases such as DECIPHER and ClinGen.
The estimated frequency of the 16p11.2 deletion syndrome is about 1-5/10,000 in the general population. Research based on the ClinGen database suggests that 16p11.2 deletions are the second most common microdeletion, occurring in one of every 235 individuals tested with intellectual and developmental disabilities. Interestingly, this deletion was identified in nearly 1% of individuals with autism [35][36][37] . Nonetheless, the phenotypic spectrum associated with this deletion is much wider and includes delays in speech or motor development, language impairment, low muscle tone, hypo-or hyperreflexia, a tendency towards obesity, short stature, and several facial dysmorphisms. The syndrome classically involves a heterozygous microdeletion of ~ 600 kb, containing 29 protein coding genes; although the majority of cases reported are de novo, the deletion is inherited in an autosomal dominant pattern in 20% of the cases; an equal sex ratio has been reported 38 .
Considering all evidence from association studies about the susceptibility CNVs for neurodevelopmental disorders, the general consensus is that there must be additional modifiers that influence the expression of these variants. A "two-hit", or second site, model has been suggested for several syndromes 39,40 . Notably, the vast majority of the syndromes with incomplete penetrance, as shown in Fig. 5  www.nature.com/scientificreports/ individuals with a microdeletion on chromosome 16p12.1 carried additional large CNVs 40 . This data supports an oligogenic basis, in which the compound effect of a relatively small number of rare variants of large effect contributes to the heterogeneity of genomic disorders. The authors also identified other known genomic disorders, each defined by a specific CNV, in which the affected children were more likely to carry multiple copy number variants than controls 39 . Overall, a second CNV hit was identified in 10% of their cases. We found that patients with a CNV known to present incomplete penetrance (Fig. 5) carried a CNV second-hit in 26.4% of the cases. It is assumed that part of the second-hits would be only ascertained by sequencing, but SNV/indel second-hits were not investigated in this study. It is relevant to mention that investigation of the parents is mandatory in cases of those susceptibility CNVs for proper genetic counselling 15 . However, in our study we were unable to obtain information about inheritance pattern for all cases because the patients and parents were not necessarily investigated at the same centers or in the same period of time.  www.nature.com/scientificreports/ In cases where the patient present one or more VUS, the recommendation is to investigate the parents to determine whether the CNV has been inherited or represents a de novo mutation 15 . While the latter may lead to a reclassification of the VUS as pathogenic or likely pathogenic, the inherited variants remain classified as VUS. In our cohort, we have information on segregation in a minority of the cases: among 126 patients who presented an autosomal VUS and segregation was tested, we found that 125 of the variants were inherited. It is important to remember that, regardless of being inherited, these VUS may still contribute to the patients' phenotype, and must be reported. While it is undeniable that the resulting 0.8% de novo alterations have some impact in diagnosis and genetic counselling, the healthcare context should be considered in the decision to test the parents. In the present cohort, we performed 252 CMA tests in parents, but were able to reclassify the VUS in a single case. For all remaining 125 cases, the parental tests did not add any useful information. When resources are limited, such as in Brazil and in many other countries, we obtain a better cost-benefit testing other 252 patients instead of investigating VUS segregation. Therefore, we would not recommend testing segregation of VUS using CMA in the public healthcare in Brazil or in other developing countries in the current situation. Nonetheless, segregation analysis can be performed with cheaper techniques such as real time PCR.
Importantly, with the incorporation of many robust SNP-array platforms in the clinical routine, many studies have identified large ROH in patients with a wide variety of clinical features [41][42][43][44] . Depending on chromosomal distribution and cumulative extent, it may either indicate UPD or parental consanguinity 45 . When ROH > 10 Mb are detected in a single chromosome, the first possibility to be considered is UPD; this event arises as a consequence of a trisomy rescue, which may have further implications, such as the presence of an undetected trisomic cell line. In cases of chromosomes subject to imprinting, the presence of two copies of the same chromosome inherited from only one of the parents is considered as pathogenic per se. UPD in non-imprinted chromosomes still increases the probability of deleterious mutations in homozygosity 46 Contrarily, the presence of many ROH   Figure 5. Frequency of syndromes with incomplete penetrance associated with additional variants. The histogram shows the frequency of a secondary CNV associated with a syndrome with incomplete penetrance.  www.nature.com/scientificreports/ throughout the genome is an indication of consanguinity, and the chance of inheritance of recessive monogenic disorders increases with the degree of relatedness. It has been demonstrated that the occurrence of multiple congenital anomalies and other significant clinical problems is higher among children of first cousins (4.4%) and second cousins (3.6%), compared to unrelated parents 47 . The rate of consanguinity in the different regions of Brazil is very heterogeneous and estimates are scarce. A recent paper indicates that some degree of inbreeding may be present in 26.5% of patients with developmental disorders of the South of Brazil 2 . However, when clinically more relevant kinship of 1st-5th degree is considered, they find consanguinity in ~ 8.5% of the cases. In our cohort, only 233 out of the 259 cases of ROH were interpreted as the result of consanguinity, reflecting a lower frequency of consanguinity of 4.0% (233/5778). In summary, we reported copy number data from patients with neurodevelopmental disorders and congenital anomalies in the largest Brazilian cohort investigated by CMA reported so far. These data, available in the DECIPHER database, can be used as a valuable resource for other genomic studies.

Data availability
The datasets generated and/or analyzed during the current study are available in the DECIPHER database (https:// www. decip herge nomics. org/). The reference number that corresponds to the DECIPHER patients ID are presented on the Supplementary Tables. All genomic coordinates from the variant data that support the findings of this study are available on the Supplementary Tables. Raw data supporting the findings of this study are available on request from the corresponding author-C.R. The raw data are not publicly available due to patients privacy/consent restriction.