Introduction

Neurodevelopmental disorders (NDDs) predominantly encompass developmental delay (DD), intellectual disability (ID), and/or autism spectrum disorders (ASD), impacting approximately 3–4% of the global population1,2. These conditions are classified as non-syndromic when they occur in isolation and syndromic when they co-occur with dysmorphisms or evident congenital anomalies (CAs)3.

With strong genetic underpinnings, ASD holds great significance within the realm of NDDs due to its high prevalence and increasing impact on individuals, families, and communities. The disorder's heterogeneity spans a wide spectrum of symptoms and severity, usually accompanied by co-occurring conditions, being characterized by impairment in social interaction and communication. According to the Diagnostic and Statistical Manual of Mental Disorders—Fifth Edition (DSM-5), we can understand the deficits in social interactions and social communications of individuals with ASD based on three aspects: socio-emotional reciprocity; non-verbal communicative behaviors used for social interaction, development, maintenance and understanding of relationships; and restricted behaviors, such as repetitive patterns exhibited as movements, repetitive use of objects or speech, unalterable routines or ritualized behaviors (verbal or non-verbal), fixation on singular interests, and abnormal response to variations in sensory aspects of the environment4. Based on common deficits, the DSM-5 defines the current diagnosis of ASD that now, along with those of autistic disorder (classical autism), also incorporates the diagnoses of childhood disintegrative disorder, pervasive developmental disorder without other specification, and Asperger's syndrome.

Sometimes ASD is the main diagnosis, sometimes it is comorbid to other NDDs such as ID, frequent in the autistic spectrum. It can also be present in syndromic conditions when apparent dysmorphic features (DF) for their potential CAs are present3.

It is estimated that ASD presents a heritability between 0.5 and 0.9%5,6. A recent review covering 74 studies with 30,212,757 participants concluded an estimated global prevalence of ASD of 0.6%. It is highest in America (1%), Africa (1%) and Australia (1.7%)7. The prevalence of ASD worldwide has increased in recent decades, for example in the USA, the Centers for Disease Control and Prevention reported that the overall prevalence of ASD was 1,5% in 2010, 1.4% in 2012, 1.7 in 2014 and 1.9 in 2016, 2.3 in 2018 (CDC). The overall prevalence of ASD in Europe and Asia has also been gradually increasing8,9. In Brazil, as well as in Latin America in general, epidemiological data on the prevalence of ASD are scarce. A single study carried out in the Southeast region of Brazil in 2011, found an estimated prevalence of 0,3%10, however, it is believed to be an underestimation due to methodological issues. If we apply the prevalence of 1% estimated for the American population to the Brazilian population (214 million), ASD should affect approximately 2 million individuals11.

Genetic and/or genomic factors such as single nucleotide polymorphisms (SNPs) and CNVs12,13,14,15,16,17 have been suggested as the etiological cause in 50–60% of cases of ASD18. The SFARI Gene6, one of the leading and constantly updated genetic databases on ASD, associates 1,262 genes and 2,290 CNVs, including those with rare frequency, to the condition (data from December 2022).

CNVs are structural variations in the DNA that involve gains or losses of large segments of genetic material (from hundreds to several million base pairs) that may be inherited or occur spontaneously during the formation of egg or sperm cells and can affect gene dosage, causing loss of function, haploinsufficiency, or overexpression of genes19. Specific CNVs have been shown to cause or increase the likelihood of developing certain NDDs such as ID, ASD, schizophrenia, as well as CAs. However, most people with CNVs do not have developmental disorders and for many CNVs related to disorders the presence of the CNV per se does not implicate necessarily the presence of the disorder, because their penetrance and expression is impacted by other genetic and/or by environmental factors, which makes their interpretation challenging.

For over a decade, Chromosomal microarray (CMA) technologies have been clinically recommended as the primary cytogenetic diagnostic test for investigating patients with NDDs20 and in 2020 the ACMG reinforced this statement, along with a more detailed guidance on interpreting results21.

Most modern microarray platforms along with genome-wide oligonucleotide probes (depending on the CMA design) also integrate high-density SNP probes, that test for single nucleotide changes in DNA sequences, allowing to detect regions of homozygosity that can be associated with disease or other traits like ancestry.

Long contiguous stretches of homozygosity (LCSHs) are relatively common in the general population and can occur due to the chance of unions among individuals with a common ancestor, in these cases they rarely are related to disease, likely characterizing regions of low recombination in the genome22,23. However, larger LCSHs can also reveal consanguinity among parents, uniparental disomy (UPD) or homologous recombinational DNA repair events and therefore be associated with an increased risk for certain genetic disorders, particularly those caused by recessive genetic mutations. In population studies, the minimal thresholds for calling LCSH are usually set around 0.5–1.0 Mbp, while in clinical analysis, minimal thresholds are more conservatively set at 3–10 Mbp24.

The presence of multiple large LCSH ≥ 5 Mbp, distributed throughout several chromosomes suggests consanguinity between the individual’s biological parents, increasing the chance of inheritance of recessive monogenic disorders. However, when large LCSH(s), reside in only one chromosome, this can reflect correction of meiotic or early post meiotic errors that resulted in total or partial uniparental disomy (UPD). UPD occurs when a person receives the two copies of a chromosome, or part of a chromosome, from only one parent25. The two copies can be of maternal (UPDmat) or paternal (UPDpat) origin. An UPD is not necessarily pathogenic, however it is an important cause of genetic disease because several genes suffer genomic imprinting, which silences one allele of the chromosomal pair in a gender-specific manner and a series of imprinting disorders cause NDs associated with ID, autistic behavior, DD and seizures. Examples include the Angelman’s syndrome (UPD (15) pat), Prader-Willi syndrome (UPD (15) mat), Beckwith-Wiedemann syndrome (UPD (11) pat), Silver-Russell syndrome (UPD (7) mat), Temple syndrome (UPD (14) mat) and Kagami-Ogata syndrome (UPD (14) pat)26. Even when not affecting imprinted genes, the UPD can uncover recessive mutations in the uniparental homozygotic regions, for which the sole transmitting parent of this region was heterozygous.

Whole chromosome UPDs can arise as consequence of the correction of a meiotic segregation error that resulted in a monosomic or a trisomic zygote, by duplicating the only chromosome present in the monosomic zygote or by losing one of the exceeding chromosomes in case of trisomy. In the monosomy rescue both chromosomes of the pair will be from only one progenitor and completely homozygous (isodisomic) whether in the trisomy rescue the UPD only occurs when the two chromosomes that were retained are from the same progenitor. In later case they can be totally isodisomic when the meiotic non-disjunction of the two sister chromatids occurred in meiosis II, however, when the meiotic error occurred in meiosis I, because of the homologous chromosomal recombination they will be partially iso/heterodisomic (one or more LSCHs on the chromosome) or completely heterodisomic (not originating homozygous regions) since the outer sister chromatids do not recombine27,28. Segmental UPDs can have complex causes, like rescue of a partial trisomy caused by translocated chromosomes, DNA double-strand breaks or others involving a replicative DNA repair mechanism28,29,30,31,32.

The aims of this study included establishing the overall diagnostic rate of CMA in our settings, to verify the contribution of LCSH, the significance of patients with ASD phenotypes, to see if there is a difference in the diagnostic yield when considering only those with ASD phenotypes, and to provide detailed genetic data of known causal CNVs and/or of other rare, possibly causal, CNVs identified in the cohort.

Methods

Ethical aspects

The research project was reviewed and approved by the Research Ethics Committee of the Hospital Infantil Joana de Gusmão, the children's hospital in Florianópolis-SC, Brazil, under the reference number 2339104. We further declare that the study was conducted accordance with ethical standards and guidelines, set forth in resolution No. 466/12 of the Brazilian National Health Council. Patients or their caregivers provided informed consent to participate in the study. In cases where it was not possible to contact the patient for justifiable reasons (such as loss of contact information), the data was still used, and a Justification of Absence of Consent was signed by the research team. The team committed to maintaining the confidentiality and privacy of the patients whose data and/or information was collected in the records.

Cohort

The aim of this study was to investigate a significant cohort with developmental disorders from South Brazil. We collected a total of 1120 chromosomal microarray (CMA) read files that were performed by the Laboratório Neurogene in Florianópolis, Santa Catarina, Brazil, upon request by medical geneticists and neurologists for investigative/diagnostic purposes, primarily from the Joana de Gusmão Children's Hospital, but also from MDs from the University Hospital Professor Polydoro Ernani de São Thiago and from private clinics in Florianópolis, State of Santa Catarina, between 2013 and 2019. These include also 420 previously published cases28,33. Furthermore, 68 out of 1120 cases were excluded because they belonged to unaffected family members and 40 cases were excluded from the statistics of developmental disorders due to insufficient clinical information. The analyzed sample, therefore, consists of CMA read files and available clinical data from 1,012 patients, primarily children with neurodevelopmental disorders, from southern Brazil.

For analysis of the significance of ASD in our cohort, we established a sub-cohort where we included every patient of the cohort where the clinical phenotype specifically mentioned ASD, autistic disorder (classical autism), childhood disintegrative disorder, pervasive developmental disorder without other specification or Asperger's syndrome as the main reason for referral to testing or as one of the phenotypes of a broader spectrum. We call "syndromic autism" those patients that had dysmorphic features/congenital anomalies (accompanied or not by intellectual disability) mentioned within their clinical phenotypes. In non-syndromic cases we have autism with intellectual disability and what we call "isolated autism", which would be the non-syndromic autism without intellectual disability. The ASD sub-cohort refers to 333 patients from the south of Brazil, of which 134 are part of a previously published study33, for which CMA reading files and clinical data were available.

Collection of clinical data

To establish a correlation between the phenotype and potential causal genes, we gathered the required phenotypic/clinical data in the exam request form and, when possible, supplemented with direct information by their medical doctors. This was done through a questionnaire that asked information about the individual's clinical presentation, behavior, history of physical exams, previous genetic and metabolic tests results, and prescription medication. No new appointments were arranged with the patients for this study, and clinicians retrieved most of the data from their medical records.

Genomic analysis

The investigative CMA platforms used were CytoScan 750K (59%) and CytoScan HD (41%) and the resulting files were analysed using the Chromosome Analysis Suite (ChAS) Affymetrix software4, which is based on the reference genome sequence of the University of California, Santa Cruz database (https-//genome.ucsc.edu/cgi-bin/hgGateway) using the human genome version of February 2009 (GRCh37/hg19). The analysis was retrospective, with the use of the CMA runs obtained from a clinical diagnostic laboratory, with previous consent of the patients.

Typically, the filter criteria for interpreting CNVs for diagnostic purposes are sizes larger than 100 Kbp for deletions and larger than 150 Kbp for duplications, both containing at least 50 markers, according to ACMG recommendations19,20. However, since this is a research study, that aims to identify potential new genes involved in developmental disturbances, we reduced the filter parameters to > 10 Kbp for deletions and for duplications, both with at least ten markers. To interpret the CNVs, we followed the latest recommendations of the ACMG and the Clinical Genome Resource21.

CNVs interpretation and classification

To interpret CNVs, regarding their function, dosage effects (known haploinsufficiency or overexpression studies) and effects of mutations, the UCSC Genome Browser with integrated databases was widely used, mainly ClinVar (NCBI), DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans using Ensembles Resources), DGV (Database of Genomic Variants), OMIM (Online Mendelian Inheritance in Man), ISCA (International Standard Cytogenomic Array), dbGaP (Database of Genotypes and Phenotype), dbVAR (Database of Large Scale Genomic Variants), ECARUCA (European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations), PUBMED (Public Medline), ClinGen (Clinical Genome Resource), MGI (Mouse Genome Informatics Database, from The Jackson Laboratory), SFARI (Simons Foundation Autism Research Initiative) and the private database CAGdb (Cytogenomics Array Group CNV Database). We also used the the Franklin platform34, based on Artificial Intelligence, as a tool for classification and interpretation of genomic variants using scores21.

The variants were classified into four types according to clinical interpretation as benign variants, variants of uncertain significance (VUS), likely pathogenic VUS (LPCNVs), or pathogenic variants (PCNVs), and the result in each case was assigned based on the CNVs of greatest clinical relevance detected in the genome of the patients21.

Variables like location, type and size of each CNV, the CNV classification, number of CNVs detected for each individual, age, gender, clinical descriptions (phenotypes), previous genetic testing results (karyotype, fragile X, etc.), and other relevant known clinical data to which we had access, were compiled (with coded identification) into simple Excel sheet for data handling with the R software [version 3.4.2] (R Foundation for Statistical Computing). This was done to understand the phenotypic frequency, the diagnostic rates, the average age and the gender distribution in the cohort, the frequency of genomic changes in each chromosome and to find if there are any phenotypic clues related to a higher diagnostic probability by CMA (predictive phenotypes of a higher chance to be related to a pathogenic CNV), that eventually could allow selecting the cases that would benefit the most using CMA as a first-line test in settings of financial shortage.

Statistics

In the study, in addition to the descriptive biostatistical analysis, the univariate analysis (Fisher's test) was applied to identify eventual predictive phenotypes for a higher diagnostic result (greater chance of having a pathogenic CNV). To compare the mean sizes, amounts of covered genes and quantities of covered OMIMs genes in the CNVs, by type of CNV found, multivariate analysis such as mean comparison test (Tukey's Multiple test) was applied. A p-value less than 0.05 was considered statistically significant.

Selection and analysis of LCSH

The analysis and selection of LCSH followed the methodology outlined in Chaves & coworkers (2019), applying a threshold of ≥ 3 megabase pairs (Mbp) for the LCSH analysis. This threshold is typically used in clinical investigations, as opposed to population-based studies, where the cut-off threshold is usually considerably lower24. All participants who had LCSHs satisfying the above criteria were included, regardless of whether they had or not a pathogenic CNV.

Automation of LCSHs analyses

For investigation of consanguinity and comparative LCSH analysis among cases as well as for calling potential UPD, all the LCSH reported in ChAS for each case were copied with coded identification and compiled into Excel sheets.

For a more adequate and precise analysis the process was automatized and all LCSHs found in the cohort were imported into Google Colab (https://colab.google/) and manipulated using the Python [3.10] programming language. The libraries used for data manipulation and analysis were Pandas [2.2.0] and NumPy [1.24.0] (for numerical computations). The code used for the analysis is available on the project's GitHub page: https://github.com/tiagochavo87/LCSH_analysis.

Analysis of consanguinity

The frequency of consanguinity in the cohort was calculated according to Kearney, Kearney and Conlin (2011). In short, when the homozygous patterns suggested inbreeding, all the regions of homozygosity ≥ 3 Mbp distributed throughout the chromosomes were added, with exception of the LCSH located on the sex chromosomes; the total sum in Mbp being divided by the size of the autosomal genome, 2.881 Mbp (GRCh37/hg19). The percentage obtained was correlated with the inbreeding coefficient (F), which is: 25% (first grade; 1/4—parent/child or full siblings), 12.5% (1/8—second grade: half siblings; uncle/niece or aunt/nephew; double first cousins; grandparent/grandchild), 6% (1/16—third grade: first cousins), 3% (1/32 fourth grade: first cousins once removed), 1.5% (1/64—fifth grade: second cousins), < 0.5% (1/128—seventh grade: third cousins)24. Kearney and co-workers emphasized that this is a crude calculation, likely to represent an underestimate of the actual homozygous proportion because of the applied threshold of LCSHs over 3 Mbp and because the CMAs may not have SNP probes in certain regions like the acrocentric short arms and the centromeric regions. On the other hand, depending on the degree of inbreeding in the population, these correlations eventually could overestimate the direct kinship relation of the proband.

Uniparental disomy (UPD)

When only LCSHs 3 to < 5 Mbp were present in the genome, but in one single autosomal chromosome the sum of two or three LCSHs (< 5 Mbp) exceeded 10 Mbp, the homozygous regions were considered a potential isodisomy resulting from a uniparental disomy (UPD) event that underwent previous recombination. When one or more LCSH over 5 Mbp was present in a single chromosome with a size or sum (in the case of multiple LCSHs) ≥ 10 Mbp, it was considered a potential UPD (regardless of eventual LCSHs ≤ 5 Mbp on other chromosomes). If more chromosomes had LCSHs over 5 Mbp, it was not regarded as a potential UPD case28.

The ChAS software does not recognize homozygosity, but the absence of heterozygosity named there as loss of heterozygosity (LOH). This includes hemizygous regions generated by a larger deletion. Therefore, all cases with LOHs ≥ 10 Mbp in size on a single autosomal chromosome, regardless of the presence of an additional chromosome with LOH(s) over 10 Mbps in size (or sum of sizes), were manually reviewed, to eliminate the confounding effect of eventual hemizygous regions to call LSCHs and ultimately an UPD.

Analysis of the most frequent LCSH

Of the 953 files available for LCSH analysis we selected the 917 microarrays for the cytobands that most frequently showed regions with LCSH ≥ 3 Mbp on an autosomal chromosome, and those LCSHs present in more than 5% of individuals were considered common LCSH. This percentage was chosen because the frequency of ≥ 1%, which is the usual threshold to define common polymorphisms of SNPs in a population, was not considered applicable here because this is an affected cohort. Also, others have chosen the same threshold (or lower) to consider LCSH found in an affected cohort as a common variation, likely lacking clinical significance for their analysis35,36,37,38,39. Hence, in doing so, we believe to have an adequate safety margin for selecting common LCSH due to ancestral haplotypes rather than due to consanguinity or other pathogenesis-related mechanisms.

To delineate a more accurate genomic position for the most frequent LCSH, the shared homozygous sections were superimposed, and their genomic positions obtained based on the median of their beginning and end.

Ethics approval and consent to participate

The project was submitted and approved by the Research Ethics Committee of the Hospital Infantil Joana de Gusmão, the children hospital of Florianópolis-SC, Brazil, under the Nr 2339104, and respects the guidelines and criteria established by the resolution 466/12 of the Brazilian National Health Council. Patients or their caregivers signed the Informed Consent Form to participate in the study. In cases in which it was not possible to contact the patient for any justifiable reason (loss of contact information, mainly) the data was used and a Justification of Absence of Consent was signed by the research team, ensuring the commitment to maintain confidentiality and privacy of the patients whose data and/or information was collected in the records.

Results

Out of the 1012 cases, 615 (61%) were male and 397 (39%) were female, with ages ranging from 0 to 55 years, and a mean age of 10 years (median = 7.15, standard deviation = 10.2).

Previous karyotyping results were available for 182 patients, with 122 normal and 60 abnormal results (for which CMA was requested to identify the specific sequences involved). However, for most patients no information about previous genetic assessments was available.

From the 1012 microarrays, a total of 7150 CNVs which fulfilled the filtering criteria were selected; 3747 duplications and 3403 deletions which were interpreted and classified into benign CNVs, pathogenic CNVs (PCNVs), variants of uncertain clinical significance (VUS) and likely pathogenic CNVs (LPCNVs).

Phenotypic characterization

Out of the 1012 cases, four were excluded from the phenotypic characterization due to the unavailability of clinical data.

The cohort is mostly characterized by individuals with neurodevelopmental impairment (85%), and 83% of cases had ID and/or DD. In 56% of cases only DD was present while ID was described in 33%. It should be noted that 420 (42%) were under 5 years of age, which is below the age range for intellectual disability diagnosis.

Phenotypic characterization for cases with ASD

Cases with ASD represent 33% of our cohort, these 333 cases, 77 (23%) were under 5 years of age, below the age for diagnosis of ID, and of these, 17 (22%) had DF. Of the other 256 individuals 5 years or older, 68 had ID, of which 36 also had DF; 43 had only DF, and 145 had "isolated" autism (without ID and dysmorphic features (from Facial dysmorphisms to CAs, see cohort in methodology).

Of the 262 male cases, 59 (53%) were below age 5, the diagnostic age for ID, and of these 12 had DF. Of the 203 male cases, aged 5 or more, 53 presented ID, and of these 29 had DF, whereas 150 (74%) had no ID of which 30 presented DF and 120 presented what we call “isolated” autism.

Of the 71 female ASD cases, 18 (25%) were under age 5, and of these 5 had DF. Of the 53 females aged 5 or more, 15 had ID, and of these 7 also had DF, 38 (72%) had ASD without ID, 13 of them with dysmorphic features (DF) and 25 of them presenting what we call “isolated” autism.

In Fig. 1 we summarize the phenotypic characterization of the cases that presented ASD in the cohort.

Figure 1
figure 1

summary of the phenotypic characterization of the cases with ASD.

Other phenotypes

In addition to the main neurodevelopmental phenotypes, most individuals have syndromic features (56%) such as congenital anomalies or malformations or atypical (dysmorphic) facial features (47% of the cohort). Psychiatric or behavioral problems, variations in height or body weight were less frequent accompanying phenotypes.

The phenotypic characteristics recorded in our cohort are listed in Table 1.

Table 1 The clinical characteristics recorded for patients with negative (only benign CNVs) and pathogenic (only PCNV) CMA results.

Diagnostic rate and interpretation of CNVs

Within our cohort of 1012 individuals (including 420 previously published cases 33), we identified 358 rare CNVs (VUS, LPCNVs and PCNVs), of which 203 were interpreted as pathogenic and were present in 170 individuals, (including 75 previously published), representing 17% of the cohort. The description of the PCNVs and clinical phenotypes of the carrier patients are listed in Table 2 (without ASD), Table 3 (with ASD), and the previously published are listed in Chaves & coworkers33.

Table 2 Pathogenic CNVs (PCNV) found in the cohort without ASD.
Table 3 Pathogenic CNVs found in the ASD Cohort. Includes ASD cases of the cohort previously published33.

Among the 170 individuals with pathogenic CNVs of the whole cohort of 1012 cases, including those previously published by Chaves & coworkers33, 26 carried more than one PCNV. 19 of them were carriers of 2 PCNVs (cases #33, #47, #61, #127, #251, #331, #332, #372, #407, #501, #511, #594, #651, #687, #737, #739, #786, #861, and #1080). Additionally, seven cases had three pathogenic CNVs (cases #151, #188, #196, #219, #270, #392, and #995). In three cases (#81, #255, and #331), a pathogenic CNV was accompanied by VUS.

Out of the 204 pathogenic CNVs, 119 were deletions, resulting in only one copy of the involved sequence, except for case #713. The deletion in this case involved a genomic region of the boy's single X sex chromosome. And six cases (#81, #255, #331, #646, #927 and #1109), along with a pathogenic deletion, also presented VUS.

The other 74 pathogenic CNVs were duplications, which usually result in a total of three copies of the involved sequence, but in eight males (#24, #25, #116, #151, #30, #455, #807 and #809) involved a relevant region of a sex chromosome and resulted in two copies (the main reason for pathogenicity is the fact that in males none of the duplicated copies on X undergoes inactivation, which it does in females) and in five cases (#306, #422, #443, #511 and #620) the CNV found was in a state of four copies. Figure 2 illustrates the frequency and number of pathogenic CNVs found per chromosome.

Figure 2
figure 2

Circle plot with the pathogenic CNVs and VUS* detected in our study.

Pathogenic CNVs were found on all chromosomes (see supplementary information 1—Pathogenic CNVs per chromosome), with sizes from 32 Kbp to 71 Mbp (SD = 9992, mean = 8365) and contained 1 to 581 genes per PCNV (SD = 93, mean = 87), of which 1 to 87 (SD = 13, mean = 9) are genes cited in the OMIM database (OMIM genes) (see supplementary information 2).

Univariate analysis (Fisher's test) indicated the predictive phenotypes for a higher diagnostic outcome (greater chance of having a pathogenic CNV) in our cohort with DNNs: Developmental delay (p-value ≤ 0.001, OR = 0.53); Autism Spectrum Disorder (p-value ≤ 0.001, OR = 2.18); Facial Malformations/Dysmorphisms (p-value ≤ 0.001, OR = 0.42); Upper limb anomalies (p-value ≤ 0.001, OR = 0.36); Lower limb anomalies (p-value = 0.001, OR = 0.41); genitourinary anomalies and malformations (p-value = 0.004, OR = 0.38); Low weight (p-value = 0.01, OR = 0.44); Intellectual disability (p-value = 0.014, OR = 0.65); Heart anomalies and malformations (p-value = 0.018, OR = 0.51); ID or DD (p-value = 0.025, OR = 0.65) and Motor development delay (p-value = 0.036, OR = 0.54). There was no significantly higher diagnostic result by CMA for the other phenotypes (see supplementary information 3).

Following the scoring system, another 155 rare CNVs were interpreted as 141 Variants of uncertain significance (VUS) (Supplementary Table 1) and 14 as Likely Pathogenic CNVs (LPCNVs) (Table 4), these being the main findings in 13% of the cohort. Of these, 102 are duplications and 53 are deletions. In cases #635, #658, #929 2 VUS were detected and in cases #649, #937, 3 VUS.

Table 4 Likely pathogenic CNVs found in the cohort.

These variants were found on most chromosomes except for 21 and 22 (see supplementary information 1—VUS per chromosome), with sizes from 30 Kbp to 8 Mbp (SD = 1266, mean = 802) and contained 1 to 87 genes (SD = 13, mean = 9), of which 1 to 38 (SD = 5 mean = 5) are genes cited in the OMIM database (OMIM genes) (see supplementary information 2). Figure 2 illustrates the frequency and amount of VUS per chromosome (in track 2). Fourteen VUS, according to the scoring system were found to be LPCNVs (Table 4).

All other CNVs were interpreted as either common genetic polymorphisms or benign variants found in all chromosomes, with sizes that varied from 10 Kbp to 24 Mbp (SD = 586, mean = 298) and contained zero to 227 genes (SD = 8, mean = 3), of which zero to 144 (SD = 4 mean = 1) are genes cited in the OMIM database (OMIM genes) (see supplementary information 2).

Diagnostic rate and interpretation of CNVs for cases with ASD

When analyzing separately the 333 CMAs from patients where ASD (including all definitions of the spectrum) was cited as the main reason for referral or as one of several phenotypes of the patient, a total of 3259 CNVs that met the filtering criteria were detected. Of those 1494 were duplications and were 1765 deletions, most of them interpreted as benign. In 33 CMAs no CNVs meeting the filtering criteria were detected. The frequency of the most relevant type of CNV found in each case in the whole cohort and the sub-cohort with ASD is illustrated in Fig. 3A1, A2. The proportional contribution of each type of CNV per subclass of ASD is illustrated in Fig. 3B.

Figure 3
figure 3

(A1) Classification of cases per most relevant CNV found in the whole cohort. (A2) Classification of cases per most relevant CNV found in the sub-cohort with ASD. (B) Diagnostic rates per ASD phenotypic categories. ASD autism spectrum disorder, ID intellectual disability, DF dysmorphic features (syndromic), classical autism (including ASD cases high functioning isolated ASD), isolated ASD: ASD without ID and without DF/CAs. (C) Odds ratios for pathogenic CNVs in classes of phenotypes. Odds ratios shown in log2 scale. As can be seen in (B), when comparing ASD with ID to ASD without ID, the diagnostic rate (12% and 10% respectively of PCNVs) is a little higher when ID is present. However, the presence of VUS is 5% higher when ID is present (19% compared to 14% in ASD w/o ID). Syndromic ASD definitively has a much higher diagnostic rate (16%) than non-syndomic ASD (7%).

In 10% of cases (33/333) we identified a total of 38 rare CNVs that were interpreted as pathogenic (Table 3), 22 deletions and 16 duplications. The particularities of cases #511, #594 and #737, with 2 PCNVs, cases #455 (Y Chromosome), #809 (X chromosome) and cases #443 and #511 (PCNV in a four-copy state) were mentioned before.

In the ASD sub-cohort pathogenic CNVs were found on 14 of the 24 human chromosomes (1, 2, 7, 9, 13, 14, 15, 16, 17, 18, 21, 22, X and Y), with sizes from 221 Kbp to 22 Mbp (SD = 5561, mean = 4926) and contained 1 to 342 genes (SD = 63, mean = 60), of which 1 to 83 (SD = 32, mean = 29) are genes cited in the OMIM database (genes OMIM) (see supplementary information 3).

For individuals affected with syndromic ASD (with DF) the diagnostic rate was higher than for the whole ASD cohort (16% to 10%), confirmed by univariate analysis 16% (p = 0.02, OR 2.43, for pathogenic CNVs) (Fig. 3C).

In cases with ASD, DF and ID, the diagnostic rate was 14%, and for ASD with ID, but without DF, it was 12%. For "isolated" ASD, the diagnosis dropped to 7%.

In the 39 cases < 5 years, 5 (13%) had pathogenic CNVs and 6 had only VUS.

For 13% (44/333) of the cases, VUS, which are also rare CNVs, were the only relevant findings, totaling 48 CNVs, 20 deletions and 28 duplications (Supplementary Table 1). These variants also were found on most chromosomes, except for chromosomes 4, 5, 12, 18, 19, 20, 21 and 22, with sizes from 10 Kbp to 5.6 Gbp (SD = 1032 Kbp, mean = 700 Kbp) and contained 1 to 61 genes (SD = 12, mean = 9), of which 1 to 26 (SD = 5 mean = 4) are genes cited in the OMIM database (OMIM genes) (see supplementary information 3). In tracks 3 and 4 of the circus ideogram graph (see supplementary information 4), the VUS found per chromosome are plotted.

Four of these VUS (in cases #513, #633, #833 and #1127) were subclassified as LPCNVs, currently without convincing evidence (Table 4).

All other CNVs were interpreted as either benign or common genetic polymorphisms, submicroscopic variants found in all chromosomes, with sizes that varied from 10 Kbp to 24 Gbp (SD = 870, mean = 228) and contained zero to 181 genes (SD = 9, mean = 3), of which zero to 96 (SD = 4 mean = 1) are genes cited in the OMIM database (OMIM genes) (see supplementary information 3).

Long contiguous stretches of homozygosity in the samples

In total, 953 CMA results whose files were available and accessible for the LCSHs study were analyzed. The majority (91%) of CMAs had at least one autosomal LCSH (≥ 3 Mbp), resulting in a total of 3445 LCSH identified in 865 individuals. Only 88 CMAs did not show any LCSH (≥ 3 Mbp). Of the total, 59% (565/953) had only LCSH below 5 Mbp, while 31% (300/953) had one or more LCSH ≥ 5 Mbp.

LCSH leading to suspected UPD

In 27 individuals (~ 2.8%) of the 953 CMA analyzed, which include 11 previously published cases28 the LCSH suggested a potential UPD (Table 5 and Fig. 4).

Table 5 Cases with potential UPDs, where a single autosomal chromosome presented LCSH(s) over 3 Mbp, that that alone or in addition of LCSHs ≥ 3 Mbp reached a size of ≥ 10 Mbp with no other LSCH over 5 Mbp on any other autosomal chromosome.
Figure 4
figure 4

Chromosomal distribution of the 27 cases with LCSH (single or sum) ≥ 10 Mbp restricted to one chromosome, suggesting putative UPDs.

Consanguinity

Analysis of LCSH distributed across multiple chromosomes indicated some degree of inbreeding in 36.5% (348/953) of cases, with over 24% suggesting seventh- to sixth-degree parentage (as third cousins); 7.2%, fifth grade (eg, second cousins); 1.8%, fourth grade (distant first cousins); 1.8%, third degree (first cousin; half-uncle with niece); 0.6%, second-degree (half-siblings, uncle-niece, double cousins) and in two cases (0.2%) parental kinship suggested incest as it is a coefficient of first-degree inbreeding [father (mother) /daughter (son), full siblings].

Clinically more relevant first-to-fifth-degree kinship was suggested by ~ 11.5% of cases.

LCSH with frequency ≥ 5%

Due to the scarcity of information about common LCSH in the Brazilian population in previous work we decided to explore the data from this affected cohort to identify frequent LCSH in the population of Santa Catarina, which we consider to potentially be non-causal for the developmental issues of the patients28, and now we revise the findings with a larger sample.

The frequency of 5% or more to consider a recurrent LCSH as a common finding in the population of southern Brazil was decided on an empirical basis. This threshold was established to ensure a significant safety margin compared to the 1% threshold used for considering a Single Nucleotide Polymorphism (SNP) as a common variant in the population. This choice was made because analyzing an affected population can introduce bias. However, it is still possible that certain autozygous haplotypes act in conjunction with other genetic variations to manifest the phenotype.

The LCSH identified as frequent, potentially representing regions of low recombination that can maintain ancestral haplotypes identical by descent, are shown in Table 7 and Fig. 5.

Figure 5
figure 5

Visualization of the chromosomal locations of the LCSHs in autosomal chromosomes considered common (frequency ≥ 5%) identified among 917 CMA results.

Discussion

This expanded retrospective cohort study involved 1012 patients with neurodevelopmental disorders (NDDs) and congenital anomalies (CAs) from the state of Santa Catarina. A total of 206 pathogenic copy number variations (CNVs) were identified in 170 individuals, resulting in a diagnostic yield of 17%. This diagnostic yield is almost the same as the 18% obtained in our first study33 and within the range of 15% to 20% of the diagnostic rate reported in the literature for patients with NDDs33,40,41,42,43,44,45,46,47,48,49,50,51,52.

It is important to highlight that out of the 173 cases with pathogenic CNVs, 32 cases had a previous abnormal karyotype result, which prompted the CMAs to identify the DNA sequences involved. Excluding the 32 cases with known abnormal karyotypes, the diagnostic rate drops to 14%. The chromosomal microarray (CMA) was essential in discovering altered sequences in abnormal karyotype results, offering unexpected insights into discrepancies compared to what a karyotype suggests. The CMA allows for scrutiny, and sometimes it reveals deletions in chromosomes where the karyotype suggests additions or additions when the karyotype suggested deletions.

In our previous work, which includes part of the current cohort, we extensively discussed the usefulness of classical karyotyping as a complement to CMA results (and vice-versa), exemplified by 17 cases with altered chromosomal results and their respective PCNV findings, including the case #687 illustrated above33. We can only underscore the importance of having both classical karyotype results and CMA results. They provide valuable clues about the processes leading to pathogenic changes and are crucial for genetic counselling53,54. Unfortunately, as CMA testing becomes more prevalent, classical karyotyping is performed less frequently, everywhere. They should at least be conducted for the child and parents when results indicate a pathogenic CNV or a potential UPD. Achieving this goal is desirable, but unattainable in most (not privileged) settings. Few cases will have access to both investigations, and even fewer will have the opportunity to investigate parents and other family members.

CNVs

Our analysis revealed pathogenic CNVs across all human chromosomes, with more than one causative variant identified in 15% of individuals. Deletions accounted for the majority (64%) of all detected pathogenic variants, consistent with the findings of others55, whereas for VUS the deletions represented only 34%.

Our findings indicate a higher incidence of pathogenic variants on chromosomes 1, 3, 19, and X, with 17, 16, 15, and 18 PCNVs, respectively. This contrasts with the results of previous studies23,24,25,26 (see Supplementary information 1- Pathogenic CNVs per chromosome).

The sizes of the PCNVs, the number of genes they covered, and the number of OMIM genes associated with these CNVs to those of the VUS and non-causative (benign) CNVs, show a statistically significant difference with P < 0.0001 (according to Tukey's Multiple test) (Fig. 3A1 and Supplementary information 2). This is comprehensible, since larger CNVs, with more genes, in particular with more genes related to disease or known to drive important cellular processes will have a higher impact, which tends to be greater for absence of gene copies than for their excess.

As depicted in the circus ideogram (Fig. 2), pathogenic CNVs tend to be situated near telomeres in most chromosomes. This is expected since subtelomeric regions are prone to rearrangements, given that only one chromosomal breakpoint is required to initiate a submicroscopic abnormality56.

Pathogenic CNVs are also known as recurrent and non-recurrent. While non-recurrent pathogenic CNVs occur sporadically in the genome, with probable origins in replication errors or DNA repair mechanisms, they cover different gene contents and consequently present variable phenotypes55,56,57. Recurrent pathogenic CNVs, in turn, are associated with known and characterized microdeletion and microduplication syndromes. Recurrence of these CNVs is mediated by non-allelic homologous recombination between locus-specific low copy repeats (LCRs)58,59.

We have identified a total of 71 individuals with known syndromes that are associated with 72% of pathogenic CNVs. Among them, the most common were Angelman/Prader Willi syndrome, Di George syndrome (0.7%), 1p36 deletion syndrome (0.6%), 16p11.2 deletion syndrome, and Cri Du Chat syndrome (0.5%) (Supplementary Table 2).

Phenotypic characterization

Characterizing phenotypes is a crucial step in investigating the genetic etiologies of developmental disorders, helping to identify the role of the genes involved, as Moeschler and Shevell's (2014)60 emphasized in their systematic review about the investigation of children with global developmental delay and intellectual disability.

In our cohort, the phenotypic characterization revealed a predominance of phenotypes related to NDs, accounting for 85% of cases, similar to findings reported by others55,59,61, with 83% of the individuals presenting ID and/or DD. In 56% of cases DD was present, while ID was mentioned for 33%. Autism Spectrum Disorders were present in 33% of the cohort, in 14% of the cohort we had “isolated” ASD (without ID and without DF). It's worth noting that 42% of the cohort was under 5 years of age, which is below the typical age range for diagnosing ID and eventual deficits are diagnosed as DD. Nevertheless, even considering that many individuals with DD are not necessarily intellectually deficient, it is still possible to estimate the prevalence of Intellectual Disability (ID) by including individuals with both DD and ID, because it is known that most individuals with DD in early childhood will later receive a diagnosis of ID62.

Along with major neurodevelopmental phenotypes, many individuals exhibit syndromic features (56%), such as congenital anomalies or malformations, and most (47% of all) had atypical facial appearance (facial dysmorphism). Other comorbidities, such as psychiatric or behavioural problems, and variations in physical parameters, like height or body weight, were less frequently reported.

With a larger sample than in our previous study, the univariate analysis confirmed our first findings, showing a significant association for the presence of pathogenic CNVs with autism spectrum disorders (in this case, with a lower presence), facial malformations/dysmorphisms and genitourinary anomalies/malformations. Obesity and short stature, that were significantly related as second relevant phenotypes when the cohort was smaller33, lost their significance in the now larger sample. Now developmental delay, intellectual disability, limb anomalies, low weight, heart anomalies/malformations and motor development delay gained in significance (see Supplementary Information 3).

However, even with such an extended sample, there is not one phenotype or group of neurodevelopmental or malformation phenotypes with sufficiently robust evidence as to justify a preferential CMA testing decision. Additionally, we are aware of our limitations in obtaining standardized phenotype data. This is mainly because there is no standardized phenotype collection and annotation among medical doctors, most of whom are not geneticists and have limited access to genetic tests for follow-up genome sequencing or mutation investigation.

In the State of Santa Catarina, which has approximately the size of Hungary and close to 7.6 million inhabitants, there are only a few (about five) medical geneticists, most of whom practice in Florianópolis, the state capital. Consequently, many patients come from distant areas or are referred for testing by medical doctors outside the main city, without the opportunity to consult with a medical geneticist. A comprehensive and standardized reassessment in all cases, which is currently beyond our capabilities, would be crucial for confidently confirming the phenotype findings and, not to mention, aiding in the interpretation of the CNVs found.

ASD cases

For the 333 cases of cohort who were diagnosed within the ASD, the ages ranged from a few months to 34 years, with a male predominance of 3.7:1. This is interesting, because when considering the male to female ratio of the whole cohort, the proportion is 1.55:1 and when the cases that mention ASD phenotypes in the clinical description are excluded, the male to female ratio is 1.1:1. We are aware that the cases did not undergo a standardized clinical assessment for ASD. However, the ratio of about 4 M:F is well established in the literature, and has led to specific reviews on sex differences in ASD63,64,65,66,67,68.

Based on the clinical data which we could obtain, 29% of the individuals (79 aged 5 or more; 17 under 5 years of age) of our ASD cohort also had dysmorphic features (DF), a term that we used to include facial dysmorphia and/or congenital anomalies. When DF were present, we considered them to be syndromic ASD cases, that could have ID or not.

Like the diagnosis of ASD, the diagnosis of ID did not follow a standardized protocol. Some individuals underwent detailed cognitive tests, and others were diagnosed by doctors based of several criteria, this can be seen on Tables 1 and 2, where in most cases only ID is mentioned, without the degree of the ID (mild, moderate, severe). Within the 256 individuals with ASD aged 5 or more, 68 (27%) had some degree of ID. Isolated ASD, which we use to define the non-syndromic patients without ID, comprised 44% (145/333) of the cohort.

According to Rosti et al. (2014)69, approximately 75% of ASD were essential (non-syndromic) cases, whereas 25% are syndromic. Lovrečić et al. (2018)70, reported a proportion of 41% of isolated ASD, 41% with DD and 19% with complex (syndromic) phenotypes when studying a cohort of 150 ASD cases.

There are wide differences within the published prevalence of ID among autistic individuals, Chiurazzi et al. (2020)71 mentions a coexistence of 70% of cases with ASD with ID, while 40% of cases with ID have ASD72. The Autism and Developmental Disabilities Monitoring Network (ADDM) funded by the CDC, states that about one third of individuals (35.2%) of the ASD spectrum also have some degree of ID (CDC—Autism Spectrum Disorder, last reviewed December 15, 2022).

There are sex differences among the subclasses of ASD. Whereas the male:female ratio for the whole ASD cohort is 3.8:1, for syndromic ASD it is 2.9:1. In syndromic ASD with ID it is 4.1:1; syndromic ASD w/o ID, 2.3:1. For non-syndromic with ID it is 3:1, and for isolated Autism (non-syndromic w/o ID) it is 4.8:1.

CNVs were found in 90% of the 333 CMAs analysed, and 38 CNVs interpreted as pathogenic were detected in 35 cases with ASD, resulting in a diagnostic yield of 10%, lower than the diagnostic rate for the whole cohort (17%), but within the range of 8 to 22% cited in the literature for other ASD cohorts16,70,73,74,75,76,77,78,79,80,81,82,83,84,85. And without the ASD cases, the diagnostic rate of the cohort increases to 20%.

Within the 35 cases with pathogenic CNVs, 4 were among the 9 patients that had previous abnormal karyotype results, for which the CMA test was requested to identify the DNA sequences involved. Excluding the 4 cases with known abnormal karyotypes, the diagnostic rate drops to 9%, however, the diagnostic yield was considered 10% because the CMA was essential to discover the altered sequences in the abnormal karyotype results.

Recurrent and rare CNVs in ASD

The pathogenic CNVs found in this study and the reported phenotypes of the respective patients are detailed in Table 3. We highlight the genetic syndromes involved with these alterations, which were identified in our cohort, in addition to the most common syndromes in ASD, which involve the chromosomal regions 15q11-q13, 16p11.2 and 22q11.286,87,88,89,90,91,92, such as the 15q13.3 Microdeletion Syndrome (#612001), Chromosome 16p.11.2 Deletion Syndrome (OMIM# 611913 ; n = 2), Distal 16p11.2 Deletion Syndrome (#613444) (in 2 cases), Distal 22q11.2 Microduplication Syndrome (# 608363) and Angelman/Prader-Willi Syndrome (*600162).

Also rarer syndromes like 1q21.1 Microduplication Syndrome (#612475), 2q37.3 Microdeletion Syndrome (#600430), Williams-Beuren Region Duplication Syndrome (#609757, n = 2), 9p Deletion Syndrome (#158170), Distal 13q Deletion Syndrome (#613,884), Temple Syndrome (#616222), Partial Trisomy 16p13.3 Syndrome, Potocki-Lupski Syndrome (#610883), Distal Chromosome 18q Deletion Syndrome (#601808), 18q Deletion Syndrome (#601808), Schinzel Giedion Syndrome (#269150), 21q22.12 Microdeletion Syndrome, 22q13 microdeletion/Phelan-McDermid syndrome (OMIM# 606232; n = 2), MECP2 Duplication Syndrome (#300260), Triple X Syndrome and XYY Region Syndrome have been associated to ASDs.

Among the pathogenic CNVs detected in our study, the ones with the highest frequency in the literature, based on data from the SFARI bank, are the 16p11.2 microdeletion (108 entries), followed by the duplication of 7q11.23 (85 entries), the 16p13 microduplication. 3p12.3 (73 entries), the Xq28 microduplication (59 entries), the 15q11.2q13.1 microdeletion (56 entries), the 22q13.33 microduplication (54 entries), and the 17p11.2 microduplication (45 entries). And identical to the findings of Li et al. (2015)93, in our study chromosomes 15, 16 and 22 together contributed to more than 25% of pathogenic CNVs.

Among the rarer findings, based on the SFARI database we have: Case #66, carrying a 22 Mbp microduplication at 15q25.1q26.3(80,304,866–102,429,040), with no SFARI entry for the locus; Case #345, a 2.7 Mbp microdeletion at 14q32.2q32.31(100,095,248–102,755,064), with two entries; the case #385, with a 4 Mbp microdeletion at 21q22.12q22.2(35,834,713–39,831,660), with only one entry; Case #443, carrying a heterozygous microduplication (4×) of 2.8 Mbp at 22q12.3q13.1(35,888,588–38,692,765), with two entries for duplication and 4 for locus deletion; Case #455, which is a 26 Mbp duplication in Yp11.31p11.2-Yq11.23(2,650,140–28,799,937), with 6 entries from a single study91; In case #751, with an 8.3 Mpb microdeletion at 18q12.2q21.1(36,210,635–44,530,609), with a single entry; Case #873, a 10 Mpb microdeletion at 13q33.2q34(105,020,842–115,107,733), with 11 entries. And case #1107, with altered karyotype, as previously mentioned, presented a deletion of 15 Mbp in 9p24.3p22.3(208,454–15,424,987), with two entries, one deletion and one duplication.

When it comes to submicroscopic chromosomal alterations, both deletion and duplication of CNVs can result in decreased gene expression by gene disruption, whether gene duplications can also lead to overexpression of genes.

As discussed by Velinov94, the detection and interpretation of recurrent CNVs, which are often associated with ASD, facilitates post-test genetic counseling, since one can safely conclude the genetic etiology by associating the CNVs with the clinical characteristics of the patient. In most cases, particularly when the parents are unaffected, it is more likely that pathogenic CNVs have their "de novo" origins. This occurs due to events such as errors during meiotic recombination, early illegitimate mitotic recombinations, or due to repairs to DNA double-stranded breaks during the first divisions of embryonic cells95.

On the other hand, pathogenic CNVs can also originate from the consequences of a balanced chromosomal translocation in the genome of the parents, according to Nowakowska et al. (2016)96, it is advisable to test the parents of individuals with large pathogenic CNVs, through the classic karyotype, since that balanced translocations cannot be identified by CMA and carry a high risk of recurrence.

Influence of dysmorphic features and/or ID in the diagnostic rate

Although the diagnostic rate for several phenotypic groups was higher than the 10% of diagnostic rate found in the ASD cohort, only the diagnostic yield of 16% for syndromic ASD was confirmed as significant by univariate analysis (p ≤ 0.05, OR = 2.43) (Fig. 3C).

Several studies have investigated the diagnostic yield of CMAs and genome sequencing techniques in cohorts with neurodevelopmental disorders and, even though with a large diagnostic variation when whole genome or exome sequencing is applied, syndromic patients tend to have significatively higher probability for a positive diagnostic result33,97,98. Specifically for ASD, the mean diagnostic yield is usually lower than for a typical neurodevelopmental cohort. However, among autism subtypes, higher diagnostic usually occurs when ASD is syndromic accompanied with other features and is syndromic (or complex) ASD78,99.

LCSHs

In 2006, Li et al. (2006)35, indicated that LCSH were more common in the human genome than was considered at the time and that they could have an impact on many fields of genetic studies. We now know that LCSH are one of the most common types of genomic traits in humans, being observed throughout the human genome as a consequence of inbreeding or evolutionary forces22,26,100,101,102.

Previously we described the analysis LCSHs in 430 cases that are part of this cohort28. Now, considering the whole cohort, we found that 91% of the individuals have at least one autosomal LCSH ≥ 3 Mbp as revealed by their CMAs tests.

Potential UPDs were found in 2.8% of the CMAs of the cohort, similar to the 2.6% we found in or previous work28. The frequency of potential or confirmed UPDs found among published cohorts varies largely among studies. Investigating 214,915 trios, from the 23andMe sequencing dataset, representing a non-clinical general population, the authors found 105 cases of UPD estimating that UPD occurs with an overall prevalence rate of roughly 1 in 2000 births or 0.05%103. The frequency of UPDs found in studies that used exome sequencing of patient-parent trios of large clinical populations for all sorts of genetic conditions is higher and oscillates between 0.2 and 0.6%104,105,106. The investigation for UPDs with whole genome sequencing of 164 parent–child trios in a more selected cohort, an Irish cohort with rare disorders, found 3 UPDs a frequency of 1.8%105.

Using CMA platforms with distinct SNP density and in clinical populations with distinct ethnic backgrounds, the reported potential UPD rate oscillates from 1 to over 4%23,106,107,108,109.

We want to emphasize once again that CMA technology can only detect UPD regions in cases of isodisomy; it cannot identify UPDs with total heterodisomy. In a complete UPD, whether it's isodisomic, iso/heterodisomic, or entirely heterodisomic, both homologous chromosomes will exhibit the gende-specific imprinting of the sole transmitting parent across their entire length. It's also important to remember that long, uninterrupted stretches of homozygosity may also result from homologous repair through a breakage-induced DNA replication mechanism, which, in contrast, can originate segmental UPDs110.

When considering the processes that lead to UPD, it's worth noting that among the 27 cases with LCSH suggesting a potential UPD, eight also had PCNVs that were either considered responsible or partially responsible for their clinical conditions. Additionally, three presented VUS, including two with LPCNVs.

One exception is case #584, which had a PCNV spanning 2.8 Mbp (4×) and overlapped with approximately 1 Mbp of the homozygous region associated with the putative UPD, whose complex origin hints to a real segmental UPD. All other CNVs were located on chromosomes unrelated to the identified UPD. We did not detect any traces of mosaicism involving the affected chromosome in any of the cases, which could have suggested a trisomy rescue.

When a potential UPD is found on one of the chromosomes related to imprinting disorders, like chromosomes 6, 7, 11, 14, 15 or 20, and the phenotype of the patient fits the potential imprinting disorder phenotype, the follow-up is straightforward111,112. However, most often the UPDs are on chromosomes without imprinted regions and sequencing of the isodisomic region should be considered because it often unmasks a homozygous deleterious variant inherited from a heterozygous parent107.

Out of the 27 potential UPD cases identified in our study (Table 5 and Fig. 4), only seven were associated with chromosomes known for imprinting disorders110. Cases #169 and #346 on chromosome 7, as well as case #312 on chromosome 14, have been previously discussed28. Among the cases with potential UPD-like LCSH patterns on chromosome 11, case #633 has a PCNV identified as the causal factor for its clinical condition, and cases #569 and #628 do not exhibit the hallmark phenotypes typically associated with Beckwith–Wiedemann overgrowth syndrome caused by UPD(11)pat or Silver-Russel Syndrome caused by UPD(11)mat. The same is true for case #907 on chromosome 20, whose available phenotypes do not correlate at all with the imprinting disorders of these chromosome.

Consanguinity

Approximately 24% of the CMAs revealed an LCSH pattern suggesting a distant familial connection (sixth or seventh degree) among the parents of patients affected by NDs. As we've previously mentioned, these findings may be indicative of regional immigration patterns and intermarriage among immigrants in southern Brazil. When the relationship suggested by the LCSH is distant and more associated with the endogamous characteristics of the population, the likelihood of clinical significance decreases.

More significant is the fact that in 11.5% of the CMAs, the LCSHs indicated a first to fifth-degree parental relationship between the parents. These cases are more likely to have a clinical impact because the closer the parentage, the higher the proportion of shared alleles, increasing the risk of inheriting two copies of an autosomal recessive (AR) mutation24. We provide an in-depth discussion of the impacts and relevance of these findings in a previous publication28.

As shown in Table 6, two patients exhibit homozygosity, indicating potential first-degree relatedness among their parents. These results are communicated to the referring physicians by the diagnostic laboratory. It is the responsibility of these physicians to follow the appropriate protocols for these cases.

Table 6 Details the results referring to the 4.3% of cases that suggested kinship from first to fourth grade.

For one patient (#1068) where a second-degree relatedness is suggested among his parents (Table 6) a PCNV was identified in chr 15 (Table 2). This patient presents a complex syndromic phenotype that extends beyond the typical manifestations usually associated with this deletion, which are mainly related to ASD, DD and behavioural issues, suggesting the participation of a causal autosomal recessive development gene.

LCSH considered common (frequency ≥ 5%)

As extensively discussed in Chaves et al. (2019)28, identifying and knowing the most common (recurrent) LCSH allows us to focus the analysis on the most clinically significant LCSH. Following the same reasoning and criteria of our initial study, in this new analysis, we have identified ten LCSH ≥ 3 Mbp occurring at a frequency of 5% or higher, thus considering these LCSH as a possible common variation in our population.

All LCSH, except for 19q13.2-q13.31 (40,357,663–44,200,928), which was identified as frequent in our dataset (Table 7) have been previously recognized as common LCSH by other research groups in clinical investigations involving patients with developmental disorders28,36,37,38,39,108, including our previous work. These LCSH are typically considered low recombination regions, representing blocks of ancestral haplotypes, and are generally interpreted as potentially non-pathogenic.

Table 7 Regions of LCSH considered common (frequency ≥ 5%) identified among 917 CMA results.

Wang et al. (2015)37 identified several of these regions as recurrent LCSH without clinical relevance in a cohort of patients with NDDs, including unaffected parents. Kearney HM39 reported them as findings occurring at a frequency > 5% in CMA readings (CytoScan HD, Affymetrix) from affected individuals. Sanchez P38 in an analysis of a cohort of 278 affected Hispanics reported LCSH as common when their frequency exceeded 3% in CMA samples (CytoScan HD, Affymetrix). Neta et al. (2022)108 reported the region we found on chromosome 16 as occurring at a frequency of 12.7% in a cohort of 100 patients with ID and/or ASD from the Midwest region of Brazil. Pajusalu et al. (2015)36 reported similar findings to ours on chromosomes 3 and 11 as recurrent LCSH with frequencies of 9.3% and 6%, respectively, using a minimum cutoff size of 5 Mbp, in the investigation of 2110 consecutive Estonian patients (including prenatal care and parenting samples).

In our previous research, we identified as common the regions 6p22.2p22.1 (26,340,871–30,006,805) and 20q11.21q11.23 (31,940,638–36,081,725), also reported as common by Sanchez P38, Kearney HM39, and Pajusalu et al. (2015)36, as well as 7q11.22q11.23 (71,997,278 -76,128,151), that had no prior report. However, they were not confirmed at a frequency ≥ 5% in this larger sample. Conversely, our previous study did not identify 5q23.3-q31.1 (128,694,241–132,201,418), also found by Kearney HM39, as frequent. However, in the larger cohort this LCSH now shows up at a frequency above 5%.

We found no previous report of the LCSH in 19q13.2q13.31 (40,357,663–44,200,928) that we identified now. This homozygous region is not associated with any genes known to have an imprinting pattern in humans113. It encompasses 148 known genes, out of which 81 are listed in OMIM, including five genes related to autosomal recessive (AR) disorders: Charcot-Marie-Tooth Disease, Type 4F (#614895), Maple Syrup Urine Disease (#248600), Neurodevelopmental disorder with hypotonia, neuropathy, and deafness (#617519), Ethylmalonic Encephalopathy (#602473), and Agammaglobulinemia 3 (#613501).

The LCSH considered frequent and common in the current study not only support the findings and discussions of our previous research but also raise the possibility that our threshold of considering LCSHs only at a frequency ≥ 5% could be too conservative. It might be a relatively safe alternative to consider a lower threshold, such as LCSHs with a frequency above 4% or 3%, as Sanchez P38 did.

Conclusions

In this retrospective study, we present the largest report of microarray chromosome data (CMA) in a cohort with neurodevelopmental disorders (NDDs) and/or congenital anomalies (CAs) from Southern Brazil. We achieved a diagnosis rate of 17%, consistent with the literature (15–20%). We characterized the rare copy number variations (CNVs) that we identified and associated them with the main phenotypes presented by each patient. The interpretation of CNVs is challenging and relies on information such as frequency and characterization in affected populations, typically obtained from cohort studies with significant sample sizes.

The primary reasons for referring individuals to CMA testing in this study were developmental delay/intellectual disability and autism spectrum disorder, often accompanied by syndromic features like congenital anomalies or dysmorphic features. Certain phenotypes have been shown to predict a higher likelihood of carrying a pathogenic CNVs.

For the cases with the ASD, although our diagnostic yield of 10% for ASD is within the range reported in the literature (8–21%), it is higher (16%) when it is syndromic, associated with dysmorphic features, and lower (7%) for "isolated" ASD.

Among the 953 CMAs analysed for contiguous stretches of homozygosity (LCSH), we observed 27 large LCSH (≥ 10 Mbp, ranging from 10.6 to 88.8 Mbp) on a single autosome, suggesting a potential frequency of uniparental disomy (UPD) of 2.8%. However, the limitations of CMA underestimate the true UPD rate, as it can only suggest its presence when uniparental isodisomy is detected. The absence of methylation tests hinders confirming these findings as real UPDs and distinguishing between complete and segmental UPDs.

Regarding consanguinity, the analysis of LCSHs indicated a possible descent from first- to fifth-degree relatives in approximately 11.5% of the cohort. This information is crucial for genetic counseling, as close relatives pose an empirical risk of recurrence, potentially due to autozygous autosomal recessive (AR) mutations. In cases with affected siblings, the analysis of regions that are identical by descent (IBD) can assist in identifying the target region for investigation, particularly when employing whole-exome sequencing (WES).

We identified ten LCSHs with a frequency above 5% in individuals with NDs. Nine of these LCSH had previously been reported as common variants by other research groups, suggesting that they are likely normal population variants in Santa Catarina. It might be possible that our threshold of considering LCSHs only at a frequency ≥ 5% could be too conservative. While valuable for prioritizing clinically relevant LCSHs for analysis, a clinical contribution of this homozygous regions cannot be completely ruled out.

Overall, analysing LCSHs detected by CMA with high SNP density provides valuable information to aid in the investigation of neurodevelopmental disorders. However, these findings are mostly theoretical and suggestive, serving as guidelines for further investigations such as methylation analysis, targeted gene sequencing, or WES.