A cohort study of neurodevelopmental disorders and/or congenital anomalies using high resolution chromosomal microarrays in southern Brazil highlighting the significance of ASD

Chromosomal microarray (CMA) is the reference in evaluation of copy number variations (CNVs) in individuals with neurodevelopmental disorders (NDDs), such as intellectual disability (ID) and/or autism spectrum disorder (ASD), which affect around 3–4% of the world’s population. Modern platforms for CMA, also include probes for single nucleotide polymorphisms (SNPs) that detect homozygous regions in the genome, such as long contiguous stretches of homozygosity (LCSH). These regions result from complete or segmental chromosomal homozygosis and may be indicative of uniparental disomy (UPD), inbreeding, population characteristics, as well as replicative DNA repair events. In this retrospective study, we analyzed CMA reading files requested by geneticists and neurologists for diagnostic purposes along with available clinical data. Our objectives were interpreting CNVs and assess the frequencies and implications of LCSH detected by Affymetrix CytoScan HD (41%) or 750K (59%) platforms in 1012 patients from the south of Brazil. The patients were mainly children with NDDs and/or congenital anomalies (CAs). A total of 206 CNVs, comprising 132 deletions and 74 duplications, interpreted as pathogenic, were found in 17% of the patients in the cohort and across all chromosomes. Additionally, 12% presented rare variants of uncertain clinical significance, including LPCNVs, as the only clinically relevant CNV. Within the realm of NDDs, ASD carries a particular importance, owing to its escalating prevalence and its growing repercussions for individuals, families, and communities. ASD was one clinical phenotype, if not the main reason for referral to testing, for about one-third of the cohort, and these patients were further analyzed as a sub-cohort. Considering only the patients with ASD, the diagnostic rate was 10%, within the range reported in the literature (8–21%). It was higher (16%) when associated with dysmorphic features and lower (7%) for "isolated" ASD (without ID and without dysmorphic features). In 953 CMAs of the whole cohort, LCSH (≥ 3 Mbp) were analyzed not only for their potential pathogenic significance but were also explored to identify common LCSH in the South Brazilians population. CMA revealed at least one LCSH in 91% of the patients. For about 11.5% of patients, the LCSH suggested consanguinity from the first to the fifth degree, with a greater probability of clinical impact, and in 2.8%, they revealed a putative UPD. LCSH found at a frequency of 5% or more were considered common LCSH in the general population, allowing us to delineate 10 regions as potentially representing ancestral haplotypes of neglectable clinical significance. The main referrals for CMA were developmental delay (56%), ID (33%), ASD (33%) and syndromic features (56%). Some phenotypes in this population may be predictive of a higher probability of indicating a carrier of a pathogenic CNV. Here, we present the largest report of CMA data in a cohort with NDDs and/or CAs from the South of Brazil. We characterize the rare CNVs found along with the main phenotypes presented by each patient and show the importance and usefulness of LCSH interpretation in CMA results that incorporate SNPs, as well as we illustrate the value of CMA to investigate CNV in ASD.


Cohort
The aim of this study was to investigate a significant cohort with developmental disorders from South Brazil.We collected a total of 1120 chromosomal microarray (CMA) read files that were performed by the Laboratório Neurogene in Florianópolis, Santa Catarina, Brazil, upon request by medical geneticists and neurologists for investigative/diagnostic purposes, primarily from the Joana de Gusmão Children's Hospital, but also from MDs from the University Hospital Professor Polydoro Ernani de São Thiago and from private clinics in Florianópolis, State of Santa Catarina, between 2013 and 2019.These include also 420 previously published cases 28,33 .Furthermore, 68 out of 1120 cases were excluded because they belonged to unaffected family members and 40 cases were excluded from the statistics of developmental disorders due to insufficient clinical information.The analyzed

Genomic analysis
The investigative CMA platforms used were CytoScan 750K (59%) and CytoScan HD (41%) and the resulting files were analysed using the Chromosome Analysis Suite (ChAS) Affymetrix software 4 , which is based on the reference genome sequence of the University of California, Santa Cruz database (https-//genome.ucsc.edu/cgi-bin/hgGateway) using the human genome version of February 2009 (GRCh37/hg19).The analysis was retrospective, with the use of the CMA runs obtained from a clinical diagnostic laboratory, with previous consent of the patients.
Typically, the filter criteria for interpreting CNVs for diagnostic purposes are sizes larger than 100 Kbp for deletions and larger than 150 Kbp for duplications, both containing at least 50 markers, according to ACMG recommendations 19,20 .However, since this is a research study, that aims to identify potential new genes involved in developmental disturbances, we reduced the filter parameters to > 10 Kbp for deletions and for duplications, both with at least ten markers.To interpret the CNVs, we followed the latest recommendations of the ACMG and the Clinical Genome Resource 21 .

CNVs interpretation and classification
To interpret CNVs, regarding their function, dosage effects (known haploinsufficiency or overexpression studies) and effects of mutations, the UCSC Genome Browser with integrated databases was widely used, mainly ClinVar (NCBI), DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans using Ensembles Resources), DGV (Database of Genomic Variants), OMIM (Online Mendelian Inheritance in Man), ISCA (International Standard Cytogenomic Array), dbGaP (Database of Genotypes and Phenotype), dbVAR (Database of Large Scale Genomic Variants), ECAR UCA (European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations), PUBMED (Public Medline), ClinGen (Clinical Genome Resource), MGI (Mouse Genome Informatics Database, from The Jackson Laboratory), SFARI (Simons Foundation Autism Research Initiative) and the private database CAGdb (Cytogenomics Array Group CNV Database).We also used the the Franklin platform 34 , based on Artificial Intelligence, as a tool for classification and interpretation of genomic variants using scores 21 .
The variants were classified into four types according to clinical interpretation as benign variants, variants of uncertain significance (VUS), likely pathogenic VUS (LPCNVs), or pathogenic variants (PCNVs), and the result in each case was assigned based on the CNVs of greatest clinical relevance detected in the genome of the patients 21 .
Variables like location, type and size of each CNV, the CNV classification, number of CNVs detected for each individual, age, gender, clinical descriptions (phenotypes), previous genetic testing results (karyotype, fragile X, etc.), and other relevant known clinical data to which we had access, were compiled (with coded identification) into simple Excel sheet for data handling with the R software [version 3.4.2](R Foundation for Statistical Computing).This was done to understand the phenotypic frequency, the diagnostic rates, the average age and the gender distribution in the cohort, the frequency of genomic changes in each chromosome and to find if there are any phenotypic clues related to a higher diagnostic probability by CMA (predictive phenotypes of a higher chance to be related to a pathogenic CNV), that eventually could allow selecting the cases that would benefit the most using CMA as a first-line test in settings of financial shortage.

Statistics
In the study, in addition to the descriptive biostatistical analysis, the univariate analysis (Fisher's test) was applied to identify eventual predictive phenotypes for a higher diagnostic result (greater chance of having a pathogenic CNV).To compare the mean sizes, amounts of covered genes and quantities of covered OMIMs genes in the CNVs, by type of CNV found, multivariate analysis such as mean comparison test (Tukey's Multiple test) was applied.A p-value less than 0.05 was considered statistically significant.

Selection and analysis of LCSH
The analysis and selection of LCSH followed the methodology outlined in Chaves & coworkers (2019), applying a threshold of ≥ 3 megabase pairs (Mbp) for the LCSH analysis.This threshold is typically used in clinical investigations, as opposed to population-based studies, where the cut-off threshold is usually considerably lower 24 .
All participants who had LCSHs satisfying the above criteria were included, regardless of whether they had or not a pathogenic CNV.

Automation of LCSHs analyses
For investigation of consanguinity and comparative LCSH analysis among cases as well as for calling potential UPD, all the LCSH reported in ChAS for each case were copied with coded identification and compiled into Excel sheets.For a more adequate and precise analysis the process was automatized and all LCSHs found in the cohort were imported into Google Colab (https:// colab.google/) and manipulated using the Python [3.10] programming language.The libraries used for data manipulation and analysis were Pandas [2.2.0] and NumPy [1.24.0] (for numerical computations).The code used for the analysis is available on the project's GitHub page: https:// github.com/ tiago chavo 87/ LCSH_ analy sis.

Analysis of consanguinity
The frequency of consanguinity in the cohort was calculated according to Kearney, Kearney and Conlin (2011).In short, when the homozygous patterns suggested inbreeding, all the regions of homozygosity ≥ 3 Mbp distributed throughout the chromosomes were added, with exception of the LCSH located on the sex chromosomes; the total sum in Mbp being divided by the size of the autosomal genome, 2.881 Mbp (GRCh37/hg19).The percentage obtained was correlated with the inbreeding coefficient (F), which is: 25% (first grade; 1/4-parent/ child or full siblings), 12.5% (1/8-second grade: half siblings; uncle/niece or aunt/nephew; double first cousins; grandparent/grandchild), 6% (1/16-third grade: first cousins), 3% (1/32 fourth grade: first cousins once removed), 1.5% (1/64-fifth grade: second cousins), < 0.5% (1/128-seventh grade: third cousins) 24 .Kearney and co-workers emphasized that this is a crude calculation, likely to represent an underestimate of the actual homozygous proportion because of the applied threshold of LCSHs over 3 Mbp and because the CMAs may not have SNP probes in certain regions like the acrocentric short arms and the centromeric regions.On the other hand, depending on the degree of inbreeding in the population, these correlations eventually could overestimate the direct kinship relation of the proband.

Uniparental disomy (UPD)
When only LCSHs 3 to < 5 Mbp were present in the genome, but in one single autosomal chromosome the sum of two or three LCSHs (< 5 Mbp) exceeded 10 Mbp, the homozygous regions were considered a potential isodisomy resulting from a uniparental disomy (UPD) event that underwent previous recombination.When one or more LCSH over 5 Mbp was present in a single chromosome with a size or sum (in the case of multiple LCSHs) ≥ 10 Mbp, it was considered a potential UPD (regardless of eventual LCSHs ≤ 5 Mbp on other chromosomes).If more chromosomes had LCSHs over 5 Mbp, it was not regarded as a potential UPD case 28 .
The ChAS software does not recognize homozygosity, but the absence of heterozygosity named there as loss of heterozygosity (LOH).This includes hemizygous regions generated by a larger deletion.Therefore, all cases with LOHs ≥ 10 Mbp in size on a single autosomal chromosome, regardless of the presence of an additional chromosome with LOH(s) over 10 Mbps in size (or sum of sizes), were manually reviewed, to eliminate the confounding effect of eventual hemizygous regions to call LSCHs and ultimately an UPD.

Analysis of the most frequent LCSH
Of the 953 files available for LCSH analysis we selected the 917 microarrays for the cytobands that most frequently showed regions with LCSH ≥ 3 Mbp on an autosomal chromosome, and those LCSHs present in more than 5% of individuals were considered common LCSH.This percentage was chosen because the frequency of ≥ 1%, which is the usual threshold to define common polymorphisms of SNPs in a population, was not considered applicable here because this is an affected cohort.Also, others have chosen the same threshold (or lower) to consider LCSH found in an affected cohort as a common variation, likely lacking clinical significance for their analysis [35][36][37][38][39] .Hence, in doing so, we believe to have an adequate safety margin for selecting common LCSH due to ancestral haplotypes rather than due to consanguinity or other pathogenesis-related mechanisms.
To delineate a more accurate genomic position for the most frequent LCSH, the shared homozygous sections were superimposed, and their genomic positions obtained based on the median of their beginning and end.
Previous karyotyping results were available for 182 patients, with 122 normal and 60 abnormal results (for which CMA was requested to identify the specific sequences involved).However, for most patients no information about previous genetic assessments was available.
From the 1012 microarrays, a total of 7150 CNVs which fulfilled the filtering criteria were selected; 3747 duplications and 3403 deletions which were interpreted and classified into benign CNVs, pathogenic CNVs (PCNVs), variants of uncertain clinical significance (VUS) and likely pathogenic CNVs (LPCNVs).

Phenotypic characterization
Out of the 1012 cases, four were excluded from the phenotypic characterization due to the unavailability of clinical data.
The cohort is mostly characterized by individuals with neurodevelopmental impairment (85%), and 83% of cases had ID and/or DD.In 56% of cases only DD was present while ID was described in 33%.It should be noted that 420 (42%) were under 5 years of age, which is below the age range for intellectual disability diagnosis.

Phenotypic characterization for cases with ASD
Cases with ASD represent 33% of our cohort, these 333 cases, 77 (23%) were under 5 years of age, below the age for diagnosis of ID, and of these, 17 (22%) had DF.Of the other 256 individuals 5 years or older, 68 had ID, of which 36 also had DF; 43 had only DF, and 145 had "isolated" autism (without ID and dysmorphic features (from Facial dysmorphisms to CAs, see cohort in methodology).
Of the 262 male cases, 59 (53%) were below age 5, the diagnostic age for ID, and of these 12 had DF.Of the 203 male cases, aged 5 or more, 53 presented ID, and of these 29 had DF, whereas 150 (74%) had no ID of which 30 presented DF and 120 presented what we call "isolated" autism.
Of the 71 female ASD cases, 18 (25%) were under age 5, and of these 5 had DF.Of the 53 females aged 5 or more, 15 had ID, and of these 7 also had DF, 38 (72%) had ASD without ID, 13 of them with dysmorphic features (DF) and 25 of them presenting what we call "isolated" autism.
In Fig. 1 we summarize the phenotypic characterization of the cases that presented ASD in the cohort.

Other phenotypes
In addition to the main neurodevelopmental phenotypes, most individuals have syndromic features (56%) such as congenital anomalies or malformations or atypical (dysmorphic) facial features (47% of the cohort).Psychiatric or behavioral problems, variations in height or body weight were less frequent accompanying phenotypes.
The phenotypic characteristics recorded in our cohort are listed in Table 1.
Figure 1.summary of the phenotypic characterization of the cases with ASD.Out of the 204 pathogenic CNVs, 119 were deletions, resulting in only one copy of the involved sequence, except for case #713.The deletion in this case involved a genomic region of the boy's single X sex chromosome.And six cases (#81, #255, #331, #646, #927 and #1109), along with a pathogenic deletion, also presented VUS.
The other 74 pathogenic CNVs were duplications, which usually result in a total of three copies of the involved sequence, but in eight males (#24, #25, #116, #151, #30, #455, #807 and #809) involved a relevant region of a sex chromosome and resulted in two copies (the main reason for pathogenicity is the fact that in males none of the duplicated copies on X undergoes inactivation, which it does in females) and in five cases (#306, #422, #443, #511 and #620) the CNV found was in a state of four copies.Figure 2 illustrates the frequency and number of pathogenic CNVs found per chromosome.
Univariate analysis (Fisher's test) indicated the predictive phenotypes for a higher diagnostic outcome (greater chance of having a pathogenic CNV) in our cohort with DNNs: Developmental delay (p-value ≤ 0.001, OR = 0. Following the scoring system, another 155 rare CNVs were interpreted as 141 Variants of uncertain significance (VUS) (Supplementary Table 1) and 14 as Likely Pathogenic CNVs (LPCNVs) (Table 4), these being the main findings in 13% of the cohort.Of these, 102 are duplications and 53 are deletions.In cases #635, #658, #929 2 VUS were detected and in cases #649, #937, 3 VUS.These variants were found on most chromosomes except for 21 and 22 (see supplementary information 1-VUS per chromosome), with sizes from 30 Kbp to 8 Mbp (SD = 1266, mean = 802) and contained 1 to 87 genes (SD = 13, mean = 9), of which 1 to 38 (SD = 5 mean = 5) are genes cited in the OMIM database (OMIM genes) (see supplementary information 2). Figure 2 illustrates the frequency and amount of VUS per chromosome (in track 2).Fourteen VUS, according to the scoring system were found to be LPCNVs (Table 4).
All other CNVs were interpreted as either common genetic polymorphisms or benign variants found in all chromosomes, with sizes that varied from 10 Kbp to 24 Mbp (SD = 586, mean = 298) and contained zero to 227 genes (SD = 8, mean = 3), of which zero to 144 (SD = 4 mean = 1) are genes cited in the OMIM database (OMIM genes) (see supplementary information 2).

Diagnostic rate and interpretation of CNVs for cases with ASD
When analyzing separately the 333 CMAs from patients where ASD (including all definitions of the spectrum) was cited as the main reason for referral or as one of several phenotypes of the patient, a total of 3259 CNVs that met the filtering criteria were detected.Of those 1494 were duplications and were 1765 deletions, most of them interpreted as benign.In 33 CMAs no CNVs meeting the filtering criteria were detected.The frequency of the most relevant type of CNV found in each case in the whole cohort and the sub-cohort with ASD is illustrated in Fig. 3A1, A2.The proportional contribution of each type of CNV per subclass of ASD is illustrated in Fig. 3B.In 10% of cases (33/333) we identified a total of 38 rare CNVs that were interpreted as pathogenic (Table 3), 22 deletions and 16 duplications.The particularities of cases #511, #594 and #737, with 2 PCNVs, cases #455 (Y Chromosome), #809 (X chromosome) and cases #443 and #511 (PCNV in a four-copy state) were mentioned before.
In cases with ASD, DF and ID, the diagnostic rate was 14%, and for ASD with ID, but without DF, it was 12%.For "isolated" ASD, the diagnosis dropped to 7%.
All other CNVs were interpreted as either benign or common genetic polymorphisms, submicroscopic variants found in all chromosomes, with sizes that varied from 10 Kbp to 24 Gbp (SD = 870, mean = 228) and contained zero to 181 genes (SD = 9, mean = 3), of which zero to 96 (SD = 4 mean = 1) are genes cited in the OMIM database (OMIM genes) (see supplementary information 3).

Long contiguous stretches of homozygosity in the samples
In total, 953 CMA results whose files were available and accessible for the LCSHs study were analyzed.The majority (91%) of CMAs had at least one autosomal LCSH (≥ 3 Mbp), resulting in a total of 3445 LCSH identified in Table 4. Likely pathogenic CNVs found in the cohort.Likely pathogenic CNVs (LPCNVs), found in the cohort, with the number of genes present in the region, listing some of the relevant genes and available phenotypes for each case.Dup duplication, Del deletion, CAs congenital anomalies, DD developmental delay, ID non-specified intellectual disability, mildID mild intellectual disability, ModID moderate intellectual disability, SevID severe intellectual disability, ASD autism spectrum disorder, FD facial dysmorphisms, SLD speech and/or language delay/impairment, IUGR intrauterine growth restriction, ADHD attention-deficit/ hyperactivity disorder, LD learning difficulty, ASD autism spectrum disorder, F female, M male.As can be seen in (B), when comparing ASD with ID to ASD without ID, the diagnostic rate (12% and 10% respectively of PCNVs) is a little higher when ID is present.However, the presence of VUS is 5% higher when ID is present (19% compared to 14% in ASD w/o ID).Syndromic ASD definitively has a much higher diagnostic rate (16%) than non-syndomic ASD (7%).

LCSH leading to suspected UPD
In 27 individuals (~ 2.8%) of the 953 CMA analyzed, which include 11 previously published cases 28 the LCSH suggested a potential UPD (Table 5 and Fig. 4).
Clinically more relevant first-to-fifth-degree kinship was suggested by ~ 11.5% of cases.

LCSH with frequency ≥ 5%
Due to the scarcity of information about common LCSH in the Brazilian population in previous work we decided to explore the data from this affected cohort to identify frequent LCSH in the population of Santa Catarina, which we consider to potentially be non-causal for the developmental issues of the patients 28 , and now we revise the findings with a larger sample.The frequency of 5% or more to consider a recurrent LCSH as a common finding in the population of southern Brazil was decided on an empirical basis.This threshold was established to ensure a significant safety margin compared to the 1% threshold used for considering a Single Nucleotide Polymorphism (SNP) as a common variant in the population.This choice was made because analyzing an affected population can introduce bias.However, it is still possible that certain autozygous haplotypes act in conjunction with other genetic variations to manifest the phenotype.
The LCSH identified as frequent, potentially representing regions of low recombination that can maintain ancestral haplotypes identical by descent, are shown in Table 7 and Fig. 5.
It is important to highlight that out of the 173 cases with pathogenic CNVs, 32 cases had a previous abnormal karyotype result, which prompted the CMAs to identify the DNA sequences involved.Excluding the 32 cases with known abnormal karyotypes, the diagnostic rate drops to 14%.The chromosomal microarray (CMA) was essential in discovering altered sequences in abnormal karyotype results, offering unexpected insights into discrepancies compared to what a karyotype suggests.The CMA allows for scrutiny, and sometimes it reveals deletions in chromosomes where the karyotype suggests additions or additions when the karyotype suggested deletions.
In our previous work, which includes part of the current cohort, we extensively discussed the usefulness of classical karyotyping as a complement to CMA results (and vice-versa), exemplified by 17 cases with altered chromosomal results and their respective PCNV findings, including the case #687 illustrated above 33 .We can only underscore the importance of having both classical karyotype results and CMA results.They provide valuable clues about the processes leading to pathogenic changes and are crucial for genetic counselling 53,54 .Unfortunately, as CMA testing becomes more prevalent, classical karyotyping is performed less frequently, everywhere.They should at least be conducted for the child and parents when results indicate a pathogenic CNV or a potential UPD.Achieving this goal is desirable, but unattainable in most (not privileged) settings.Few cases will have access to both investigations, and even fewer will have the opportunity to investigate parents and other family members.

CNVs
Our analysis revealed pathogenic CNVs across all human chromosomes, with more than one causative variant identified in 15% of individuals.Deletions accounted for the majority (64%) of all detected pathogenic variants, consistent with the findings of others 55 , whereas for VUS the deletions represented only 34%.
The sizes of the PCNVs, the number of genes they covered, and the number of OMIM genes associated with these CNVs to those of the VUS and non-causative (benign) CNVs, show a statistically significant difference with P < 0.0001 (according to Tukey's Multiple test) (Fig. 3A1 and Supplementary information 2).This is comprehensible, since larger CNVs, with more genes, in particular with more genes related to disease or known to drive important cellular processes will have a higher impact, which tends to be greater for absence of gene copies than for their excess.
Vol:.( 1234567890 www.nature.com/scientificreports/As depicted in the circus ideogram (Fig. 2), pathogenic CNVs tend to be situated near telomeres in most chromosomes.This is expected since subtelomeric regions are prone to rearrangements, given that only one chromosomal breakpoint is required to initiate a submicroscopic abnormality 56 .
Pathogenic CNVs are also known as recurrent and non-recurrent.While non-recurrent pathogenic CNVs occur sporadically in the genome, with probable origins in replication errors or DNA repair mechanisms, they cover different gene contents and consequently present variable phenotypes [55][56][57] .Recurrent pathogenic CNVs, in turn, are associated with known and characterized microdeletion and microduplication syndromes.Recurrence of these CNVs is mediated by non-allelic homologous recombination between locus-specific low copy repeats (LCRs) 58,59 .

Phenotypic characterization
Characterizing phenotypes is a crucial step in investigating the genetic etiologies of developmental disorders, helping to identify the role of the genes involved, as Moeschler and Shevell's (2014) 60 emphasized in their systematic review about the investigation of children with global developmental delay and intellectual disability.
In our cohort, the phenotypic characterization revealed a predominance of phenotypes related to NDs, accounting for 85% of cases, similar to findings reported by others 55,59,61 , with 83% of the individuals presenting ID and/or DD.In 56% of cases DD was present, while ID was mentioned for 33%.Autism Spectrum Disorders were present in 33% of the cohort, in 14% of the cohort we had "isolated" ASD (without ID and without DF).It's worth noting that 42% of the cohort was under 5 years of age, which is below the typical age range for diagnosing ID and eventual deficits are diagnosed as DD.Nevertheless, even considering that many individuals with DD are not necessarily intellectually deficient, it is still possible to estimate the prevalence of Intellectual Disability (ID) by including individuals with both DD and ID, because it is known that most individuals with DD in early childhood will later receive a diagnosis of ID 62 .
Along with major neurodevelopmental phenotypes, many individuals exhibit syndromic features (56%), such as congenital anomalies or malformations, and most (47% of all) had atypical facial appearance (facial  With a larger sample than in our previous study, the univariate analysis confirmed our first findings, showing a significant association for the presence of pathogenic CNVs with autism spectrum disorders (in this case, with a lower presence), facial malformations/dysmorphisms and genitourinary anomalies/malformations. Obesity and short stature, that were significantly related as second relevant phenotypes when the cohort was smaller 33 , lost their significance in the now larger sample.Now developmental delay, intellectual disability, limb anomalies, low weight, heart anomalies/malformations and motor development delay gained in significance (see Supplementary Information 3).
However, even with such an extended sample, there is not one phenotype or group of neurodevelopmental or malformation phenotypes with sufficiently robust evidence as to justify a preferential CMA testing decision.Additionally, we are aware of our limitations in obtaining standardized phenotype data.This is mainly because there is no standardized phenotype collection and annotation among medical doctors, most of whom are not geneticists and have limited access to genetic tests for follow-up genome sequencing or mutation investigation.
In the State of Santa Catarina, which has approximately the size of Hungary and close to 7.6 million inhabitants, there are only a few (about five) medical geneticists, most of whom practice in Florianópolis, the state capital.Consequently, many patients come from distant areas or are referred for testing by medical doctors outside the main city, without the opportunity to consult with a medical geneticist.A comprehensive and standardized reassessment in all cases, which is currently beyond our capabilities, would be crucial for confidently confirming the phenotype findings and, not to mention, aiding in the interpretation of the CNVs found.

ASD cases
For the 333 cases of cohort who were diagnosed within the ASD, the ages ranged from a few months to 34 years, with a male predominance of 3.7:1.This is interesting, because when considering the male to female ratio of the whole cohort, the proportion is 1.55:1 and when the cases that mention ASD phenotypes in the clinical description are excluded, the male to female ratio is 1.1:1.We are aware that the cases did not undergo a standardized clinical assessment for ASD.However, the ratio of about 4 M:F is well established in the literature, and has led to specific reviews on sex differences in ASD [63][64][65][66][67][68] .
Based on the clinical data which we could obtain, 29% of the individuals (79 aged 5 or more; 17 under 5 years of age) of our ASD cohort also had dysmorphic features (DF), a term that we used to include facial dysmorphia and/or congenital anomalies.When DF were present, we considered them to be syndromic ASD cases, that could have ID or not.Like the diagnosis of ASD, the diagnosis of ID did not follow a standardized protocol.Some individuals underwent detailed cognitive tests, and others were diagnosed by doctors based of several criteria, this can be seen on Tables 1 and 2, where in most cases only ID is mentioned, without the degree of the ID (mild, moderate, severe).Within the 256 individuals with ASD aged 5 or more, 68 (27%) had some degree of ID.Isolated ASD, which we use to define the non-syndromic patients without ID, comprised 44% (145/333) of the cohort.
There are wide differences within the published prevalence of ID among autistic individuals, Chiurazzi et al. (2020) 71 mentions a coexistence of 70% of cases with ASD with ID, while 40% of cases with ID have ASD 72 .The Autism and Developmental Disabilities Monitoring Network (ADDM) funded by the CDC, states that about one third of individuals (35.2%) of the ASD spectrum also have some degree of ID (CDC-Autism Spectrum Disorder, last reviewed December 15, 2022).
There are sex differences among the subclasses of ASD.Whereas the male:female ratio for the whole ASD cohort is 3.8:1, for syndromic ASD it is 2.9:1.In syndromic ASD with ID it is 4.1:1; syndromic ASD w/o ID, 2.3:1.For non-syndromic with ID it is 3:1, and for isolated Autism (non-syndromic w/o ID) it is 4.8:1.
Within the 35 cases with pathogenic CNVs, 4 were among the 9 patients that had previous abnormal karyotype results, for which the CMA test was requested to identify the DNA sequences involved.Excluding the 4 cases with known abnormal karyotypes, the diagnostic rate drops to 9%, however, the diagnostic yield was considered 10% because the CMA was essential to discover the altered sequences in the abnormal karyotype results.
Among the rarer findings, based on the SFARI database we have: Case #66, carrying a 22 Mbp microduplication at 15q25.1q26.3(80,304, When it comes to submicroscopic chromosomal alterations, both deletion and duplication of CNVs can result in decreased gene expression by gene disruption, whether gene duplications can also lead to overexpression of genes.
As discussed by Velinov 94 , the detection and interpretation of recurrent CNVs, which are often associated with ASD, facilitates post-test genetic counseling, since one can safely conclude the genetic etiology by associating the CNVs with the clinical characteristics of the patient.In most cases, particularly when the parents are unaffected, it is more likely that pathogenic CNVs have their "de novo" origins.This occurs due to events such as errors during meiotic recombination, early illegitimate mitotic recombinations, or due to repairs to DNA double-stranded breaks during the first divisions of embryonic cells 95 .
On the other hand, pathogenic CNVs can also originate from the consequences of a balanced chromosomal translocation in the genome of the parents, according to Nowakowska et al. (2016) 96 , it is advisable to test the parents of individuals with large pathogenic CNVs, through the classic karyotype, since that balanced translocations cannot be identified by CMA and carry a high risk of recurrence.
Vol:.( 1234567890 Although the diagnostic rate for several phenotypic groups was higher than the 10% of diagnostic rate found in the ASD cohort, only the diagnostic yield of 16% for syndromic ASD was confirmed as significant by univariate analysis (p ≤ 0.05, OR = 2.43) (Fig. 3C).
Several studies have investigated the diagnostic yield of CMAs and genome sequencing techniques in cohorts with neurodevelopmental disorders and, even though with a large diagnostic variation when whole genome or exome sequencing is applied, syndromic patients tend to have significatively higher probability for a positive diagnostic result 33,97,98 .Specifically for ASD, the mean diagnostic yield is usually lower than for a typical neurodevelopmental cohort.However, among autism subtypes, higher diagnostic usually occurs when ASD is syndromic accompanied with other features and is syndromic (or complex) ASD 78,99 .

LCSHs
In 2006, Li et al. (2006)  35 , indicated that LCSH were more common in the human genome than was considered at the time and that they could have an impact on many fields of genetic studies.We now know that LCSH are one of the most common types of genomic traits in humans, being observed throughout the human genome as a consequence of inbreeding or evolutionary forces 22,26,[100][101][102] .
Previously we described the analysis LCSHs in 430 cases that are part of this cohort 28 .Now, considering the whole cohort, we found that 91% of the individuals have at least one autosomal LCSH ≥ 3 Mbp as revealed by their CMAs tests.
Potential UPDs were found in 2.8% of the CMAs of the cohort, similar to the 2.6% we found in or previous work 28 .The frequency of potential or confirmed UPDs found among published cohorts varies largely among studies.Investigating 214,915 trios, from the 23andMe sequencing dataset, representing a non-clinical general population, the authors found 105 cases of UPD estimating that UPD occurs with an overall prevalence rate of roughly 1 in 2000 births or 0.05% 103 .The frequency of UPDs found in studies that used exome sequencing of patient-parent trios of large clinical populations for all sorts of genetic conditions is higher and oscillates between 0.2 and 0.6% [104][105][106] .The investigation for UPDs with whole genome sequencing of 164 parent-child trios in a more selected cohort, an Irish cohort with rare disorders, found 3 UPDs a frequency of 1.8% 105 .
We want to emphasize once again that CMA technology can only detect UPD regions in cases of isodisomy; it cannot identify UPDs with total heterodisomy.In a complete UPD, whether it's isodisomic, iso/heterodisomic, or entirely heterodisomic, both homologous chromosomes will exhibit the gende-specific imprinting of the sole transmitting parent across their entire length.It's also important to remember that long, uninterrupted stretches of homozygosity may also result from homologous repair through a breakage-induced DNA replication mechanism, which, in contrast, can originate segmental UPDs 110 .
When considering the processes that lead to UPD, it's worth noting that among the 27 cases with LCSH suggesting a potential UPD, eight also had PCNVs that were either considered responsible or partially responsible for their clinical conditions.Additionally, three presented VUS, including two with LPCNVs.
One exception is case #584, which had a PCNV spanning 2.8 Mbp (4×) and overlapped with approximately 1 Mbp of the homozygous region associated with the putative UPD, whose complex origin hints to a real segmental UPD.All other CNVs were located on chromosomes unrelated to the identified UPD.We did not detect any traces of mosaicism involving the affected chromosome in any of the cases, which could have suggested a trisomy rescue.
When a potential UPD is found on one of the chromosomes related to imprinting disorders, like chromosomes 6, 7, 11, 14, 15 or 20, and the phenotype of the patient fits the potential imprinting disorder phenotype, the follow-up is straightforward 111,112 .However, most often the UPDs are on chromosomes without imprinted regions and sequencing of the isodisomic region should be considered because it often unmasks a homozygous deleterious variant inherited from a heterozygous parent 107 .
Out of the 27 potential UPD cases identified in our study (Table 5 and Fig. 4), only seven were associated with chromosomes known for imprinting disorders 110 .Cases #169 and #346 on chromosome 7, as well as case #312 on chromosome 14, have been previously discussed 28 .Among the cases with potential UPD-like LCSH patterns on chromosome 11, case #633 has a PCNV identified as the causal factor for its clinical condition, and cases #569 and #628 do not exhibit the hallmark phenotypes typically associated with Beckwith-Wiedemann overgrowth syndrome caused by UPD (11)pat or Silver-Russel Syndrome caused by UPD(11)mat.The same is true for case #907 on chromosome 20, whose available phenotypes do not correlate at all with the imprinting disorders of these chromosome.

Consanguinity
Approximately 24% of the CMAs revealed an LCSH pattern suggesting a distant familial connection (sixth or seventh degree) among the parents of patients affected by NDs.As we've previously mentioned, these findings may be indicative of regional immigration patterns and intermarriage among immigrants in southern Brazil.When the relationship suggested by the LCSH is distant and more associated with the endogamous characteristics of the population, the likelihood of clinical significance decreases.
More significant is the fact that in 11.5% of the CMAs, the LCSHs indicated a first to fifth-degree parental relationship between the parents.These cases are more likely to have a clinical impact because the closer the parentage, the higher the proportion of shared alleles, increasing the risk of inheriting two copies of an autosomal recessive (AR) mutation 24 .We provide an in-depth discussion of the impacts and relevance of these findings in a previous publication 28 6, two patients exhibit homozygosity, indicating potential first-degree relatedness among their parents.These results are communicated to the referring physicians by the diagnostic laboratory.It is the responsibility of these physicians to follow the appropriate protocols for these cases.
For one patient (#1068) where a second-degree relatedness is suggested among his parents (Table 6) a PCNV was identified in chr 15 (Table 2).This patient presents a complex syndromic phenotype that extends beyond the typical manifestations usually associated with this deletion, which are mainly related to ASD, DD and behavioural issues, suggesting the participation of a causal autosomal recessive development gene.

LCSH considered common (frequency ≥ 5%)
As extensively discussed in Chaves et al. (2019) 28 , identifying and knowing the most common (recurrent) LCSH allows us to focus the analysis on the most clinically significant LCSH.Following the same reasoning and criteria of our initial study, in this new analysis, we have identified ten LCSH ≥ 3 Mbp occurring at a frequency of 5% or higher, thus considering these LCSH as a possible common variation in our population.All LCSH, except for 19q13.2-q13.31(40,357,663-44,200,928), which was identified as frequent in our dataset (Table 7) have been previously recognized as common LCSH by other research groups in clinical investigations involving patients with developmental disorders 28,[36][37][38][39]108 , including our previous work. Thee LCSH are typically considered low recombination regions, representing blocks of ancestral haplotypes, and are generally interpreted as potentially non-pathogenic.
Wang et al. (2015) 37 identified several of these regions as recurrent LCSH without clinical relevance in a cohort of patients with NDDs, including unaffected parents.Kearney HM 39 reported them as findings occurring at a frequency > 5% in CMA readings (CytoScan HD, Affymetrix) from affected individuals.Sanchez P 38 in an analysis of a cohort of 278 affected Hispanics reported LCSH as common when their frequency exceeded 3% in CMA samples (CytoScan HD, Affymetrix).Neta et al. (2022) 108 reported the region we found on chromosome 16 as occurring at a frequency of 12.7% in a cohort of 100 patients with ID and/or ASD from the Midwest region of Brazil.Pajusalu et al. (2015)  36 reported similar findings to ours on chromosomes 3 and 11 as recurrent LCSH with frequencies of 9.3% and 6%, respectively, using a minimum cutoff size of 5 Mbp, in the investigation of 2110 consecutive Estonian patients (including prenatal care and parenting samples).
The LCSH considered frequent and common in the current study not only support the findings and discussions of our previous research but also raise the possibility that our threshold of considering LCSHs only at a frequency ≥ 5% could be too conservative.It might be a relatively safe alternative to consider a lower threshold, such as LCSHs with a frequency above 4% or 3%, as Sanchez P 38 did.

Conclusions
In this retrospective study, we present the largest report of microarray chromosome data (CMA) in a cohort with neurodevelopmental disorders (NDDs) and/or congenital anomalies (CAs) from Southern Brazil.We achieved a diagnosis rate of 17%, consistent with the literature (15-20%).We characterized the rare copy number variations (CNVs) that we identified and associated them with the main phenotypes presented by each patient.The interpretation of CNVs is challenging and relies on information such as frequency and characterization in affected populations, typically obtained from cohort studies with significant sample sizes.

Figure 2 .
Figure 2. Circle plot with the pathogenic CNVs and VUS* detected in our study.

Figure 3 .
Figure 3. (A1) Classification of cases per most relevant CNV found in the whole cohort.(A2) Classification of cases per most relevant CNV found in the sub-cohort with ASD.(B) Diagnostic rates per ASD phenotypic categories.ASD autism spectrum disorder, ID intellectual disability, DF dysmorphic features (syndromic), classical autism (including ASD cases high functioning isolated ASD), isolated ASD: ASD without ID and without DF/CAs.(C) Odds ratios for pathogenic CNVs in classes of phenotypes.Odds ratios shown in log2 scale.As can be seen in (B), when comparing ASD with ID to ASD without ID, the diagnostic rate (12% and 10% respectively of PCNVs) is a little higher when ID is present.However, the presence of VUS is 5% higher when ID is present (19% compared to 14% in ASD w/o ID).Syndromic ASD definitively has a much higher diagnostic rate (16%) than non-syndomic ASD (7%).

Table 5 .
2.8 Mbp (× 4), partially overlapping with this probable UPD.* Male, 2 yrs., low weight, short stature, FD, DD, mongolian spots, poor ear development, SLD, ASD, disturbed behavior, agressive Cases with potential UPDs, where a single autosomal chromosome presented LCSH(s) over 3 Mbp, that that alone or in addition of LCSHs ≥ 3 Mbp reached a size of ≥ 10 Mbp with no other LSCH over 5 Mbp on any other autosomal chromosome.Identified in previous work *(Chaves et al., 2019) 28 .

Figure 4 .
Figure 4. Chromosomal distribution of the 27 cases with LCSH (single or sum) ≥ 10 Mbp restricted to one chromosome, suggesting putative UPDs.

Figure 5 .
Figure 5. Visualization of the chromosomal locations of the LCSHs in autosomal chromosomes considered common (frequency ≥ 5%) identified among 917 CMA results.

Table 7 .
Regions of LCSH considered common (frequency ≥ 5%) identified among 917 CMA results.When the beginning and/or end of the cytobands were variable, a linear position was obtained based on the median of the beginning or end.All analyses, as well as linear positions, were based on the human reference genome, version GRCh37/hg19.(a) Chaves et al. 2019, (b) Wang et al. 2015, (c) Kearney H. M. (personal communication, 2017), (d) Sanchez P. (personal communication, 2017), (e) Pajusalu et al. 2015, (f) Neta et al. 2022.The bolded LCSH was only found in our study. 33

Table 1 .
The clinical characteristics recorded for patients with negative (only benign CNVs) and pathogenic (only PCNV) CMA results.

Table 3 .
33thogenic CNVs found in the ASD Cohort.Includes ASD cases of the cohort previously published33.Pathogenic CNVs found by CMA in the cohort with ASD, with the number of genes present in the region, listing the most relevant genes and phenotypes for each individual.Dup duplication, Del deletion, CAs congenital anomaly, DD developmental delay, MildID mild intellectual disability, ModID moderate intellectual disability, SevID severe intellectual disability, ASD autism spectrum disorder, FD facial dysmorphism, SLD speech and/or language delay or impairment, IUGR intrauterine growth restriction, ADHD attention-deficit/hyperactivity disorder, LDO learning difficulty only, LD learning disability, ND not determined, F female, M male, 1 of 2 pCNVs 1 of 2 patogenic CNVs from one individual.

Table 6 .
Details the results referring to the 4.3% of cases that suggested kinship from first to fourth grade.LCSH with frequency ≥ 5%.