Exon-focused targeted oligonucleotide microarray design increases detection of clinically relevant variants across multiple NHS genomic centres

In recent years, chromosomal microarrays have been widely adopted by clinical diagnostic laboratories for postnatal constitutional genome analysis and have been recommended as the first-line test for patients with intellectual disability, developmental delay, autism and/or congenital abnormalities. Traditionally, array platforms have been designed with probes evenly spaced throughout the genome and increased probe density in regions associated with specific disorders with a resolution at the level of whole genes or multiple exons. However, this level of resolution often cannot detect pathogenic intragenic deletions or duplications, which represent a significant disease-causing mechanism. Therefore, new high-resolution oligonucleotide comparative genomic hybridisation arrays (oligo-array CGH) have been developed with probes targeting single exons of disease relevant genes. Here we present a retrospective study on 27,756 patient samples from a consortium of state-funded diagnostic UK genomic centres assayed by either oligo-array CGH of a traditional design (Cytosure ISCA v2) or by an oligo-array CGH with enhanced exon-level coverage of genes associated with developmental disorders (CytoSure Constitutional v3). The new targeted design used in Cytosure v3 array has been designed to capture intragenic aberrations that would have been missed on the v2 array. To assess the relative performance of the two array designs, data on a subset of samples (n = 19,675), generated only by laboratories using both array designs, were compared. Our results demonstrate that the new high-density exon-focused targeted array design that uses updated information from large scale genomic studies is a powerful tool for detection of intragenic deletions and duplications that leads to a significant improvement in diagnostic yield.

The number and size of CNVs detected using the two array designs were analysed for each classification (Figure 3.1) for the two labs that used the v2 and v3 arrays. The CNVs within the benign classification are typically smaller in size than those in more pathogenic classifications, culminating with a predominance of large CNVs in the pathogenic classification. This is to be expected, as the impact of large CNVs is more likely to interrupt 'normal' gene function, leading to a clinical phenotype.
Comparing CNV lengths across the different classifications between the two array designs reveals relatively concordant data, with the largest discrepancies evident in the VOUS and pathogenic classifications. In the VOUS classification, a larger proportion of smaller CNVs were detected by the CytoSure v3 array when compared to the CytoSure v2 array. In the pathogenic classification, there are markedly more 'larger' CNVs detected using the CytoSure v2 array than evident using the CytoSure v3 array. Supplementary Note 5. Summary of verification of the OGT CytoSure Constitutional v3 array CGH, Wessex Regional Genetics Laboratory, Salisbury NHS Foundation Trust, Salisbury, UK

Estimation of general sensitivity and specificity for array CGH testing of prenatal samples
The data used in this validation includes only samples that were • Analysed and found to be normal by array-CGH which was confirmed by parallel conventional karyotyping analysis • Analysed and found to have an imbalance by array-CGH which was then confirmed by conventional karyotyping, FISH, MLPA or parental follow up.

Results:
In total, 131 prenatal samples have been analysed using array CGH.

Results:
Technical noise for normal call (2:2 allele ratio), exclusive of any biological variation SD = 0.13 -2.62 to 0.262 (95%) sample type. Deming's regression analysis used to assess differences between variance of data set.

Supplementary Note 4. Data acquisition and filtering.
Data was acquired, with the permission of the participating laboratories, using a plug-in (software component) for CytoSure Interpret Software. The data was retrieved from each laboratory's relational database in four separate files as detailed below.

CNV_CALLS file
Contains all calls in the database as a separate line item. Each line of the text file contains: • Anonymised sample ID: A random unique string of characters, produced at runtime. No identifiable information is included in the dataset.

QC_METRICS file
Contains QC metrics of the array runs in the database, including: • Array barcode • QC metrics for the array (DLR spread, signal intensity, background noise, etc.)

PROTOCOLS file
Contains descriptions of every analysis protocol available in the software, including details such as: • Number of probes required to make a call • Segmentation algorithm used • Thresholds used for defining a segment as loss or gain

ANONYMISED_SAMPLE_IDS
Contains mappings from the anonymised sample ID and any original sample ID's used by the lab. This file is not used in any analysis and is kept by the contributing lab, in order to match up any anonymised sample information if deemed appropriate.

DATA FILTERING
The initial data set obtained from the four participating laboratories comprised 31,314 unique, anonymised samples with a total of 182,348 calls. This study focuses on post-natal proband samples only and an initial filtering step was applied to exclude samples which fell outside of this criterion. These included samples run on alternative array formats (17 CNVs) and samples that were not compared against a known reference (i.e. sample-on-sample data; 4,038 CNVs).
83,020 calls classified by the laboratories as "N/A" were also removed from subsequent analyses. These calls comprise artefacts, which are known exceptions reflective of the array build, Y-chromosome calls and any unclassified calls (calls where no final classification has been assigned, e.g. on failed samples).
After filtering, the final data set comprised 95,273 calls from 27,756 unique samples, hereafter referred to as copy number variations (CNVs) to differentiate from unfiltered data, referred to as calls. This represents a reduction of 47.75% of the total calls and 11.36% of the samples.

Supplementary Data 1. Description
Full dataset containing pathogenic and likely pathogenic intragenic CNVs identified by the CytoSure v3 oligo-array in 27,756 unique samples that would have been missed by the v2 oligo-array.