High-resolution structural variant profiling of myelodysplastic syndromes by optical genome mapping uncovers cryptic aberrations of prognostic and therapeutic significance

Chromosome banding analysis (CBA) remains the standard-of-care for structural variant (SV) assessment in MDS. Optical genome mapping (OGM) is a novel, non-sequencing-based technique for high-resolution genome-wide SV profiling (SVP). We explored the clinical value of SVP by OGM in 101 consecutive, newly diagnosed MDS patients from a single-center, who underwent standard-of-care cytogenetic and targeted NGS studies. OGM detected 383 clinically significant, recurrent and novel SVs. Of these, 224 (51%) SVs, seen across 34% of patients, were cryptic by CBA (included rearrangements involving MECOM, NUP98::PRRX2, KMT2A partial tandem duplications among others). SVP decreased the proportion of normal karyotype by 16%, identified complex genomes (17%), chromothripsis (6%) and generated informative results in both patients with insufficient metaphases. Precise gene/exon-level mapping allowed assessment of clinically relevant biomarkers (TP53 allele status, KMT2A-PTD) without additional testing. SV data was complementary to NGS. When applied in retrospect, OGM results changed the comprehensive cytogenetic scoring system (CCSS) and R-IPSS risk-groups in 21% and 17% patients respectively with an improved prediction of prognosis. By multivariate analysis, CCSS by OGM only (not CBA), TP53 mutation and BM blasts independently predicted survival. This is the first and largest study reporting the value of combined SVP and NGS for MDS prognostication.

labeled gDNA was mixed with DNA stain, stained overnight at room temperature for backbone visualization and quantified using Qubit HS dsDNA assay kit (ThermoFisher Scientific, CA). The fluorescent-labeled gDNA molecules were loaded on a Saphyr chip G2.3, and linear double stranded gDNA molecules passing across nanochannels were imaged sequentially on a Saphyr instrument (Bionano Genomics, San Diego, CA). Effective genome coverage of approximately 300X was achieved for every tested sample (1,300 GB data per sample), in theory, enabling detection of aberrations at a 5% allele frequency (equivalent to aberrations in 10% of cells when heterozygous). Standard run quality control parameters [total DNA (≥150 kbp), N50 (≥150kbp), map rate (≥150 kbp), effective coverage (>300x) and average label density (per 100kbp)] were evaluated per manufacturer's guidelines.

Data Analysis and Variant Filtering
Data was analyzed using Bionano Access (Bionano Genomics, San Diego, CA) using Genome Reference Consortium GRCh38/hg38 as the reference. Identification of SVs was based on discrepant alignment between the molecules of the sample (following assembly of consensus genome maps from molecule clusters showing the same SV) and reference (GRCh38/hg38), with no assumption about ploidy. For fractional CN analysis, following alignment of molecules/labels against GRCh38/hg38, sample's raw label coverage was normalized against relative coverage from normal human controls, segmented and baseline CN state was estimated [mode of coverage of all labels; coverage in sex chromosomes was halved if chromosome Y molecules were present]. CN states of segmented genomic intervals were assessed for significant increase/decrease from the baseline.
Data analysis was performed in a single-blinded fashion independently by 2 users using de novo (DN; for detection of SVs>500bp), rare variant (RV; for detection of SVs>5,000bp) and copy number pipelines (CN; for capturing large CNVs >500,000bp potentially missed by SV algorithms). RV pipeline enabled detection of SVs occurring at low allelic fractions (~10%). Based on prior sensitivity studies using simulations, serial dilutions and cell lines, a detection sensitivity of ~95% for SVs with an allele fraction of ~10% was achieved (data not shown). DN pipeline was primarily used for CN-LOH assessment and confirmation of SV calls>5,000 bp detected by RV pipeline; SV calls between 500 bp and 5,000 bp were not included for this study.
For variant filtering, as a first step, we used the recommended size and confidence score filters for each of the three pipelines for to generate a list of high confidence SVs and copy number variants for analysis described elsewhere (Supplementary Table S2) [7][8][9][10]. For the second step, we used the OGM data generated from 200 healthy individuals to select only the rare variants that represent pathogenic somatic alterations by filtering out the variants seen in normal population. Finally, as a third step, in order to select clinically significant SVs, we selected variants that overlapped the coding region of a gene/ chromosome locus implicated in myeloid neoplasm, adapted from the publicly available myeloid neoplasm-specific gene list (created through a collaboration between the Cancer Genomics Consortium and the Mayo Clinic (Genomics of Oncology Annotation Team: https://www.cancergenomics.org/gene_lists.php) and in-house 81gene NGS mutation panel (Supplementary Table S3). The final interpretation of every call was made after visualizing the sample molecules for changes in the sequence patterns compared to the reference.

Limit of Detection and Reproducibility/ Precision
SV detection by OGM is dependent on 3 inter-related parameters: size (bp), type and clonal burden of the aberrations that can influence the limit of detection (LOD). Evaluation of LOD is challenging due to the limited number of cells available from patient samples for serial dilution.
Therefore, we used other evidence to support this. Since we set RV pipeline threshold of >5000 bp for SV calling for this study, we relied on white paper RV pipeline data to generate this information using dilutions of different types of calls at 300X effective coverage of simulated data.
The analysis showed that deletions (≥5000 bp), duplications (>150 kbp), insertions (5-50 kbp), inversions (>70 kbp) and translocations were detectable at 5% variant allele frequency at least 90% of the time using the RV pipeline [10]. Additionally, independent assessment of the LOD for CNVs and SVs was recently investigated in patient samples. Sahajpal et al evaluated LOD for deletions, duplications, aneuploidy and translocations, and showed that all variants were detected in triplicate at 5% allele fraction (10% of cells) [11]. Further, limit of detection could vary in the presence of other cytogenetic abnormalities, especially a complex karyotype and hence, more systematic studies using multiple samples with 2 or more concomitant aberrations are needed.
To confirm the reproducibility/ precision, over the study interval, we performed duplicate testing on aliquots from 4 different patient samples, 1 with a normal karyotype and the remaining

Statistical Analysis
Overall survival (OS) was calculated from the time from diagnosis to death or the last follow-up date. Patients who were alive at their last follow-up were censored on that date. The Kaplan-Meier product limit method was used to estimate the median OS for each parameter.
Univariate Cox proportional hazards regression analysis was used to identify association of each of the variables with OS, followed by multivariate analysis. Prognostic fitness of cytogenetic risk calculated from CBA and OGM were compared using Harrell's concordance index.

Supplementary Figures
Supplementary Figure S1 Schematic overview of the workflow describing the experiment and analysis for optical genome mapping (OGM). Ultra-high-molecular-weight-DNA was extracted from fresh/frozen BM cells, followed by direct label and stain (DLS) labeling, linearization, and sequential imaging. The imaging data was converted to molecules that were assembled de novo to generate consensus genome maps using a reference.