Integrated DNA/RNA targeted genomic profiling of diffuse large B-cell lymphoma using a clinical assay

We sought to define the genomic landscape of diffuse large B-cell lymphoma (DLBCL) by using formalin-fixed paraffin-embedded (FFPE) biopsy specimens. We used targeted sequencing of genes altered in hematologic malignancies, including DNA coding sequence for 405 genes, noncoding sequence for 31 genes, and RNA coding sequence for 265 genes (FoundationOne-Heme). Short variants, rearrangements, and copy number alterations were determined. We studied 198 samples (114 de novo, 58 previously treated, and 26 large-cell transformation from follicular lymphoma). Median number of GAs per case was 6, with 97% of patients harboring at least one alteration. Recurrent GAs were detected in genes with established roles in DLBCL pathogenesis (e.g. MYD88, CREBBP, CD79B, EZH2), as well as notable differences compared to prior studies such as inactivating mutations in TET2 (5%). Less common GAs identified potential targets for approved or investigational therapies, including BRAF, CD274 (PD-L1), IDH2, and JAK1/2. TP53 mutations were more frequently observed in relapsed/refractory DLBCL, and predicted for lack of response to first-line chemotherapy, identifying a subset of patients that could be prioritized for novel therapies. Overall, 90% (n = 169) of the patients harbored a GA which could be explored for therapeutic intervention, with 54% (n = 107) harboring more than one putative target.


Description of workflow
DNA and RNA from each patient are extracted and made into barcoded libraries through separate workflow streams. The DNA and cDNA undergo library construction and hybrid selection on independent plates. DNA and RNA samples from the same patient then converge in an analysis pipeline using the plate names and shared specimen ID.

DNA and RNA extraction
DNA and RNA are extracted from FFPE samples as previously described. 1,2 A 5µm FFPE section is stained using hematoxylin and eosin and reviewed by a pathologist to confirm ≥ 20% tumor nuclei and a tissue volume of ≥2mm 3 . A macro-dissection of samples is performed when warranted in order to enrich for tumor content. DNA and RNA are each extracted from 40 μm (typically 4x10µm) of unstained FFPE sections.
DNA extraction: FFPE samples are deparaffinized and then digested with a proteinase K buffer for 12-24 h followed by purification with the Promega Maxwell 16 Tissue LEV DNA kit. Doublestranded DNA is quantified by a Picogreen fluorescence assay using the provided lambda DNA standards (Invitrogen). 50-200 ng of dsDNA in 50-100 μl water in microTUBEs is fragmented to ~200 bp by sonication (3 min, 10% duty, intensity = 5, 200 cycles/burst; Covaris E210) before purification using a 1.8× volume of AMPure XP Beads (Agencourt). Samples yielding <50 ng of extracted DNA are considered failed (estimated failure rate 4.9% 1 ).
RNA extraction: FFPE samples are deparaffinized and then digested with proteinase K lysis buffer at 56ºC for 15min followed by 80ºC for 15min. The lysate is treated with freshly prepared DNase at room temperature for 10 min and then purified using the Promega Maxwell CSC RNA FFPE kit. Samples are quantified for RNA yield using RiboGreen (LifeTech). Samples with RNA yield ≥3.5ng/µL proceed to cDNA synthesis. RNA is normalized 500ng in a volume of 22.7μL. A cDNA primer mixture of random hexamer (IDT) and oligo dT (IDT) are annealed to the template RNA at 65ºC for 5min. First strand synthesis is performed using M-MLV RT RNase(H-) (Promega #M3683) 25°C 10 min, 40°C 50 min, 85°C 5 min. Second strand synthesis follows using the NEB Second strand mRNA synthesis kit (#E6111L) and incubated at 16ºC for 30min. The entire cDNA product is sheared by sonication to ~200bp fragment size (3 min, 10% duty, intensity = 5, 200 cycles/burst; Covaris E210) before purification using 1.8x SPRI clean up.

Library construction and hybrid selection
Solution hybridization is performed using pools of 5' biotinylated 120 bp oligonucleotide DNA baits (Integrated DNA Technology); a pool of 35,845 baits for the DNA libraries and pool of 22,656 baits for the cDNA libraries. Baits were designed by taking overlapping 120 bp DNA sequence intervals covering target exons (60 bp overlap) and introns (20 bp overlap), with a minimum of three baits per target; SNP targets were allocated one bait each. Intronic baits were filtered for repetitive elements as defined by the UCSC Genome RepeatMasker track. 3 Hybrid selection of targets demonstrating reproducibly low coverage was boosted by increasing the number of baits for these targets.
SPRI purification and subsequent library construction with the NEBNext kits (E6040S, NEB), containing mixes for end repair, dA addition and ligation, are performed in 96-well plates (Eppendorf) on a Bravo Benchbot (Agilent) using the "with-bead" protocol 4 to maximize reproducibility and library yield. 500-2,000 ng of sequencing library is and suspended in water, heat denatured at 95 °C for 5 min and then incubated at 68 °C for 5 min before addition of the baitset reagent and Cot, salmon sperm and adaptor-specific blocker DNA in hybridization buffer. After a 24-h incubation, the library-bait duplexes are captured on paramagnetic MyOne streptavidin beads (Invitrogen) and off-target library is removed by washing once with 1× SSC at 25 °C and four times with 0.25× SSC at 55 °C. The PCR master mix is added to directly amplify (12 cycles) the captured library from the washed beads. 4 After amplification, the samples are 1.8× SPRI purified, quantified by qPCR (Kapa) and sized on a LabChip GX (Caliper). Samples yielding <500 ng of sequencing library, or with a mean insert size >400 bp, are considered failed. Size selection was not done. Libraries are normalized to 1.05 nM and pooled such that each Illumina HiSeq 2500 lane has up to four samples each (32 per flowcell), before 49 × 49 paired-end sequencing using manufacturer's protocols to ∼500× unique coverage for DNA and to >3 M unique on-target pairs for cDNA.

Sequence data processing
Sequence data were mapped to the human genome (hg19) using BWA aligner v0.5.9. 5 PCR duplicate read removal and sequence metric collection was done using Picard 1.47 (http://picard.sourceforge.net/) and Samtools 0.1.12a33. Local alignment optimization was performed using GATK 1.0.4705. 6 Variant calling was done only in genomic regions targeted by the test.

Base substitutions, indels, and copy number analysis
Samples with median exon coverage in the range 150 to 250× are considered qualified, whereas those with coverage <150× are considered failed. Significant non-synonomous variants were defined as any somatic alteration annotated in the COSMIC database (v62), as well as clear inactivating mutations (i.e. truncations or deletions) in established tumor suppressor genes. 7 For base substitutions, the mutant allele frequency (MAF) cutoff used was 1% for known somatic variants (based on COSMIC v62) and 5% for novel somatic variants.
To detect indels, de novo local assembly in each targeted exon was performed using the de Bruijn approach. 8 Key steps are: 1. Collecting all read-pairs for which at least one read maps to the target region.
2. Decomposing each read into constituent k-mers and constructing an enumerable graph representation (de Bruijn) of all candidate nonreference haplotypes present.
3. Evaluating the support of each alternate haplotype with respect to the raw read data to generate mutational candidates. All reads are compared to each of the candidate haplotypes through ungapped alignment, and a 'vote' for each read is assigned to the candidate with best match. Ties between candidates are resolved by splitting the read vote, weighted by the number of reads already supporting each haplotype. This process is iterated until a 'winning' haplotype is selected.
4. Aligning candidates against the reference genome to report mutation calls. Indel candidates arising from direct read alignment were also considered.
For indels, the MAF cutoff was 3% for known somatic variants and 10% for novel somatic variants. Additional details were described previously. 1 CNA detection was achieved using a comparative genomic hybridization (CGH)-like method. A log-ratio profile of the sample is obtained by normalizing the sequence coverage at all exons against a process-matched normal control. This profile was corrected for GC-bias, segmented and interpreted using allele frequencies of sequenced SNPs to estimate tumor purity and copy number at each segment as previously described. 1 . Fitting was performed using Gibbs sampling, assigning absolute copy number to all segments. Model quality was reviewed and alternative explanations considered, 9 and focal amplifications are called at segments with ≥6 copies (or ≥7 for triploid; ≥8 for tetraploid tumors) and homozygous deletions at 0 copies, in samples with purity >20%.

Rearrangement calling methods
Gene rearrangements were detected by identifying clusters of chimeric read pairs from both DNA (pairs mapping >10 kbp apart or on different chromosomes) and RNA (pairs mapping to refSeq sequences corresponding to different genes or to genomic loci >10 kbp apart, and reads with suboptimal mapping were aligned to whole genome references). Alignments to the 2 different references were then merged and calibrated based on the full genome reference (hg19) for fusion detection. Chimera clusters were filtered for repetitive sequence (average mapq >30) and by distribution of mapped positions (SD >10). Identified rearrangements were then annotated according to the genomic loci of both clusters and categorized as gene fusions (eg, BCR-ABL1), gene rearrangements (eg, IGH-BCL2), or truncating events (eg, TP53 rearrangement). Rearrangement candidates were then filtered based on number of chimera reads supporting the rearrangement events (for documented fusions, a minimum 10 chimera reads are required; for putative somatic driver rearrangements, 50 chimera reads are required).
In addition to the de novo rearrangement detection method described above, reads were also separately aligned to a custom reference library generated based on common fusions and rearrangements. Fusions were detected based on the observation of reads aligned across the junction of rearrangement breakpoints. Immunoglobulin heavy locus (IGH) rearrangements were detected by targeting rearrangement hotspots of both common immunoglobulin fusion partner genes (major and minor translocations involving MYC, BCL2, and CCND1), as well as IGH breakpoint regions.