False negatives in GBA1 sequencing due to polymerase dependent allelic imbalance

A variant in the GBA1 gene is one of the most common genetic risk factors to develop Parkinson’s disease (PD). Here the serendipitous finding is reported of a polymerase dependent allelic imbalance when using next generation sequencing, potentially resulting in false-negative results when the allele frequency falls below the variant calling threshold (by default commonly at 30%). The full GBA1 gene was sequenced using next generation sequencing on saliva derived DNA from PD patients. Four polymerase chain reaction conditions were varied in twelve samples, to investigate the effect on allelic imbalance: (1) the primers (n = 4); (2) the polymerase enzymes (n = 2); (3) the primer annealing temperature (Ta) specified for the used polymerase; and (4) the amount of DNA input. Initially, 1295 samples were sequenced using Q5 High-Fidelity DNA Polymerase. 112 samples (8.6%) had an exonic variant and an additional 104 samples (8.0%) had an exonic variant that did not pass the variant frequency calling threshold of 30%. After changing the polymerase to TaKaRa LA Taq DNA Polymerase Hot-Start Version: RR042B, all samples had an allele frequency passing the calling threshold. Allele frequency was unaffected by a change in primer, annealing temperature or amount of DNA input. Sequencing of the GBA1 gene using next generation sequencing might be susceptible to a polymerase specific allelic imbalance, which can result in a large amount of flase-negative results. This was resolved in our case by changing the polymerase. Regions displaying low variant calling frequencies in GBA1 sequencing output in previous and future studies might warrant additional scrutiny.


Scientific Reports
| (2021) 11:161 | https://doi.org/10.1038/s41598-020-80564-y www.nature.com/scientificreports/ in another study 6 , could not be confirmed by our sequencing method. Upon further investigation, the previous finding turned out to be a true positive result, while in our NGS method the variant was present, but it did not pass the default variant calling filter (heterozygous variant detected in more than 30% of the reads). A heterozygous allele should have a variant calling frequency of approximately 50% for both variants and a homozygous allele should have a variant calling frequency of approximately 100%, with very little noise using modern techniques 7 . Upon experimental lowering of this variant calling filter to 2%, the total GBA1 variant hit-rate almost doubled, primarily driven by the relatively common NM_000157.3:c.1093G > A;p.(Glu365Lys) (allelic name E326K) variant. This paper describes how changing the polymerase enzyme normalized all variant frequencies, thereby uncovering the false negative results, by using a structured assessment of different primers, PCR primer annealing temperatures (T a ), amounts of DNA input and two different polymerases.
Based on intronic variants, including known benign variants, many samples without exonic variants were also imbalanced. Due to uncertainty of sequencing in GC-rich and repeat regions, some intronic variants with a low frequency could be sequencing or mapping errors as opposed to imbalanced amplification. Therefore, allelic These are the first 216 samples with a suspected GBA1 exonic variant, initially sequenced using the Q5 polymerase and later repeated with TaKaRa polymerase. The top row shows histograms combining all exonic variants. The bottom row shows dot plots with variant frequencies per specific exonic variant. In normal circumstances, one would only expect a variant frequency around 50% and 100%. 44 samples contained two exonic variants (primarly p.[Asp179His;Glu365Lys] (D140H + E326K)), one sample contained three variants and one sample four. Frequencies below 30% are generally filtered out; the filter was customly set to 2% here. Variant details can be found in our previous publication 5 . Assessment of PCR conditions. PCR yield of human control DNA per T a for Q5 and TaKaRa using all four primer sets can be seen in Fig. 2. PCR yield using TaKaRa was generally higher than using Q5. A T a of 62 °C for Q5 and 63 °C for TaKaRa was chosen. The PCR product increased with increasing DNA input (4 ng, 20 ng, 100 ng) of the human control samples. Using the PD samples, primer set 2 had the lowest yield. Samples using primer set 2, control samples with 4 ng DNA input and negative controls were omitted from library preparation and sequencing. Variant frequencies per polymerase and per primer set of control samples with varying DNA input can be seen in Table 1 and variant frequencies of the PD samples can be seen in Table 2. Difference in DNA input did not affect the variant frequencies, based on control samples with 20 ng or 100 ng input. Choice of primers did not affect the variant frequencies, based on three different primer sets unique to the GBA1 gene. Samples amplified using TaKaRa polymerase showed balanced variant frequencies, including the initially imbalanced samples using Q5 polymerase. PD sample 1 had a low yield after PCR using primer set 1 and TaKaRa polymerase and PD sample 9 had a low yield after PCR using primer set 3 and Q5 polymerase, therefore these two samples could not reliably be analyzed.

Discussion
This paper describes the serendipitous finding of a polymerase dependent allelic imbalance when sequencing the GBA1 gene, resulting in a high number of false negative results. In our cohort, the variant hit rate increased by 93% after changing the polymerase, primarily driven by the relatively common c.1093G > A;p.(Glu365Lys) (E326K) variant and the [c.535G > C];[c.1093G > A];p.[(Asp179His);(Glu365Lys)] (D140H + E326K) complex allele (a likely Dutch founder allele). This artifact was initially disguised by the commonly used variant calling filter set at a frequency of 30%. Our previous publication 5 was based on correct data after this artifact was detected and corrected. Considering this finding, we strongly advise to further explore the lower frequency regions of  Preferential engagement of the polymerase to a specific allele can have multiple causes, like differences in GC content between alleles, heterozygous variants in the primer region, methylation status or altered folding [8][9][10][11] . A different variant in the primer annealing site was excluded as a cause due to equivalent results when using different primer sets. No specific intron or exon variant could be detected that differentiated between balanced and imbalanced samples. At this point, it remains unclear what causes this imbalance. Similarly, it is unclear how this translates to other polymerases or whether this could be resolved using modified PCR conditions. Concentration of the Q5 polymerase enzyme could not be varied, because this is provided in a mastermix solution.
Methylation status can alter the DNA secondary structure and melting properties 10 . Amplification is not prevented by methylation, but it is rather driven preferentially toward the unmethylated allele when the methylation status of the two alleles is distinct 11 . Similarly, altered DNA strand folding, like hairpin and G-Quadruplex structures, can also interfere with PCR 12,13 . Coadjuvants to improve DNA denaturation, and performing PCR amplification in a KCl free buffer have been suggested to circumvent these problems 11,12 . Histone modifications are known to be involved in epigenetic modification 9 , but these are typically removed during DNA purification, so an effect on PCR seems unlikely.
A previous assessment of allele dropout showed a majority to be caused by nonreproducible PCR failures rather than sequence variants 14 , but this seems unlikely in the current report, considering the widespread and reproducible imbalances reported.
Most exonic variants, if abnormal in read frequency, displayed a decrease in frequency to below 30%. Some exonic variants however, like c.1223C > T;p.(Thr408Met) (T369M) and c.1226A > G;p.(Asn409Ser) (N370S), only showed normal or abnormally high read frequencies. Samples with a variant frequency up to 94.5% turned out to be heterozygous (correct frequency ~ 50%) after changing to TaKaRa polymerase. This shows that also marginally abnormal results should be interpreted with caution.
Most exonic variants that occurred in an imbalanced frequency in certain samples, were also seen in a normal frequency in other samples (using Q5 polymerase). This implies that the exonic variant is not exclusively responsible for the imbalance. Conversely, exonic variants that were only seen in a normal frequency, are not precluded from a potential imbalance, because these variants were less prevalent and may therefore have only been detected with balanced frequencies by chance. Using the TaKaRa polymerase, the allelic imbalance was eliminated for all exonic variants and most intronic variants, but the imbalance could still be seen for some remaining intronic variants ( Supplementary Figs. 2 and 3). These intronic variants mostly were in high-repeat intronic regions, or in other regions with a relatively low coverage, considered technical noise. In samples containing these imbalanced intronic variants, other intronic (and sometimes exonic) variants did have balanced frequencies, strengthening the notion that the imbalanced intronic variants were a technical artifact.
These findings could also explain discrepancies between multiple reports from the same or nearby populations. Both in the United Kingdom and in Spain, the c.1093G > A;p.(Glu365Lys) (E326K) variant was initially not found 15,16 , but in later cohorts it was reported [17][18][19] .
Considering the ongoing drug development targeting the GBA1 pathway, more and more people with Parkinson's disease will be screened for GBA1 variants. Both for counseling purposes and for adequate enrollment in upcoming clinical trials, a reliable sequencing method is essential. Genotyping. Saliva was obtained from patients using Oragene DNA OG-500 tubes (DNA Genotek). DNA isolation, next generation sequencing (NGS) and data analysis was performed by GenomeScan B.V., Leiden, the Netherlands, as described previously 5 . The GBA1 gene was unambiguously amplified using primers unique to the functional gene; these primers were used previously 20 . Initially, the Q5 Hot Start High-Fidelity 2X Master Mix (New England BioLabs Inc.) was used. After the imbalanced variant frequencies were detected, a structured assessment of various PCR conditions was conducted. DNA was isolated according to standardized procedures using the QIAsymphony DSP DNA Midi Kit (Qiagen). The DNA concentration of the samples was determined using Picogreen (Invitrogen) measurement prior to amplification. Afte PCR, the long-range PCR product was fragmented using the Bioruptor Pico (Diagenode) to an average size of 300-500 bp before sequencing on an Illumina sequencer. Library preparation was performed using the NEBNext Ultra II DNA Library Prep kit (New England Biolabs E7370S/L). End repair/A-tailing, ligation of sequencing adapters and PCR amplification was performed according to the procedure described in the NEBNext Ultra DNA Library Prep kit instruction manual. The quality and yield of the library preparation was determined by Fragment Analyzer analysis. Clustering and DNA sequencing (paired-end 150 bp) using the Illumina cBot and HiSeq 4000 was performed according to the manufacturer's protocols. Image analysis, base calling and quality check was performed with the Illumina data analysis pipeline RTA v2.7.7 and Bcl2fastq v2.17.
Data analysis was performed using a standardized in-house pipeline developed by GenomeScan B.V., based on the Genome Analysis Toolkit's (GATK) best practice recommendations 21 , including instructions for raw data quality control, adapter trimming, quality filtering, alignment of short reads, and frequency calculation. During the alignment step (using Burrows-Wheeler Aligner v0.7.4) to the human reference (hg19), the GBAP1 gene was masked due to the high homology with GBA1. By masking GBAP1, mapping quality of GBA1 reads increased, especially at the 3′-prime of the gene, where the homology is the highest.
GBA1 variants are described based on the amino acid position excluding the 39-residue signal sequence at the start (also known as "allelic nomenclature"), which is used historically in GBA1 research (format: E326K). The recommended nomenclature by the Human Genome Variation Society (HGVS) is also given (format: p.(Glu365Lys)) and further variant details can be found in our previous publication 5 . The used NCBI Reference Sequence is NM_000157.3, NP_000148.2, assembly GRCh37.
Assessment of PCR conditions. To investigate the cause of the variant frequency imbalance, four PCR conditions were varied: (1) the primers (n = 4); (2) the polymerase enzymes (n = 2); (3) the primer annealing temperature (T a ) specified for the used polymerase; and (4) the amount of DNA input.
Twelve samples with a GBA1 variant of varying variant frequencies were further analyzed for this purpose: two homozygous samples, four heterozygous samples (with balanced variant frequencies) and six samples with an abnormal variant frequency, see Table 3. Additionally, commercial human DNA and negative controls were used.
The four primers investigated can be found in Table 4. Primer set 1 was used throughout the original genotyping project. The two polymerases used were Q5 Hot Start High-Fidelity 2X Master Mix (New England BioLabs Inc.) and TaKaRa LA Taq DNA Polymerase Hot-Start Version: RR042B. Q5 DNA Polymerase is composed Table 3. Selection of twelve PD samples with varying GBA1 variant frequencies. Samples 5,8,9,10 and 11 have an abnormally low variant frequency and sample 7 has a frequency too high for a normal heterozygous and too low for a normal homozygous variant. Sample 6 is compound heterozygous (T369M on one allele and L324P on the other allele). Both the HGVS nomenclature is given and the GBA1 allelic name, which excludes the 39-amino acid signaling peptide, both using accession NP_000148.2  23 . First, the optimal primer annealing temperature (T a ) was assessed for both polymerases, using all four primer sets and 100 ng commercial human DNA. A standard three-step PCR cycle was performed according to the instructions of the supplier. For the Q5 polymerase, a T a of 58-66 °C with increments of 2 °C was investigated and for the TaKaRa polymerase, a T a of 60-68 °C with increments of 2 °C was investigated. Concentrations of the PCR products were measured using Picogreen (Invitrogen). The size of the PCR products was determined by Fragment Analyzer analysis.
Using the determined T a per polymerase, the four primers were assessed using the twelve DNA samples from PD patients, commercial human control DNA and a No Template Control. The commercial human DNA was used to vary the DNA input, using 4 ng, 20 ng and 100 ng. For all PD samples, 100 ng was used. See Table 5 for an overview.

Confirmation of genotyping.
All samples with a GBA1 exonic variant, either balanced or imbalanced, were rerun in the new conditions based on the above mentioned assessment. Table 4. Four different GBA1 primer sets that will lead to amplification of the functional gene and not the pseudogene GBAP1. Genomic position based on Hg19.   Table 5. Analysis setup to investigate the effect of four primers sets and of DNA input on variant frequencies.
This setup was performed once using Q5 polymerase and once using TaKaRa polymerase. Samples one to twelve were previously assessed to have a GBA1 variant using Q5 polymerase, some with abnormal variant frequencies, see Table 3. Negative control samples contained no DNA.