Misaligned sequencing reads from the GNAQ-pseudogene locus may yield GNAQ artefact variants

) simultaneously. Linkage disequilibrium analysis of the three SNPs from the GNAQP locus also showed that they tend to co-occur and cause an misalignment to GNAQ locus. This misalignment would yield the wrong callings of GNAQ p.T96S and p.101X mutations. c GNAQ - GNAQP homologous regions that implicated p.T96S and p.Y101X, and rs3730150, rs3730148 and rs3730153 in the GNAQ and GNAQP loci, respectively. The immediate regions outside of chr9:80537082-80537173 are unique to GNAQ that would further help Zhaoming Li et al. to further validate their current ﬁ ndings.

GNAQ deficiency led to enhanced NK cell survival in conditional knockout mice (Ncr1-Cre-Gnaq fl/fl ) via the inhibition of AKT and MAPK signalling pathways. It was also shown to be clinically important as patients with GNAQ p.T96S had inferior survival and could be relevant for the development of therapies.
As the Zhaoming Li et al. 2 study used FFPE materials for all their sequencing work, we investigated the recurrent GNAQ mutations encoding p.T96S and p.Y101X.
It was of peculiar interest to us that the two GNAQ hotspot somatic mutations (p.T96S and p.Y101X) reported in the study were not reported in other NKTCL studies that also used NGS [4][5][6][7][8][9] . We analyzed the Sanger sequences provided in Supplementary  Fig. 4 of the work in question and realized that the singlenucleotide variant (SNV) that encoded for p.T96S had a minor allele frequency (MAF) of 1.18% (1386/117782, ExAC v1.0 10 database; dbSNP151 11 , rs753716491), which we found to be too common if it was to contribute substantially to the pathogenesis of NKTCL. Moreover, the authors wrote in the published work that the GNAQ somatic mutations encoding for p.Y101X tended to co-occur with p.T96S. However, the GNAQ somatic mutation that encoded for p.Y101X was not marked as a common SNP by germline databases and it was also functionally redundant for a stop-gain (p.Y101X) mutation to co-occur with another missense (p.T96S) mutation on the same gene. This suggested to us that the alignments to the GNAQ locus that encoded for both p.T96S and p.Y101X were erroneous.
In an attempt to reproduce the findings of Zhaoming Li et al., we analyzed the sequencing data of the GNAQ-mutant cases from the original paper. The original sample IDs are 9622, 9634, 8186, 9626 and 8188. The read-depth supporting the GNAQ-mutant allele/total allele are 3/37, 9/71, 10/69, 7/69 and 7/44, respectively. However, all the mutant reads could be non-uniquely aligned to both GNAQ and GNAQP loci. Within these five samples, 9626 and 9622 had matching-normal samples, where they had longer read-lengths (125 bp) than their matching-tumor FFPE samples (<~100 bp) at the concerned GNAQ locus. This allowed the https://doi.org/10.1038/s41467-022-28115-z OPEN artefact variants from the tumors to leak through the germline filter during a somatic variant-calling procedure.
Next, we further analysed the NGS reads that encoded for both p.T96S and p.Y101X somatic mutations and found they were indeed misaligned. We simulated 100 bp long NGS reads that would encode for both p.T96S and p.Y101X somatic mutations from the genomic locus of GNAQ using the same hg19 reference that the authors have used and realigned the in silico reads back to the same reference (Fig. 1a). The reads were multi-mapped to the genomic loci of GNAQ and GNAQ-psuedogene-1 (GNAQP) at chr9q21.2 and chr2q21.1, respectively. As expected, the read was realigned back to the GNAQ locus that it was simulated from and recapitulated the two simulated SNVs too; chr9:80537095[G>T] (p.Y101X) and chr9:80537112[T>A] (p.T96S, rs753716491). Next, Fig. 1b  We performed linkage disequilibrium (LD-LDlink) analysis 12 of all three possible pairwise combinations of the three SNPs within GNAQP and found that they were likely to co-occur together as a triplet of SNPs within GNAQP (Fig. 1b, D′ = 1, R 2 ≥ 0.9403). As such, NGS reads that were representing these SNPs would be misaligned to GNAQ instead and be misinterpreted for somatic mutations encoding for p.T96S and p.Y101X instead.
By performing a pair-wise Smith-Waterman alignment 13 between the genomic sequences of GNAQ and GNAQP, we found that chr9:80537082-80537222 and chr2:132182125-132182265 were homologous and encapsulated all the SNPs and variants that implicated the validity of the reported GNAQ somatic mutations (Fig. 1c). To confirm the reported mutations, the following two criteria need to be satisfied. 1) The alignment must represent GNAQ mutations that encode for p.T96S and p.Y101X. 2) The alignment must extend errorless beyond chr9:80537082-80537222. If either of the two criteria cannot be satisfied, then the validity of the reported GNAQ somatic mutations in NKTCL is questionable.
As the 127 NKTCLs that were studied by Zhaoming Li et al. 2 were all FFPE archival materials and 101 of them had matched whole blood as its germline counterpart. DNA extracted from whole-blood are typically less fragmented and tends to yield longer NGS read-lengths than DNA extracted from FFPE archival Fig. 1 GNAQ p.T96S and p.Y101X mutations could be the results of misaligned sequencing reads from GNAQ-Pseudogene-1. a Reference sequence from GNAQ locus (top), in silico simulated read that would encode for GNAQ p.T96S and p.101X mutations (middle-green box) and in silico read that represents co-occurring SNPs, rs3730150, rs3730148 and rs3730153 (bottom-orange box; the co-occurring SNPs are in red). b Top-scoring alignments of the read that would encode for GNAQ p.T96S and p.101X. The read aligns to both GNAQ (with one mismatch and one SNP) and GNAQP (with three SNPs) simultaneously. Linkage disequilibrium analysis of the three SNPs from the GNAQP locus also showed that they tend to co-occur and cause an misalignment to GNAQ locus. This misalignment would yield the wrong callings of GNAQ p.T96S and p.101X mutations. c GNAQ-GNAQP homologous regions that implicated p.T96S and p.Y101X, and rs3730150, rs3730148 and rs3730153 in the GNAQ and GNAQP loci, respectively. The immediate regions outside of chr9:80537082-80537173 are unique to GNAQ that would further help Zhaoming Li et al. to further validate their current findings.
materials. This allows NGS reads sequenced from whole-blood to align more accurately than those sequenced from FFPE archival materials onto a reference genome. This would mean that sequencing reads that originated from one genomic locus could be mapped to more than one genomic loci and yielded variant artefacts in subsequent downstream analyses.
In an analysis for somatic mutations, the germline mutations would be subtracted from the tumor mutations. In this case, the GNAQ p.T96S and p.Y101X somatic artefacts may have leaked through the subtraction step as reads sequenced from the GNAQ and GNAQP loci were aligned differently from both FFPE archival tumor and normal whole-blood samples. Thus, the combination of the following three criteria 1) Short tumor reads that failed to align correctly 2) Long germline reads that aligned correctly and 3) SNP-stricken genomic region from where the tumor reads were sequenced that may have contributed to the GNAQ p.T96S and p.Y101X artefacts.

Methods
Realignment of sequencing reads from GNAQ-pseudogene locus. Genomic aligner BWA-MEM (v0.7.17-r1188) and reference genome hg19 were used to realign the sequencing data described in this study 2