Myeloproliferative neoplasms (MPNs) are clonal hematological malignancies characterized by excessive production of terminally differentiated myeloid cells. MPNs comprise nine disease entities; three of which, namely, polycythemia vera (PV), essential thrombocythemia (ET) and primary myelofibrosis (PMF), share major molecular and pathological features.1, 2 The MPN phenotype is mainly defined by mutually exclusive oncogenic mutations in the genes JAK2,3, 4, 5, 6, 7 MPL8, 9 and, as recently discovered, CALR.10, 11 In most cases, the JAK2 and MPL mutations are single-nucleotide substitutions at key amino-acid positions. JAK2-V617F is the most common mutation in MPN accounting for >60% of cases. CALR mutations consist exclusively of insertions and deletions in the last exon of the gene resulting in a frameshift to a specific alternative reading frame.10

A germline haplotype spanning the JAK2 gene (GGCC) is a major predisposition factor for developing JAK2 mutation-positive MPN.12, 13, 14 Interestingly, in heterozygote patients, the JAK2-V617F mutation is acquired preferentially on the GGCC haplotype of JAK2.14 This observation led to the two different hypotheses of hypermutability vs fertile ground to explain the germline predisposition.12, 14 However, these hypotheses have neither been proven nor been disproven to date.

After the recent discovery of CALR mutations in MPN,10, 11 we hypothesized that allelic preference in mutation acquisition similar to JAK2-V617F may also occur at CALR locus. Moreover, as all CALR somatic mutations in MPN are insertions and deletions, specific mutational mechanisms might be of relevance. As is the case of JAK2, sequence variants such as single-nucleotide polymorphisms (SNPs) proximal to the mutational hotspot in the CALR gene may also predispose to MPN.

A common SNP is located close to the CALR mutational hotspot, 54 bp downstream of the CALR stop codon (rs1049481). Allele frequencies in the general population are about 40% G and 60% T. To investigate the possibility of preferential acquisition of CALR mutations on one of the alleles, we designed a PCR-based assay that allows detecting which allele of rs1049481 acquired the somatic mutation. The unlabeled forward primer (5′-GGCAAGGCCCTGAGGTGT-3′) binds to a sequence in intron 8 (last intron, 5′ to CALR mutations), whereas a pair of 6-FAM-labeled reverse primers (5′-AGACATTATTTGGCGCGGCC-3′ and 5′-TTTAGACATTATTTGGCGCGGCA-3′) bind at the 3′ untranslated region region, with the most 3′ base binding either to G or T allelic variants of rs1049481 (Figure 1a). As the PCR product includes the entire exon 9, somatic insertions and deletions can be detected by measuring the size of the labeled PCR product. In addition to the base difference, the reverse primer specific to the G allele is also three nucleotides shorter than the T allele primer. Thus, the assay simultaneously genotypes rs1049481 and discriminates between mutant and wild-type CALR (Figure 1b). The results of the assay were concordant with CALR mutation data obtained through Sanger sequencing previously in all cases.

Figure 1
figure 1

Investigation of potential allelic imbalances in CALR mutational acquisition and gene expression. (a) Assay used for simultaneous genotyping of rs1049481 and allelic localization of CALR mutations. The two allele-specific labeled reverse primers bind and amplify the chromosomal region bearing the complementary allele of the SNP resulting in products of different sizes. In addition, the size difference of CALR insertions and deletions allows for identification of the allele on which the mutations are acquired. (b) Capillary electrophoresis tracks of the amplified labeled product for different cases of rs1049481 genotypes and CALR mutations. Below each peak, the size, the corresponding genotype (in brackets) and the CALR mutational status are indicated. Coexistence of mutated and wild-type CALR on the T allele suggests incomplete clonality (bottom right panel). (c) The relationship between rs1049481 genotypes and CALR mRNA expression levels. RNA-Seq data on peripheral blood from 173 AML patients were downloaded from The Cancer Genome Atlas (TCGA) project (LAML data set). Normalized expression values were grouped by rs1049481 genotype. There is no detectable statistically significant difference in genotype-specific gene expression (P=0.19; Kruskal–Wallis test). (d) Evidence for allelic expression imbalance at the CALR locus. RNA-Seq and exome sequencing alignment data were downloaded from the TCGA data platform and the 1000 Genomes Project, respectively. Genomic DNA for exome sequencing was derived from normal control tissues, whereas RNA-Seq data was generated from AML peripheral blood enriched for myeloid progenitor cells. Allelic read depth ratios at rs1049481 (T/G) of heterozygous samples that passed genotyping quality filtering are shown. While the T/G ratio of exome sequencing calls is around 1, allelic ratios at RNA level significantly deviate from allelic balance (P=2.2 × 10−16; Mann–Whitney U-test).

If the CALR gene mutates randomly, somatic mutations are expected to be equally distributed between CALR alleles. We initially included 386 MPN patients and 163 healthy controls from Austria in this study. Of the 386 MPN cases, 70 had CALR mutations, of which 37 (52.9%) harbored type 1, 19 (27.1%) type 2 and 14 (20.0%) other mutation types. Later in the study, we expanded the CALR-mutated cohort by including additional 129 Austrian and Italian cases heterozygous for rs1049481. We observed unequal distribution of CALR mutations between the G and T alleles of rs1049481 in heterozygous cases. CALR mutations were more frequently acquired on the T allele (N=23) compared with the G allele (N=10) in patients with GT genotype of rs1049481 (Table 1). As the number of CALR-positive cases heterozygous for rs1049481 was low in the initial cohort, we tested additional CALR-positive patients from different cohorts using the same assay.

Table 1 The distribution of rs1049481 SNP genotypes in CALR mutation-positive and -negative MPN cases and controls

Overall, we observed a significantly higher number of CALR mutations in the T allele (100 cases) compared with G allele (62 cases) of rs1049481. Thus, the T allele of CALR mutates 1.61 times more frequently than the G allele (P=0.0028; one-proportion z-test). We subsequently performed subtype analysis; however, low sample size implicated low statistical power for potential trends. Nevertheless, we observed a distinct pattern of allelic bias in CALR mutation types. The allelic bias was most pronounced for CALR type 2 mutations (T/G: 32/15 cases, ratio 2.13, P=0.013), followed by type 1 mutations (T/G: 50/30 cases, ratio 1.67, P=0.025) and was nearly absent for all the other mutations combined (T/G: 18/17 cases, ratio 1.06, P=0.866).

As a similar allelic bias in the JAK2 mutation acquisition confers predisposition to JAK2-positive MPN, we next examined whether rs1049481_T confers susceptibility to CALR-positive MPN. As shown in Table 1, genotypic frequencies between controls and MPN cases did not differ significantly (P=0.8407, χ2 test for independence). Thus, despite the presence of an allelic bias in somatic mutagenesis of CALR (Table 1), rs1049481 does not exhibit statistically significant association with MPN in our patient cohort. Notably, a larger cohort might be necessary for CALR (allelic bias: 1.6-fold) compared with JAK2 (allelic bias: 7.2-fold)14 to have sufficient statistical power for observing disease association as a consequence of the mutation acquisition bias. Genotype frequencies were similar in different MPN mutational subtypes (CALR positive, JAK2 positive) and diagnostic classes (PV, ET or PMF; data not shown). Although rs1049481 serves as a suitable tagging SNP for the CALR locus and allows for screening of a large number of heterozygote individuals owing to its high minor allele frequency, another variant in linkage disequilibrium might be causative for differential mutation acquisition. However, with the present cohort size it was not possible to dissect the haplotype for identifying variants with stronger linkage to CALR mutations than rs1049481 (data not shown).

The allelic bias of CALR mutation acquisition may be a result of specific mutational mechanisms responsible for CALR mutagenesis. Particularly interesting is the fact that CALR somatic mutations are restricted to insertions and deletions. The most frequent CALR mutations are the 52-bp deletion (type 1) and the 5-bp insertion (type 2). A close examination of the sequence of CALR exon 9 reveals a complex repetitive region with both trinucleotide repeats and longer repeat elements. The nucleotide sequences around the breakpoints of the 52-bp deletion consist of two imperfect direct repeats separated by a spacer sequence. The start positions of the two repeats are 52 bp apart. In case of the 5-bp insertion, the sequence of the insertion creates a 10-bp palindromic sequence (it is the inverted complementary copy of the preceding five bases). These facts indicate that the mutagenesis of CALR might be recombination mediated. In such a scenario, the actual sequence context of the region may have an important role in facilitating or impeding the mutation generation. Thus, owing to the complex mutational mechanisms, the CALR mutations may occur more frequently in the T allele of the rs1049481 compared with the G allele. This might be more relevant for type 1 and type 2 mutations, as specific mutational mechanisms resulting in increased mutation rate may be responsible for their high frequency. The observed trend in allelic bias specificity toward type 1 and type 2 mutations provides some support to this hypothesis.

An alternative mechanism potentially explaining the observed mutational imbalance might be allelic bias in gene expression. Higher expression from one of the alleles can be associated with a more open chromatin state, thus being more exposed to mutagenic stimuli. Furthermore, mutations occurring on the allele producing higher transcript levels can be more likely to result in clonal outgrowth, a concept underlying the ‘fertile ground hypothesis’.12, 13, 14 To test for potential allelic expression bias at the CALR locus, we made use of the combined genotyping and RNA-Seq data on peripheral blood from 173 acute myeloid leukemia (AML) patients provided by The Cancer Genome Atlas project.15 The AML peripheral blood is enriched for myeloid progenitors and might therefore represent an adequate tissue for this analysis. Grouping CALR expression values by genotypes did not yield any statistically significant evidence for differential expression (Figure 1c). However, when examining the RNA-Seq data on a per-individual level, heterozygotes showed a modest but significant expression bias toward the T allele, compatible with allelic expression imbalance (Figure 1d). Whether allelic expression bias is responsible for the allelic bias in CALR mutational acquisition needs to be the subject of further investigation.

Cancer-associated genes showing allelic bias in their somatic mutagenesis have not been commonly reported. Interestingly, after JAK2, our observation on allelic imbalance of CALR mutations describes the second such case involved in MPN pathogenesis. It remains to be seen whether other cancer-associated loci exhibit biases in the acquisition of somatic mutations similar to JAK2 and CALR in MPN.