Introduction

Hydatidiform mole (HM) is an aberrant human pregnancy characterized by abnormal embryonic development, hydropic degeneration of chorionic villi and excessive trophoblastic proliferation. Hydatidiform moles affect 1 in 600 pregnancies in western countries,1 but have higher frequencies in Southern and Asian countries.2 These moles are mostly sporadic and not recurrent. Recurrent hydatidiform moles (RHMs) are defined by the occurrence of at least two moles in the same patient and affects 1.5–9.3% of women with a prior HM.3, 4, 5, 6, 7

To date, two genes, NLRP7 and KHDC3L, responsible for RHMs have been identified.8, 9 Studies by several groups and on various populations demonstrated that NLRP7 is a major gene for RHMs. NLRP7 pathogenic variants are found in 48–80% of patients with RHMs depending on populations and patients inclusion criteria.10, 11, 12, 13, 14, 15 To date, 48 distinct pathogenic variants observed in recessive state in NLRP7 have been reported in a total of 131 patients.16 NLRP7 codes for a NOD-like receptor pyrin containing protein 7 that plays roles in inflammatory response,17, 18 trophoblastic tissue differentiation19, 20 and proliferation,20 and is part of the oocyte cortical cytoskeleton.21

KHDC3L is a second gene responsible for RHMs and is mutated in 14% of patients who are negative for NLRP7 mutations.9, 22 So far, four different pathogenic variants in KHDC3L have been reported in a total of six patients. KHDC3L codes for a KH-domain containing protein, of which another member, KHDC1B, has been shown to bind mRNA and is believed to play a role in the regulation of their translation during oocyte maturation.23

The closest NLRP gene to NLRP7 is NLRP2, which is located 25-kb distal to NLRP7 on human chromosome 19q13.4 in a head-to-head orientation. NLRP7 does not have rodent or bovine orthologues and is believed to have originated from a common NLRP2/7 ancestor in primates.24

In this study, we report 11 novel protein-truncating variants in NLRP7, of which five are large deletions. We characterized the effects of four variants at the transcriptional level. We demonstrate that six of the large deletion breakpoints occurred in intron 5 and eight of the nine large deletions reported so far in NLRP7 may have been mediated by Alu repeats and microhomology of 16-bp to 44-bp. Our analysis demonstrates that human NLRP7 is highly rich in Alu repeats, which represent half of its intronic sequences, that may have accumulated in the primate ancestor of NLRP2/7 before its duplication into two distinct genes, NLRP2 and NLRP7.

Materials and methods

Patients

Patients included in this study were referred to our laboratories at the Research Institute of the McGill University Health Centre, the ‘Laboratoire de génétique des maladies rares et autoinflammatoires’, or the Genetics Department of the Shiraz University of Medical Sciences, for NLRP7 mutation analysis. All the patients included in the mutation analysis were new with the exception of patient 1424 and her two affected sisters, 1426 and 1428, who were previously reported as being heterozygous for a 60-kb deletion encompassing portions of NLRP7 and NLRP2.25 In the transcriptional analysis of the consequences of the variants, two other patients, 1200 and 1074, whose mutations were previously reported, were included.20 The reproductive history of the new patients recapitulated in Table 1. The products of conception (POCs) of the patients listed in Table 1 were not available for analysis with the exception of one molar tissue from patient 1428 that was genotyped by our group and found diploid biparental confirming therefore its previously reported genotype.25 All patients had RHMs with the exception of patient 1291 who had four spontaneous abortions (SAs), of which two were found triploid by karyotype analysis. Her products of conception were not available for us for histopathological review and genotyping, but were analyzed as part of the patient clinical care. One of her triploid products of conception was genotyped by microsatellite DNA markers and reported to be digynic. Her two other SAs were found diploid, one with trisomy 9 and one with monosomy X (Table 1).

Table 1 Recapitulation of the 11 identified novel protein truncating variants and the reproductive outcomes of the patients

Mutation analysis and characterization of the deletions

NLRP7 mutation analysis was performed as previously described by PCR amplification of the 11 exons of NLRP7 from genomic DNA followed by Sanger sequencing of the PCR products in the two directions.8 Various other strategies were performed to identify the exact variants in patients with suspected deletions. These strategies included the design of additional primers for various genomic regions, genotyping of NLRP7 fragments containing known single nucleotide polymorphisms (SNPs) in all family members and establishment of the haplotypes, regular and long range PCR, and genomic DNA dosage by quantitative real-time PCR. The primer pairs that allowed the amplification of the junction fragment for each of the described large deletions are provided in Supplementary Table S1. Variants are described with reference to the following transcript, NM_001127255.1, protein NP_001120727.1 and genomic sequence NG_008056.1. Variant numbering starts with the initiation codon. All variants detected have been deposited in the Leiden Open Variation Database (http://databases.lovd.nl/shared/genes/NLRP7) (patient IDs 00054739-00054741, 00054760-00054768 and 00054773-00054779) and INFEVERS (http://fmf.igh.cnrs.fr/ISSAID/infevers/). Exons are numbered as previously described in Messaed et al.26

Quantitative real time PCR

Quantitative real time PCR to define the deletions in patients 1359 and 1424 was performed using Quantifast SYBR-green PCR kit (Qiagen, Toronto, ON, Canada) on 25-ng of genomic DNA. Each sample was tested in duplicates using the Bio-Rad Miniopticon Real Time PCR system (Bio-Rad Laboratories, Mississauga, ON, Canada) and analyzed using the Opticon Monitor software (Bio-Rad Laboratories). The comparative CT method (ΔΔCT method) was used for relative quantification, and data was normalized against an endogenous control sequence, exon 11 of NLRP7 in patient 1359 and exon 8 of NLRP7 in patient 1424, both of which have two normal copies documented by their heterozygous status for SNPs within the amplicons that amplify these exons. Melting curve analysis was also done to verify PCR specificity.

Reverse transcription PCR

Reverse transcription followed by PCR (RT-PCR) was performed on RNA extracted from Epstein-Barr virus (EBV) transformed cells of patients 1074, 1200, 1243, 1291 and 1341 using Trizol (Invitrogen, Carlsbad, CA, USA) and verified by electrophoresis. cDNA was synthesized using a reverse transcription kit (Life Technologies, Thermo Scientific, Carlsbad, CA, USA). PCR after reverse transcription was performed using primers located in exons 5 and 7 of NLRP7 for patients 1291 and 1200, primers from exons 9 and 11 for patients 1074 and 1243 and various primers located in exons 5 to 11 for patient 1341. The amplified fragments were separated on 1.5% agarose gel, purified using QIAquick PCR purification kit (Qiagen, Toronto, ON, Canada) and sequenced using Sanger sequencing. To document the absence of NLRP7 transcripts in patient 1341, primers from exons 5 and 7 of ZNF28 gene were used in combination with the various NLRP7 primers in the same reaction as technical positive control for the quality and quantity of RNA and the success of the RT-PCR reactions. The various primers used in RT-PCR are provided in Supplementary Table S2.

Results

Identification of 11 novel protein truncating variants including five large deletions

NLRP7 mutation analysis was performed by PCR amplification of all exons from genomic DNA of 12 unrelated patients and direct sequencing of the PCR products. The identified variants are listed in Table 1 and some of them are shown in (Supplementary Figure S1). These variants include two previously reported missense variants, a previously reported splice variant, a new splice variant, two novel single nucleotide substitutions leading to stop codons, two small duplications (one previously reported and one novel), two small deletions, and five large deletions. All patients were either homozygous for one, or compound heterozygous for two, potentially pathogenic variants with the exception of one patient, 1291, who had a small deletion in a heterozygous state and no other variants that are believed to be pathogenic. The variant found in this patient deletes six nucleotides from intron 5 and the first three nucleotides of exon 6, c.2130-6_2132del.

In patient 6202, regular PCR primers designed to amplify exon 6 did not yield any DNA amplification and suggested the presence of a homozygous deletion. Other primers were designed and allowed the amplification of a fragment overlapping the junction (Supplementary Table S1). The sequencing of this fragment revealed the presence of a 1219-bp deletion, c.2130-266_2300+782del, in a homozygous state.

Similarly, in patients P120 and her affected sister (P140), no amplifications were obtained with primers designed to amplify exons 1–5. A series of primers, mapping around the suspected deleted region were used in PCR to amplify the junction fragment that was sequenced (Supplementary Table S1) and the analysis of its sequence defined the deletion at the nucleotide level as c.-3998_2130-668del.

In patient 1341, no PCR fragments were amplified with the primers for exons 2–6. We designed new primers and performed primer walking using regular and long range PCR amplification on the genomic DNA of the patient. This resulted in the amplification of an ~1-kb fragment only in the patient, but not in controls (Supplementary Table 1 and Supplementary Figure S2). Sequencing this fragment and the analysis of its sequence revealed a 5041-bp deletion extending from introns 1 to 5 and defined the deletion at the nucleotide level to c.-39-231_2130-510del. This abnormal 1-kb fragment overlapping the deletion was also amplified from DNA of the two parents of the patient. We note that the size of the normal fragment that does not contain the deletion is ~6-kb and was not amplified on DNA from the parents of the patient or the controls under our experimental conditions (Supplementary Figure S2).

Patients 1424, 1426 and 1428 are sisters and were previously reported as carriers of a 60-kb deletion eliminating substantial portions of NLRP2 and NLRP7 in a heterozygous state and with no other NLRP7 pathogenic variants.25 NLRP7 sequencing on DNA from patient 1424 revealed a previously reported mutation, c.2571dup, p.(Ile858Hisfs*11) in a heterozygous state. To define accurately the deletion, we sequenced NLRP7 amplicons containing common SNPs in all family members, genotyped additional SNPs in the family, established the haplotypes and used quantitative real-time PCR to define the 5’ breakpoint of the deletion close to NLRP7 exon 1 (Supplementary Figure S2) and the 3’ breakpoint close to NLRP7 intron 5. Using primers flanking the deletion, we amplified a fragment of ~1.6-kb in the patient but not in controls (Supplementary Table S1). Sanger sequencing defined the deletion at the nucleotide level to c.-40+251_2130-681del (Table 1). The two protein truncating variants are most likely pathogenic and segregated from the two parents to the three affected sisters.

Patient 1359 was found homozygous for a previously reported missense mutation, p.(Leu750Val) that was present only in her father in a heterozygous state, but not in her mother. This raised the possibility that the patient is hemizygous for the fragment containing p.(Leu750Val). We then performed a high density SNP microarray using Cytoscan HD (Affymetrix, Santa Carla, CA, USA), which suggested that the patient is heterozygous for a deletion spanning from introns 5 to 10 (Supplementary Figure S2). Sequencing of NLRP7 amplicons containing common SNPs in all family members, as well as genotyping additional SNPs from the region, haplotype establishment, and genomic DNA dosage analysis using quantitative real-time PCR using various genomic primers defined the 5’ breakpoint to ~13-kb upstream of exon 1 and the 3’ breakpoint to intron 10. We then amplified the junction fragment in the patient, sequenced it and defined the deletion at the nucleotide level to c.-13413_2982-344del (Table 1). Because this deletion was not detected by PCR amplification of the 11 NLRP7 exons from genomic DNA, we checked by PCR for its presence in three other unrelated patients of the same ethnic group to that of patient 1359 (Mexican origin), and in which homozygosity for a mutation in NLRP7 was detected but the parents were not available for DNA testing. This analysis did not detect the junction fragment in any of the three unrelated patients.

Analysis of the effects of four variants on NLRP7 transcription

The 5041-bp deletion identified in patient 1341, c.-39-231_2130-510del, is the first reported homozygous deletion in NLRP7 that removes the transcription start site (+1). To analyze its effect on the gene transcription, we performed reverse transcription followed by PCR on RNA extracted from EBV—transformed lymphoblastoid cell lines from the patient with three combinations of primers that amplify three cDNA fragments from exons 5 to 11 (Figure 1). However, no cDNA fragments were amplified with the three combinations of primers demonstrating the complete absence of NLRP7 transcripts in the cell line from this patient.

Figure 1
figure 1

Transcriptional characterization of four NLRP7 variants using RT-PCR on RNA from EBV-transformed patient cells. Red arrows denote missing or abnormal fragments observed only in the patients. (a) RT-PCR on RNA from patient 1341 showing complete loss of NLRP7 transcripts using various combinations of primers located in exons 5–11. Primers from exons 5 and 7 of ZNF28 gene were used in combination with those of NLRP7 as positive controls for the RT-PCR. (b) RT-PCR on RNA from patient 1291 with a 9-bp deletion removing the invariant splice acceptor sites of exon 6 showing the deletion of exon 6 from the cDNA. (c) RT-PCR on RNA from patient 1200 with an invariant splice acceptor site variant showing the loss of exon 6. (d) RT-PCR on RNA from patients 1074 and 1243 (e) with the same variant affecting the invariant splice donor of exon 9. This variant leads to one common abnormal fragment of 673-bp in the two patients resulting from the insertion of 162-bp of intron 9 between exons 9 and 10. In patient 1243, we observed other faint fragments due to minor aberrant splicing, of them a fragment smaller than 673-bp is shown in this figure and is indicated by a red arrow.

Patient 1291 has a novel heterozygous 9-bp deletion that deletes the splice acceptor of exon 6, in which we previously reported an invariant splice site mutation, c.2130-2A>G, in a heterozygous state in patient 1200.20 To determine the effects of these two variants on the splicing of exon 6, we performed RT-PCR on RNA from EBV—transformed cells from the two patients, 1291 and 1200. This revealed the presence of a normal (298-bp) and an abnormal shorter cDNA fragments (127-bp) in the two patients (Figure 1). Purification of the short fragments from the agarose gel and their sequencing revealed their identical sequences and their lack of exon 6, presumably due to the splicing variants (Figure 1).

We also assessed the effects of a previously reported splice variant that affects the invariant splice donor site of exon 9, c.2810+2T>G, in two patients, 1074 and 1243. We found that this variant leads to different splicing in the two patients. In patient 1074, this variant led to an abnormal cDNA fragment of 673-bp that contains the first 162-bp of intron 9 and is expected to result in an in-frame insertion of 54 amino acids in the protein. In the second patient, 1243, the same variant yielded, in addition to the 673-bp fragment, to another smaller cDNA fragment of ~650-bp (Figure 1) and to some other faint fragments larger than 673-bp (data not shown) that could not be cloned and were not observed in more than 10 control subjects.

Eight Alu mediated large deletions and rearrangements in NLRP7

For all the five novel large deletions described in this study, the breakpoints were within Alu elements and were found flanked by microhomology over 16-bp to 44-bp on both sides of the breakpoints with one of the microhomology element being completely or partially removed by the deletion (Figure 2). In addition, the same microhomology element was present in more than one region within the genomic structure of NLRP7. For instance, the microhomology element of 37-bp flanking the deletion in patient 6202 is also involved in a previously reported variant, c.2130-312_2300+736del14 and the microhomology element present at the 5’ and 3’ breakpoints of the deletion in patient 1424 is also present at the breakpoints of a previously reported deletion by Kou et al.10 We therefore analyzed the whole genomic structure of NLRP7, a total of 25,991-bp (starting 1-kb upstream of exon 1 to 1-kb downstream of exon 11) (hg19 at https://genome.ucsc.edu/) for repetitive elements using CENSOR,27 a publicly available software that screens query sequences against a reference collection of repeats (http://www.girinst.org/censor/). This analysis led to the identification of Alu repetitive elements surrounding all the five deletions reported in this study (Figure 2) and the three previously reported ones10, 14 (Figure 3). We also found that NLRP7 is highly rich in Alu repeat elements, which account for ~48% of all its intronic sequences with introns 2, 5 and 10 being the richest with Alu content of 70%, 66% and 60%, respectively (Figure 3). In addition, for four of the novel deletions, their 3’ breakpoints were within a 1-kb interval in intron 5, which includes also the breakpoints of two previously reported deletions and rearrangements10, 14, 28 and define a hotspot of Alu instability, deletions and rearrangements (Figure 3).

Figure 2
figure 2

Characterization of five large NLRP7 deletions identified in this study and schematic representation of the Alu elements at the breakpoints. The microhomology sequences surrounding the deletions are shown in capital letters and the unique sequences on each side of the microhomology sequences are shown in small letters. Red letters indicate differences between the two microhomology elements or the flanking sequences. Dashed red lines delimit the deletions. The orientation of the Alu repeats is indicated by arrows above each repeat.

Figure 3
figure 3

Recapitulation of all NLRP7 large deletions and distribution of Alu elements in its intronic sequences. To date, eight Alu-repeat mediated deletions and rearrangements in NLRP7 have been described. Colored boxes refer to specific Alu subfamilies, each with a specific color. Novel and previously reported mutations are in red and black, respectively. Arrows on the top of the colored boxes indicate the orientation of the Alu elements. Six of the 16 deletions and rearrangement breakpoints occurred in intron 5 and define a hotspot of Alu instability, deletions, and rearrangements shown in a rectangle. N, stands for number; seq. stands for sequences. % of Alu seq, indicates the percentage of the length of Alu sequences in each intron divided by the total length of the sequence of each intron.

Discussion

Here we describe eleven novel protein truncating variants in NLRP7 in unrelated patients, three from familial and eight from simplex cases. All patients had RHMs and were found to have two defective alleles with the exception of one patient, 1291, who had four SAs and a single heterozygous 9-bp deletion. The presence of a single NLRP7 protein truncating variant in this patient with four SAs supports previous observations by our group26 and others29, 30 about the involvement of NLRP7 in the genetic susceptibility for recurrent SAs in a minority of cases.

We also characterized the consequences of four different variants on NLRP7 transcription. We demonstrated the absence of NLRP7 transcripts in EBV-transformed cells from patient 1341 with a homozygous large deletion that removes 5041-bp including the transcription start site. This patient is therefore the first with a demonstrated complete absence of NLRP7 transcripts. The fact that she is in good health besides her reproductive problem strengthens current knowledge about the main requirement of NLRP7 for female reproduction. In two other patients, 1291 and 1200, we demonstrated that their different splice variants, c.2130-6_2132del and c.2130-2A>G, lead to the same splice consequence and skipping of exon 6 from the mature RNA. We note that the skipping of exon 6 does not disrupt the open reading frame but is expected to delete 56 amino acids located between the second and the third leucine-rich-repeats and part of the third leucine-rich-repeats. In two other patients, 1074 (reported previously8) and 1243, we found that the mutation of the acceptor splice site of exon 9, c.2810+2T>G, activates a new splice site that does not have a canonical GT invariant donor and leads mainly to an in-frame insertion of 162-bp between exons 9 and 10. Other minor aberrant splice isoforms were also observed only in patient 1243 and could be due to the intrinsic degenerate nature of the activated non-canonical splice site. Our data on the consequences of this splice variant in patient 1074 are in agreement with those observed by Kou et al.10 on the same splice variant.

To date and including this study, 59 distinct pathogenic variants in recessive state have been reported in NLRP7, of which four previously reported large deletions10, 14, 28 and five novel ones. Analysis of the sequences flanking the nine deletions showed that seven have their two breakpoints within Alu elements, one has only one breakpoint within an Alu element,28 and the remaining does not have any breakpoint in Alu elements, but in exonic sequences (in exon 4).14 Careful analysis of the deletions revealed that seven of them have microhomology of 16 to 44 nucleotides adjacent to their 5’ and 3’ breakpoints. In the eight large deletions, the involved microhomology repeats, as well as the Alu elements to which they belong, have the same orientations. Among 15 breakpoints located within Alu elements, 11 occurred in the left arms of the Alu elements, two occurred in the right arms and two in the middle of the Alu elements. In summary, of the 15 breakpoints within Alu elements, one occurred in intron 10, two several kilobases upstream of exon 1, three in intron 1, three in intron 6 and six in intron 5, of which three in the same AluY element. These six different deletions in unrelated subjects define a hotspot of Alu instability and deletions in intron 5 (Figure 3).

Because all the intronic breakpoints of the large NLRP7 deletions occurred within Alu repeats, we screened NLRP7 genomic sequence for Alu repeat elements. This analysis revealed that ~48% of the total NLRP7 genomic structure (25,997-bp) is occupied by Alu sequences, which represent one Alu insertion every 450-bp, and is much higher than the known Alu density in the human genome (one insertion per 3,000-bp).31, 32, 33 Alu elements are well-known to be primates specific34 and so is NLRP7, which is believed to have duplicated from a NLRP2/7 ancestor in primates.24 To better understand NLRP7 evolution, we looked at the Alu content of human NLRP2 and found it, similar to NLRP7, highly rich in Alu elements. Indeed, NLRP2 and NLRP7 along with NLRP12 are the richest NLRPs in Alu elements (Figure 4). In addition, the human NLRP2 and NLRP7 display the same percentages of various Alu subfamilies similar to their orthologues in chimpanzees, gorillas, orangutans, rhesus, baboon and marmoset (data not shown) suggesting that Alu insertion in primates NLRP2 and NLRP7 occurred in their common NLRP2/7 ancestor before its duplication into two genes. Because of the known role of Alu elements in segmental duplications on many chromosomes,25, 28 it is possible that Alu insertion and expansion in the primate NLRP2/7 may have mediated, or helped in, its duplication in two distinct genes.

Figure 4
figure 4

Alu distribution in human NLRP genes. For all genes, the genomic structures were downloaded from the UCSC Genome browser (hg19) and each included 1-kb upstream the first exon and 1-kb downstream of the last exon. The presence of Alu repeats was investigated using CENSOR (http://www.girinst.org/censor/). The percentage of Alu sequences in each gene represents the total length of its Alu sequences over the length of its genomic structure. We note that Alu elements were mostly found in introns, which is in agreement with known data about most Alu sequences.

Alu elements are the most abundant and successful short interspersed elements found in the human genome. Alu elements are estimated to contribute to 0.075% of human mutations35 and form a major part of extensive genomic structural variation.33, 36, 37 There are multiple factors that predispose Alu elements to recombination and include, the number of Alu elements in a chromosome or a region, the close proximity of the Alu elements to each other, the high GC content of their sequences, and the high sequence similarity between the different Alu subfamilies (70-100%).33 Including the current study, the total number of distinct NLRP7 mutations is 59 of which eight (13.5%) are mediated by Alu recombination. Therefore, Alu mediated mutations seem to be 180-times more frequent in NLRP7 than the average of all known human mutations. This suggests that the genomic architecture of NLRP7 gene may be prone to the occurrence of Alu-mediated deletions and rearrangements that may not be detected by Sanger sequencing or next generation exome sequencing. Our data call for more attention in reporting DNA testing in patients with apparently homozygous NLRP7 mutations, mainly when the parents are not available for testing. In such patients, compound heterozygous mutations for one detectable mutation by Sanger sequencing and a large deletion may appear as homozygous for the detectable mutation and consequently lead to an inaccurate molecular diagnosis and reproductive genetic counseling.