Assessment of the gene mosaicism burden in blood and its implications for immune disorders

Solís-Moruno, Manuel; Mensa-Vilaró, Anna; Batlle-Masó, Laura; Lobón, Irene; Bonet, Núria; Marquès-Bonet, Tomàs; Aróstegui, Juan I.; Casals, Ferran

doi:10.1038/s41598-021-92381-y

Download PDF

Article
Open access
Published: 21 June 2021

Assessment of the gene mosaicism burden in blood and its implications for immune disorders

Manuel Solís-Moruno^1,2,
Anna Mensa-Vilaró^3,4,
Laura Batlle-Masó^1,2,
Irene Lobón¹,
Núria Bonet²,
Tomàs Marquès-Bonet^1,5,6,7,
Juan I. Aróstegui^3,4,8 &
…
Ferran Casals^2,9

Scientific Reports volume 11, Article number: 12940 (2021) Cite this article

1543 Accesses
5 Citations
3 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 17 August 2021

This article has been updated

Abstract

There are increasing evidences showing the contribution of somatic genetic variants to non-cancer diseases. However, their detection using massive parallel sequencing methods still has important limitations. In addition, the relative importance and dynamics of somatic variation in healthy tissues are not fully understood. We performed high-depth whole-exome sequencing in 16 samples from patients with a previously determined pathogenic somatic variant for a primary immunodeficiency and tested different variant callers detection ability. Subsequently, we explored the load of somatic variants in the whole blood of these individuals and validated it by amplicon-based deep sequencing. Variant callers allowing low frequency read thresholds were able to detect most of the variants, even at very low frequencies in the tissue. The genetic load of somatic coding variants detectable in whole blood is low, ranging from 1 to 2 variants in our dataset, except for one case with 17 variants compatible with clonal haematopoiesis under genetic drift. Because of the ability we demonstrated to detect this type of genetic variation, and its relevant role in disorders such as primary immunodeficiencies, we suggest considering this model of gene mosaicism in future genetic studies and considering revisiting previous massive parallel sequencing data in patients with negative results.

The impact of rare and low-frequency genetic variants in common variable immunodeficiency (CVID)

Article Open access 15 April 2021

Blood transcriptomics analysis offers insights into variant-specific immune response to SARS-CoV-2

Article Open access 02 February 2024

Novel causative variants of VEXAS in UBA1 detected through whole genome transcriptome sequencing in a large cohort of hematological malignancies

Article Open access 23 February 2023

Introduction

The distribution and effect of somatic genetic variants in disease has been studied mostly in cancer. However, in the past years, they have also been identified in a wide spectrum of syndromes including neurological disorders as schizophrenia¹, autism spectrum disorder², Alzheimer^3,4,5,6 or Huntington disease⁷, coronary heart disease and stroke⁸ and kidney diseases such as the Alport syndrome^9,10,11. In fact, at least theoretically, all monogenic diseases could be originated by a postzygotic mutation and the resulting somatic mosaicism. In the field of immune-related diseases, a remarkable number of somatic variants have been described in monogenic autoinflammatory diseases^{12,13,14,15,16,17,18,19,20}, and a recent work has shown its important contribution to these disorders and other primary immunodeficiencies (PIDs)²¹.

Understanding the relative abundance of somatic variants in health is critical to design efficient tools for mosaicism detection in disease studies. Different studies have measured the presence of somatic variation in normal tissues, most assessing the presence of mutations in cancer-driver genes, such as NOTCH1 mutations, which undergo expansion through positive selection^22,23,24. They reported the colonization of the tissue by mutant clones increasing with age and exposure to mutagenic agents (sun radiation, tobacco). Other studies, based on single cell²⁵ or transcriptome analysis²⁶ revealed tissue-specific patterns of somatic variant distribution, as well as negative selection of functional variants in non-cancer samples.

The detection of somatic variants from massive parallel sequencing (MPS) data presents some difficulties. Standard variant calling methods are based on the presence of germline heterozygous mutations in about 50% of the sequencing reads, and may fail to detect somatic variants in allelic imbalance and lower frequencies. Most of the algorithms developed for somatic variant analyses have been optimized for cancer studies where a tumour sample is compared with the healthy tissue from the same individual^27,28,29,30. Of note, studies comparing the output of different variant callers have revealed low levels of overlap^29,30. The tumour vs. healthy tissue approach is not suitable for somatic variant detection in mosaicisms, where the same postzygotic variant might be present in several tissues at similar frequencies. Alternatively, other variant calling tools can be applied to non-matched samples^31,32. In this case, allelic imbalance thresholds will need to be relaxed to detect low frequency variants, at the cost of substantially increasing the number of candidate variants. Then, an adequate filtering strategy will be essential to differentiate sequencing artefacts from true genetic variants. These filters are based both on technical criteria to exclude sequencing or mapping errors and biological knowledge to restrict the analysis to a set of candidate regions. A validation step, such as amplicon-based deep sequencing (ADS), will be ultimately required to confirm the presence of a somatic variant and better determine its frequency.

In the present study we aim to assess the load of somatic coding variants in peripheral blood at detectable frequencies from MPS data, which is relevant to detect somatic causal variants in monogenic Mendelian diseases, in particular PIDs. These diseases represent a privileged scenario for the study of the somatic pathogenic variation because of the needed presence of the causal variant in blood, as well as probably in other easily accessible tissues, and the reported important contribution of somatic mutation in these disorders²¹. For this, we initially performed whole-exome sequencing (WES) in a total of 16 samples belonging to 12 individuals. All individuals carry a pathogenic and previously described somatic mutation related to a PID while one patient carries a germline variant. We then selected the best candidate somatic variants, based on read quality and mapping information, to be validated with ADS. With this analysis we have tested the ability to detect causal somatic variation in PID as well as estimated the actual number of functional coding variants in blood at detectable frequencies from WES data.

Material and methods

Ethical approval

Written informed consents for genetic analyses and participation in the study were obtained from each enrolled individual. The Ethics Committees of Hospital Clínic and Universitat Pompeu Fabra (reference number 7HCB/2019/0631), both located in Barcelona, approved the study, which was carried out in accordance with the principles and last amendments of the Declaration of Helsinki.

Samples

The present study included both unique and matched samples from peripheral blood (PB), oral mucosa (OM) and urine (UR) for 12 individuals: (i) 11 unrelated PID patients carrying a pathogenic and previously described somatic variant, and ii) one of the descendants with the same pathogenic variant in germline status (Table 1). In eight individuals, the only analysed sample was PB (S2, S4a, S6, S8, S9, S10 and S11) or OM (S4). In four individuals, we analysed samples from paired tissues: from PB and OM in three patients (S1a-S1b, S3a-S3b and S5a-S5b) and, in the remaining patient, from PB and UR (S7a-S7b).

Table 1 Samples and mutations included in the study.

Full size table

All of the PID mutations are missense single nucleotide variants (SNVs), and are the disease causing mutation either in the proband or in its offspring, where they are germline variants. The range of variant allele frequencies (VAFs) for the somatic variants previously estimated by ADS²¹ ranges from 2.3 to 34.8%.

For patient S5 we included additional samples from urine, oral mucosa, whole blood (before and after anti-IL-1 treatment), and different cell type populations previously isolated by flow cytometry²⁰: neutrophils, monocytes, B cells, T CD4 + cells and T CD8 + cells (all pre-treatment).

Sequencing and genomic analysis

After DNA extraction, library preparation and exome capture were performed with the Nextera Rapid Capture kit (Illumina) according to the manufacturer’s instructions. The libraries were sequenced in a NextSeq Illumina platform in three High Output 2 × 150 paired-end cycles runs to a mean coverage of 245X. We used BWA-mem version 0.7.16a-r1181³³ (https://github.com/lh3/bwa) to map the samples to the human reference genome hg38 (UCSC). We marked duplicated reads using Picard version 2.18.6 (https://github.com/broadinstitute/picard) MarkDuplicates and realigned indels using GATK’s version 3.7³⁵ (https://github.com/broadgsa/gatk) IndelRealigner. We also performed base quality score recalibration using GATK’s BaseRecalibrator.

We used eight publicly available tools to call genetic variants: FreeBayes version 0.9.14-8-g1618f7e³⁴ (https://github.com/freebayes/freebayes), HaplotypeCaller version 3.7³⁵ (https://github.com/broadgsa/gatk), LoFreq version 2.1.2³⁶ (https://github.com/CSB5/lofreq), MuTect2 version 3.7³⁵ (https://github.com/broadgsa/gatk), SomVarIUS version 1.1³⁷ (https://github.com/kylessmith/SomVarIUS), Strelka2 version 2.7.1³⁸ (https://github.com/Illumina/strelka), VarDict version 1.0³⁹ (https://github.com/AstraZeneca-NGS/VarDict) and VarScan2 version 2.4.3⁴⁰ (https://sourceforge.net/projects/varscan/files/). FreeBayes and HaplotypeCaller are purely germline callers. SomVarIUS is a caller designed to detect somatic variants in unpaired samples. The rest of them support a single mode and a paired mode. Although in our study we were not analysing cancer samples, we tested the behaviour of variant callers’ paired mode in this context with the matched PB-OM and PB-UR samples. We used default parameters for all the callers except for VarScan2, where we lowered the allele frequency threshold of 20% and set the p-value to 1 to retrieve all the possible calls. For HaplotypeCaller, we first used the default ploidy parameter of 2 and next we considered other ploidy values: 4, 5, 6 and 10.

For variant calling, the manufacturer’s targeted regions were intersected with our VCF files to retrieve the on target genetic variants, and we restricted our analysis to these regions. We annotated the variants using SnpEff version 4.3t⁴¹ (https://sourceforge.net/projects/snpeff/files/) and SnpSift version 4.3t⁴² (https://sourceforge.net/projects/snpeff/files/). Using the database dbNSFP version 4.0b1a⁴³, we added parameters of interest such as CADD score⁴⁴, GERP score, ExAC⁴⁵ and gnomAD allele frequencies. We also added two functional predictions, gene haploinsufficiency values⁴⁶ and Residual Variation Intolerance Score (RVIS)⁴⁷.

We performed ADS with rhAmpSeq from Integrated DNA Technologies (IDT, Coralville, USA) to validate the candidate somatic variants. We sequenced every selected position to a mean coverage > 20,000X in a NextSeq Illumina platform in a High Output 2 × 150 paired-end cycles run. The confirmed in blood plus 19 additional candidate somatic variants in S5 were analysed for validation in different tissues and cell population samples. They were sequenced in a MiSeq v3 run (2 × 300) to a final depth > 155,000X. We used BWA-mem version 0.7.16a-r1181 to map the fastq files to the human reference genome hg38 (UCSC). We then used pysam version 0.15.2 (https://github.com/pysam-developers/pysam) to count the number of reads supporting every allele, requiring a minimum mapping quality of 20 to calculate VAFs.

Results

Detection of somatic pathogenic variants from WES in PID patients

We performed WES in all DNA samples to a mean coverage of 245X (Table 1). The total number of genetic variants differs among the different callers (Supplementary Fig. S1), mostly because of VarDict and VarScan2, the two callers with relaxed allelic imbalance parameters, which called more than 200,000 variants each. These two callers also show high heterogeneity across samples, which correlates with sequencing depth, as expected in MPS experiments. The amount of overlapping variants across the different callers is uneven, especially for SomVarIUS, due to the low number of variants it calls. The number of concordant variants between VarDict and VarScan2 is also low, probably because VarDict calls 3–4 times the number of indels of Varscan2 and because of discrepancies calling low frequency variants (Supplementary Fig. S2).

Figure 1 shows which known causal somatic variants (Table 1) are detected by each software. FreeBayes and HaplotypeCaller have the lowest detection ratios. For the rest, the ability of detection is similar and seems to depend on the frequency of the mutations, along with the coverage of the sample and the mapping quality. The S1a causal variant has not been called by any software, but visual inspection of the mapped reads revealed that none of them supported the alternative allele (Supplementary Fig. S3). Excluding it, VarDict and VarScan2 were able to detect all the causal variants. To increase the power of detection of HaplotypeCaller, we explored the effect of modifying the ploidy parameter. We used ploidy 2 (default), 4, 5, 6 and 10 in order to call variants with lower frequencies than expected in a germline scenario. This parameter is normally tuned when working with organisms with ploidies different than 2. For instance, decaploid plants have been reported^48,49, and genotypes 0/0/0/0/0/0/0/0/0/1 are possible. This way, the increase of the ploidy parameter makes HaplotypeCaller more sensible to low frequency variants. The percentage of detected variants increased sequentially with the ploidy parameter, although some remained undetected. HaplotypeCaller seems to be sensitive to mapping quality as in the case of the ELANE region (Supplementary Fig. S4), where a variant with moderate frequency is not detected by this caller. Interestingly, we lost one variant using ploidy 10 while it was previously detected with ploidies 5 and 6 due to memory reasons (Fig. 1, expanded in Supplementary Fig. S5).

Next, we assessed the performance of the five variant callers including a paired mode in the four cases with available paired samples (S1, S3, S5 and S7), where the same variant is present in two tissues with different frequencies. As a general trend, there is no improvement of the detection rate when using the paired mode compared to the single mode, probably because of the small differences in allele frequency between tissues. The use of one or the other paired sample as cancer/healthy tissue does not seem to affect the capacity of detection. Again, VarDict and VarScan2 showed the best detection ratios (Supplementary Table S1).

Filtering strategies for the identification of true causal variants

Once genetic variants have been called, a set of different filters is commonly applied to reduce the number of false positives. This is a crucial issue in the study of monogenic syndromes, where the aim is moving from the approximately 20,000 genetic variants identified in a typical WES to one or a few candidate variants. Relaxing or disabling the VAF filters to increase the ability to detect causal somatic variants, as we did in this study, produces an important increase of the number of mutations per individual, making this process highly recommended.

We evaluated the ability to identify the known pathogenic variants after applying the standard filters to the variants called by VarDict and VarScan2, the most successful programs in calling them (Fig. 1). We started by intersecting the two VCF files for every individual, given that in all cases the true variants were retained by both of them. Next, we applied a set of additional filters sequentially (see below), checking in every step if the causal variant was retained or filtered out (Table 2). First, we filtered out SNPs located 6 bp around indels. Second, as suggested previously⁵⁰, we restricted our analysis to the 1000 Genomes Project strict mask filter. Third, we required the positions to be covered by, at least, 50 reads (DP > 50) and to show a minimum quality value of 25 (QUAL > 25). Fourth, we only kept loss of function and missense variants. Fifth, we applied a stringent population allele frequency threshold of 0.001 in gnomAD. With a high probability, a somatic variant will be absent in the population because of its de novo nature, although the possibility of having a recurrent mutation cannot be excluded. Sixth, following the recommendations in the literature, we kept variants with a likely damaging predicted effect (CADD > 15⁴⁴) and a high evolutionary conservation score, as an indicator of its functional importance (GERP > 2⁵¹). Seventh, we required at least three reads supporting the alternative allele (VD ≥ 3) in every call. Finally, we used the list of 333 genes of the International Union Of Immunological Societies (IUIS, updated in February 2018)⁵² as a set of candidate genes for PIDs. Excluding the causal somatic variant of sample S1a, which was not detected in the sequencing process, 13 out of the 14 somatic mutations were included in the final list of candidate variants. The remaining one (S6), was filtered out because of a GERP value lower than 2.

Table 2 Numer of called variants and after sequential variant filtering process for each sample.

Full size table

Mosaicism abundance detection in whole blood

As mentioned above, the consideration of genetic variants deviating from the approximate expectation of 50% read frequency increases substantially the number of called variants. In the previous analyses we assessed how many of the true causal variants in 11 PID samples were detected. Now we wonder what proportion of the called variants in these samples corresponds to real postzygotic mutations, and not to sequencing, mapping or calling errors. We restricted the analysis to coding variants, more prone to have a functional impact and to be related to monogenic disorders. For this, we applied the following filters to select the variants more plausible to be validated as true: we intersected the SNPs called by VarDict and VarScan2, removed SNPs located 6 bps around indels, applied 1000G strict mask, required a minimum depth of 50 and a minimum quality of 25, removed variants classified as common in dbSNP and those shared among samples in the study, removed SNPs located within homopolymers, and removed SNPs in positions where the mappability was not perfect. We also performed a binomial test to exclude potential heterozygous mutations, to estimate the possibility of the observed number of reads supporting the alternative allele given the total number of reads. We finally required a minimum number of reads supporting the alternative allele of 7, due to the large number of variants below this threshold in our dataset (Supplementary Fig. S6). After this filtering, we moved from the approximately 250,000 variants called per individual to around 40. (Fig. 2), representing a total of 461 candidate somatic variants (Supplementary Table S2) for the 11 blood samples. 327 (70%) of the variants were missense, while 92 (20%) were synonymous and 19 (4%) were stop-gain. The remaining 23 variants were annotated as structural interaction variants and splice variants. Remarkably 30 of the variants were located in zinc finger proteins, 20 of them located in chromosome 19, and none of them were validated.

The 461 candidate variants were analysed by ADS with the rhAmpSeq technology (see Methods). All candidate positions were resequenced in the individual in which they were called and in the rest of individuals, plus two healthy individuals as controls. The average coverage per position was 22,500X (max = 272,401, min = 0, sd = 21,296). The overall validation ratio was very low. For five individuals (S6, S7, S8, S9 and S10), only the initial pathogenic variant was validated, with none of the other additional candidate variants confirmed. In other six individuals, including the individual with no somatic variants (S4a), we validated one additional variant: one missense variant in ODF2 (S1), SHISHA2 (S2), STRIP1 (S3) and IL2RG (S11), and one synonymous variant in CACNAS1 (S4) and ROBO4 (S11). Of note, in patient S5 we validated a total of eleven variants: seven missense, being one of them the causal variant in NLRP3, and four synonymous. The twelve variants seemed to cluster in two frequency groups: one with variants of about 25% (including the pathogenic variant) and other with variants about 4.5% (Supplementary Table S2).

Cell type distribution of somatic variants in S5 patient

Given the high number of validated somatic variants in patient S5, we expanded the analysis selecting nine additional candidate genetic variants. These variants were analysed for validation, along with the twelve previously confirmed, both in the whole blood sample and different cell populations separated by flow cytometry²⁰ (Table 3). We also added a whole blood DNA extraction obtained after the anti-IL-1 treatment this patient received. In this experiment, the average coverage per position was 158,000X (max = 484,219, min = 16,689, sd = 80,940). We considered that a somatic variant was validated in a given cell type or tissue when the proportion of reads supporting the alternative allele was above 0.30%, a value close to the average error type of sequencing by synthesis technologies, which also varies with features such as sequence context or the specific nucleotide change^53,54. Six of the nine new genetic variants were validated, with one (chr7:157,614,060) being a germline variant according to its frequency (Table 3).

Table 3 VAF of the 20 somatic candidate variants studied in S5 patient.

Full size table

Overall, we detected 17 somatic variants in this patient, 16 protein coding and one intronic (Table 3), now clustered in three groups with similar VAFs around 24%, 4.5% and 1.5% in whole-blood pre-treatment (Fig. 3) and cell type distribution. VAFs changes across different cell types and tissues are coordinated within each group, being the two main groups only present in the myeloid line as well as in urine and cell mucosa, but absent in the lymphoid line. In general, we found higher allele frequencies in monocytes and lower in oral mucosa. The presence of the somatic variants in oral mucosa and urine was produced by leukocyte infiltration, which was detected by flow cytometry²⁰. On the other hand, the lowest VAF group of variants are detected in myeloid cells and B cells, but not in T cells. The VAF of all the somatic variants is reduced in the whole-blood sample after the anti-IL-1 treatment (Whole blood 3 post, in Table 3). This decrease is more important for the variants restricted to the myeloid line, and it is likely observed because of the increased proliferation of inflammatory cells, which is now controlled with the treatment²⁰.

Discussion

We performed WES of DNA samples from patients with PIDs, carrying variable degree of gene mosaicism and assessed the ability to detect the somatic causal genetic variants by using different tools. Among the eight variant callers tested, VarDict and VarScan showed the higher detection rates of the causal somatic variants. The rest of the callers designed for somatic variant detection (MuTect2, SomVarIUS and Strelka2) mainly showed some limitations with the lower frequency variants at lower coverage. FreeBayes and HaplotypeCaller, designed for germline variant detection, failed to detect most of the somatic mutations. However, the performance of HaplotypeCaller increases when modifying the ploidy parameter, devised for non-diploid organisms and which allowed retrieving variants with less frequency than the expected 50% in the germline. Of interest, the efficiency of the five callers including a paired mode did not increase when using paired samples, probably because of the small frequency difference between the two samples carrying the same mutation.

Allele frequency is the main limitation for calling a somatic variant, with the risk of non-capturing the mutation because of its low frequency and/or insufficient coverage. To capture these low frequency variants, sequencing depths should ideally be higher than the commonly average depths achieved in WES studies (60-100X). However, the average coverage value might not be informative enough on the sequencing performance for all genomic regions, given the non-uniformity of the capture process. The use of new metrics including this information has been proposed⁵⁵, which should help to reduce false-negative results. As an example, the NOD2 region is clearly captured more efficiently than the NLRP3 region in our study (Table 1). On the other side, only a few reads supporting the alternative allele seems enough to detect the variant, with as few as 3 (out of 128) for the S10 variant or 7 (out of 97) for the S1b variant (Table 1). Thus, an increase of the sequencing depth to 100-200X is recommendable in cases in which somatic variation is suspected. Higher coverage facilitates the detection of very low frequency variants, but increases the risk of enlarging the list of candidate variants because of approaching the error rate of MPS technologies⁵⁶.

Genetic studies usually implement a set of filters to reduce the number of candidate variants to the causal one or to a small group. This process is a trade-off between reducing the number of false positives (either sequencing or mapping artefacts, and non-causal variants) and false negatives (called but filtered true causal variants). At the risk of missing the causal variant, these filters are essential to determine, at least, a reduced list of candidate genes for monogenic syndromes. In the case of studies like this, where the relaxation of allele frequency thresholds generates a list of up to hundreds of thousands of variants per sample (Supplementary Fig. S1), this step can be especially critical. After applying commonly used filtering parameters both for sequencing and biological features, only the causal variant in one patient was discarded because of low conservation score (GERP for S6 causal variant: -8.07). In the case of applying more stringent filters, two more variants (S1b and S5a-b) would be missed due to GERP score vale lower than 4^57,58. On the other side, only S6 causal variant would not pass a CADD threshold of 20.

The final number of candidate genetic variants exceeds by about ten times the number of variants in studies analysing germline variants. Considering the IUIS list of 333 candidate genes for PIDs, this is still quite high, with approximately 0.5 variants per gene in each individual. Therefore, it seems recommendable to restrict the analysis to a reduced set of candidate genes according to the clinical phenotype of each patient. Alternatively, the use of some gene features could also help to reduce the list of candidate variants if there is not any a priori clear candidate. Several gene indexes have been developed to measure their possible contribution to human disease. Among them, haploinsufficiency predictions could seem useful for identifying candidate genes in a somatic variant disease model expecting to follow a dominant inheritance pattern. However, all the genes with somatic causal variants included in this study show haploinsufficiency values below the consensus threshold of 0.5, with NLRP3, a gene that is proven to be mutated in different autoinflammatory diseases⁵⁹, showing the highest value of 0.465. In contrast, NLRP3 has been reported as a gene with a high level of intolerance to functional variation (RVIS = − 0.95, in the top 9.38% of genes)²¹.

It is important to consider that exome sequencing was performed in DNA samples obtained from peripheral blood. Therefore, only somatic variants present in the major cell populations in blood can be detected. Neutrophils represent more than half of the nucleated blood cells (55–75%) in healthy individuals, while lymphocytes represent around 20% (from which T cells are ~ 70%, B cells are just ~ 20%, and NK cells ~ 10%)⁶⁰. Thus, for early postzygotic mutations, the capacity of detection will most probably not be affected by the cell type implicated in the disorder, since the variant will have similar frequencies in all cell populations. In contrast, for later onset mutations restricted to particular lineages, the mutation will only be detectable if present in the major cell populations of the analysed tissue. Therefore, for immune disorders, the probability of detecting a causal variant from whole-blood extraction analysis will be much higher in those produced by alteration in the myeloid cells, such as in autoinflammatory disorders, than in the lymphoid cells. This fact can partially explain the larger number of reported cases in autoinflammatory disorders²¹ compared to other PIDs, as well as the lack of success in the identification of somatic variants in lymphoid immunodeficiencies such as CVID⁶¹. In these latter situations, it is expected that a big proportion of somatic causal variants would only be detectable if the analysis is restricted to particular cell types. Thus, cell subsets isolation can be essential to the identification and/or the validation of somatic genetic variants in these less represented cell types.

Beyond the detection of the known causal variants, the detected load of coding variants per exome was very low. Except for S5, all the individuals carry none or only one somatic variant additionally to the causal variant. The vast majority of candidate variants were false positives, even if they passed the mapping and quality filters. Comparing our results to other studies is not straightforward because of the differences in the methodologies used and the scanned VAFs, as well as the conceptual approach and targeted regions (see “Introduction”). A whole-genome sequencing (WGS) data analysis of 11,262 blood samples revealed a median number of three mosaic mutations for younger individuals, increasing after 35–45 years of age, and considering 20 somatic variants as the threshold for clonal expansion, that affecting 12.5% of the individuals⁶². Although the minimum detectable VAF of the study was limited because of the 34.8X mean coverage, the results seem concordant with the low number of somatic variants described in our WES deep sequencing approach. In addition to scanning a wide range of VAFs, we validated our results by ADS, which confirmed the low number of somatic coding variants detectable in blood. At a finer level, the total number of somatic variants per cell has been estimated in single-cell studies^25,26, although most of this variation would remain undetected when the whole tissue is analysed. In fact, when much lower frequencies have been scanned (VAF ≥ 0.0001), it has been shown that clonal haematopoiesis is present in up to 97% of middle-aged people⁶³. However, in absence of positive selection on a given mutation, only those that occurred earlier would reach detectable frequencies.

We identified a particular patient with an excess of validated variants compared to the others. S5 is the oldest individual of our dataset (64 years old), although another individual of similar age was also included in this study. Especially for the higher VAF group of five variants (which includes the causal one in NLRP3), the frequency pattern is quite uniform, except for one of the variants in chromosome X (chrX:71,537,899), with lower frequency in monocytes. The presence of the genetic variants in the lowest frequency cluster in cells of the myeloid lineage and in B cells, but not in T cells, could be explained by its origin in adult hematopoietic stem cells generating multineage outputs⁶⁴. Because of the seemingly aggrupation in three different clusters of frequencies and cell type distribution, we propose simultaneous occurrence and clonal expansion as the most parsimonious explanation. However, none of the genes with somatic variants in S5 (Table 3) seems to be related with cellular proliferation that could be linked to an adaptive advantage of a clone of cells, and we also discarded the presence of additional candidate variant in DNMT3A, TET2 and ASXL1 genes, known to be implicated in hematologic malignancies^8,65. In fact, in the aforementioned study of WGS of 11,262 individuals⁶² only 12.6% of the cases of clonal haematopoiesis had detectable cancer driver mutations. Thus, on the rest of cases as well as for S5, clonal haematopoiesis could be produced by genetic drift, as suggested in simulation analysis⁶⁶. In contrast, a recent study⁶⁷ proposes positive selection being the major driving force of clonal haematopoiesis, and that it would take more than 2000 years for a mutation to reach a VAF > 1% by only drift. However, our results do not seem to fit to this explanation, because of the abovementioned gene location as well as the presence of synonymous and intronic variants.

Finally, although we believe that our study contributes to the understanding of the burden of functional somatic mutations in blood and provides some practical advice on its detection, we would like to acknowledge some limitations of our approach. Allele frequency and sequencing depth are the two main limiting factors to detect a somatic variant as shown in our case by the failure to detect a variant with VAF < 3%. Also, the number of genetic variants depends on the selected software, that show a limited level of overlapping among them. In this sense, we recommend an inclusive strategy by using the less stringent callers or parameters, followed by a filtering strategy based on sequencing and mapping features. However, even by using stringent filters, the capacity of detection of causal variants will be mostly limited to previously known candidate or related genes, given the excessive number of variants when considering the whole exome. Gene functional relevance or mutation tolerance indexes could be used to reduce the number of candidate genes, but they also show limited applicability. Of importance, we also acknowledge the limitations derived from the small size of our cohort which, while allowing the study of somatic variant discovery, makes it difficult to draw conclusions in terms of dynamics of somatic variation.

Conclusions

The detectable genetic load of somatic coding variants in blood is low. A moderate increase of the commonly achieved depths in exome sequencing analyses can be enough to detect most of these variants at frequencies above the technology error rate, for which we recommend using variant callers sensitive to low VAF. Of importance, the high proportion of false positives makes mandatory their validation which will also provide a better estimation of the VAF. Given both the feasibility of this approach and the reported contribution of gene mosaicism to PIDs²¹, we think that this model should be considered in future sequencing studies. It can be of special interest for those disorders related to major cell populations in blood, such as autoinflammatory diseases. We also suggest reanalysing data of undiagnosed patients, especially those where the inheritance pattern in the pedigree and/or the clinical features of the patient might fit this model. Because of the high number of possible somatic variants called per individual, even after applying stringent filters, it is advisable to restrict the analysis to a set of candidate genes defined according to the clinical phenotype. Finally, our results are in agreement with the existence of clonal haematopoiesis produced by drift, and that can be related to non-cancer disorders.

Data availability

The datasets generated during and analysed during the current study are available in the European Nucleotide Archive (ENA) repository under accession code PRJEB44742 (https://www.ebi.ac.uk/ena/browser/view/PRJEB44742).

Change history

17 August 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41598-021-96172-3

References

Bundo, M. et al. Increased L1 retrotransposition in the neuronal genome in schizophrenia. Neuron 81, 306–313 (2014).
CAS PubMed Google Scholar
D’Gama, A. M. et al. Targeted DNA sequencing from autism spectrum disorder brains implicates multiple genetic mechanisms. Neuron 88, 910–917 (2015).
MathSciNet PubMed PubMed Central Google Scholar
Bushman, D. M. et al. Genomic mosaicism with increased amyloid precursor protein (APP) gene copy number in single neurons from sporadic Alzheimer’s disease brains. Elife 2015, 1–26 (2015).
Google Scholar
Parcerisas, A. et al. Somatic signature of brain-specific single nucleotide variations in sporadic alzheimer’s disease. J. Alzheimer’s Dis. 42, 1357–1382 (2014).
CAS Google Scholar
Beck, J. A. et al. Somatic and germline mosaicism in sporadic early-onset Alzheimer’s disease. Hum. Mol. Genet. 13, 1219–1224 (2004).
CAS PubMed Google Scholar
Sala Frigerio, C. et al. On the identification of low allele frequency mosaic mutations in the brains of Alzheimer’s disease patients. Alzheimer’s Dement. 11, 1265–1276 (2015).
Google Scholar
Swami, M. et al. Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset. Hum. Mol. Genet. 18, 3039–3047 (2009).
CAS PubMed PubMed Central Google Scholar
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
PubMed PubMed Central Google Scholar
Krol, R. P. et al. Somatic mosaicism for a mutation of the COL4A5 gene is a cause of mild phenotype male Alport syndrome. Nephrol. Dial. Transplant. 23, 2525–2530 (2008).
CAS PubMed Google Scholar
Bruttini, M. et al. Mosaicism in alport syndrome and genetic counseling. J. Med. Genet. 37, 717–719 (2000).
CAS PubMed PubMed Central Google Scholar
Plant, K. E., Boye, E., Green, P. M., Vetrie, D. & Flinter, F. A. Somatic mosaicism associated with a mild Alport syndrome phenotype. J. Med. Genet. 37, 238–239 (2000).
CAS PubMed PubMed Central Google Scholar
Kawasaki, Y. et al. Identification of a high-frequency somatic NLRC4 mutation as a cause of autoinflammation by pluripotent cell-based phenotype dissection. Arthritis Rheumatol. 69, 447–459 (2017).
CAS PubMed Google Scholar
Bessler, M. et al. Paroxysmal nocturnal haemoglobinuria (PNH) is caused by somatic mutations in the PIG-A gene. EMBO J. 13, 110–117 (1994).
CAS PubMed PubMed Central Google Scholar
Saito, M. et al. Disease-associated CIAS1 mutations induce monocyte death, revealing low-level mosaicism in mutation-negative cryopyrin-associated periodic syndrome patients. Blood 111, 2132–2141 (2008).
CAS PubMed Google Scholar
Takeda, J. et al. Deficiency of the GPI anchor caused by a somatic mutation of the PIG-A gene in paroxysmal nocturnal hemoglobinuria. Cell 73, 703–711 (1993).
CAS PubMed Google Scholar
Zhou, Q. et al. Cryopyrin-associated periodic syndrome caused by a myeloid-restricted somatic NLRP3 mutation. Arthritis Rheumatol. 67, 2482–2486 (2015).
CAS PubMed PubMed Central Google Scholar
Tanaka, N. et al. High incidence of NLRP3 somatic mosaicism in patients with chronic infantile neurologic, cutaneous, articular syndrome: Results of an international multicenter collaborative study. Arthritis Rheum. 63, 3625–3632 (2011).
CAS PubMed PubMed Central Google Scholar
Saito, M. et al. Somatic mosaicism of CIAS1 in a patient with chronic infantile neurologic, cutaneous, articular syndrome. Arthritis Rheum. 52, 3579–3585 (2005).
CAS PubMed Google Scholar
Mensa-Vilaro, A. et al. First Identification of intrafamilial recurrence of blau syndrome due to gonosomal NOD2 mosaicism. Arthritis Rheumatol. 68, 1039–1044 (2016).
CAS PubMed Google Scholar
Mensa-Vilaro, A. et al. Brief report: Late-onset cryopyrin-associated periodic syndrome due to myeloid-restricted somatic NLRP3 mosaicism. Arthritis Rheumatol. 68, 3035–3041 (2016).
CAS PubMed Google Scholar
Mensa-Vilaró, A. et al. Unexpected relevant role of gene mosaicism in primary immunodeficiency diseases. J. Allergy Clin. Immunol. https://doi.org/10.1016/j.jaci.2018.09.009 (2018).
Article PubMed Google Scholar
Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).
ADS CAS PubMed Google Scholar
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
ADS CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
ADS CAS PubMed PubMed Central Google Scholar
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
García-Nieto, P. E., Morrison, A. J. & Fraser, H. B. The somatic mutation landscape of the human body. Genome Biol. 20, 1–20 (2019).
Google Scholar
Hofmann, A. L. et al. Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers. BMC Bioinform. 18, 1–15 (2017).
Google Scholar
Xu, H., DiCarlo, J., Satya, R. V., Peng, Q. & Wang, Y. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genom. 15, 1–10 (2014).
CAS Google Scholar
Cai, L., Yuan, W., Zhang, Z., He, L. & Chou, K. C. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data. Sci. Rep. 6, 1–9 (2016).
CAS Google Scholar
Krøigård, A. B., Thomassen, M., Lænkholm, A. V., Kruse, T. A. & Larsen, M. J. Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLoS ONE 11, 1–15 (2016).
Google Scholar
Sandmann, S. et al. Evaluating variant calling tools for non-matched next-generation sequencing data. Sci. Rep. 7, 1–12 (2017).
Google Scholar
Teer, J. K. et al. Evaluating somatic tumor mutation detection without matched normal samples. Hum. Genom. 11, 1–13 (2017).
ADS Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. [q-bio.GN] (2013).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv Prepr. arXiv1207.3907 [q-bio.GN] (2012).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv https://doi.org/10.1101/201178 (2017).
Article Google Scholar
Wilm, A. et al. LoFreq: A sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).
CAS PubMed PubMed Central Google Scholar
De, S. SomVarIUS : Somatic variant identification from unpaired tissue samples Genome analysis samples. Bioinformatics https://doi.org/10.1093/bioinformatics/btv685 (2015).
Article PubMed Google Scholar
Kim, S. et al. Strelka2: Fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
CAS PubMed Google Scholar
Lai, Z. et al. VarDict: A novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, 1–11 (2016).
Google Scholar
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
CAS PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6, 80–92 (2012).
CAS Google Scholar
Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
PubMed PubMed Central Google Scholar
Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum. Mutat. 37, 235–241 (2016).
PubMed PubMed Central Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
CAS PubMed PubMed Central Google Scholar
Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, 1–11 (2010).
Google Scholar
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, 1003709 (2013).
Google Scholar
Ahmadi, H. & Bringhurst, R. S. Breeding Strawberries at the Decaploid Level. J. Am. Soc. Hortic. Sci. 117, 856–862 (2019).
Google Scholar
Hummer, K. E., Nathewet, P. & Yanagi, T. Decaploidy in Fragaria iturupensis (Rosaceae). Am. J. Bot. 96, 713–716 (2009).
PubMed Google Scholar
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 555, 550–555 (2018).
ADS Google Scholar
Myers, R. M. et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. https://doi.org/10.1101/gr.102210.109 (2010).
Article PubMed PubMed Central Google Scholar
Picard, C. et al. International Union of Immunological Societies: 2017 Primary immunodeficiency diseases committee report on inborn errors of immunity. J. Clin. Immunol. 38, 96–128 (2018).
PubMed Google Scholar
Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19, 269–285 (2018).
CAS PubMed PubMed Central Google Scholar
Pfeiffer, F. et al. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 1–14 (2018).
Google Scholar
Wang, Q., Shashikant, C. S., Jensen, M., Altman, N. S. & Girirajan, S. Novel metrics to measure coverage in whole exome sequencing datasets reveal local and global non-uniformity. Sci. Rep. 7, 1–11 (2017).
PubMed PubMed Central Google Scholar
Schirmer, M. et al. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res. 43, e37 (2015).
PubMed PubMed Central Google Scholar
Amendola, L. M. et al. Actionable exomic incidental findings in 6503 participants: Challenges of variant classification. Genome Res. 25, 305–315 (2015).
CAS PubMed PubMed Central Google Scholar
de Valles-Ibáñez, G. et al. Genetic load of loss-of-function polymorphic variants in Great Apes. Genome Biol. Evol. 8, 871–877 (2016).
PubMed PubMed Central Google Scholar
de Torre-Minguela, C., del Castillo, P. M. & Pelegrín, P. The NLRP3 and pyrin inflammasomes: Implications in the pathophysiology of autoinflammatory diseases. Front. Immunol. 8, 43 (2017).
PubMed PubMed Central Google Scholar
Berrington, J. E., Barge, D., Fenton, A. C., Cant, A. J. & Spickett, G. P. Lymphocyte subsets in term and significantly preterm UK infants in the first year of life analysed by single platform flow cytometry. Clin. Exp. Immunol. 140, 289–292 (2005).
CAS PubMed PubMed Central Google Scholar
de Valles-Ibáñez, G. et al. Evaluating the genetics of common variable immunodeficiency: Monogenetic model and beyond. Front. Immunol. 9, 1–15 (2018).
Google Scholar
Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).
CAS PubMed PubMed Central Google Scholar
Young, A. L., Tong, R. S., Birmann, B. M. & Druley, T. E. Clonal hematopoiesis and risk of acute myeloid leukemia. Haematologica 104, 2410 (2019).
CAS PubMed PubMed Central Google Scholar
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
ADS CAS PubMed PubMed Central Google Scholar
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
PubMed PubMed Central Google Scholar
Klein, A. M. & Simons, B. D. Universal patterns of stem cell fate in cycling adult tissues. Development 138, 3103–3111 (2011).
CAS PubMed Google Scholar
Watson, C. J., Papula, A. L., Poon, G. Y. P., Wong, W. H. & Young, A. L. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 1454, 1449–1454 (2020).
ADS Google Scholar

Download references

Acknowledgements

This study was funded by grants SAF2015-68472-C2-2-R from the Ministerio de Economía y Competitividad (Spain), RTI2018-096824-B-C22 grant from the Spanish Ministry of Science, Innovation and Universities co-financed by FEDER and by Direcció General de Recerca, Generalitat de Catalunya (2017SGR-702) to F.C. M.S.-M. is supported by the Ministerio de Economía y Competitividad, Spain (Maria de Maetzu grant MDM-2014-0370-16-3). L.B.-M. is supported by a Formació de personal Investigador fellowship from Generalitat de Catalunya (2018_FI_B00072). T.M-B. is supported by BFU2017-86471-P (MINECO/FEDER, UE), U01 MH106874 grant, Howard Hughes International Early Career, Obra Social "La Caixa" and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). Supported in part by CERCA Programme/Generalitat de Catalunya (JI.A.), SAF2015-68472-C2-1-R grant from the Ministerio de Economía y Competitividad (Spain) co-financed by European Regional Development Fund (ERDF) (JI.A.), RTI2018-096824-B-C21 grant from the Ministerio de Ciencia, Innovación y Universidades (Spain) co-financed by ERDF (JI.A.), AC15/00027 grant from the Instituto de Salud Carlos III / Transnational Research Projects on Rare Diseases (JI.A.).

Author information

Authors and Affiliations

Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Doctor Aiguader 88, Barcelona, Spain
Manuel Solís-Moruno, Laura Batlle-Masó, Irene Lobón & Tomàs Marquès-Bonet
Genomics Core Facility, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain
Manuel Solís-Moruno, Laura Batlle-Masó, Núria Bonet & Ferran Casals
Department of Immunology, Hospital Clínic, Barcelona, Spain
Anna Mensa-Vilaró & Juan I. Aróstegui
Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
Anna Mensa-Vilaró & Juan I. Aróstegui
Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
Tomàs Marquès-Bonet
CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
Tomàs Marquès-Bonet
Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, C/ Columnes S/N, Cerdanyola del Vallès, 08193, Barcelona, Spain
Tomàs Marquès-Bonet
Universitat de Barcelona, Barcelona, Spain
Juan I. Aróstegui
Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
Ferran Casals

Authors

Manuel Solís-Moruno
View author publications
You can also search for this author in PubMed Google Scholar
Anna Mensa-Vilaró
View author publications
You can also search for this author in PubMed Google Scholar
Laura Batlle-Masó
View author publications
You can also search for this author in PubMed Google Scholar
Irene Lobón
View author publications
You can also search for this author in PubMed Google Scholar
Núria Bonet
View author publications
You can also search for this author in PubMed Google Scholar
Tomàs Marquès-Bonet
View author publications
You can also search for this author in PubMed Google Scholar
Juan I. Aróstegui
View author publications
You can also search for this author in PubMed Google Scholar
Ferran Casals
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.C., JI.A, T.M.-B and M.S.-M. conceived and designed the study. M.S.-M., A.M.-V., L.B.-M. and I.L. analysed data. M.S.-M., A.M.-V. and N.B. performed laboratory work. All authors participated in the writing and correction of the manuscript.

Corresponding author

Correspondence to Ferran Casals.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this Article was revised: The original PDF version of this Article omitted an affiliation for Ferran Casals. The correct affiliations for Ferran Casals are ‘Genomics Core Facility, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, 08003, Barcelona, Spain’ and ‘Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain’.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Solís-Moruno, M., Mensa-Vilaró, A., Batlle-Masó, L. et al. Assessment of the gene mosaicism burden in blood and its implications for immune disorders. Sci Rep 11, 12940 (2021). https://doi.org/10.1038/s41598-021-92381-y

Download citation

Received: 14 May 2020
Accepted: 09 June 2021
Published: 21 June 2021
DOI: https://doi.org/10.1038/s41598-021-92381-y

This article is cited by

Somatic genetic variation in healthy tissue and non-cancer diseases
- Manuel Solís-Moruno
- Laura Batlle-Masó
- Ferran Casals
European Journal of Human Genetics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

The impact of rare and low-frequency genetic variants in common variable immunodeficiency (CVID)

Blood transcriptomics analysis offers insights into variant-specific immune response to SARS-CoV-2

Novel causative variants of VEXAS in UBA1 detected through whole genome transcriptome sequencing in a large cohort of hematological malignancies

Introduction

Material and methods

Ethical approval

Samples

Sequencing and genomic analysis

Results

Detection of somatic pathogenic variants from WES in PID patients

Filtering strategies for the identification of true causal variants

Mosaicism abundance detection in whole blood

Cell type distribution of somatic variants in S5 patient

Discussion

Conclusions

Data availability

Change history

17 August 2021

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Somatic genetic variation in healthy tissue and non-cancer diseases

Comments

Search

Quick links