Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction

Ahmed, Ibrahim; Tucci, Felicia A.; Aflalo, Aure; Smith, Kenneth G. C.; Bashford-Rogers, Rachael J. M.

doi:10.1038/s41598-020-67290-1

Download PDF

Article
Open access
Published: 29 June 2020

Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction

Ibrahim Ahmed¹^nAff2,
Felicia A. Tucci³,
Aure Aflalo^1,4,
Kenneth G. C. Smith^1,5 &
…
Rachael J. M. Bashford-Rogers^1,3

Scientific Reports volume 10, Article number: 10570 (2020) Cite this article

7353 Accesses
3 Citations
20 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 07 October 2020

This article has been updated

Abstract

The ability to accurately characterize DNA variant proportions using PCR amplification is key to many genetic studies, including studying tumor heterogeneity, 16S microbiome, viral and immune receptor sequencing. We develop a novel generalizable ultrasensitive amplicon barcoding approach that significantly reduces the inflation/deflation of DNA variant proportions due to PCR amplification biases and sequencing errors. This method was applied to immune receptor sequencing, where it significantly improves the quality and estimation of diversity of the resulting library.

Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands

Article 03 May 2021

Selective multiplexed enrichment for the detection and quantitation of low-fraction DNA variants via low-depth sequencing

Article 03 May 2021

Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study

Article 09 September 2021

Introduction

Amplicon sequencing is often the basis for characterizing DNA variant proportions, and is routinely used in many areas including tumor heterogeneity¹, 16S microbiome², viral³, CRISPR/Cas9 library screens⁴ and immune receptor sequencing⁵. However, the ability to accurately quantify the proportions of DNA variants is hampered by amplification biases that lead to inflation/deflation of some DNA amplicons, as well as the inability to correct sequencing errors (Fig. S1a). To overcome the amplification biases in DNA-based amplification and sequencing, we developed a novel generalizable ultrasensitive amplicon barcoding approach that significantly reduces the inflation/deflation of DNA variant proportions from PCR amplification biases and sequencing errors.

Amplification biases from RNA starting material have been largely addressed by the introduction of unique molecular identifiers (UMIs) in the reverse transcription primers (barcoded primers), thus subsequent PCR amplification of each cDNA molecule can be quantified and corrected through the capture of the UMI barcode. However, when starting from a DNA template, this approach cannot be used. Previous attempts at generating barcoded PCR amplicons from DNA using barcoded primers via standard exponential PCR amplification leads to the preferential amplification of PCR amplicons rather than template⁶ and thus resulting in significant amplification biases (Fig. S1b).

To overcome these issues, we established a sUMI-seq PCR amplification using barcoded primers that generate self-annealing amplicons (Fig. 1a, denoted sUMI-seq primers. Secondary structure-assisted UMI incorporation, amplification and sequencing). These sUMI-seq primers contain three key regions: (1) the target gene-specific region, (2) a UMI primer barcode (8 bp), and (3) a region based on multiple annealing and looping-based amplification cycles (MALBAC) methodology⁷, in which the PCR products are able to self-anneal forming MALBAC amplicon loops. These amplicon loops preferentially do not further amplify due to the thermodynamic and kinetic preference for loop closure compared to further primer annealing to the open and available original DNA template (Fig. S2). This will result in a close-to-linear amplification, rather than standard exponential amplification, of template DNA due to the unavailability of the MALBAC amplicon loops to further amplify. This first PCR (PCR1) is followed by a cleanup step to remove unbound primers and primer dimers. Then a second PCR (PCR2), with primers annealing to the common MALBAC region of PCR1 amplicons, generates linearized amplicons that are amenable for library preparation and high-throughput sequencing. A bioinformatics pipeline was developed to identify the primer barcodes, to correct for amplification frequency and to correct sequencing errors through alignment of sequences sharing the same barcode (code made available at https://github.com/rbr1/sUMI_processing_pipeline).

sUMI-seq is made possible by two key innovations: firstly, the PCR amplification step (PCR1) using the sUMI-seq primers allows for preferential amplification of the template DNA and minimal amplification of the MALBAC looped amplicons. Secondly, linearization of the self-annealed PCR amplicons in PCR 2 leads to increased sensitivity in the face of low template DNA input. The use of barcoded sample primers in PCR2 allows for sample pooling, and efficient library preparation and sequencing.

One area in which quantification of DNA variants is important is in B cell receptor sequencing. B cell receptors (BCRs) are membrane-bound immunoglobulins (Igs) which are secreted as antibodies by antibody secreting cells (plasma cells), which differentiate from naïve or memory B cells upon antigen activation. The huge diversity of the antibody repertoire is due to DNA recombination of variable gene segments (V, (D), J) at the Ig heavy (IgH) and light (IgL) chain loci during B cell ontogeny, and subsequent acquisition of somatic hypermutation (SHM) in activated B cells. Both IgH and IgL variable regions are further subdivided into four framework regions (FWR 1–4), determining the antibody folding, and into three complementarity determining regions (CDR 1–3), involved in antigen binding. BCRs represent unique markers for each B cell clone (Fig. 1b,c). The BCR repertoire analysis by BCR gene deep sequencing allows measurement of the diversity and complexity of B cell response, and to identify clonal related B cells (B cell clones) which correlate with different immunological conditions. BCR repertoire analysis from blood or tissues by high-throughput sequencing has been used to provide powerful insights into B cell biology and tracking B cell clones in the context of health^6,8, autoimmunity⁹, cancer^10,11, infection¹², vaccination¹³ and in other diseases. BCR sequencing from RNA has been established using UMIs of each RNA molecule to accurately quantify relative BCR RNA frequencies⁶. Despite the successes of RNA-based BCR repertoire sequencing, RNA can be a sub-optimal substrate for BCR sequencing. The level of Ig transcripts is upregulated through B cell maturation and activation upon antigen encounter, with plasma cells having the highest amount of Ig mRNA per cell. This may lead to inflation or deflation of the detected B cell clonotypes of a repertoire. This was shown clearly in the detection of B cell acute lymphocytic leukaemia¹⁰, where lower numbers of BCR RNA molecules per leukemic cell compared to non-leukemic B cells lead to a significant underrepresentation of patient tumor proportion. This major limitation can be overcome through sequencing from DNA, as B cells carry one functional BCR allele per cell (one B cell – one antibody). However, no reliable method for molecular barcoding during PCR amplification of DNA has yet been established, thus leading to potential amplification biases in the resulting sequencing data.

Here we develop and validate a novel generalizable ultrasensitive amplicon barcoding approach and apply it to BCR sequencing, where it significantly improves the quality and estimation of diversity of the resulting library.

Results and Discussion

To test the effectiveness of sUMI-seq PCR, a synthetic DNA fragment library was designed containing an internal DNA barcode (referred to as synthetic DNA-UMI) unique to each DNA molecule (Figs. 2a, S3). This synthetic DNA fragment UMI library design was based on a BCR sequence. This means that both the synthetic DNA-UMI and BCR repertoire from clinical samples can be amplified using the same primer sets. Specifically, forward primers anneal to the IgH V genes (FR3) and reverse primers anneal to IgH J genes (Table S1), allowing the amplification of the IgH variable region encompassing the CDR3 which is the major determinant of antibody-binding specificity (Fig. 5ci). Together, this synthetic DNA fragment UMI library design facilitates quantification of the relative amplification of each unique DNA template between methods. The sUMI-seq PCR was applied to the synthetic DNA fragment library using either 5, 10, or 20 PCR cycles in PCR1, to test the effect of different PCR cycles, followed by PCR2 (20 cycles). In addition, a standard non-barcoded PCR using standard non-barcoded primers (i.e containing the gene-specific annealing region only), using the same synthetic DNA-UMI as template, was amplified with an equivalent approach (see methods). Each reaction condition successfully generated PCR amplicons that were subsequently sequenced by MiSeq (sequencing information in Table S2).

Firstly, we quantified the frequency of further amplification of PCR1 self-annealing MALBAC loops products depending on the number of PCR cycles. This is achieved through the assessment of the frequencies of the synthetic DNA-UMI per primer barcode pair (i.e. 1000*(“number of identical synthetic DNA-UMIs”/“number of sUMI-seq primer barcode pairs”), (Fig. 2b). As expected, the frequency of duplicated synthetic DNA-UMIs per sUMI-seq primer barcode pair was low (mean rate per sample of 0.392–1.335 per 1000 reads). The rate of duplicated synthetic DNA-UMIs per sUMI-seq primer barcode pair increased with the number of PCR 1 cycles that appeared to asymptote at 1.291 (Fig. 2b, 75% confidence intervals 0.939–2.536, p-value = 0.0168). This demonstrated only a low level of further amplification of the MALBAC loop amplicon in PCR1, and this level can be tailored depending on the number of PCR cycles. Importantly, between 5–10 cycles and between 10–20 cycles, less that a 2-fold increase in MALBAC-loop-specific amplification was observed. This suggests that the PCR1 step MALBAC-loop-specific amplification is non-exponential, and the PCR 1 step predominantly amplifies the original DNA template.

Next, we quantified and compared the relative amplification biases between sUMI-seq and standard non-barcoded PCR through synthetic DNA-UMIs amplification. To determine the effectiveness of the sUMI-seq primers in reducing the amplification biases, we also compared the relative amplification biases after filtering using or ignoring the sUMI-seq barcode information (Fig. 2ci). To account for differences in read depths between the different methods, each filtered dataset was subsampled to the same read depth across all samples (3000 reads), and relative amplification biases were calculated, defined as the maximum number of reads containing the same synthetic DNA-UMI. The mean relative amplification biases was calculated from 500 repeats per sample. (Fig. 2cii). Indeed, the mean relative biases were equivalent between sUMI-seq PCR ignoring the barcode information and the standard non-barcoded PCR. However, the mean relative biases were significantly lower in the sUMI-seq PCR using the barcode information compared to ignoring the barcode information (p-value=0.005). Together, this highlights the need for accounting for amplification biases.

Finally, a quantitative amplicon barcoding method should have a linear correlation between DNA template input and sequence output. To test this, we performed a dilution series of a peripheral blood (PB) DNA sample mixed with the synthetic UMI-DNA library at varying ratios, and sUMI-seq PCR was applied, again using either 5, 10, or 20 PCR1 cycles (Fig. 3a). The PB DNA sample was from a chronic lymphocytic leukaemia (CLL) patient, characterised by a clonal expansion of a single B cell clone, where >50% of all peripheral B cells contain a single IgH VDJ rearrangement (IGHV1–69*14-IGHJ6*02), as previously published⁵. Indeed, there was a strong linear relationship between CLL DNA input and proportion of sequencing reads after accounting for barcodes (Figs. 3b,c, S4). This suggests that sUMI-seq primers can be used to accurately correct amplification bias.

We next applied sUMI-seq PCR to BCR sequencing of clinical samples, namely on a well-characterized cohort of peripheral blood mononuclear cell (PBMC) samples from 11 healthy individuals and 4 chronic lymphocytic leukemia (CLL) patients who have previously been sequenced using the conventional BCR non-barcoded amplification method¹⁴. Subsequent linearization of the amplicons in PCR2 with the inclusion of a sample-specific barcode in the linearization primers was used to facilitate efficient sample pooling before library preparation and high-throughput sequencing (Fig. 4a). This yielded between 4228–29372 unique sUMI-barcodes per sample after filtering for BCRs, comprising between 1055–4688 unique IgH V-D-J rearrangement per sample (Table S1). As previously observed, the healthy individual samples yielded diverse BCR repertoires (Figs. 4b and S6), whereas the CLL samples were characterized by the clonal expansion of a single malignant B cell clone (Fig. 4b), demonstrated by the increased maximum clone size and clonal diversification indices. The BCR sequences of dominant malignant BCR clones identified by sUMI-seq were identical to that of conventional BCR non-barcoded amplification methods and BCR amplification by RNA as previously published⁵ (Fig. S7). Furthermore, the frequency of each B cell clone, as defined by the CDR3 of the BCR sequence, was highly correlated with that of the conventional DNA amplification method (Fig. 4c). Together, this demonstrated that sUMI-seq PCR could be used to efficiently capture BCR repertoire data from DNA sources.

We next determined whether the capture of BCR repertoires were significantly improved using sUMI-seq. Given that sUMI-seq benefits from both error-correction and amplification bias-correction (see methods), we hypothesized that the estimation of (1) clonal diversity, (2) the level of somatic hypermutation and (3) mean amplicon length would be improved compared to standard non-barcoded BCR PCR amplification methods.

Firstly, the relative clonal diversity of all clones representing>1% of the total repertoire in each sample was compared between filtering using sUMI-seq barcode information to correct for amplification biases and filtering ignoring the sUMI-seq barcode information, whilst accounting for read depth (Fig. 5a,bi). Indeed, the use of the sUMI-seq barcode information resulted in a significant reduction in estimated clonal diversity in all clones tested in both healthy (diverse) and CLL (clonal) BCR repertoires (p-values < 1e-10, Fig. 5bii). This suggests that standard PCR amplification methods overestimate the diversity of DNA pools due to the introduction of PCR amplification and sequencing errors, which can be corrected through the use of sUMI-seq primer barcoding. Secondly, the estimation of the level of somatic hypermutation (SHM) was significantly reduced when the sUMI-seq barcode information was used for filtering. This was demonstrated by a significantly higher proportion of unmutated BCR sequences (i.e. the IGHV region within 1 bp difference from the closest germline reference gene) when using the sUMI-seq primer barcoding (Fig. 5cii). The nature of the mutations, often reported in BCR sequencing studies¹⁵, was also significantly different when using error and amplification bias correction. This was shown both in terms of the lower silent-to-non-silent mutation ratio (p-value = 0.033, Fig. 5biii) and the locations of the mutations: namely a higher proportion of mutations occurring in the CDRs compared to the FWRs (p-value = 0.00059, Fig. 5civ). The latter is in agreement with previous studies where mutations are known to preferentially occur in the CDRs compared to the FWRs^16,17.

Furthermore, the PCR amplification is known to preferentially amplify shorter amplicons. The CDR3 is the most variable region of the BCR sequence, driven in part by the combinations of different IGHV-D-J regions that are recombined during B cell maturation (Fig. 5ci). Indeed, longer CDR3 lengths (longer than ~20 amino acids) are associated with both auto- and poly-reactivity and are often interrogated in BCR repertoire studies¹⁸. The mean CDR3 region length can be determined from the BCR sequencing data, and, indeed, significantly increased mean CDR3 lengths were observed using amplification-bias-correction via sUMI-seq compared to when error correction was not used (Fig. 5cv).

Together, this data suggests that the sUMI-seq barcoded approach represents a closer representation of the “ground truth” of the BCR repertoire compared to the non-barcoded repertoires. This demonstrates that the estimation of diversity, mutation and amplicon lengths of a mixed DNA pool are all significantly improved by sUMI-seq compared to conventional non-barcoded methods.

In summary, the sUMI-seq strategy allows for ultrasensitive barcoding of PCR amplicons from DNA for high-throughput sequencing, benefiting from significantly reduced PCR and sequencing errors and amplification biases leading to more accurate characterization of mixed DNA samples. We applied this method to immune receptor repertoire (BCR) profiling, where sUMI-seq captured both highly diverse and highly clonal B cell repertoires from healthy and CLL patients, respectively. sUMI-seq allowed for more accurate estimation of diversity, mutation and amplicon lengths, which are key analyses in many studies of mixed DNA variant pools. sUMI-seq can be easily applied to any PCR amplicons, and benefits from simplicity of primer design, straightforwardness of the amplification protocol with few steps, and streamlined method for incorporating sample barcodes in the second PCR. We have demonstrated the utility and power of this method in the characterization of complex immune receptor repertoire profiles, and this may be applied to a wide range of other applications in which characterizing DNA variants may be obscured by amplification bias or sequence error.

Materials and Methods

Samples

Peripheral blood mononuclear cells (PBMCs) were isolated from 10 mL of whole blood from healthy volunteers and CLL patients using Ficoll gradients (GE Healthcare). Total RNA was isolated using TRIzol and purified using RNeasy Mini Kit (Qiagen), including on-column DNase digestion according to the manufacturer’s instructions. Ethical approval for this study was obtained from the Eastern NHS Multi Research Ethics Committee (07/MRE05/44). Informed consent was obtained from all subjects enrolled and all experiments were performed in accordance with relevant guidelines and regulations.

Design and amplification with barcoded primers

Gene specific sUMI-seq primers were designed according to Fig. S8.

sUMI-seq PCR 1 amplification with barcoded MALBAC primers

PCR1 was performed using 15 μL KAPA buffer (2×) (KAPA HIFI Hotstart PCR kit, Kapa Biosystems), 1 μL MALBAC IgH V (FR3) forward primer mix (10 µM) (containing 7 family specific primers designed to target the FR3 regions of VH1 through VH7 variable gene families) and 1 μL MALBAC reverse primer JH_(10 µM) (consensus sequence), 8 μL nuclease-free water, 5 μL DNA template (20 ng/μL), in a final volume of 30 μL. sUMI-seq primer sequences that amplify the BCR repertoire are provided in Table S1. The synthetic DNA library (UMI_DNA) was designed to be amplified with the same primer sets. The thermal cycling conditions for sUMI-seq PCR 1 were as follows: 1 cycle (95 °C–5 min); 5 cycles (98 °C–5 sec; 72 °C–2 min); 5 cycles (65 °C–10 sec, 72 °C–2 min); 5, 10, 20, or 30 cycles (98 °C–20 sec, 60 °C–1 min, 72 °C–2 min); 1 step (72 °C–10 min). PCR1 amplicons were then cleaned-up using 0.8x Agencourt AMPure XP beads (Beckman Coulter).

sUMI-seq PCR 2 amplification (without sample IDs)

The PCR2 reaction was performed using 17.5 μL of KAPA buffer (2×) (KAPA HIFI Hotstart PCR kit, Kapa Biosystems), 1 μL of 10 μM IgH V (FR3) forward primer mix and 1 μL of 10 μM MALBAC_UNI primers, 5.5 μL of nuclease-free water, 10 μL of DNA template (from PCR1), in a final volume of 35 μL. The thermal cycling conditions were as follows: 1 cycle (95 °C–5 min); 5 cycles (98 °C–5 sec; 72 °C–2 min); 5 cycles (65 °C–10 sec, 72 °C–2 min); 20 cycles (98 °C–20 sec, 60 °C–1 min, 72 °C–2 min); 1 step (72 °C–10 min).

sUMI-seq PCR 2 amplification (with sample barcode IDs)

The PCR2 reaction was performed using 17.5 μL KAPA buffer (2×) (KAPA HIFI Hotstart PCR kit, Kapa Biosystems), 1 μL MALBAC IgH V (FR3) forward primer mix (10 μM), and 1 μL MALBAC_UNI_Ind primer (10 μM) (choice of 1–12 barcodes) (Table S1), 5.5 μL nuclease-free water, 10 μL DNA template (from PCR1), for a total volume of 35 μL. The thermal cycling conditions were as follows: 1 cycle (95 °C–5 min); 5 cycles (98 °C–5 sec; 72 °C–2 min); 5 cycles (65 °C–10 sec, 72 °C–2 min); 20 cycles (98 °C–20 sec, 60 °C–1 min, 72 °C–2 min); 1 step (72 °C–10 min).

Standard non-barcoded PCR amplification

This was performed using 15 μL KAPA buffer (2×) (KAPA HIFI Hotstart PCR kit, Kapa Biosystems), 1 μL IgH V (FR3) forward primer mix (10 μM) (standard non-barcoded primers), and 1μL reverse IgH-J (10 μM) (standard primers), 8 μL nuclease-free water, 5 μL DNA template (20 ng/μL), for a total volume of 30 μL. The thermal cycling conditions were as follows: 1 cycle (95 °C–5 min); 5 cycles (98 °C–5 sec; 72 °C–2 min); 5 cycles (65 °C–10 sec, 72 °C–2 min); 5, 10, or 20 cycles (98 °C–20 sec, 60 °C–1 min, 72 °C–2 min); 1 step (72 °C–10 min).

High-throughput sequencing and QC

PCR2 DNA amplicons were cleaned-up using 0.8x Agencourt AMPure XP beads (bead-based size selection) and checked using electrophoresis on a 2% agarose gel. MiSeq libraries were prepared using KAPA protocols (KK8722 and KK8504) and sequenced using 300 bp pair-end MiSeq (Illumina). Raw MiSeq reads were filtered for base quality (median Phred score >32) using QUASR (http://sourceforge.net/projects/quasr/)³.

MiSeq forward and reverse reads were merged together if they contained an identical overlapping region of >50 bp, or otherwise discarded.

For the sUMI-seq filtering pipeline

Universal barcoded regions were identified in reads and orientated to read from forward (IgH V)-primer to reverse (IgH-J) region primer. The barcoded region within each primer was identified and checked for conserved bases. Error-correction and amplification bias correction: Groups of sequencing reads containing the same sUMI-seq primer UMIs originate from the same DNA template, and therefore a consensus sequence was generated from these groups. This reduces amplification biases (i.e. the effect of differential amplification of DNA templates), as well as correcting potential PCR/sequencing errors. Consensus sequences were retained only if there was a per-base agreement of 80% between all sequencing reads containing the same UMI. For groups of 4 of fewer sequencing reads containing the same UMI, there needed to be complete agreement between sequences after alignment, otherwise were discarded. This is summarised in Fig. S5.

For the standard filtering pipeline

The primer regions within the sequencing reads were determined. All sequences without identifiable primer annealing regions were discarded.

Quantifying the frequency of further amplification of that PCR1 self-annealing MALBAC loops products

For each sequence within the synthetic UMI-DNA datasets, the synthetic DNA-UMIs and primer barcode pairs were identified. From this, the proportion of sequences which contained DNA-UMIs associated with more than one primer barcode pairs was determined, and normalised to the total number of reads (provided as a rate per 1000 reads): $1000\ast (\frac{{\rm{number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{synthetic}}\,{\rm{DNA}}-{\rm{UMIs}}\ast {\rm{number}}\,{\rm{of}}\,{\rm{sUMI}}-{\rm{seq}}\,{\rm{primer}}\,{\rm{barcode}}\,{\rm{pairs}}}{{\rm{number}}\,{\rm{of}}\,{\rm{BC}}-{\rm{MALBAC}}\,{\rm{primer}}\,{\rm{pairs}}/{\rm{number}}\,{\rm{of}}\,{\rm{reads}}})$

Quantifying amplification bias using the synthetic DNA-UMIs library and comparing sUMI PCR to standard PCR amplification

The relative amplification biases were compared between (1) the sUMI-seq method using sUMI-seq barcode information (using the sUMI-seq filtering pipeline), (2) the sUMI-seq method ignoring the sUMI-seq barcode information (using the standard filtering pipeline), and (3) using the standard non-barcoded PCR amplification method and the standard filtering pipeline. To account for differences in read depths between the different methods, each filtered dataset was subsampled to the same read depth across all samples (3000 reads), defined as the maximum number of reads containing the same synthetic DNA-UMI:

$$amplification\,bias\,(per\,subsample)\,=\,{\max }({n}_{i})$$

where n_i is number of reads containing synthetic DNA − UMI i. The mean relative amplification biases was calculated from 500 repeats per sample. Wilcoxon tests were performed in R.

BCR sequence filtering

Sequences without complete reading frames and non-immunoglobulin sequences were removed and only reads with significant similarity to reference IgH variable genes (V-D-J) from the IMGT database were retained using BLAST¹⁹. Sequence annotation, including somatic hypermutation, CDR3 regions and IGHV gene usages, were defined via IMGT V-QUEST, where repertoire differences were performed by custom scripts in Python, and statistics were performed in R using Wilcoxon tests for significance.

BCR repertoire generation and network analysis

The network generation algorithm and network properties were calculated as in Bashford-Rogers et al.⁵: each vertex represents a unique sequence, where relative vertex size is proportional to the number of identical reads. Edges join vertices that differ by single nucleotide non-indel differences and clusters are collections of related, connected vertices.

A clone (cluster) refers to clonally-related B cells, containing BCRs with identical CDR3 regions and IgH gene usage, or differing by single point mutations, such as through somatic hypermutation.

Clonality diversity refers to the relative number of clonally-related, but distinct, B cells within a clone. In the context of BCR sequencing, this is a measure of the number of unique clonally-related BCRs (clone members). Sequence repertoire parameters that were dependent on sequencing depth were generated by subsampling each sequencing sample to a specified clone depth. This includes the Clonal Diversification index, was measured by cluster Renyi Index as defined in Bashford-Rogers et al.⁵. This is calculated from the distribution of the number of unique VDJ region sequences per clone within subsampled BCR repertoires at specified depth of 1000 clones. The mean of 100 repeats of resulting Clonal Diversification indices was determined. Clone size distributions were also calculated from the same subsamples and a mean of 100 repeats was determined.

BCR network sampling to preserve the overall clonal structure of visual representation

To obtain representative subgraph of a network that preserves the overall relative clonal architecture whilst providing visual representations that distinguish between samples of different clonalities, clone subsampling was used as described in¹⁵. One thousand clones are subsampled and a network generated from all BCRs from these clones. Subsampling was performed 100 times, and the sample that contained a maximum clone size closest to the median of all subsamples greater than the unsampled maximum clone size was chosen.

Ethics approval and consent to participate

Ethical approval for this study was obtained from the Eastern NHS Multi Research Ethics Committee (07/MRE05/44). Informed consent was obtained from all subjects enrolled.

Data availability

Code is made available at https://github.com/rbr1/sUMI_processing_pipeline. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. All sequencing data will be uploaded to the EGA.

Change history

07 October 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

McKerrell, T. et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep 10, 1239–1245, https://doi.org/10.1016/j.celrep.2015.02.005 (2015).
Article CAS PubMed PubMed Central Google Scholar
Human Microbiome Project, C. A framework for human microbiome research. Nature 486, 215–221, https://doi.org/10.1038/nature11209 (2012).
Article ADS CAS Google Scholar
Watson, S. J. et al. Viral population analysis and minority-variant detection using short read next-generation sequencing. Philos Trans R Soc Lond B Biol Sci 368, 20120205, https://doi.org/10.1098/rstb.2012.0205 (2013).
Article PubMed PubMed Central Google Scholar
Wei, L. et al. Genome-wide CRISPR/Cas9 library screening identified PHGDH as a critical driver for Sorafenib resistance in HCC. Nat Commun 10, 4681, https://doi.org/10.1038/s41467-019-12606-7 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Bashford-Rogers, R. J. M. et al. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome Res 23, 1874–1884, https://doi.org/10.1101/gr.154815.113 (2013).
Article CAS PubMed PubMed Central Google Scholar
Petrova, V. N. et al. Combined Influence of B-Cell Receptor Rearrangement and Somatic Hypermutation on B-Cell Class-Switch Fate in Health and in Chronic Lymphocytic Leukemia. Frontiers in Immunology 9, https://doi.org/10.3389/fimmu.2018.01784 (2018).
Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626, https://doi.org/10.1126/science.1229164 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Galson, J. D. et al. In-Depth Assessment of Within-Individual and Inter-Individual Variation in the B Cell Receptor Repertoire. Front Immunol 6, 531, https://doi.org/10.3389/fimmu.2015.00531 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bashford-Rogers, R. J. M., Smith, K. G. C. & Thomas, D. C. Antibody repertoire analysis in polygenic autoimmune diseases. Immunology, https://doi.org/10.1111/imm.12927 (2018).
Bashford-Rogers, R. J. M. et al. Eye on the B-ALL: B-cell receptor repertoires reveal persistence of numerous B-lymphoblastic leukemia subclones from diagnosis to relapse. Leukemia, https://doi.org/10.1038/leu.2016.142 (2016).
Bashford-Rogers, R. J. M. et al. Dynamic variation of CD5 surface expression levels within individual chronic lymphocytic leukaemia clones. Exp Hematol, https://doi.org/10.1016/j.exphem.2016.09.010 (2016).
Galson, J. D. et al. Analysis of B Cell Repertoire Dynamics Following Hepatitis B Vaccination in Humans, and Enrichment of Vaccine-specific Antibody Sequences. EBioMedicine 2, 2070–2079, https://doi.org/10.1016/j.ebiom.2015.11.034 (2015).
Article PubMed Central PubMed Google Scholar
Ma, L. et al. Characteristics Peripheral Blood IgG and IgM Heavy Chain Complementarity Determining Region 3 Repertoire before and after Immunization with Recombinant HBV Vaccine. PLoS One 12, e0170479, https://doi.org/10.1371/journal.pone.0170479 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bashford-Rogers, R. J. et al. Capturing needles in haystacks: a comparison of B-cell receptor sequencing methods. BMC Immunol 15, 29, https://doi.org/10.1186/s12865-014-0029-0 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bashford-Rogers, R. J. M. et al. Analysis of the B cell receptor repertoire in six immune-mediated diseases. Nature, https://doi.org/10.1038/s41586-019-1595-3 (2019).
Lin, M. M., Zhu, M. & Scharff, M. D. Sequence dependent hypermutation of the immunoglobulin heavy chain in cultured B cells. Proc Natl Acad Sci USA 94, 5284–5289 (1997).
Article ADS CAS PubMed PubMed Central Google Scholar
Yaari, G. et al. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front Immunol 4, 358, https://doi.org/10.3389/fimmu.2013.00358 (2013).
Article CAS PubMed PubMed Central Google Scholar
Meffre, E. et al. Immunoglobulin heavy chain expression shapes the B cell receptor repertoire in human B cell development. J Clin Invest 108, 879–886, https://doi.org/10.1172/JCI13051 (2001).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410, https://doi.org/10.1016/S0022-2836(05)80360-2 (1990).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Wellcome Trust. We would like to thank the Cambridge Cancer Trials Centre and nurse specialists Gwyn Stafford, Rosie Tween, Lisa Walbridge and Joanna Baxter, and the patients and staff of Addenbrooke’s Haematology Translational Research Laboratory. This work was supported by the Wellcome Trust (grant WT106068AIA) and the Amgen Foundation.

Author information

Ibrahim Ahmed
Present address: Faculty of Biology, Medicine and Health, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK

Authors and Affiliations

Department of Medicine, University of Cambridge, Cambridge, United Kingdom
Ibrahim Ahmed, Aure Aflalo, Kenneth G. C. Smith & Rachael J. M. Bashford-Rogers
Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
Felicia A. Tucci & Rachael J. M. Bashford-Rogers
Department of Pathology, University of Cambridge, Cambridge, United Kingdom
Aure Aflalo
Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, United Kingdom
Kenneth G. C. Smith

Authors

Ibrahim Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Felicia A. Tucci
View author publications
You can also search for this author in PubMed Google Scholar
Aure Aflalo
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth G. C. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Rachael J. M. Bashford-Rogers
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.J.M.B.-R. and K.G.C.S. planned the study. R.J.M.B.-R., I. A., A. A. and F.A.T. performed BCR amplification and R.J.M.B.-R. analysed sequencing data. All authors provided intellectual contributions to experiments and/or analyses. R.J.M.B.-R. wrote the manuscript. All authors edited the manuscript.

Corresponding author

Correspondence to Rachael J. M. Bashford-Rogers.

Ethics declarations

Competing interests

R.J.M.B.-R. is a co-founder and consultant for Alchemab Therapeutics Ltd and a consultant for Imperial College London and VHSquared. F.A.T. is a consultant for Alchemab Therapeutics Ltd. K.G.C.S. is a co-founder of Rheos Medicines and PredictImmune. I. A. and A. A. declare no conflicts of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ahmed, I., Tucci, F.A., Aflalo, A. et al. Ultrasensitive amplicon barcoding for next-generation sequencing facilitating sequence error and amplification-bias correction. Sci Rep 10, 10570 (2020). https://doi.org/10.1038/s41598-020-67290-1

Download citation

Received: 08 November 2019
Accepted: 01 June 2020
Published: 29 June 2020
DOI: https://doi.org/10.1038/s41598-020-67290-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.