Computational correction of index switching in multiplexed sequencing libraries

Larsson, Anton J M; Stanley, Geoff; Sinha, Rahul; Weissman, Irving L; Sandberg, Rickard

doi:10.1038/nmeth.4666

Correspondence
Published: 27 April 2018

Computational correction of index switching in multiplexed sequencing libraries

Anton J M Larsson¹,
Geoff Stanley²,
Rahul Sinha³,
Irving L Weissman³ &
…
Rickard Sandberg¹

Nature Methods volume 15, pages 305–307 (2018)Cite this article

4452 Accesses
50 Citations
15 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

To the Editor:

Contemporary sequencers allow for large-scale multiplexing of sequencing libraries. In the single-cell field, hundreds or thousands of libraries are typically sequenced in the same lane of an Illumina sequencer^1,2. It is assumed that the integrity of each library depends on the experimental conditions before sequencing, but cross-library contamination can also occur during sequencing³. Newer Illumina sequencers that use exclusion amplification (e.g., HiSeq 3000, HiSeq 4000 and NovaSeq) seem especially vulnerable to index switching caused by free index primers, which generate 2–10% contaminated reads^4,5,6. Libraries for which a single swapping event can lead to the erroneous assignment of sequence reads to cells (or samples) are most affected and include single-cell libraries multiplexed by combinatorial barcoding with i5 and i7 Nextera primers. We describe a strategy to estimate the fraction of affected counts and introduce a computational correction procedure for affected data. We observed that this correction removed most cross-contamination signal and that the corrected expression data no longer clustered by index or plate, leading to the rescue of previously sequenced libraries.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

happi: a hierarchical approach to pangenomics inference
- Pauline Trinh
- , David S. Clausen
- & Amy D. Willis
Genome Biology Open Access 29 September 2023
Using strain-resolved analysis to identify contamination in metagenomics data
- Yue Clare Lou
- , Jordan Hoff
- … Jillian F. Banfield
Microbiome Open Access 02 March 2023
De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee
- Yunxi Liu
- , R. A. Leo Elworth
- … Todd J. Treangen
Nature Communications Open Access 10 November 2022

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Computational correction of index switching.**

References

Segerstolpe, Å. et al. Cell Metab. 24, 593–607 (2016).
Article CAS Google Scholar
Cao, J. et al. Science 357, 661–667 (2017).
Article CAS Google Scholar
Illumina Inc. Effects of Index Misassignment on Multiplexing and Downstream Analysis. Publication No. 770-2017-004-C QB # 5420 (Illumina Inc., 2017).
Sinha, R. et al. bioRxiv preprint at https://www.biorxiv.org/content/early/2017/04/09/125724 (2017).
Google Scholar
Griffiths, J.A., Richard, A.C., Bach, K., Lun, A.T.L. & Marioni, J.C. bioRxiv preprint at https://www.biorxiv.org/content/early/2018/01/30/177048 (2018).
Google Scholar
Costello, M. et al. bioRxiv preprint at https://www.biorxiv.org/content/early/2017/10/10/200790 (2017).
Google Scholar

Download references

Acknowledgements

This work was supported by the Swedish Research Council (grant 2017-01062 to R. Sandberg), the Bert L. and N. Kuggie Vallee Foundation (to R. Sandberg), a National Science Foundation Graduate Research Fellowship (grant DGE 1147470 to G.S.), the NIH (grant R01CA86065 to I.L.W.), the California Institute for Regenerative Medicine (grant GC1R-06673-A), to M. Snyder; Collaborative Reserch Project to I.L.W. and the Virginia and D.K. Ludwig Fund for Cancer Research (to I.L.W.)

Author information

Authors and Affiliations

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
Anton J M Larsson & Rickard Sandberg
Department of Bioengineering, Stanford University School of Engineering, Stanford, California, USA
Geoff Stanley
Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, California, USA
Rahul Sinha & Irving L Weissman

Authors

Anton J M Larsson
View author publications
You can also search for this author in PubMed Google Scholar
Geoff Stanley
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Irving L Weissman
View author publications
You can also search for this author in PubMed Google Scholar
Rickard Sandberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rickard Sandberg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Read count distributions at different thresholds of filtering.

(A) Histogram of gene read count distributions (post-correction HiSeq4000) for all 384 cells of the HSC plate, after applying read count filtering at the indicated threshold (1 to 10 reads). (B) The read count distribution of the same plate (sequenced on NextSeq) without correction or read count filtering. We noted that corrected data still had an excess of low read counts which we eliminated by using a read-count threshold. Note that using a read count threshold is not sufficient to remove cross-contamination signal (Fig. S2).

Supplementary Figure 2 Evaluation of read count threshold for index switching correction.

(A) Histogram of false positive gene expression, defined as gene expression detected on the HiSeq 4000 before (blue) and after (yellow) correction where no expression was detected in the same cell on the NextSeq 500. (B) Histogram with fraction false positive expression signals removed per gene. (C) Histogram of false negative gene expression defined as no detectable gene expression on the HiSeq 4000 before (blue) and after (yellow) correction where expression was detected in the same cell on the NextSeq 500.

Supplementary Figure 3 Corrected data rescued index-driven HSC sub-clusters.

(A) Robust PCA (rPCA) analyses of HSCs based on HiSeq 4000 sequence counts, coloured by the Nextera i5 index. The rPCA algorithm assigns the most prominent outlier observations (i.e. cells) into the last principal components, which corresponded to wells that had been amplified using the same i7 and i5 indices (n = 384 cells). (B) Loadings of cells on PC24, stratified by i5 Nextera primers (n = 384 cells, 16 stratifications) (C) Loading of cells on P25, stratified by i7 Nextera primers (n = 384 cells, 24 stratifications). Center: Median, Hinges: 1^st and 3^rd quartiles, Whiskers: 1.5 interquartile range (IQR) (D) rPCA clustering of HSCs (as in A) for corrected HiSeq 4000 sequence counts, colored by Nextera i5 index primer. (E-F) As in (C-D) for corrected HiSeq 4000 sequence counts.

Supplementary Figure 4 Corrected data rescued index-driven plate-associated clustering.

(A) PCA analysis of HSCs from two plates of libraries sequenced on the HiSeq 4000, colored by library plate. PC2 scores separate cells by plates (n = 384 cells per plate). (B) As in (A) for corrected transcriptome data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Larsson, A., Stanley, G., Sinha, R. et al. Computational correction of index switching in multiplexed sequencing libraries. Nat Methods 15, 305–307 (2018). https://doi.org/10.1038/nmeth.4666

Download citation

Published: 27 April 2018
Issue Date: 01 May 2018
DOI: https://doi.org/10.1038/nmeth.4666

This article is cited by

Endosymbiont diversity across native and invasive brown widow spider populations
- Monica A. Mowery
- Laura C. Rosenwald
- Jennifer A. White
Scientific Reports (2024)
Using strain-resolved analysis to identify contamination in metagenomics data
- Yue Clare Lou
- Jordan Hoff
- Jillian F. Banfield
Microbiome (2023)
happi: a hierarchical approach to pangenomics inference
- Pauline Trinh
- David S. Clausen
- Amy D. Willis
Genome Biology (2023)
De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee
- Yunxi Liu
- R. A. Leo Elworth
- Todd J. Treangen
Nature Communications (2022)
SpotClean adjusts for spot swapping in spatial transcriptomics data
- Zijian Ni
- Aman Prasad
- Christina Kendziorski
Nature Communications (2022)

Computational correction of index switching in multiplexed sequencing libraries

Subjects

Relevant articles

happi: a hierarchical approach to pangenomics inference

Using strain-resolved analysis to identify contamination in metagenomics data

De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Read count distributions at different thresholds of filtering.

Supplementary Figure 2 Evaluation of read count threshold for index switching correction.

Supplementary Figure 3 Corrected data rescued index-driven HSC sub-clusters.

Supplementary Figure 4 Corrected data rescued index-driven plate-associated clustering.

Supplementary information

Supplementary Text and Figures

Life Sciences Reporting Summary (PDF 67 kb)

Supplementary Software

Rights and permissions

About this article

Cite this article

This article is cited by

Endosymbiont diversity across native and invasive brown widow spider populations

Using strain-resolved analysis to identify contamination in metagenomics data

happi: a hierarchical approach to pangenomics inference

De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee

SpotClean adjusts for spot swapping in spatial transcriptomics data

Search

Quick links

Subjects

Relevant articles

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links