Sample preparation

Human PBMC thawing and nuclei isolation

Cryopreserved human PBMCs from one male donor and one female donor were purchased from AllCells and distributed across institutes to generate the following samples: v1 V1, v1 V2, v1.1 C1–v1.1 C3, v1.1 St1, v1.1 St2, v1.1c C1, v1.1c C2, v2 V1, v2 V2, v2 C1, v2 C2, MO Sa1, MO Sa2, MO C1, MO C2, MO V1, MO V2, mt M1, mt M2, mt C1, mt C2, mt* Br1, mt* Br2, ddS H1, ddS H2, s3 O1, Hy E1–Hy E4, Hy V1, Hy V2 and Hy C1–Hy C3. For the remaining samples (v1.1 T1, v2 T1, v2 T2, ddS Bi1–ddS Bi4, ddS U1, ddS U2 and s3 O2), locally available cryopreserved PBMCs were used. In these short identifiers, the first part indicates the technology used, the second part indicates the first one or two letters from the center where the experiment was performed, and the number identifies each technical replicate.

Unless specified otherwise in technology-specific methods sections (for Bio-Rad ddSEQ, mtscATAC-seq, 10x v1.1 control runs and s3-ATAC), cryopreserved PBMCs were thawed according to the 10x Genomics demonstrated protocol CG00039 (‘Fresh Frozen Human Peripheral Blood Mononuclear Cells for Single Cell RNA Sequencing’). Briefly, 1 ml of frozen cells was rapidly thawed in a water bath at 37 °C and transferred to a 50-ml tube using a 1,000-µl wide-bore tip. Next, 1 ml of medium prewarmed to 37 °C and supplemented with 10% fetal bovine serum (FBS; Thermo Fisher Scientific) was added dropwise with gentle swirling of the sample. After 1 min of incubation at room temperature, 2, 4, 8 and 16 ml of medium with 10% FBS were added dropwise with 1 min of incubation at room temperature in between. The cell suspension was then centrifuged at 300g for 5 min at room temperature. The pellet was resuspended in 10 ml of medium supplemented with 10% FBS, and cells were counted. Unless specified otherwise in the technology-specific methods sections, the isolation of nuclei was performed according to the 10x Genomics demonstrated protocol ‘Nuclei Isolation for Single Cell ATAC Sequencing’. Briefly, 1 million cells from the cell mix were transferred to a 1.5-ml microcentrifuge tube and centrifuged at 500g for 5 min at 4 °C. The supernatant was removed without disrupting the cell pellet, and 100 µl of chilled lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween 20, 0.1% NP-40 substitute, 0.01% digitonin and 1% bovine serum albumin (BSA)) was added and mixed by pipetting ten times. Samples were then incubated on ice for 3 min. Following lysis, 1 ml of chilled wash buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween 20 and 1% BSA) was added and mixed by pipetting. Nuclei were centrifuged at 500g for 5 min at 4 °C, and the supernatant was removed without disrupting the nuclei pellet. Based on the starting number of cells and assuming a 50% loss during the procedure, nuclei were resuspended into the appropriate volume of chilled diluted Nuclei Buffer (10x Genomics) to achieve a concentration of 925–2,300 nuclei per µl, suitable for a target recovery of 3,000 nuclei. This combination of PBMC thawing and nuclei isolation was used for all 10x samples (except mtscATAC-seq protocols, v1.1 control runs and v1.1 St1 and v1.1 St2 samples) and all HyDrop samples, but not for s3-ATAC and Bio-Rad ddSEQ samples. The method of cell counting was performed differently depending on the center of origin. For all samples generated in VIB, cells and nuclei were counted using a LUNA automated cell counter (Logos Biosystems). For Stanford and Sanger samples, cells and nuclei were counted manually using a hemocytometer. For all Bio-Rad ddSEQ and s3-ATAC and all CNAG samples, cells and nuclei were counted using a TC20 cell counter (Bio-Rad). For Broad samples and samples generated by the company 10x Genomics, cells and nuclei were counted using a Countess II or III FL automated cell counter (Thermo Fisher).

10x ATAC v1 (short identifiers v1 V1 and v1 V2)

PBMCs were thawed, and nuclei were isolated as described above. Two technical replicates were generated on the same day starting from the same freshly thawed nuclei suspension. scATAC-seq libraries were prepared according to the Chromium Single Cell ATAC reagent kits v1.0 user guide (10x Genomics, CG000001 Rev D). Briefly, the transposition reaction was prepared by mixing the desired number of nuclei with ATAC Buffer (10x Genomics) and ATAC Enzyme (10x Genomics) and incubated for 60 min at 37 °C; 4,590 nuclei were loaded with the goal of recovering 3,000 nuclei. Nuclei were partitioned into Gel Bead-in-Emulsions (GEMs) by using the Chromium Controller (Chip E). DNA linear amplification was then performed by incubating the GEMs under the following thermal cycling conditions: 72 °C for 5 min, 98 °C for 30 s and 12 cycles of 98 °C for 10 s, 59 °C for 30 s and 72 °C for 1 min. GEMs were broken using Recovery Agent (10x Genomics), and the resulting DNA was purified by sequential Dynabeads and SPRIselect reagent beads cleanups. Libraries were indexed by PCR using a Single Index kit (Plate N) and incubating under the following thermal cycling conditions: 98 °C for 45 s and ten cycles of 98 °C for 20 s, 67 °C for 30 s and 72 °C for 20 s with a final extension of 72 °C for 1 min. Sequencing libraries were subjected to a final bead cleanup with SPRIselect reagent.

Samples v1 V1 and v1 V2 were sequenced on a NovaSeq 6000 using a NovaSeq SP kit (100 cycles; 20028401, Illumina), and sequencing was performed using the following read protocol: 50 cycles (read 1), 8 cycles (i7 index read), 16 cycles (i5 index read) and 49 cycles (read 2).

10x ATAC v1.1 (short identifiers v1.1 C1–v1.1 C3, v1.1 T1, v1.1 St1 and v1.1 St2)

PBMCs were thawed, and nuclei were isolated as described above for samples v1.1 C1–v1.1 C3 and v1.1 T1. For samples v1.1 St1 and v1.1 St2, a different thawing/isolation protocol was used. Here, each cryopreserved PBMC sample was thawed in 50 ml of thaw medium (IMDM, 10% FBS and 200 Kunitz U ml–1 DNase) preheated to 37 °C and incubated for 15 min at 37 °C. DNase was ordered from Worthington Biochem (LS002007) and resuspended in HBSS at 20,000 U ml–1 (100× stock). Cells were pelleted at 300g (1,200 r.p.m.), resuspended in 5 ml of thaw medium and layered over 5 ml of Ficoll in a 15-ml conical tube. Cells were then spun at 500g (1,500 r.p.m.) with no brake for 30 min at room temperature in a swinging-bucket centrifuge. 2 mL of the mononuclear cell layer was collected and diluted with 10 ml of room temperature PBS. Cells were put on ice and maintained at 4 °C until use.

Technical replicates were generated on the same day starting from the same freshly thawed nuclei suspension. scATAC-seq libraries were prepared according to the Chromium Single Cell ATAC reagent kit v1.1 user guide (10x Genomics, CG000209 Rev D). Briefly, the transposition reaction was prepared by mixing the desired number of nuclei with ATAC Buffer (10x Genomics) and ATAC Enzyme (10x Genomics) and was then incubated for 60 min at 37 °C; 4,590 nuclei were loaded with the goal of recovering 3,000 nuclei. For sample ‘10x v1.1 V2’, 9,180 nuclei were loaded instead of 4,590 due to a counting error. Nuclei were partitioned into GEMs by using a Chromium Controller with Chip H. DNA linear amplification was then performed by incubating the GEMs under the following thermal cycling conditions: 72 °C for 5 min, 98 °C for 30 s and 12 cycles of 98 °C for 10 s, 59 °C for 30 s and 72 °C for 1 min. GEMs were broken using Recovery Agent (10x Genomics), and the resulting DNA was purified by sequential Dynabeads and SPRIselect reagent beads cleanups. Libraries were indexed by PCR using a Single Index kit N set A (10x Genomics, PN-1000212) and incubated under the following thermal cycling conditions: 98 °C for 45 s and ten cycles of 98 °C for 20 s, 67 °C for 30 s and 72 °C for 20 s with a final extension of 72 °C for 1 min. Sequencing libraries were subjected to a final bead cleanup with SPRIselect reagent.

Samples v1.1 St1 and v1.1 St2 were sequenced on an Illumina NextSeq 500 machine using a high-output flow cell with 34 bp paired-end reads. Samples v1.1 C1–v1.1 C3 and v1.1 T1 were sequenced on an Illumina NovaSeq 6000 with the following sequencing conditions: 50 bp (read 1), 8 bp (i7 index), 16 bp (i5 index) and 49 bp (read 2).

10x ATAC v2 (short identifiers v2 V1, v2 V2, v2 T1, v2 T2, v2 C1 and v2 C2)

PBMCs were thawed, and nuclei were isolated as described above. Technical replicates were generated on the same day starting from the same freshly thawed nuclei suspension. scATAC-seq libraries were prepared according to the Chromium Single Cell ATAC reagent kits v2 user guide (10x Genomics, CG000496 Rev B). Briefly, the transposition reaction was prepared by mixing the desired number of nuclei with ATAC Buffer (10x Genomics) and ATAC Enzyme (10x Genomics) and was then incubated for 30 min at 37 °C; 4,590 nuclei were loaded with a goal of recovering 3,000 nuclei. Nuclei were partitioned into GEMs by using a Chromium Controller with Chip H. Sample v2 T1 was the only sample for which Chromium X was used. DNA linear amplification was then performed by incubating the GEMs under the following thermal cycling conditions: 72 °C for 5 min, 98 °C for 30 s and 12 cycles of 98 °C for 10 s, 59 °C for 30 s and 72 °C for 1 min. GEMs were broken using Recovery Agent (10x Genomics), and the resulting DNA was purified by sequential Dynabeads and SPRIselect reagent beads cleanups. Libraries were indexed by PCR using a Single Index kit N set A and incubated under the following thermal cycling conditions: 98 °C for 45 s and eight cycles of 98 °C for 20 s, 67 °C for 30 s and 72 °C for 20 s with a final extension of 72 °C for 1 min. Sequencing libraries were subjected to a final bead cleanup with SPRIselect reagent.

Samples v2 v1 and v2 v2 were sequenced on an Illumina NextSeq 2000 under the following sequencing conditions: 50 bp (read 1), 8 bp (i7 index), 16 bp (i5 index) and 50 bp (read 2). Samples v2 C1, v2 C2, v2 T1 and v2 T2 were sequenced on an Illumina NovaSeq 6000 under the following sequencing conditions: 50 bp (read 1), 8 bp (i7 index), 16 bp (i5 index) and 49 bp (read 2).

10x multiome (short identifiers MO Sa1, MO Sa2, MO C1, MO C2, MO V1 and MO V2)

PBMCs were thawed as described above. The isolation of nuclei was slightly different, including the use of RNase inhibitors to ensure RNA quality. Briefly, two pools of cells (technical replicates) were generated from the two donors by mixing 500,000 cells per donor, totaling 1 million cells per pool. Cells were pelleted for 5 min at 300g and 4 °C and were washed twice in 1 ml of wash buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 1% BSA, 0.1% Tween 20, 1 mM DTT and 1 U µl–1 RNase inhibitor). After the second wash and final centrifugation, cells were resuspended in 0.1 ml of chilled lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween 20, 0.1% NP-40, 0.01% digitonin, 1% BSA and 1 mM DTT) and incubated for 3 min on ice. Nuclei were washed three times in 1 ml of wash buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 1% BSA, 0.1% Tween 20, 1 mM DTT and 1 U µl–1 RNase inhibitor) by centrifuging at 500g for 5 min.

After the last centrifugation, cells were resuspended in chilled Nuclei Buffer (1× Nuclei Buffer, 1 mM DTT and 1 U µl–1 RNase inhibitor) calculated and loaded according to the Chromium Next GEM Single Cell Multiome ATAC + GEX user guide (protocol CG000338 Rev A); 4,590 nuclei were loaded with the goal of recovering 3,000 nuclei. For loading onto 10x chips, we sought to recover 3,000 nuclei. Following the isolation of nuclei and transposition, GEMs were generated using GEM Chip J. GEM cleanup and preamplification PCR were performed as per the user guide. For the ATAC-seq library, eight cycles of PCR were run, while seven cycles of PCR were performed for cDNA amplification. Of the amplified cDNA, 25% of the material was used for gene expression library construction with 15 cycles of PCR for both technical replicates.

For samples MO Sa1, MO Sa2, MO C1 and MO C2, ATAC libraries were sequenced on an Illumina NovaSeq 6000 using the following read protocol: 50 cycles (read 1), 8 cycles (i7 index read), 24 cycles (i5 index read) and 49 cycles (read 2). ATAC libraries from MO V1 and MO V2 were sequenced according to the same parameters but on a NextSeq 2000. For samples MO Sa1, MO Sa2, MO C1 and MO C2, RNA libraries were sequenced on an Illumina NovaSeq 6000 using the following read protocol: 28 cycles (read 1), 10 cycles (i7 index read), 10 cycles (i5 index read) and 90 cycles (read 2). RNA libraries from MO V1 and MO V2 were sequenced according to the same parameters but on a NextSeq 2000.

10x mtscATAC (short identifiers mt M1, mt M2, mt C1, mt C2, mt* Br1 and mt* Br2)

Cryopreserved PBMCs were thawed as described above. For samples mt* Br1 and mt* Br2, cells were also washed, and 250,000 live cells were sorted using SytoxBlue at a 1:1,000 dilution as a live/dead cell stain. Samples mt M1, mt M2, mt C1 and mt C2 were not sorted. Cells from each donor were subsequently pooled at a 1:1 ratio, and, after washing, cells were fixed in 1% formaldehyde (Thermo Fisher, 28906) in PBS for 10 min at room temperature, quenched with glycine solution to a final concentration of 0.125 M and washed twice in PBS via centrifugation at 400g for 5 min at 4 °C. Cells were subsequently treated with lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 , 0.1% NP-40 and 1% BSA) for 3 min on ice, followed by the addition of 1 ml of chilled wash buffer and inversion (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 and 1% BSA) before centrifugation at 500g for 5 min at 4 °C. The supernatant was discarded, and cells were diluted in 1× diluted Nuclei Buffer before counting using trypan blue and a Countess II FL automated cell counter. Subsequently, mtscATAC-seq libraries were generated using the Chromium Next GEM Single Cell ATAC Library & Gel Bead kit (v1.1, 1000175) according to the manufacturer’s instructions (CG000209); 4,590 nuclei were loaded with the goal of recovering 3,000 nuclei. Briefly, following tagmentation, cells were loaded onto a Chromium Controller Single Cell instrument to generate single-cell GEMs, followed by linear PCR, as described in the protocol using a C1000 Touch thermal cycler with the 96-Deep Well Reaction Module (Bio-Rad). After breaking the GEMs, barcoded tagmented DNA was purified and further amplified to enable sample indexing (11 cycles of PCR) and enrichment of mtscATAC-seq libraries. The final libraries were quantified using a Qubit double-stranded DNA high-sensitivity assay kit (Invitrogen) and a high-sensitivity DNA chip run on a Bioanalyzer 2100 system (Agilent).

Samples mt C1 and mt C2 were sequenced on an Illumina NovaSeq 6000 under the following sequencing conditions: 50 bp (read 1), 8 bp (i7 index), 16 bp (i5 index) and 49 bp (read 2). Samples mt M1 and mt M2 were sequenced on an Illumina NovaSeq 6000 under the following sequencing conditions: 150 bp (read 1), 8 bp (i7 index), 16 bp (i5 index) and 150 bp (read 2). Samples mt* Br1 and mt* Br2 were sequenced on an Illumina Nextseq 550 with paired-end reads (2 × 72 cycles), 8 cycles for index 1 and 16 cycles for index 2.

Bio-Rad SureCell ATAC (short identifiers ddS Bi1–ddS Bi4, ddS H1, ddS H2, ddS U1 and ddS U2)

Cryopreserved PBMCs were quickly thawed in a water bath at 37 °C, rinsed with culture medium (RPMI supplemented with 15% FBS) and treated with 0.2 U μl−1 DNase I (Thermo Fisher Scientific) in 5 ml of culture medium at 37 °C for 30 min. After DNase I treatment, cells were washed once with medium and twice with ice-cold 1× PBS supplemented with 0.1% BSA. Cells were then filtered with a 35-μm cell strainer (Corning), and cell viability and concentration were measured with trypan blue on a TC20 automated cell counter (Bio-Rad).

For a detailed description of tagmentation protocols and buffer formulations, refer to the SureCell ATAC-Seq Library Prep kit user guide (17004620, Bio-Rad). Collected cells and tagmentation buffers were chilled on ice. Lysis was performed simultaneously with tagmentation. After washing, equal numbers of cells from each donor were mixed with Whole-Cell Tagmentation Mix containing 0.1% Tween 20, 0.01% digitonin and 1× PBS supplemented with 0.1% BSA, ATAC Tagmentation Buffer and ATAC Tagmentation Enzyme (ATAC Tagmentation Buffer and ATAC Tagmentation Enzyme are both included in the SureCell ATAC-Seq Library Prep kit (17004620, Bio-Rad)). The mix was split into two technical replicates, and cells were then mixed and agitated on a ThermoMixer (5382000023, Eppendorf) for 30 min at 37 °C. Tagmented cells were kept on ice before encapsulation.

Tagmented cells were loaded onto a ddSEQ Single-Cell Isolator (12004336, Bio-Rad). For samples ddS H1, ddS H2, ddS U1 and ddS U2, 5,000 nuclei were loaded with the goal of recovering 3,000 nuclei. scATAC-seq libraries were prepared using a SureCell ATAC-Seq Library Prep kit (17004620, Bio-Rad) and SureCell ddSEQ Index kit (12009360, Bio-Rad). Bead barcoding and sample indexing were performed in a C1000 Touch thermal cycler with a 96-Deep Well Reaction Module (1851197, Bio-Rad). The following PCR conditions were used: 37 °C for 30 min; 85 °C for 10 min; 72 °C for 5 min; 98 °C for 30 s; eight cycles of 98 °C for 10 s, 55 °C for 30 s and 72 °C for 60 s and a single 72 °C extension for 5 min to finish. Emulsions were broken, and products were cleaned up using Ampure XP beads (A63880, Beckman Coulter). Barcoded amplicons were further amplified using a C1000 Touch thermal cycler with a 96-Deep Well Reaction Module. The following PCR conditions were used: 98 °C for 30 s and seven cycles of 98 °C for 10 s, 55 °C for 30 s and 72 °C for 60 s and a single 72 °C extension for 5 min to finish. PCR products were purified using Ampure XP beads and quantified on an Agilent Bioanalyzer (G2939BA, Agilent) using a high-sensitivity DNA kit (5067-4626, Agilent).

For samples ddS H1, ddS H2, ddS Bi3, ddS Bi4, ddS U1 and ddS U2, libraries were sequenced on a NextSeq 550 (SY-415-1002, Illumina) using a NextSeq High-Output kit (150 cycles; 20024907, Illumina) and the following read protocol: 118 cycles (read 1), 8 cycles (i7 index) and 40 cycles (read 2). For ddS Bi1 and ddS Bi2, samples were sequenced on a NovaSeq according to the same protocol. A custom sequencing primer was required for read 1 (16005986, Bio-Rad; included in the kit).

HyDrop ATAC (short identifiers Hy E1–Hy E4, Hy V1, Hy V2 and Hy C1–Hy C3)

PBMCs were thawed, and nuclei were isolated as described above. HyDrop was performed as previously described12 but with an updated barcoded hydrogel bead design and minor improvements in nuclei handling. Briefly, barcoded hydrogel beads were produced as described previously but using 384 × 384 combinations of primers instead of the original method using 96 × 96 × 96 combinations, resulting in a barcode sequence of 30 bp instead of 50 bp. One million PBMCs were counted, pelleted and resuspended in 200 μl of ATAC Lysis Buffer (1% BSA, 10 mM Tris-HCl (pH 7.5), 10 mM NaCl, 0.1% Tween 20, 0.1% NP-40, 3 mM MgCl 2 , 70 μM Pitstop in DMSO and 0.01% digitonin) for 5 min on ice. One milliliter of ATAC Nuclei Wash Buffer (1% BSA, 10 mM Tris-HCl (pH 7.5), 0.1% Tween 20, 10 mM NaCl and 3 mM MgCl 2 ) was added, and nuclei were pelleted at 500g at 4 °C for 5 min. The resulting pellet was resuspended in 100 μl of ice-cold PBS and filtered with a 40-μm strainer (Flowmi); 25,000 PBMC nuclei were resuspended in 25 μl of ATAC Reaction Mix (10% dimethylformamide, 10% Tris-HCl (pH 7.4), 5 mM MgCl 2 , 5 ng μl–1 Tn5, 70 μM Pitstop in DMSO, 0.1% Tween 20 and 0.01% digitonin) and incubated at 37 °C for 1 h without shaking. To recover a target of 3,000 nuclei, 5,625 tagmented nuclei were added to 48 μl of PCR mix (1.3× Phusion HF buffer, 15% OptiPrep, 1.3 mM dNTPs, 39 mM DTT, 0.065 U μl–1 Phusion HF polymerase, 0.065 U μl–1 Deep Vent polymerase and 0.013 U μl–1 ET SSB). PCR mix was coencapsulated with 35 μl of freshly thawed HyDrop ATAC beads in hydrofluoroether 7500 Novac oil with EA-008 surfactant (RAN Biotech) on an Onyx microfluidics platform (Droplet Genomics). The resulting emulsion was collected in aliquots of 25 μl in total volume and thermocycled according to the linear amplification program (72 °C for 15 min; 98 °C for 3 min; 12 amplification cycles of 98 °C for 10 s, 63 °C for 30 s and 72 °C for 1 min and a final hold at 4 °C). One hundred and twenty-five microliters of Recovery Agent (20% perfluorooctanol in hydrofluoroether 7500), 55 μl of guanidinium thiocyanate buffer (5 M guanidinium thiocyanate, 25 mM EDTA and 50 mM Tris-HCl (pH 7.4)) and 5 μl of 1 M DTT were added to each separate aliquot of 50 μl of thermocycled emulsion and incubated on ice for 5 min. Five microliters of Dynabeads was added to the aqueous phase and incubated for 10 min. Dynabeads were pelleted on a neodymium magnet and washed twice with 80% ethanol. Elution was performed in 50 μl of elution buffer (10 mM Tris-HCl, pH 8.5) supplemented with 10 mM DTT and 0.1% Tween 20. A 1× Ampure bead purification was performed according to manufacturer’s recommendations. Elution was performed in 30 μl of elution buffer supplemented with 10 mM DTT. Eluted library was further amplified in 100 μl of PCR mix (1× KAPA HiFi, 1 μM index i7 primer and 1 μM index i5 primer). The final library was purified in a 0.4–1.2× double-sided Ampure purification, eluted in 25 μl of elution buffer supplemented with 10 mM DTT and quality controlled on an Agilent Bioanalyzer high-sensitivity chip (Agilent Technologies).

Samples Hy V1, Hy V2 and Hy E1–Hy E4 were loaded at 750 pM on a NextSeq 2000 using a NextSeq 2000 P2 kit (100 cycles; 20046811, Illumina), and sequencing was performed using the following read protocol: 49 cycles (read 1), 10 cycles (i7 index read), 31 cycles (i5 index read) and 48 cycles (read 2). Samples Hy C1–Hy C3 were sequenced on a NovaSeq 6000 using the same parameters.

s3-ATAC (short identifiers s3 O1 and s3 O2)

Samples s3 O1 and s3 O2 were generated on different days according to the following protocol. Only sample s3 O1 was performed on the reference PBMC sample of two donors. The PBMC pellet was thawed and suspended in NIB-HEPES (pH 7.2; 10 mM HEPES-KOH (BP310-500 (Fisher Scientific) and 1050121000 (Sigma-Aldrich), respectively), 10 mM NaCl, 3 mM MgCl 2 (Fisher Scientific, AC223210010), 0.1% (vol/vol) IGEPAL CA-630 (Sigma-Aldrich, I3021) and 0.1% (vol/vol) Tween (Sigma-Aldrich, P-7949)) before Dounce homogenization. s3-ATAC was then performed as described previously13. Two plates were prepared for a total of 2,880 nuclei per sample. Briefly, nuclei were flow sorted via a Sony SH800 to remove debris and attain an accurate count per well before PCR in 1× TD buffer. Immediately following sorting completion, the plate was sealed and centrifuged for 5 min at 500g and 4 °C to ensure that nuclei were within the buffer. Nucleosomes and remaining transposases were then denatured with the addition of 1 µl of 0.1% SDS (roughly 0.01% final concentration) per well. Then, 4 µl of NPM (Nextera XT kit, Illumina) per well was subsequently added to perform gap-fill on tagmented genomic DNA, with an incubation at 72 °C for 10 min. Next, 1.5 µl of 1 µM A14-LNA-ME oligonucleotides was added to supply the template for adapter switching. The polymerase-based adapter switching was then performed under the following conditions: initial denaturation at 98 °C for 30 s and ten cycles of 98 °C for 10 s, 59 °C for 20 s and 72 °C for 10 s. The plate was then held at 10 °C. After adapter switching, 1% (vol/vol) Triton X-100 in ultrapure water (Sigma, 93426) was added to quench persisting SDS. The following was then combined per well for PCR: 16.5 µl of sample, 2.5 µl of indexed i7 primer at 10 µM, 2.5 µl of indexed i5 primer at 10 µM, 3 µl of ultrapure water, 25 µl of NEBNext Q5U 2× master mix (New England Biolabs, M0597S) and 0.5 µl of 100× SYBR Green I (Thermo Scientific, S7563) for a total of 50 µl of reaction per well. Real-time PCR was performed on a Bio-Rad CFX under the following conditions measuring SYBR fluorescence every cycle: 98 °C for 30 s and 16–18 cycles of 98 °C for 10 s, 55 °C for 20 s and 72 °C for 30 s, fluorescent reading and 72 °C for 10 s. After fluorescence passed an exponential growth and began to inflect, the samples were held at 72 °C for another 30 s and stored at 4 °C. Amplified libraries were then cleaned by pooling 25 µl per well into a 15-ml conical tube and cleaning via a QIAquick PCR purification column following the manufacturer’s protocol (Qiagen, 28106). The pooled sample was eluted in 50 µl of 10 mM Tris-HCl (pH 8.0). Library molecules then went through size selection via SPRI selection beads (Mag-Bind TotalPure NGS Omega Biotek, M1378-01). Next, 50 µl of vortexed and fully suspended room temperature SPRI beads was combined with the 50-µl library (one cleanup) and incubated at room temperature for 5 min. The reaction was then placed on a magnetic rack, and, once cleared, the supernatant was removed. The remaining pellet was rinsed twice with 100 µl of fresh 80% ethanol. After the ethanol was pipetted out, the tube was spun down and placed back on the magnetic rack to remove any lingering ethanol. Next, 31 µl of 10 mM Tris-HCl (pH 8.0) was used to resuspend the beads off the magnetic rack, followed by an incubation for 5 min at room temperature. The tube was again placed on the magnetic rack, and, once cleared, the full volume of supernatant was moved to a clean tube. DNA was then quantified by Qubit double-stranded DNA high-sensitivity assay following the manufacturer’s instructions (Thermo Fisher, Q32851). Libraries were diluted to 2 ng µl−1 and run on an Agilent Tapestation 4150 D5000 tape (Agilent, 5067-5592). Library molecule concentration within the range of 100 to 1,000 bp was then used for final library dilution of 1 nM.

Samples s3 O1 and s3 O2 were sequenced on a NovaSeq S2 flow cell following the manufacturer’s recommendations (Illumina, 20028315) as paired-end libraries with 10 cycle index reads and 85 cycles (O1) or 90 cycles (O2) for reads 1 and 2.

10x v1.1 control runs (short identifiers v1.1c C1 and v1.1c C2)

Two additional control runs were performed on the same day as v1.1 C3. Control run v1.1c C1 was performed using the standard 10x nuclei extraction lysis buffer with the omission of NP-40 to simulate the whole-cell protocol used in Bio-Rad ddSEQ experiments. Control run v1.1c C2 was performed using the Dounce homogenization protocol as described in the s3-ATAC experiments but without FACS. Starting from permeabilized cells or Dounce-extracted nuclei, both control runs were performed exactly according to the standard 10x v1.1 protocol simultaneously with v1.1 C3.

Samples v1.1c C1 and v1.1c C2 were sequenced on a NovaSeq 6000 with 50 cycles for read 1, 49 cycles for read 2, 8 cycles for index 1 and 16 cycles for index 2.

10x scRNA-seq

Cryopreserved PBMCs were thawed as described above, and equal numbers of cells from each donor were mixed. The cell mix was partitioned into GEMs by using the Chromium Controller system (10x Genomics), with a target recovery of 5,000 total cells. We generated three technical replicates by loading three channels of Chip G with the same cell mix. cDNA sequencing libraries were prepared using the Next GEM Single Cell 3′ reagent kit v3.1 (10x Genomics, PN-1000268), following the manufacturer’s instructions. Briefly, after GEM-RT cleanup, cDNA was amplified during 12 cycles, and cDNA quality control and quantification were performed on an Agilent Bioanalyzer high-sensitivity chip (Agilent Technologies). cDNA libraries were indexed by PCR using the PN-220103 Chromium i7 Sample Index Plate. Size distribution and concentration of 3′ cDNA libraries were verified on an Agilent Bioanalyzer high-sensitivity chip (Agilent Technologies).

Sequencing of cDNA libraries was performed on an Illumina NovaSeq 6000 using the following sequencing conditions to obtain approximately 40,000 reads per cell: 28 bp (read 1), 8 bp (i7 index), 0 bp (i5 index) and 89 bp (read 2).

Data preprocessing

Unified scATAC-seq data analysis pipeline (PUMATAC)

We developed PUMATAC, a unified Nextflow v21.04.3 (ref. 29) pipeline, to align samples from multiple technologies to the reference genome and write fragments files from these reference genome alignments (https://github.com/aertslab/PUMATAC). The steps implemented in PUMATAC are described briefly in the text below and in detail with examples at https://github.com/aertslab/scATAC-seq_benchmark. All code necessary to reproduce our analyses and graphics is present in notebooks in this repository.

Barcode correction and FASTQ processing (singlecelltoolkit in PUMATAC)

Each barcode was compared to the whitelist barcodes and kept (with bam tag ‘CB’) if it was a perfect match or if changing any of the bases by 1 bp resulted in a match (Hamming distance of 1). Barcodes that were unable to be corrected were retained with the ‘CR’ bam tag. The barcode tag information, including the original barcode quality scores (‘CY’), was added to the comments field in each of the two paired-end FASTQ files. Adapter trimming was then performed using TrimGalore (version 0.6.6)30 with the ‘–paired’ option, which in turn runs Cutadapt31.

Reference genome alignment and fragments writing (bwa-mem in PUMATAC)

We first aligned full sequencing datasets of all samples to the GRCh38 or mm10 reference genome using PUMATAC. We then filtered cells (described later) and downsampled all sequencing data to a common sequencing depth of 40,796 reads per cell and realigned these downsampled FASTQ files. In PUMATAC, alignment was performed using bwa-mem2 (v2.2.1)32 with the ‘mem’ method and default mapping parameters. The ‘-C’ option was used to copy the barcode tag information from the FASTQ file to the resulting bam file. Read group information was taken from the FASTQ name field in the first line of each input file and added with the ‘-R’ option in bwa-mem2. The ‘fixmate’ tool from SAMtools (version 1.12)33 was used to add mate coordinates and insert sizes to the file. Reads were aligned to the GRCh38 reference for the PBMC samples and to mm10 for mouse public data. From the resulting aligned reads in .bam format, fragments were written in the bed-like fragments.tsv.gz format using a combination of SAMtools34 and AWK, according to the base pair shift rules described in the CellRanger manual (https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments).

Barcode multiplet detection (barcard in PUMATAC)

For each sample, we detected barcode multiplets using barcard, our own reimplementation of bap (https://github.com/caleblareau/bap). Similar to bap, barcard subsets fragments files to barcodes associated with at least 1,000 unique fragments. For each remaining barcode, the number of unique fragments that share their beginning and end coordinates between the two barcodes divided by the total number of barcodes found in both barcodes combined is calculated with every other barcode. The Jaccard indices for these barcode pairs are then ranked and thresholded using Otsu’s algorithm to identify barcode multiplets. Following the identification of barcode multiplets in each sample, a new tag (‘DB’) was added to the bam file to represent droplet barcodes. This tag contained either the original corrected barcode from the CB tag (in the case of singlets) or an underscore-separated concatenation of each corrected barcode that forms the multiplet. This step, and others, was parallelized using GNU Parallel35. Similarly, fragments.tsv.gz files were rewritten to merge detected barcode multiplets. While this step is only necessary for Bio-Rad ddSEQ samples, we detected and merged multiplets in all samples.

PUMATAC validation using CellRanger

We realigned all 10x v1, v1.1, v2 and multiome data using CellRanger-arc. We then subset fragments files generated by CellRanger and PUMATAC on barcodes identified as cell barcodes. For each pair of barcodes, we then calculated the number of unique fragments that were attributed to that barcode by both CellRanger and PUMATAC based on beginning and ending coordinates. The Jaccard index was calculated based on this number.

Downstream analyses

Starting from the fragments files generated by PUMATAC and merged using barcard, we then performed further analyses such as clustering, cell-type annotation, differential region calling and transcription factor motif analysis using a combination of bioinformatics packages.

Single-cell-level quality control and barcode filtering (pycisTopic)

We used the Python implementation of cisTopic16 (pycisTopic; https://github.com/aertslab/pycisTopic) to collect single-cell-level quality control statistics and filter barcodes starting from the PUMATAC fragments files. For quality control purposes, we considered all barcodes with at least ten unique fragments. We used the GRCh38 or mm10 BioMart36 gene annotation to calculate TSS enrichment. We followed the current ENCODE recommendations in calculating TSS enrichment by examining read depth in a 2,000-bp window on each side of the TSS (https://www.encodeproject.org/data-standards/terms/#enrichment). Cells were then filtered from barcodes using Otsu algorithm-defined thresholds on TSS enrichment and number of unique fragments per cell. In the first pass, we counted fragments in SCREEN regions18. This count matrix was then used to annotate cells with known cell types, after which consensus peaks could be called over all cell types present (see later). We then used this consensus peak set to recount the fragments and perform further downstream analyses in the second pass.

Donor identification (Freemuxlet in VSN/PUMATAC)

We used Freemuxlet to identify barcodes belonging to each of the two mixed individuals in each sample and simultaneously identify doublets on the basis of barcodes with mixed genotypes. We first ran a prefiltering step to filter the bam file for only the selected cells after the initial quality control. We then used the popscle suite of tools (https://github.com/statgen/popscle), first ‘dsc-pileup’ to quantify reads overlapping known variants and then Freemuxlet to call sample and doublet identity in each barcode. Freemuxlet requires a list of known variants in the genome along with their allele frequencies. To obtain this, we used the 1000 Genomes Phase 3 dataset37 and applied filtering steps to keep only SNPs with a minor allele frequency of at least 10%. This step was automated using VSN/PUMATAC14.

Doublet identification (Scrublet)

We used the Python implementation of Scrublet19 to identify doublets among the barcodes selected based on the fragments/regions count matrix during the initial quality control steps. Doublet thresholds for each sample were set manually, and doublets were removed for downstream analysis.

Sequencing depth downsampling (seqtk)

After the initial quality control steps to select cells, we identified the sample with the fewest number of reads per cell and used this sample as the reference to which the others were downsampled. After identifying the target number of reads per cell in the reference sample (ddS Bi3, with 40,796 reads per cell), each of the FASTQ files for the other samples were downsampled to that depth. Downsampling was performed with seqtk (https://github.com/lh3/seqtk, version 1.3-r106). The seed parameter (‘-s’) was set to the same value for all files to ensure that the reads remained paired across the paired-end and barcode files. Following downsampling of each FASTQ file, the mapping procedure was repeated to produce new downsampled fragments and bam files. This was repeated for 35,000, 30,000, 25,000, 20,000, 15,000, 10,000 and 5,000 reads per cell, and these FASTQ files were further processed as described earlier and later.

Cell-type identification (Seurat)

Label transfer was performed using an annotated PBMC reference dataset21 consisting of nine independent technology types and batches. We used Seurat (v4.0.3) to perform the label transfer steps in an R (v4.1.0) environment. Label transfer was performed using methods outlined in the Seurat vignettes and associated22. In brief, each of the nine already annotated PBMC reference datasets was compared pairwise to find cells that serve as anchors between them and then used to generate an integrated reference that minimized technical differences.

For the scATAC-seq data, we used this integrated PBMC reference to predict cell types. A gene activity matrix was first estimated, and label transfer was performed by assigning query cells based on the local neighborhood around each anchor in the integrated reference, with the highest scoring cell type being assigned. Following prediction of the scATAC-seq cell types, we refined these classifications by using the clusters identified in the scATAC-seq data. Clustering was first performed on cells with the Leiden algorithm using a high resolution to generate many fine-grained clusters. For each cluster, we then assigned a consensus cell-type identity to the entire cluster based on the majority cell type identified by label transfer. In this way, the ATAC-based clusters were labeled with the most likely cell type, while peak information was retained for later analysis. Where multiple clusters existed for one cell type, these were merged and used to generate cell-type-specific peak sets in downstream steps.

Two-pass dimensionality reduction (pycisTopic)

Fragments were first counted in ENCODE SCREEN regions to generate a preliminary count matrix. This count matrix was used to filter cells based on TSS enrichment and number of unique fragments. The SCREEN regions count matrix was then used to train cisTopic’s latent Dirichlet allocation models, and the model with an optimal number of topics was selected as described in ref. 16. Based on Seurat cell-type identification and high-resolution Leiden clustering, a consensus cell type was assigned to each cell. Cell-type-specific peaks (see later) were called based on these consensus cell types and aggregated into a new consensus peak set. The fragments were recounted in this new peak set to generate a consensus peak count matrix. This count matrix was used to retrain cisTopic’s latent Dirichlet allocation models, and an optimal model was chosen for the second pass and used to reduce the dimensionality of the data.

Consensus peak calling (pycisTopic)

In pyCisTopic, the ‘export_pseudobulk’ function was used to create cell-type-specific fragments and bigwig files using the consensus cell types. These were in turn used to generate cell-type-specific consensus peaks for each sample by recalling the subset of cells with MACS2 (ref. 23). Peak calling for quality control purposes was performed using MACS2 with settings specific to ATAC-seq experiments (‘genome_size = hs’, ‘shift = 73’, ‘ext_size = 146’ and ‘q_value = 0.01’). We used ‘Version 2’ of the ENCODE candidate cis-Regulatory Elements with blacklisted regions removed38 (Supplementary Fig. 1b). These regions were used after the quality control steps to create the cisTopic object, perform the first-pass clustering and obtain consensus peaks. Duplicate rates were calculated for each barcode by dividing the number of unique fragments by the total.

Cell-type-specific peak sets were generated for each sample. For each sample, the cell-type-specific peak sets were then merged into a final sample-specific consensus set. Each sample’s FRIP was calculated using these final consensus peaks. The consensus sets of all 47 samples were then merged into one master set, in which all data were counted to form the merged datasets.

Region overlap calculation (HOMER)

The mergePeaks function of the HOMER suite (v4.11)39 with default parameters (adding -d given -venn options) was used to find overlap between several region sets, notably the overlap between consensus DARs and peaks recovered in the merged cell-type fair set and in the individual cell-type fair sets.

DAR calling (pycisTopic)

A Wilcoxon rank-sum test was used to calculate significance of enrichment of regions (fold change) between each specified contrast using cisTopic’s imputed region accessibility based on cell topic and topic region probabilities. We contrasted cell types (type 1 versus all), technologies (cells from cell type 1 from technique A versus cells from cell type 1 from technique B) and donors (cells from cell type 1 from donor A versus cells from cell type 1 from donor B). Because high-quality samples produce many DARs that can skew the distribution of DAR enrichment scores to lower numbers, we chose to show the distributions of the top 2,000 DARs for each cell type contrast and the top 200 for male–female contrasts. For cell-type contrasts, differential accessibility was thresholded at a minimum of 1.5× fold change enrichment. For male/female contrasts, a minimum threshold of 1.2× fold change enrichment was used.

Transcription factor motif enrichment analysis (cisTarget)

Cell-type-specific and male/female-specific DARs were analyzed for transcription factor motif enrichment using cisTarget24,25 and standard parameters and settings for the human genome.

Price calculation and sequencing saturation

We calculated the price of a hypothetical 5,000-cell experiment based on US list prices for commercial methods and original manuscripts for open-source methods as follows. The 10x scATAC-seq v2 assay was quoted as $1,750 for 48 Next Gem Chip H (1000161), $930 for 96 indices (1000212) and $24,300 for 16 Chromium Next GEM Single Cell ATAC v2 (1000390) for a weighted total of $1,565 per lane. For the 10x multiome assay, the price was $1,750 for 48 Next Gem Chip J (1000234), $930 for 96 indices (1000215) and $44,760 for 16 Chromium Next GEM Single Cell Multiome ATAC + Gene Expression (1000283) for a weighted total of $2,843 per lane. For Bio-Rad ddSEQ SureCell ATAC, the price was $8,800 for a complete SureCell ATAC-Seq library prep kit (17004620), which accommodates eight samples. For s3-ATAC, the cost per plate is ~$200, and each plate can accommodate 1,440 cells. For HyDrop, the cost per run is ~$100 and can recover 8,000–10,000 cells.

Bio-Rad ddSEQ reports a doublet rate of 3.76% at a recovery of 5,000 cells (https://www.bio-rad.com/sites/default/files/webroot/web/pdf/lsr/literature/ATAC-Seq_Poster.pdf). 10x Chromium supports a recovery of up to 10,000 cells but at a doublet rate of 8%. At an expected doublet rate of 4%, 5,000 cells can be recovered (https://kb.10xgenomics.com/hc/en-us/articles/360001378811-What-is-the-maximum-number-of-cells-that-can-be-profiled-). HyDrop reports 6% doublets on 8,000 recovered cells 12. These three microfluidic methods use the same microfluidic concepts to encapsulate single cells, and doublet rates are similar when correcting for the number of cells recovered. We therefore reasoned that the most fair comparison would be to assume a recovery of 5,000 cells per 10x and ddSEQ lane or HyDrop run. For s3-ATAC, we assumed 1,440 cells per plate, for a total of 5,760 cells across four plates in s3-ATAC.

Full sequencing depth fragments files were subset to filtered cell barcodes (before doublet filtering and minimum TSS enrichment threshold, that is, only filtered by Otsu thresholding on minimum number of reads). We then subsampled these fragments files using a range of fractions and fitted a Michaelis–Menten kinetic model on the resulting duplication rate by the number of reads per cell. We defined the saturation sequencing depth at which each technology is expected to reach 50% duplicate fragments after sequencing.

scRNA-seq analysis (scanpy)

After aligning scRNA-seq data to the reference genome using CellRanger or CellRanger-arc, scanpy40 was used to calculate single-cell quality control metrics for each sample. True cells were filtered from noise using Otsu-derived cutoffs on minimum number of UMIs.

Public mouse brain data reanalysis

Public mouse scATAC-seq data were downloaded from the following sources: 10x Genomics scATAC-seq v1.0 on Chromium (https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/1.2.0/atac_v1_adult_brain_fresh_5k/atac_v1_adult_brain_fresh_5k_fastqs.tar), 10x Genomics scATAC-seq v1.1 on Chromium (https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/2.1.0/8k_mouse_cortex_ATACv1p1_nextgem_Chromium_X/8k_mouse_cortex_ATACv1p1_nextgem_Chromium_X_fastqs.tar), 10x Genomics scATAC-seq v2 on Chromium X (https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/2.1.0/8k_mouse_cortex_ATACv2_nextgem_Chromium_X/8k_mouse_cortex_ATACv2_nextgem_Chromium_X_fastqs.tar), 10x Genomics scATAC-seq v2 on Chromium (https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/2.1.0/8k_mouse_cortex_ATACv2_nextgem_Chromium_Controller/8k_mouse_cortex_ATACv2_nextgem_Chromium_Controller_fastqs.tar), 10x Genomics Multiome ATAC (https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-arc/2.0.0/e18_mouse_brain_fresh_5k/e18_mouse_brain_fresh_5k_fastqs.tar), Bio-Rad ddSEQ (SRA accession number SRR14494477), s3-ATAC (SRA accession number SRX10841853) and HyDrop (SRA accession number PRJNA733185).

These data were reanalyzed in a similar manner as the PBMC datasets. Briefly, cells were filtered from noise as described above. All datasets were then downsampled to the highest common depth and further intervals of 5,000 reads per cell. These downsampled sets were realigned to the mm10 reference genome using PUMATAC and counted in the SCREEN regions. Cells were clustered, and per cluster peaks were called and aggregated into a consensus peak set per sample. Datasets were recounted in these consensus peak sets to produce the quality metrics shown in our manuscript.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.