Precision and accuracy of single-molecule FRET measurements—a multi-laboratory benchmark study

Single-molecule Förster resonance energy transfer (smFRET) is increasingly being used to determine distances, structures, and dynamics of biomolecules in vitro and in vivo. However, generalized protocols and FRET standards to ensure the reproducibility and accuracy of measurements of FRET efficiencies are currently lacking. Here we report the results of a comparative blind study in which 20 labs determined the FRET efficiencies (E) of several dye-labeled DNA duplexes. Using a unified, straightforward method, we obtained FRET efficiencies with s.d. between ±0.02 and ±0.05. We suggest experimental and computational procedures for converting FRET efficiencies into accurate distances, and discuss potential uncertainties in the experiment and the modeling. Our quantitative assessment of the reproducibility of intensity-based smFRET measurements and a unified correction procedure represents an important step toward the validation of distance networks, with the ultimate aim of achieving reliable structural models of biomolecular systems by smFRET-based hybrid methods.

F RET 1 , also known as fluorescence resonance energy transfer, is a well-established method for studying biomolecular conformations and dynamics at both the ensemble 2-4 and the singlemolecule level [5][6][7][8][9][10] . In such experiments, the energy transfer between donor and acceptor fluorophores is quantified with respect to their proximity 1 . The fluorophores are usually attached via flexible linkers to defined positions of the system under investigation. The transfer efficiency depends on the interdye distance, which is well described by Förster's theory for distances > 30 Å 11,12 . Accordingly, FRET has been termed a 'spectroscopic ruler' for measurements on the molecular scale 2 , capable of determining distances in vitro, and even in cells 13 , with potentially ångström-level accuracy and precision. In its single-molecule implementation, FRET largely overcomes ensemble-averaging and time-averaging and can uncover individual species in heterogeneous and dynamic biomolecular complexes, as well as transient intermediates 5 .
The two most popular smFRET approaches for use in determining distances are confocal microscopy of freely diffusing molecules in solution and total internal reflection fluorescence (TIRF) microscopy of surface-attached molecules. Various fluorescenceintensity-and lifetime-based procedures have been proposed with the aim of determining FRET efficiencies 10,[14][15][16][17][18][19][20] . Here we focus on intensity-based measurements in which the FRET efficiency E is determined from donor and acceptor photon counts and subsequently used to calculate the interfluorophore distance according to Förster's theory.
So far most intensity-based smFRET studies have characterized relative changes in FRET efficiency. This ratiometric approach is often sufficient to distinguish different conformations of a biomolecule (e.g., an open conformation with low FRET efficiency versus a closed conformation with high FRET efficiency) and to determine their interconversion kinetics. However, knowledge about distances provides additional information that can be used, for example, to compare an experimental structure with known structures, or to assign conformations to different structural states. In combination with other structural measurements and computer simulations,
However, it is difficult to compare and validate distance measurements from different labs, especially when detailed methodological descriptions are lacking. In addition, different methods for data acquisition and analysis, which often involve custom-built microscopes and in-house software, can have very different uncertainties and specific pitfalls. To overcome these issues, here we describe general methodological recommendations and well-characterized standard samples for FRET that can enable researchers to validate results and estimate the accuracy and precision of distance measurements. This approach should allow the scientific community to confirm the consistency of smFRET-derived distances and structural models. To facilitate data validation across the field, we recommend the use of a unified nomenclature to report specific FRET-related parameters.
The presented step-by-step procedure for obtaining FRET efficiencies and relevant correction parameters was tested in a worldwide, comparative, blind study by 20 participating labs. We show that, for standardized double-stranded DNA FRET samples, FRET efficiencies can be determined with an s.d. value of less than ± 0.05.
To convert the measured smFRET efficiencies to distances, we used the Förster equation (equation (3); all numbered equations cited in this paper can be found in the Methods section), which critically depends on the dye-pair-specific Förster radius, R 0 . We discuss the measurements required to determine R 0 and the associated uncertainties. Additional uncertainty arises from the fact that many positions are sampled by the dye relative to the biomolecule to which it is attached. Therefore, specific models are used to describe the dynamic movement of the dye molecule during the recording of each FRET-efficiency measurement 22,23 . The investigation of the uncertainties in FRET-efficiency determination and the conversion into distance measurements enabled us to specify uncertainties for individual FRET-derived distances.

Results
Benchmark samples and approaches. We chose double-stranded DNA as a FRET standard for several reasons: DNA sequences can be synthesized, FRET dyes can be specifically tethered at desired positions, the structure of B-form DNA is well characterized, and the samples are stable at room temperature long enough that they can be shipped to labs around the world. The donor and acceptor dyes were attached via C2 or C6 amino linkers to thymidines of opposite strands ( Supplementary Fig. 1). These thymidines were separated by 23 Table 1, and Supplementary Note 1). The attachment positions were known only to the reference lab that designed the samples. The samples were designed in such a way that we were able to determine all correction parameters and carry out a selfconsistency test (described below).
In this study we used Alexa Fluor and Atto dyes because of their high quantum yields and well-studied characteristics (Supplementary Note 2). Eight hybridized double-stranded FRET samples were shipped to all participating labs. In the main text, we focus on four FRET samples that were measured by most labs in our study: To avoid dye stacking 28,29 , we designed the DNA molecules such that the dyes were attached to internal positions sufficiently far from the duplex ends. As a first test for the suitability of the labels, we checked the fluorescence lifetimes and time-resolved anisotropies (Supplementary Table 2) of all donor-only and acceptor-only samples. The results indicated that there was no significant quenching or stacking and that all dyes were sufficiently mobile at these positions (Supplementary Note 2).
Most measurements were carried out on custom-built setups that featured at least two separate spectral detection channels for donor and acceptor emission (Supplementary Figs. 3 and 4). Results obtained with different fluorophores (samples 3 and 4) and different FRET methods (ensemble lifetime 30 , single-molecule lifetime 16 , and a phasor approach 31 ) are presented in Supplementary Fig. 2 and Supplementary Notes 1 and 2.
A robust correction procedure to determine absolute fluorescence intensities is needed. The ideal solution is a ratiometric approach that, for intensity-based confocal FRET measurements, was pioneered by Weiss and coworkers and uses alternating twocolor laser excitation (ALEX) with microsecond pulses 17,32 . In this approach the fluorescence signal after donor excitation is divided by the total fluorescence signal after donor and acceptor excitation (referred to as apparent stoichiometry; see equation (16)), to correct for dye and instrument properties 17 . The ALEX approach was also adapted for TIRF measurements 20 . To increase time resolution and to enable time-resolved spectroscopy, Lamb and coworkers introduced pulsed interleaved excitation with picosecond pulses 33 .
Procedure to determine the experimental FRET efficiency 〈E〉. In both confocal and TIRF microscopy, the expectation value of the FRET efficiency 〈 E〉 is computed from the corrected FRET efficiency histogram. In this section, first we outline a concise and robust procedure to obtain 〈 E〉 . Then we describe distance and uncertainty calculations, assuming a suitable model for the interdye distance distribution and dynamics 6,11,34 . Finally, we derive self-consistency

Nature Methods
arguments and comparisons to structural models to confirm the accuracy of this approach. Our general procedure is largely based on a previous approach 17 , with modifications to establish a robust workflow and standardize the nomenclature. Intensity-based determination of FRET efficiencies requires consideration of the following correction factors (details in the Methods section): background signal correction (BG) from donor and acceptor channels; α, a factor for spectral cross-talk arising from donor fluorescence leakage in the acceptor channel; δ, a factor for direct excitation of the acceptor with the donor laser; and a detection correction factor (γ). The optimal way to determine these factors is to alternate the excitation between two colors, which allows for determination of the FRET efficiency (E) and the relative stoichiometry (S) of donor and acceptor dyes, for each single-molecule event. This requires the additional excitation correction factor β to normalize the excitation rates.
The following step-by-step guide presents separate instructions for confocal and TIRF experiments; notably, the order of the steps is crucial (Methods).
Diffusing molecules: confocal microscopy. Photon arrival times from individual molecules freely diffusing through the laser focus of a confocal microscope are registered. Signal threshold criteria are applied, and bursts are collected and analyzed. From the data, first a 2D histogram of the uncorrected FRET efficiency ( i E app ) versus the uncorrected stoichiometry ( i S app ) is generated (Fig. 2a). Then the average number of background photons is subtracted for each channel separately (Fig. 2b). Next, to obtain the FRET sensitized acceptor signal (F A|D ), one must subtract the donor leakage (α ii I Dem|Dex ) and direct excitation (δ ii I Aem|Aex ) from the acceptor signal after donor excitation. As samples never comprise 100% photoactive donor and acceptor dyes, the donor-only and acceptor-only populations are selected from the measurement and used to determine the leakage and direct excitation (Fig. 2c). After this correction step, the donor-only population should have an average FRET efficiency of 0, and the acceptor-only population should have an average stoichiometry of 0.
The last step deals with the detection correction factor γ and the excitation correction factor β. If at least two species (two different samples or two populations within a sample) with different interdye distances are present, they can be used to obtain the 'global γ-correction' . If one species with substantial distance fluctuations (e.g., from intrinsic conformational changes) is present, a 'singlespecies γ-correction' may be possible. Both correction schemes assume that the fluorescence quantum yields and extinction coefficients of the dyes are independent of the attachment point. The correction factors obtained by the reference lab are compiled in Supplementary Table 3. The final corrected FRET efficiency histograms are shown in Fig. 2d. The expected efficiencies 〈 E〉 are obtained as the mean of a Gaussian fit to the respective efficiency distributions. After correction, we noted a substantial shift of the FRET-efficiency peak positions, especially for the low-FRET-efficiency peak (E ~ 0.25 uncorrected to E ~ 0.15 when fully corrected).
Surface-attached molecules: TIRF microscopy. The correction procedure for TIRF-based smFRET experiments is similar to the Min.

Nature Methods
procedure for confocal-based experiments. In the procedure used for ALEX data 20 , a 2D histogram of the uncorrected FRET efficiency versus the uncorrected stoichiometry is generated (Fig. 2e). The background subtraction is critical in TIRF microscopy, as it can contribute substantially to the measured signal. Different approaches can be used to accurately determine the background signal, such as measuring the background in the vicinity of the selected particle or measuring the intensity after photobleaching (Fig. 2f). After background correction, the leakage and direct excitation can be calculated from the ALEX data as for confocal microscopy (Fig. 2g).
Again, determination of the correction factors β and γ is critical 15 . As with confocal microscopy, one can use the stoichiometry information available from ALEX when multiple populations are present to determine an average detection correction factor (global γ-correction). In TIRF microscopy, the detection correction factor can also be determined on a molecule-by-molecule basis, provided the acceptor photobleaches before the donor (individual γ-correction). In this case, the increase in the fluorescence of the donor can be directly compared to the intensity of the acceptor before photobleaching. A 2D histogram of corrected FRET efficiency versus corrected stoichiometry is shown in Fig. 2h.
In the absence of alternating laser excitation, the following problems occasionally arose during this study: (i) the low-FRETefficiency values were shifted systematically to higher efficiencies, because FRET-efficiency values at the lower edge were overlooked owing to noise; (ii) the direct excitation was difficult to detect and correct because of its small signal-to-noise ratio; and (iii) acceptor bleaching was difficult to detect for low FRET efficiencies. Therefore, we strongly recommend implementing ALEX in order to obtain accurate FRET data.
Nine of the twenty participating labs determined FRET efficiencies by confocal methods for samples 1 and 2 (Fig. 3a). Seven of the twenty participating labs determined FRET efficiencies by TIRF-based methods (Fig. 3b). The combined data from all labs for measurements of samples 1 and 2 agree very well, with s.d. for the complete dataset of Δ E < ± 0.05. This is a remarkable result, considering that different setup types were used (confocal-and TIRFbased setups) and different correction procedures were applied (e.g., individual, global, or single-species γ-correction).
Distance determination. The ultimate goal of this approach is to derive distances from FRET efficiencies. The efficiency-to-distance conversion requires knowledge of the Förster radius, R 0 , for the specific FRET pair used and of a specific dye model describing the behavior of the dye attached to the macromolecule 22,23 . In the following, we describe (i) how R 0 can be determined and (ii) how to use a specific dye model to calculate two additional values, R 〈E〉 and R MP . R 〈E〉 is the apparent distance between the donor and the acceptor, which is directly related to the experimental FRET efficiency 〈 E〉 that is averaged over all sampled donor-acceptor distances R DA (equation (5)), but it is not a physical distance. R MP is the real distance between the center points (mean positions) of the accessible volumes and deviates from R 〈E〉 because of the different averaging in distance and efficiency space. R MP cannot be measured directly but is important, for example, for mapping the physical distances required for structural modeling 34 .
R 0 is a function of equation (7) and depends on the index of refraction of the medium between the two fluorophores (n im ), the spectral overlap integral (J), the fluorescence quantum yield of the donor (Φ F,D ), and the relative dipole orientation factor (κ 2 ) (an estimate of their uncertainties is provided in the Methods section). Our model assumes that the FRET rate (k FRET ) is much slower than the rotational relaxation rate (k rot ) of the dye and that the translational diffusion rate (k diff ) allows the dye to sample the entire accessible volume within the experimental integration time (1/k int ), that is, k rot > > k FRET > > k diff > > k int . The validity of these assumptions is justified by experimental observables discussed in the Methods.  Table 4). Example correction factors are given in Supplementary Table 3.

Nature Methods
The determined Förster radii for samples 1 and 2 are given in Supplementary Table 4. Note that literature values differ mainly because donor fluorescence quantum yields are not specified and the refractive index of water is often assumed, whereas we used n im = 1.40 here. Our careful error analysis led to an error estimate of 7% for the determined R 0 , which is relatively large (mainly owing to the uncertainty in κ 2 ).
We used the measured smFRET efficiencies and the calculated Förster radii to compute the apparent distance R 〈E〉 from each lab's data (equation (5)). Figure 4a,b shows the calculated values for these apparent distances for samples 1 and 2 for each data point in Fig. 3.
The average values for all labs are given in Supplementary Table 4, together with model values based on knowledge of the dye attachment positions, the static DNA structure, and the mobile dye model (Supplementary Note 3). Considering the error ranges, the experimental and model values agree very well with each other (the deviations range between 0 and 8%).
Although this study focused on measurements on DNA, the described FRET analysis and error estimation are fully generalizable to other systems (e.g., proteins), assuming mobile dyes are used. What becomes more difficult with proteins is specific dye labeling, and the determination of an appropriate dye model, if the dyes are not sufficiently mobile (Supplementary Note 3). R 〈E〉 corresponds to the real distance R MP only in the hypothetical case in which both dyes are unpolarized point sources, with zero accessible volume (AV). In all other cases, R MP is the only physical distance. It can be calculated in two ways: (i) if the dye model and the local environment of the dye are known, simulation tools such as the FRET Positioning and Screening tool 8 can be used to compute R MP from R 〈E〉 for a given pair of AVs; or (ii) if the structure of the investigated molecule is unknown a priori, a sphere is a useful assumption for the AV. In both cases, a lookup table is used to convert R 〈E〉 to R MP for defined AVs and R 0 values (Supplementary Note 5). Our results for these calculations, given as distances determined via the former approach, are shown in Fig. 4c Distance uncertainties. We estimated all uncertainty sources arising from both the measurement of the corrected FRET efficiencies and the determination of the Förster radius, and propagated them into distance uncertainties. We discuss the error in determining the distance between two freely rotating but spatially fixed dipoles, R DA , with the Förster equation (equation (26)). Figure 5a shows how uncertainty in each of the correction factors (α, γ, and δ) and the background signals (BG D , BG A ) is translated into the uncertainty of R DA (Supplementary Note 6). The uncertainty of R MP is similar but depends on the dye model and the AVs. The solid gray line in Fig. 5a shows the sum of these efficiency-dependent uncertainties, which are mainly setup-specific quantities. For the extremes of the distances, the largest contribution to the uncertainty in R DA arises from background photons in the donor and acceptor channels. In the presented example with R 0 = 62.6 Å, the total uncertainty Δ R DA based on the setup-specific uncertainties is less than 4 Å for 35 Å < R DA < 90 Å. Notably, in confocal measurements, larger intensity thresholds can decrease this uncertainty further. The uncertainty in R DA arising from errors in R 0 (blue line in Fig. 5b) is added to the efficiency-related uncertainty in R DA (bold gray line in Fig. 5b) to estimate the total experimental uncertainty in R DA (black line in Fig. 5b). The uncertainties for determining R 0 are dominated by the dipole orientation factor κ 2 and the refractive index n im (Methods). Including the uncertainty in R 0 , the error Δ R DA,total for a single smFRET-based distance between two freely rotating point dipoles is less than 6 Å for 35 Å < R DA < 80 Å. The uncertainty is considerably reduced when multiple distances are calculated and self-consistency in distance networks is exploited 9 . Besides background contributions, an R DA shorter than 30 Å may lead to larger errors due to (i) potential dye-dye interactions and (ii) the dynamic averaging of the dipole orientations being reduced owing to an increased FRET rate.
Comparing distinct dye pairs. To validate the model assumption of a freely rotating and diffusing dye, we developed a self-consistency argument based on the relationship between different dye pairs that bypasses several experimental uncertainties. We define the ratio R rel for two dye pairs as the ratio of their respective R 〈E〉 values (Methods, equation (30)). This ratio is quasi-independent of R 0 , because all dye parameters that are contained in R 0 are approximately eliminated by our DNA design. Therefore, these ratios should be similar for all investigated dye pairs, which we indeed found was the case (Supplementary Table 4). When comparing, for example, the low-to mid-distances for three dye pairs with E > 0.1, we obtained a mean R rel of 1.34 and a maximum deviation of 2.7%. This is a relative error of 2.3%, which is less than the estimated error of our measured distances of 2.8% (Fig. 5a). This further demonstrates the validity of the assumptions for the dye model and averaging regime used here. This concept is discussed further in the Methods.
Although calculated model distances are based on a static model for the DNA structure, DNA is known not to be completely rigid 35 . We tested our DNA model by carrying out molecular dynamics simulations using the DNA molecule (without attached dye molecules; Supplementary Note 7) and found that the averaged expected FRET efficiency obtained with the computed dynamically varying slightly bent DNA structure led to comparable but slightly longer distances   Table 4) for those cases where we observed larger deviations with static models.

Discussion
Despite differences in the setups used, the reported intensity-based FRET efficiencies were consistent between labs in this study. We attribute this remarkable consistency (Δ E < ± 0.05) to the use of a general step-by-step procedure for the experiments and data analysis.
We also showed that the factors required for the correction of FRET efficiency can be determined with high precision, regardless of the setup type and acquisition software used. Together the measurement errors caused an uncertainty in R DA of less than 5%, which agrees well with the variations between the different labs. Ultimately, we were interested in the absolute distances derived from these FRET efficiencies. Figure 5 shows that any distance between 0.6 R 0 and 1.6 R 0 could be determined with an uncertainty of less than ± 6 Å. This fits well with the distance uncertainty measured across the labs and corresponds to a distance range from 35 to 80 Å for the dye pairs used in sample 1. This estimation is valid if the dyes are sufficiently mobile, as has been supported by time-resolved anisotropy measurements and further confirmed by a self-consistency argument. The s.d. for sample 2 was slightly larger than that for sample 1 (Fig. 5a), which could be explained by specific photophysical properties. The values for samples 3 and 4 (Supplementary Table 4) showed similar precision, considering the smaller number of measurements.
For the samples 1-hi and 2-hi, which were measured after each lab verified its setup and procedure, the precision was further increased by almost a factor of two (Supplementary Table 4), possibly owing to the thorough characterization during this study.
We also tested the accuracy of the experimentally derived distances by comparing them with distances in the static model. For every single FRET pair we found excellent agreement between 0.1% and 4.1% (0.4-2.4 Å) for sample 1 and agreement mostly within the range of experimental error between 3.1% and 9.0% (2.7-5.5 Å) for sample 2. The deviations could be even smaller for dynamic DNA models. For sample 2, which had the cyanine-based dye Alexa Fluor 647 instead of the carbopyronine-based dye Atto 647N as an acceptor, the lower accuracy could be explained by imperfect sampling of the full AV or dye-specific photophysical properties (details are presented in Supplementary Table 2). It was shown previously that cyanine dyes are sensitive to their local environment 36 and therefore require especially careful characterization for each newly labeled biomolecule.
For future work, it will be powerful to complement intensitybased smFRET studies with single-molecule lifetime studies, as the picosecond time resolution could provide additional information on calibration and fast dynamic biomolecular exchange. In addition, it will be important to establish appropriate dye models for more complex (protein) systems in which the local chemistry may affect dye mobility (Supplementary Note 4). However, when used with mobiles dyes (which can be checked via anisotropy and lifetime experiments; Supplementary Note 2), the dye model here is fully generalizable to any biomolecular system 8,9 .
The results from different labs and the successful self-consistency test clearly show the great potential of absolute smFRET-based distances for investigations of biomolecular conformations and dynamics, as well as for integrative structural modeling. The ability to accurately determine distances on the molecular scale with smFRET experiments and to estimate the uncertainty of the measurements provides the groundwork for smFRET-based structural and hybrid approaches. Together with the automated selection of the most informative pairwise labeling positions 23 and fast analysis procedures 8-10 , we anticipate that smFRET-based structural methods will become an important tool for de novo structural determination and structure validation, especially for large and flexible structures with which the application of other structural biology methods is difficult.

Methods
Methods, including statements of data availability and any associated accession codes and references, are available at https://doi. org/10.1038/s41592-018-0085-0.    Table 3. Table 5 for a summary of the following section. The FRET efficiency E is defined as

Nomenclature and definitions. See Supplementary
where F is the signal. The stoichiometry S is defined as The FRET efficiency for a single donor-acceptor distance R DA is defined as The mean FRET efficiency for a discrete distribution of donor-acceptor distances with the position vectors R i D( ) and R j A( ) is calculated as The apparent donor-acceptor distance R 〈E〉 is computed from the average FRET efficiency for a distance distribution. It is a FRET-averaged quantity that is also referred to as the FRET-averaged distance 〈 R DA 〉 E (ref. 37 ): where I is the experimentally observed intensity; F indicates the corrected fluorescence intensity; Φ Φ and F F ,A ,D are the fluorescence quantum yield of the acceptor and the donor, respectively; g R|A and g G|D represent the detection efficiency of the red detector (R) if only the acceptor was excited or green detector (G) if the donor was excited (analogously for other combinations); and σ Α | G is the excitation cross-section for the acceptor when excited with green laser (analogously for the other combinations).
The Förster radius (in angstroms) for a given J in the units shown below is given by  Table 1 and Supplementary Note 1. We ordered them from IBA GmbH (Göttingen), which synthesized and labeled the single DNA strands and then carried out HPLC purification. Here the dyes were attached to a thymidine (dT), which is known to cause the least fluorescence quenching of all nucleotides 26 .
Most labs measured the four DNA samples listed in Supplementary Table 1. Therefore, we focus on these four samples in the main text of this paper. The additional samples and the corresponding measurements are described in Supplementary Note 1, Supplementary Fig. 2, and Supplementary Table 4. A buffer consisting of 20 mM MgCl 2 , 5 mM NaCl, 5 mM Tris, pH 7.5, was requested for all measurements, with de-gassing just before the measurement at room temperature.
The linker lengths were chosen in such a way that all dyes had about the same number of flexible bonds between the dipole axis and the DNA. Atto 550, Alexa Fluor 647, and Atto 647N already have an intrinsic flexible part before the C-linker starts ( Supplementary Fig. 1). In addition, the DNAs were designed such that the distance ratio between the high-FRET-efficiency and low-FRET-efficiency samples should be the same for all samples, largely independent of R 0 .
Details on all used setups and analysis software are presented in Supplementary Note 8.
General correction procedure. The FRET efficiency E and stoichiometry S are defined in equations (1) and (3). Determination of the corrected FRET E and S is based largely on the approach of Lee et al. 17 and consists of the following steps: (1) data acquisition, (2) generation of uncorrected 2D histograms for E versus S, (3) background subtraction, (4) correction for position-specific excitation in TIRF experiments, (5) correction for leakage and direct acceptor excitation, and (6) correction for excitation intensities and absorption cross-sections, quantum yields, and detection efficiencies. For the confocal setups, a straightforward burst identification is carried out in which the trace is separated into 1-ms bins. Usually a minimum threshold (e.g., 50 photons) is applied to the sum of the donor and acceptor signals after donor excitation for each bin. This threshold is used again in every step, such that the number of bursts used may change from step to step (if the γ correction factor is not equal to 1). Some labs use sophisticated burst-search algorithms. For example, the dual-channel burst search 38,39 recognizes the potential bleaching of each dye within bursts. Note that the choice of burst-search algorithm can influence the γ correction factor. For standard applications, the simple binning method is often sufficient, especially for well-characterized dyes and low laser powers. This study shows that the results do not depend heavily on these conditions (if they are applied properly), as every lab used its own setup and procedure at this stage. The number of photon bursts per measurement was typically between 1,000 and 10,000.
For the TIRF setups, traces with one acceptor and one donor are selected, defined by a bleaching step. In addition, only the relevant range of each trajectory (i.e., prior to photobleaching of either dye) is included in all subsequent steps. The mean length of the time traces analyzed by the reference lab was 47 frames (18.8  For confocal measurements, one can determine the background by averaging the photon count rate for all time bins that are below a certain threshold, which is defined, for example, by the maximum in the frequency-versus-intensity plot (the density of bursts should not be too high). Note that a previous measurement of only the buffer can uncover potential fluorescent contaminants, but may differ substantially from the background of the actual measurement. The background intensity is then subtracted from the intensity of each burst in each channel (equation (10)). Typical background values are 0.5-1 photon/ms (Fig. 2b).
For TIRF measurements, various trace-wise or global background corrections can be applied. The most common method defines background as the individual offset (time average) after photobleaching of both dyes in each trace. Other possibilities include selecting the darkest spots in the illuminated area and subtracting an average background time trace from the data, or using a local background, for example, with a mask around the particle. The latter two options have the advantage that possible (exponential) background bleaching is also corrected. We did not investigate the influence of the kind of background correction during this study, but a recent study showed that not all background estimators are suitable for samples with a high molecule surface coverage 41 .
To summarize, a correction of the background is very important but can be done very well in different ways.
Position-specific excitation correction (optional for TIRF). The concurrent excitation profiles of both lasers are key for accurate measurements ( Supplementary Fig. 5). Experimental variations across the field of view are accounted for by a positionspecific normalization:

Leakage (α) and direct excitation (δ).
After the background correction, the leakage fraction of the donor emission into the acceptor detection channel and the fraction of the direct excitation of the acceptor by the donor-excitation laser are determined. The correction factor for leakage (α) is determined by equation (12), using the FRET efficiency of the donor-only population ("D only" in Fig. 2b,f). The correction factor for direct excitation (β) is determined by equation (13) from the stoichiometry of the acceptor-only population ("A only" in Fig. 2b,f).
ii app (DO) ii app ii app ii app where E ii app (DO) and S ii app (AO) are calculated from the background-corrected intensities ii I of the corresponding population, donor-only or acceptor-only, respectively. This correction, together with the previous background correction, results in the donor-only population being located at E = 0, S = 1 and the acceptor-only population at S = 0, = … E 0 1. The corrected acceptor fluorescence after donor excitation, | F A D , is given by equation (14), which yields the updated expressions for the FRET efficiency and stoichiometry, equations (15) and (16) In principle, the leaked donor signal could be added back to the donor emission channel 42 . However, this would require precise knowledge about spectral detection efficiencies, which is not otherwise required, and has no effect on the final accuracy of the measurement. As the determination of α and δ influences the γ and β correction in the next step, both correction steps can be repeated in an iterative manner if required (e.g., if the γ and β factors deviate largely from 1).  (17): ( 1) ( 1) . Error propagation, however, is more straightforward if equation (17) is used. If there is a complex dependence between properties of dyes and efficiencies, the homogeneous approximation is no longer applicable. In this case, the relationship between S iii app (DA) and E iii app (DA) for different populations (or even subpopulations for the same single species) cannot be described by equation (17) with a single γ correction factor. Here, γ can be determined for a single species. We call this 'single-species γ-correction' . This works only if the efficiency broadening is dominated by distance fluctuations. The reason for this assumption is the dependency of these correction factors on both the stoichiometry and the distancedependent efficiency. In our study, global and local γ-correction yielded similar results. Therefore, the homogeneous approximation, with distance fluctuations as the main cause for efficiency broadening, can be assumed for samples 1 and 2. Systematic variation of the γ correction factor yields an error of about 10%.
Alternatively, determination of γ and β factors can be done trace-wise, as in, for example, msALEX experiments 43 , where the γ factor is determined as the ratio of the decrease in acceptor signal and the increase in donor signal after acceptor bleaching. We call such an alternative correction individual γ-correction 15 . The analysis of local distributions can provide valuable insights about properties of the studied system.

Nature Methods
Plots of the E-versus-S histogram are shown in Fig. 2d,h. Now, the FRET population should be symmetric to the line for S = 0.5. The donor-only population should still be located at E = 0, and the acceptor-only population should be at S = 0. Finally, the corrected FRET efficiency histogram is generated from events with a stoichiometry of 0.3 < S < 0.7 (histograms in Fig. 2). The expected value of the corrected FRET efficiencies E is deduced as the center of a Gaussian fit to the efficiency histogram. This is a good approximation for FRET efficiencies in the range from about 0.1 to 0.9. In theory, the shot-noise limited efficiencies follow a binomial distribution if the photon number per burst is constant. For extreme efficiencies or data with a small average number of photons per burst, the efficiency distribution can no longer be approximated with a Gaussian. In this case and also in the case of efficiency broadening due to distance fluctuations, a detailed analysis of the photon statistics can be useful 38,[44][45][46] .
Uncertainty in distance due to R 0 . According to Förster theory 1 , the FRET efficiency E and the distance R are related by equation (3). In this study, we focused on the comparison of E values across different labs in a blind study. Many excellent reviews have been published on how to determine the Förster radius R 0 16,47,48 , and a complete discussion would be beyond the scope of the current study. In the following, we estimate and discuss the different sources of uncertainty in R 0 by utilizing standard error propagation (see also Supplementary Note 6 and ref. 26 ). R 0 is given by equation (7).
The 6th power of the Förster radius is proportional to the relative dipole orientation factor κ 2 , the donor quantum yield Φ F,D , the overlap integral J, and n −4 , where n is the refractive index of the medium: For Fig. 5b, we used a total Förster radius related distance uncertainty of 7%, which is justified by the following estimate. Please note that the error in the dipole orientation factor is always specific for the investigated system, whereas the errors in the donor quantum yield, overlap integral and refractive index are more general, although their mean values do also depend on the environment.
The refractive index. Different values for the refractive index in FRET systems have been used historically, but ideally the refractive index of the donor-acceptor intervening medium n im should be used. Some experimental studies suggest that the use of the refractive index of the solvent may be appropriate, but this is still open for discussion (see, e.g., the discussion in ref. 49 ). In the worst case, this value n im might be anywhere between the refractive index of the solvent (n water = 1.33) and a refractive index for the dissolved molecule (n < n oil = 1.52) (ref. 50 ), that is, n water < n im < n oil . This would result in a maximum uncertainty of Δ n im < 0.085. As recommended by Clegg 51 , we used n im = 1.40 to minimize this uncertainty (Supplementary Table 6). The distance uncertainty propagated from the uncertainty of the refractive indices can then be assumed to be The donor quantum yield Φ F,D is position dependent; therefore we measured the fluoresence lifetimes and quantum yields of the free dye Atto 550 and the 1-hi, 1-mid, and 1-lo donor-only labeled samples (Supplementary Table 2).
In agreement with the work of Sindbert et al. 37 , the uncertainty of the quantum yield is estimated at Δ Φ ′ F,D = 5%, arising from the uncertainties of the Φ F values of reference dyes and the precision of the absorption and fluorescence measurements. Thus, the distance uncertainty due to the quantum yield is estimated as The overlap integral J was measured for the unbound dyes in solution (Atto 550 and Atto 647N), as well as for samples 1-lo and 1-mid. This resulted in a deviation of about 10% for J when we used the literature values for the extinction coefficients. All single-stranded labeled DNA samples used in this study were purified with HPLC columns providing a labeling efficiency of at least 95%. The labeling efficiencies of the single-stranded singly labeled DNA and of the double-stranded singly labeled DNA samples were determined by the ratio of the absorption maxima of the dye and the DNA and were all above 97%. This indicates an error of the assumed exctinction coefficient of less than 3%. Thus, the distance uncertainty due to the overlap spectra and a correct absolute acceptor extinction coefficient can be estimated by equation (26). However, the uncertainty in the acceptor extinction coefficient might be larger for other environments, such as when bound to a protein.
In addition to the above uncertainty estimation, the J-related uncertainty can also be obtained through verification of the self-consistency of a β-factor network 9 . Finally, we found little uncertainty when we used the well-tested dye Atto 647N. Fluorescence spectra were measured on a Fluoromax4 spectrafluorimeter (Horiba, Germany). Absorbance spectra were recorded on a Cary5000 UV-VIS spectrometer (Agilent, USA) ( Supplementary Fig. 6).
The κ 2 factor and model assumptions. The uncertainty in the distance depends on the dye model used 22 . Several factors need to be considered, given the model assumptions of unrestricted dye rotation, equal sampling of the entire accessible volume, and the rate inequality First, the use of κ 2 = 2/3 is justified if k rot > > k FRET , because then there is rotational averaging of the dipole orientation during energy transfer. k rot is determined from the rotational correlation time ρ 1 < 1 ns, and k FRET is determined from the fluorescence lifetimes 1 ns < τ fl < 5 ns. Hence the condition k rot > > k FRET is not strictly fulfilled. We estimate the error this introduces into κ 2 from the timeresolved anisotropies of donor and acceptor dyes. If the transfer rate is smaller than the fast component of the anisotropy decay (rotational correlation time) of donor and acceptor, then the combined anisotropy, r C , is given by the residual donor and acceptor anisotropies ( ∞ r D, and ∞ r A, , respectively): In theory, the donor and the acceptor anisotropy should be determined at the time of energy transfer. If the transfer rate is much slower than the fast component of the anisotropy decay of donor and acceptor, the residual anisotropy can be used ( Supplementary Fig. 7) 9 . Also, the steady-state anisotropy values can give an indication of the rotational freedom of the dyes on the relevant time scales, if the inherent effect by the fluorescence lifetimes is taken into account (refer to the Perrin equation, r(τ) = r 0 /(1 + (τ/ϕ)), where r is the observed anisotropy, r 0 is the intrinsic anisotropy of the molecule, τ is the fluorescence lifetime, and ϕ is the rotational time constant; Supplementary Table 2 and Supplementary Fig. 8).
If the steady-state anisotropy and r C are low (< 0.2), one can assume (but not prove) sufficient isotropic coupling (rotational averaging), that is, κ 2 = 2/3, with an uncertainty of about 5% (ref. 9 ): Spatial sampling. In addition, it is assumed that both dyes remain in a fixed location for the duration of the donor lifetime, that is, k FRET > > k diff , where k diff is defined as the inverse of the diffusion time through the complete AV. Recently the diffusion coefficient for a tethered Alexa Fluor 488 dye was determined to be D = 10 Å 2 /ns (ref. 30 ). Therefore, k diff is smaller than k FRET . For short distances (< 5 Å) the rates become similar, but the effect on the interdye distance distribution within the donor's lifetime is small, as has been observed in time-resolved experiments. We also assumed that, in the experiment, the efficiencies are averaged over all possible interdye positions. This is the case when k diff > > k int , which is a very good assumption for TIRF experiments with k int > 100 ms, and also for confocal experiments with k int values around 1 ms.
Overall uncertainty in R 0 . Time-resolved anisotropy measurements of samples 1 and 2 resulted in combined anisotropies less than 0.1. Therefore, we assumed isotropic coupling to obtain R MP . The R MP values matched the model distances very well, further supporting these assumptions. Finally, an experimental study of κ 2 distributions also yielded typical errors of 5% (ref. 37 ). The overall uncertainty for the Förster radius would then result in The absolute values determined for this study are summarized in Supplementary Table 6. Please note that the photophysical properties of dyes vary in different buffers and when the dyes are attached to different biomolecules. Therefore, all four quantities that contribute to the uncertainty in R 0 should be measured for the system under investigation. When supplier values or values from other studies are applied, the uncertainty can be much larger. The values specified here could be further evaluated and tested in another blind study.
Comparing distinct dye pairs. Even though time-resolved fluorescence anisotropy can show whether dye rotation is fast, the possibility of dyes interacting with the DNA cannot be fully excluded. Thus, it is not clear whether the dye molecule is completely free to sample the computed AV (free diffusion), or whether there are sites of attraction (preferred regions) or sites of repulsion (disallowed regions). To validate the model assumption of a freely rotating and diffusing dye, we define the 1 nature research | reporting summary

Statistical parameters
When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main text, or Methods section).

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code

Data collection
There was no common software used and no special code developed, but each group used their own custom softwares. The protocol gave enough detail that the results did not depend on the particular software used, one of the strengths of this study.

Data analysis
A major part of this manuscript is to describe the data analysis in great detail. This is independent of the software used, one of the main outcomes of this study.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All data is available upon request fom the authors. The data of figure 2 is available on Zenodo at: http://doi.org/10.5281/zenodo.1249497.

Field-specific reporting
Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences
For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences
Study design All studies must disclose on these points even when the disclosure is negative.

Sample size
We invited 22 labs and 20 labs finally responded. All resulting uncertainties are given, they take the sample size into account. Finally, similiar results were obtained for different samples measured by a different number of labs (not all labs could measure all samples).
Data exclusions No data was excluded from the analysis.

Replication
Each lab was asked to perform experiments according to standards used in the lab. Reproducibility in this respect is already given by the multi-laboratory approach used in this study.