Introduction

Microscopical methods for the analysis of DNA contours and the mapping of its intrinsic curvature and flexibility have been developed by several groups1,2,3,4,5,6,7. These methods have been exploited for different purposes, as the experimental validation of models for DNA adsorption and bending1,2,4,5,6,7,8,9 or the correlation of DNA shape and flexibility to melting10, ligand interactions11,12,13, replication14, genomic packaging and transcription regulation15,16. The wide applicability of DNA conformational studies demands simple experimental methods, characterized by a few processing steps for specimen preparation and minimum experimental bias on curvature and flexibility measurements. These requirements are crucial for the introduction of effective assays for DNA analysis fully based on high-resolution imaging, e.g. sizing17, genotyping and haplotyping18, expression profiling19,20. Furthermore, they may lead to envision a key role for the nanoscale conformational analysis within more complex protocols, setting up population-based genetic disease studies or solving genomic screening problems at the single-molecule level21.

Current studies on DNA structure and flexibility involve the high-resolution imaging of adsorbed species and the use of an image-analysis software in order to reconstruct the molecular profiles and analyze the signed curvature associated to segments of given location and length1,2,3,4,5,6,7,8,9,22,23,24. Tracing algorithms represent each molecule as a chain of xy pairs separated by a fixed distance l along the contour (Fig. 1). The curvature analysis proceeds through the calculation of the signed bending angles θi formed by the adjacent units, that are obtained from the vector product of the local tangent vectors and (i = 1,2,…, N − 1 with N total number of units)4. From the θi values one can define the global curvature Cj,m for a segment of m units, located at j units from one of the ends, as:

with j = 1,2,…,N and m = 1,2,…,N − j. It is a common practice to neglect statistical correlations among neighbouring θi variables and to represent them as the sum of static and dynamic contributions, i.e. , where the thermally-induced angular fluctuations occur around the constant sequence-dependent angles and are normally distributed with null mean value1,2,3,4,22,23,24. Thus the average value of the Cj,m curvature is:

where the angle brackets 〈〉 denote an ensemble average conducted over the accessible chain conformations. Equation (2) proves that the average curvature 〈Cj,m〉 equals the intrinsic curvature of the segment. Furthermore it suggests a route for comparing the experimental values of intrinsic curvature with the theoretical ones: in fact the left hand side (l.h.s.) term might be experimentally accessed by averaging the Cj,m realizations over a large pool of imaged molecular contours1,3,4,5,25, whereas the right hand side (r.h.s.) term should be predicted computationally by well-consolidated methods (e.g. the static dinucleotide wedge models by Bolshoy et al., Gorin et al., Olson et al. or De Santis et al. highlighted in Ref. 26).

Figure 1
figure 1

Schematic illustration of the main steps of DNA conformational analysis, & presentation of a comparative essay exploiting a symmetric curvature descriptor.

(a) A molecule is imaged by high-resolution microscopy then traced by an image-analysis software and represented as a chain of xy pairs separated by a contour length l. The signed bending angle θi is obtained from the vector product of the local tangent vectors and (top left panel). One can ascribe four different spatial orientations to the extracted contour of a label-free molecule, according to the end chosen as the starting point of the nucleotide sequence (red dot) and the molecular face exposed to the substrate. As a result, the signed curvature Cj,m changes in modulus and/or sign according to the chosen orientation (right panel). A generic symmetric curvature descriptor Fj,m couples the signed curvatures of two m-units long segments, symmetrically placed at j units from chain ends (bottom left panel). The main advantage is that such quantity remains the same for each one of the possible orientations of the extracted contour. (b) The characteristic patterns of variation of 〈Fs,L〉 for some specimens – named (1), (2) and (3) – enable to highlight DNA regions with different intrinsic curvature and/or flexibility (grey boxes). This represents an original strategy to establish the comparative analysis of bent duplexes under label-free conditions.

In line with the above arguments, the experimental curvature variance can be related to the theoretical chain flexibility4,25, where is defined in the usual way:

The practical estimation of 〈Cj,m〉 and is however a nontrivial task since it requires to orientate the sampled molecular contours in order to evaluate the curvature averages at corresponding points of the nucleotide sequence. In general, for each molecular contour extracted from a high-resolution image, there are four possible spatial orientations, depending on which of the two contour ends correspond to the starting point of the base-pair sequence (the 5′-3′ direction) and on which of the two chemically-different faces are exposed by the molecule to the substrate when collapsing on it from the bulk solution (Fig. 1a). In the case of unlabeled chains, their orientation uncertainty cannot be solved deterministically because of the lack of any distinctive topographical feature between the beginning and the end of a DNA molecule. Discrimination of chain polarity was traditionally solved by end-labelling with bulky tags1,3,5,18. Alternatively, palindromic dimers can be constructed starting from the target molecules4. An uncertainty however remains on the two orientations with mirror curvature profiles that describe DNA adsorption on chemically-different faces. Scipioni et al.25 and Sampaolese et al.27 demonstrated that such orientations are not statistically equivalent if the molecules are deposited onto freshly-cleaved mica, because of a preferential adsorption of T-rich faces. This fact ultimately justifies the nonzero intrinsic curvature 〈Cj,m〉 obtained from the analysis of an ensemble of palindromic dimers, or even from labelled chains after proper orientation to have the same polarity.

One readily recognizes that it would be more desirable to characterize the local DNA curvature without any assumption on the adsorption mechanism and preferential orientation of target chains on a given substrate. Furthermore, preparation of end-labelled duplexes and palindromic constructs represents a time-consuming, labor-intensive part of the whole experiment that hampers the broad applicability of similar studies. To address these limitations, we propose to perform conformational analysis of label-free duplexes using Symmetric Curvature Descriptors (SCDs). We define a SCD as a stochastic function that couples the apparent global curvatures of two segments, symmetrically placed along the DNA molecular backbone, in a way that its realizations do not depend - neither in modulus nor in sign - on the orientation arbitrarily assigned to the analyzed chains. In other words the SCDs are (centro)symmetric. The reader is referred to the Supplementary Note for a general discussion on their mathematical and statistical properties. Using segmental chain notation, a generic SCD Fj,mFj,m(Cj,m, CN− 1 − (j + m),m) couples the signed curvatures Cj,m and CN− 1 − (j + m),m of two m-long segments, placed at j units from chain ends (Fig. 1a). In a previous work24 we investigated the DNA intrinsic curvature using one specific descriptor, namely Pj,mCj,m·CN− 1 − (j + m),m. In the present case, we use SCDs to probe both intrinsic curvature and differential flexibility. As several different descriptors Fj,m might be in principle introduced to this goal, we explicitly choose a few selected descriptors defined according to criteria of simplicity and convenience. It is known that |Cj,m|, or cosCj,m always depend on the intrinsic curvature and flexibility of the m-long segment4,26, therefore we have considered basic algebraic operations involving such quantities in a symmetric arrangement. In detail, we complement Pj,m with COSj,m ≡ cosCj,m·cosCN− 1 − (j + m),m and . It is trivial to demonstrate that these descriptors are symmetric. Accordingly, 〈Pj,m〉, 〈COSj,m〉 and 〈SSj,m〉 can be estimated by ensemble averages obtained from a large pool of molecular profiles with arbitrary relative orientation. In agreement with this picture, neither end-labelled molecules nor palindromic constructs are required.

For the case of non-overlapping fragments (j = 1,2,…,N/2, m = 1,…,N/2 − j):

where 〈Cj,m〉 and are given respectively by equations (2) and (3). The quantities and are numeric constants. The last equality of each equation holds under specific conditions discussed in the Supplementary Note. Equation (4) shows that 〈Pj,m〉 depends only on the intrinsic curvatures of the two chosen segments. On the contrary, equations (5) and (6) indicate that 〈COSj,m〉 and 〈SSj,m〉 include two qualitatively different contributions, from the intrinsic curvatures of the segments (〈Cj,m〉, 〈CN− 1 − (j + m),m〉) and from their flexibility ().

We carried out the DNA conformational analysis by introducing the curvilinear distance sjl and by plotting s vs 〈Fs,L〉 at fixed L (Lml), that corresponds to probe the emergence of intrinsic curvature and/or flexibility effects for pairs of segments of fixed length L, located at a given distance s from the ends. By definition, we expect to observe remarkable variations of 〈Fs,L〉 whenever large intrinsic curvature and/or enhanced flexibility affect the trajectory of the chosen fragments. Overall, these features contribute to generate a characteristic pattern of variation of 〈Fs,L〉 that can be exploited to set up the comparative analysis of bent duplexes (Fig. 1b). Accordingly, we explored the characteristic patterns of variation of 〈Ps,L〉, 〈COSs,L〉 and 〈SSs,L〉, i.e. we plotted s vs 〈Ps,L〉, s vs 〈COSs,L〉 and s vs 〈SSs,L〉 at fixed L.

Results

We validated the performance of the proposed descriptors using both simulations and real Atomic Force Microscopy (AFM) data of 1332 bp strands from the promoter region of the human Osteopontin (OPN) coding gene.

Theoretical patterns of variation for human OPN coding gene

We predicted the average shape of target DNA with the static dinucleotide wedge model of De Santis et al.28 and treated room-temperature bending through sequence-independent, Worm-Like-Chain (WLC) flexibility22 (see Methods section). The simulated 〈Ps,L〉 and 〈COSs,L〉 patterns are reported in Fig. 2a. For simplicity, we focus on the contour lengths L = 17 nm/50 bp and L = 34 nm/100 bp. These are larger than the DNA apparent width estimated by AFM (≈10 nm, see below) hence they also provide experimental patterns of variation free from AFM tip convolution artefacts. The s vs 〈Ps,L〉 curves are characterized by marked oscillations that persist for different L values, namely L = 17 nm/50 bp and L = 34 nm/100 bp. In particular for L = 34 nm one recognises three negative peaks of ≈0.10–0.8rad2 (conventionally named 1, 3 and 5) at s1 = 16 nm, s3 = 59 nm and s5 = 165 nm, whereas smaller local maxima (named 2 and 4) occur at s2 = 35 nm (≈0.07rad2) and s4 = 78 nm (≈0.04rad2) respectively. Consistently with equation (4), local peaks of the s vs 〈Ps,L〉 plot are related to pair of segments with large intrinsic curvature. This is confirmed by a direct inspection of the static curvature profile of the 2D trajectory of the OPN-related DNA (Fig. 2b). One can notice that the signed angle varies in the range , with few marked oscillations taking place over a length scale of ~100 nm. Here we properly highlighted the 34 nm-long tracts involved in the calculation of 〈Ps,L〉 at the sites s1,…,s5, demonstrating that each one of them holds appreciable curvatures, whose magnitude varies in the range 0.02rad–0.10rad. Fig. 2a reveals that the characteristic pattern of variation of 〈COSs,L〉 closely resembles the s vs 〈Ps,L〉 one, with five local peaks at the curvilinear positions s1,…,s5 for L = 34 nm/100 bp. This seems at first sight to contradict equation (5), which states that both the intrinsic curvature and flexibility affect such conformational average. However, it should be observed that our theoretical predictions are entirely based on the WLC model assuming a sequence-independent flexibility. Therefore, the theoretical s vs 〈COSs,L〉 pattern reflects local variations of the intrinsic curvature and the flexibility produces a bias of the absolute values. Thus, the local intrinsic curvature ultimately drives the modulation of the 〈COSs,L〉 pattern. In Fig. 3 we explore further how the WLC flexibility contributes to the predicted patterns of variation. In Fig. 3a we consider the s vs 〈COSs,L〉 plots and contrast the reference pattern for an ideally-rigid DNA with the case of a flexible profile. We intentionally overdo comparison by introducing remarkably different contour lengths, i.e. L = 6.8 nm/20 bp and L = 51 nm/150 bp. For L = 6.8 nm/20 bp the pattern is dominated by variations of the local intrinsic curvature and the flexibility determines merely an overall, small vertical shift of the pattern of about 0.13. Indeed this is predicted by theory (see Supplementary equation (S.7) and the Supplementary Note) and corresponds to the r.h.s. term of equation (5) with c* ≈ 0. For L = 51 nm/150 bp the role of flexibility is more prominent and twofold, i.e. it determines a vertical shift of the s vs 〈COSs,L〉 pattern with respect to the case of an ideally-rigid chain and it also reduces the amplitude of the pattern modulation. As explained above, the flexibility affects the patterns uniformly along the curvilinear coordinate s in that we assumed a sequence-independent flexibility (curvature variance , with ξ = 52 nm persistence length) according to the WLC model.

Figure 2
figure 2

Simulated conformational analysis of a bent, flexible 1332 bp duplex from the promoter region of the OPN coding gene: descriptors 〈Ps,L〉, 〈COSs,L〉, .

(a) Patterns of variation of 〈Ps,L〉 and 〈COSs,L〉. Five local peaks are highlighted by circles. (b) Intrinsic curvature profile. The 34 nm-long tracts involved in the calculation of conformational averages at the sites s1s5 are reported for completeness.

Figure 3
figure 3

Simulated conformational analysis for the 1332 bp duplex from the promoter region of the OPN coding gene: descriptors 〈COSs,L〉, 〈SSs,L〉.

(a) Patterns of variation of 〈COSs,L〉. Results for an ideally-rigid chain are compared with WLC predictions to evaluate the role of flexibility at two different L values, i.e. 6.8 nm/20 bp (top panel) and 51 nm/150 bp (bottom panel). (b) Patterns of variation of 〈SSs,L〉. Results for an ideally-rigid chain are contrasted with WLC predictions to show the flexibility contribution at two L values, i.e. 6.8 nm/20 bp (top panel) and 34 nm/100 bp (bottom panel). As a general remark, the WLC data with grey dots are obtained by direct ensemble averaging over simulated 2D chain trajectories, whereas the red, dash WLC line is obtained by approximation of the ensemble average with the Supplementary equation (S.10).

In Fig. 3b we consider the s vs 〈SSs,L〉 case. The patterns for an ideally rigid chain are well structured, in particular for L = 34 nm/100 bp. one recognizes three peaks at the curvilinear coordinates s1, s4, s5 that we already introduced in Fig. 2a for the s vs 〈Ps,L〉 and 〈COSs,L〉 plots. Moreover, the WLC flexibility determines an overall, vertical shift by , where . This agrees with the general predictions of equation (6). Thus, by analysing the descriptor 〈SSs,L〉 one can easily decouple for any length L the intrinsic curvature – that tunes the modulation of 〈SSs,L〉 along the curvilinear coordinate s – from the flexibility – that determines a shift of the whole pattern. We underline that in the present case L = 6.8 nm/201 bp is not strictly appropriate for evaluating the experimental patterns of variation, as these might be partially affected by AFM tip convolution effects. Therefore, data analysis is carried out below only for L = 17 nm(50 bp) and L = 34 nm(100 bp).

Experimental patterns of variation for the human OPN coding gene

In Fig. 4 we report the results from experiments on OPN-related DNA (see Methods section for details on sample preparation, AFM imaging and DNA tracing). A quantitative AFM analysis of molecular profiles was routinely performed to test the reproducibility of the imaging conditions and evaluate deviations of adsorbed DNA superstructure from the canonical B-form. Typically, measured DNA molecules displayed an average width of ≈10 nm and a height of 0.8 nm–1.0 nm, due respectively to AFM probe convolution effects and to the elastic deformation of the soft molecule under the repulsive forces exerted by the scanning tip29. Molecule surface density was in the range 2–5 μm−2. The analysis of the contour lengths for a large number of traced molecules (≈400) attested a DNA contraction of 5% with respect to the B-form. This corresponds to a helix rise per base pair of 0.32 nm in excellent agreement with the results of similar studies2,9,11,22,30,31. Standard checks on global statistical parameters (e.g. the mean-squared end-to-end distance) proved the thermodynamic equilibration of chains on mica and allowed us to estimate ξ = 52 nm, figure that agrees with the DNA flexibility reported in other AFM experiments2,22,23. In Fig. 4a we show a representative topography of the target DNA. As expected, it reveals the large variety of shapes assumed by DNA under the thermal stochastic perturbation of its molecular environment. By visual inspection one can notice the persistence of bends at a few sites, namely in close proximity of both ends and within the central region of the chain. This suggests the presence of non-null intrinsic curvatures at the same places.

Figure 4
figure 4

Experimental conformational analysis of OPN-related DNA.

(a) Representative AFM topography of the 1332 bp target DNA. It shows the persistence of bends at few locations along the molecular backbone - marked by arrows - suggesting the presence of a significant intrinsic curvature at the same places. (b) Pattern of variation of 〈Ps,L〉 compared with simulations (top panel). The related s vs RSS graph reveals that substantial agreement exist along the curvilinear axis, with a residual peak below 50 nm (bottom panel). (c) Pattern of variation of 〈COSs,L〉 compared with simulations (top panel). The related s vs RSS graph reveals discrepancies at three distinct regions of the curvilinear axis (bottom panel). (d) Pattern of 〈SSs,L〉 compared with simulations (top panel). The s vs RSS graph is very similar to that reported in (c) for the 〈COSs,L〉 descriptor and suggests that the failure of sequence-independent flexibility very likely increases residual errors for s > 50 nm.

In Fig. 4b we show the experimental s vs 〈Ps,L〉 pattern with L = 34 nm/100 bp, calculated for an ensemble of 160 molecular profiles extracted from several AFM topographies. It displays oscillations that confirm the presence of intrinsic curvatures along the studied contours. We recognize four peaks for s < 80 nm - that concern pairs of segments close to the DNA contours ends - and an additional negative peak for s ≈ 150 nm–175 nm, that on the contrary involves pairs of tracts located around the middle portion of the strands. In the range s ≈ 80 nm–130 nm the s vs 〈Ps,L〉 curve is almost completely flat and 〈Ps,L〉 ≈ 0, which means that at least one of the two symmetrically placed segments has a negligible intrinsic curvature. Noteworthy, the curvilinear positions of the main peaks of 〈Ps,L〉 agree with the visual inspection of DNA bends from several AFM topographies (see for example Fig. 4a).

The comparison of the experimental s vs 〈Ps,L〉 pattern with the theoretical one of Fig. 2a can be effectively carried out by comparing the curvilinear positions and amplitudes of the main experimental peaks with the theoretical peaks located at s1,…, s5. In detail, we recognize peak 1 also in the experimental data, biased by a small horizontal shift δs1 of about 7 nm. Moreover, a 8 nm horizontal shift δs5 affects peak 5 whereas the shifts for the remaining peaks are negligible compared to the positional errors (<5 nm) affecting the molecular trajectories extracted from the tip-convoluted AFM images. An appreciable difference exists between the magnitude of the experimental peaks and the theoretical ones. The protocol adopted for samples preparation is certainly contributing to such discrepancies. In particular the horizontal shift δs1 might be ascribed to a structural reorganization of adsorbed DNA at one or both ends, involving local variations of the helix rise, nanosized deletions or out-of-equilibrium alterations that are not properly resolved by AFM imaging and that can be induced by sample drying24,25,27. On the other side, the reduced magnitude of the peaks in the experimental pattern with respect to the theoretical counterpart (mostly at s1 and s5) may be attributed to the rinsing of the samples with pure water after DNA adsorption on mica. This step in fact reduces the ionic strength of the solution and consequently enhances the electrostatic repulsion of charged phosphate groups. A net decrease of the absolute curvature of the adsorbed molecules is therefore highly probable25,27. Zuccheri et al.4 and Scipioni et al.25 showed that the theoretical predictions of the wedge model of De Santis et al.28 are in quantitative agreement with experiments provided that the theoretical curvature modulus is rescaled by the empirical numerical factor ≈2.5 (see Fig. 3 in4 and Fig. 7 in25 for L = 26 bp and 62 bp respectively). We exploited this evidence to improve the agreement between theoretical and experimental patterns of the SCDs. We rescaled Cs,L of equations (4)(6) by the empirical factor ≈2.0 before calculating the ensemble averages. Fig. 4b attests that in doing this we substantially improved the agreement between theory and experiment for the case of the s vs 〈Ps,L〉 pattern. In fact, the experimental amplitudes of peaks 1 and 5 were correctly predicted after this additional normalization step. A quantitative measure of the amount of error between the normalized wedge model and experimental data was estimated by introducing the Residual Sum of Squares (RSS). The s vs RSS plot (Fig. 4b bottom panel) shows that the main discrepancy between experimental data and the normalized wedge model is ≤0.04 and concerns a nanosized region with s ≈ 14 nm–37 nm, whereas the remaining part of the curvilinear axis is affected by comparatively smaller errors. In view of such results, we recognize that our analysis – based on the wedge model of De Santis et al.28 with the additional amplitude normalization step – describes the relevant features of the s vs 〈Ps,L〉 plot with accuracy.

The next considered SCDs probe both DNA intrinsic curvature and flexibility. The experimental s vs 〈COSs,L〉 pattern is shown in Fig. 4c for L = 34 nm/100 bp. It displays oscillations that indicate that intrinsic curvature and/or varying flexibility are playing a relevant role along the studied molecular contours. Comparison with the normalized model gives excellent agreement between the amplitudes of the main peaks and their relative positions (δs1δs4 ≈ δs5 ≈ 7 nm, δs2δs3 ≤ 5 nm). A more careful examination of the related svs RSS plot reveals that residuals are localized within three distinct regions of the curvilinear axis. This contrasts with the s vs RSS curve of the 〈Ps,L〉 pattern and led us to ascribe different origins to the three regions. The first region (s ≈ 14 nm–37 nm) in fact occurs also for the 〈Ps,L〉 pattern thus it reflects an inherent inability of the wedge model to predict the actual intrinsic curvature of the probed chains close to their ends. On the contrary the two other regions (s ≈ 75 nm–100 nm and s ≈ 140 nm–160 nm) are peculiar of the 〈COSs,L〉 pattern and they likely reflect the local failure of the second main assumption of our theoretical analysis, i.e. the sequence-independent WLC flexibility. In other words, one has to assume that the nucleotide sequence of those two regions affects the persistence length ξ and causes appreciable fluctuations with respect to the mean value (52 nm). Similar effects have been reported for the well-known model system pBR322 DNA in AFM experiments on palyndromic dimers4.

The experimental s vs 〈SSs,L〉 pattern is compared with its theoretical counterpart for L = 34 nm/100 bp in Fig. 4d. The curvilinear positions and magnitude of the peaks 1,4 and 5 are again useful to drive the comparison. We find an excellent correspondence of the curvilinear positions of the experimental and theoretical peaks at s1, s4 and s5s1 ≈ 7 nm, δs4δs5 ≤ 5 nm) and we observe full overlap of the patterns in the s ranges 0–18 nm and 145 nm–180 nm. However the two graphs have different trends and amplitudes on the remaining part of the curvilinear axis. This is evident when considering that the experimental maximum at ~122 nm and minima at ~100 nm and 140 nm do not have a clear correspondence in the theoretical curve. Importantly, the s vs RSS curve closely resembles the corresponding graph for the 〈COSs,L〉 descriptor, with the residuals localized within the same three regions of the curvilinear axis. Again, the two regions for s > 50 nm very likely indicate that the assumption of the constant WLC flexibility breaks at those places.

To investigate the sequence-dependent flexibility of the OPN-related DNA we focused on the s vs 〈SSs,L〉 pattern and we used the following equation:

The quantity is the average curvature variance for a pair of tracts placed symmetrically along the studied contours. Equation (7) shows that can be obtained by subtracting from the experimental average 〈SSs,L〉 the corresponding theoretical quantity calculated for an ideally-rigid molecular profile 〈SSs,LRigid. The latter corresponds indeed to the profiles of Fig. 3b (solid lines) with the additional amplitude renormalization of intrinsic curvature by the factor 2.0 (see above). The results of a similar analysis are reported in Fig. 5a for L = 34 nm/100 bp and L = 17 nm/50 bp. We note that in both cases fluctuates around a constant value that depends on L and corresponds indeed to the WLC flexibility term L/ξ, i.e. ~0.6 for L = 34 nm/100 bp and ~0.33 for L = 17 nm/50 bp respectively. The fluctuations are in the range of 5%–20% of the WLC term and indicate that along the molecular backbone variations of the local flexibility are taking place. To probe that fluctuations are a robust manifestation of the chain flexibility - rather than the result of trivial statistical discrepancies between the theoretical and experimental patterns 〈SSs,LRigid and 〈SSs,L〉 - we reported in the same graph the profile of the cumulative local frequency of AA, TT, AT and TA dinucleotide steps. The comparison shows that the modulation of follows very closely the variations in the cumulative content of dinucleotide steps, i.e. the local chain flexibility is modulated by the local content of the AA, AT, TA and TT dinucleotide steps. This finding agrees with the results originally reported for palindromic constructs in that it demonstrates the AT-rich regions are more flexible that GC-rich sequences4. Scipioni et al.25 have explained such a correlation from the point of view of the thermodynamic stability of the DNA chain, i.e. they have connected the sequence-dependent flexibility of a DNA tract to the dinucleotide melting temperatures. Fig. 5b shows the comparison of the experimental pattern of variation of and the theoretical one, calculated as in25 for L = 34 nm/100 bp. We recognize several similarities in the trends of the two graphs, with local maxima/minima at corresponding curvilinear positions. Notably, the theory predicts an overall fluctuation of ~0.15rad2 whereas the experimental differential flexibility estimated by our analysis fluctuates by ~0.80rad2. An enhancement by a factor ~3 of the experimental differential flexibility with respect to the theoretical predictions has been reported by others4,25 therefore we are in line with previous reports.

Figure 5
figure 5

Evaluation of differential flexibility of OPN-related DNA.

(a) Curvature variance estimated from experimental data and theory using equation (7). The variance fluctuates around a constant value that depends on L and corresponds indeed to the WLC flexibility term L/ξ. The fluctuations are a robust manifestation of differential flexibility since they correlate with the profile of the cumulative local frequency of AA, TT, AT and TA dinucleotide steps. (b) Comparison of the experimental pattern of variation of and the theoretical one, calculated as in26 for L = 34 nm/100 bp. Local maxima/minima at corresponding curvilinear positions are highlighted by circles. An enhancement of the experimental differential flexibility with respect to the theoretical predictions emerges and agrees with previous reports.

Discussion

The reported results demonstrate that the oscillations of s vs 〈Ps,L〉, s vs 〈COSs,L〉 and s vs 〈SSs,L〉 curves can be used to locate the most significant bending sites of the DNA backbone. Moreover, the vertical shifts of the s vs 〈COSs,L〉 - for short L values - and of s vs 〈SSs,L〉 pattern - for any L value - can be exploited to investigate flexibility effects. We underline that informative patterns are obtained for DNA templates that do not contain extended strings of phased A-tracts, or other prominent nucleotide sequences responsible for large stereo-specific bends. Apart from the system discussed above, we already demonstrated that random DNA (~25% content of A,T,G and C respectively) is still characterized by an appreciable (nonzero) intrinsic curvature over 20 bp–100 bp long fragments and provides relevant signals in terms of the SCDs patterns24. The same conclusions hold for the 937 bp EcoRV-PstI fragment of pBR322 DNA1,3,4,25,27, a well-known template discussed in the Supplementary Note and Supplementary Fig. S2. This confirms that the applicability of proposed method goes definitely beyond a few specific cases and regards 102-103 bp long fragments that can be readily prepared and imaged at the nanoscale by various microscopy techniques. Fig. 4 and 5 give evidence of the rich information achieved by complementing the 〈Ps,L〉 and 〈SSs,L〉 patterns. It is clear that the characterization of the differential flexibility in terms of s vs patterns (equation (7)) ultimately relies on the use of models to predict the sequence-dependent intrinsic curvature and to simulate DNA adsorption on the solid substrate. As shown above, one can preliminarily estimate the degree of accuracy of a given model by comparing the experimental 〈Ps,L〉 pattern with the theoretical one. In the present work and in24 we have demonstrated that static dinucleotide wedge models are in good agreement with the experimental results. An improved analysis will certainly benefit of future developments to the models dealing with DNA sequence-dependent curvature and surface adsorption.

We finally note that there is a loss of information in the study of SCDs with respect to the case of s vs 〈Cs,L〉 (or s vs ) data highlighted in previous works3,4,5,25,27,32. This loss arises from coupling pairs of DNA tracts into the definition of Pj,m, COSj,m and SSj,m. Nevertheless this choice provides a number of advantages, overcoming some fundamental and practical limitations of early protocols. First, the novel method can be implemented on label-free molecules, therefore specimens preparation is merely reduced to standard protocols for DNA deposition onto atomically smooth substrates. Second, the patterns of variation of SCDs are prone to an effective comparison with theoretical models (used to predict the r.h.s. of equations (4),(5),(6)), that impart access to the physics of DNA sequence-dependent curvature and flexibility.

We foresee several challenging applications for the use of SCDs. One interesting possibility might regard the systematic use of 〈Ps,L〉 and 〈SSs,L〉 maps to explore in detail the predictions of DNA adsorption and bending models. An insight into this topic is provided in the present work and significant improvements are expected to come from state of the art modelling (as Brownian dynamics and molecular dynamics simulations) going beyond the nearest-neighbour approximation in conformational analysis or describing the non-equilibrium processes of DNA adsorption and relaxation on the atomically flat substrate23,33,34,35,36,37,38. For example, a tight comparison of experimental and theoretical patterns might allow us to identify the presence of nanosized regions where out-of-equilibrium alterations of the chain architecture systematically take place during adsorption. This information might be eventually related to the local base pairs sequence and/or exploited to tune DNA adsorption according to the needs of novel comparative essays. Another challenge might involve the use of 〈Ps,L〉 and 〈SSs,L〉 patterns to detect automatically small conformational changes in large sample numbers. The capability of relating DNA structural variations to physical or biological causes (e.g. mutations at one or more base pairs) might eventually contribute to develop new assays and even genetic screening protocols for highly-bent duplexes. Interestingly, some studies might explore the ultimate sensitivity of such patterns to point mutations and mismatched base-pairs and largely contribute to the discovery of physical methodologies for molecular haplotyping18,39. Within this context, we already offered a concrete example on the 〈Ps,L〉 patterns sensitivity to single nucleotide polymorphisms in the OPN encoding gene24. Further attractive developments might come from the evaluation of 〈Ps,L〉 and 〈SSs,L〉 patterns to address the structural properties of DNA fragments complexed with intercalating dyes and binding drugs13,40 or even proteins. In fact the patterns of variation might be useful to complement current microscopy studies on the formation of protein-DNA complexes (e.g.41), where the position distribution of protein binding along unlabelled DNA fragments is calculated relative to the closest DNA terminus. Indeed this choice statically couples binding events occurring on symmetrically placed tracts, in analogy with the curvatures coupling contained in the SCDs definition. As a result, a visual correlation of s vs 〈Ps,L〉 (or 〈SSs,L〉) and s vs protein-binding-frequency plots would easily point out the existence of helix sites where local intrinsic curvature (or flexibility) drives the so called ‘indirect’ DNA recognition or competes with other binding mechanisms25,42. This is certainly of dramatic interest for fundamental investigations addressing the ability of proteins to locate specific sites or structures among a vast excess of non-specific, intrinsically bent DNA, as in the relevant case of mismatch repair proteins interrogating DNA to find out biosynthetic errors and promote strand-specific repair12. One might also apply SCDs to support investigations on the interference of DNA-intercalating agents on DNA intrinsic curvature43,44, on the enhancement of DNA flexibility by sequence non-specific DNA-binding proteins45 and complement DNA investigations recently addressed by high-speed, real-time or hybrid AFM techniques46,47,48.

In conclusion, we presented a novel method to characterize the intrinsic curvature and flexibility of adsorbed DNA molecules, starting from the topographies obtained by high-resolution microscopy. The method relies on mapping along the molecular backbone a selection of symmetric curvature descriptors, i.e. stochastic functions that couple pairs of segments symmetrically placed along the helix chain in a way that their realizations do not depend on chains polarity. We demonstrated the practical applicability of this approach for the relevant case of AFM-imaged DNA, through theoretical and experimental arguments. Our strategy provides a number of advantages overcoming some fundamental and practical limitations of early protocols, e.g. there is no need to prepare end-labelled molecules or palindromes and no assumptions are done on the preferential DNA adsorption mechanisms. More importantly, our approach can readily manage comparative assays involving a large number of samples. We suggested examples where curvature studies based on SCDs might complement several existing investigations.

Methods

Modeling DNA intrinsic curvature, adsorption & room-temperature bending

Model chains representing the average three-dimensional (3D) shape of DNA specimens were generated by the 3DNA software49 exploiting nearest-neighbor, static dinucleotide wedge models28 (see Supplementary Methods and Supplementary Fig. S3,S4).

We custom developed an algorithm (LabView, National Instruments) that flattens the 3D model chain to simulate deposition24. Briefly, it divides the chain into a discrete number of fragments originally lying on different planes and projects them individually. The output is a two-dimensional (2D) chain formed by the geometric projections connected at their ends according to local continuity criteria. This procedure assumes that the 3D → 2D transformation takes place at the expense of few local twists of the molecular backbone; as a consequence it reasonably implies a minimum increase of the conformational energy of the flattened molecule with respect to the 3D counterpart.

The algorithm was implemented as follows. Geometric projection starts at one of the 3D chain ends and involves the longest fragment that can be projected onto a best fit plane while maintaining its overall fluctuations (relative to that plane) below a given threshold. Once such fragment is found, the algorithm is iterated on the remaining part of the 3D chain until the whole curve is flatted onto a unique set of preferential planes. The threshold value is chosen to match the typical range of chain – surface interaction forces, i.e. few nanometres. The results of the above algorithm for the target DNA were found to be consistent with those obtained by a different theoretical approach, originally proposed by Scipioni et al.25.

The 2D chain formed by geometric projection was used to simulate the room temperature bending of DNA, describing chain lateral motion onto the mica surface. According to the WLC model, DNA can be modelled by a chain of virtual bonds of length lWLC connected by torsional-spring vertices, that are energetically uncorrelated and characterized by a harmonic local bending-energy function (with kB Boltzmann constant, T absolute temperature, ξ persistence length and are thermally-induced angular fluctuations occurring around the constant sequence-dependent angles)2. We sampled the 2D chain at the spacing lWLC = 0.32 nm (corresponding to the experimentally found helix rise per base-pair, see Results section) and thermal effects (on bending) were implemented by adding to the angles among neighbour segments a fluctuation chosen by a Monte-Carlo (MC) method from normally distributed numbers with mean zero and variance of lWLC/ξ. The new trajectories were superimposed on a randomly flat substrate (roughness 0.1 nm, grid spacing 1.95 nm) and dilated by a parabolic tip50 in order to generate topographies resembling as close as possible those obtained by AFM. These were finally analysed with the tracing algorithm (see below) to assure a bias - due to random and systematic angular distortions - comparable to that affecting experimental data.

We used the procedure above to predict the theoretical patterns of variations reported in Fig. 2,3 and Supplementary Fig. S2.

Sample preparation

The 1332 bp DNA fragments were obtained by PCR amplification of the regulatory region of the OPN encoding gene51; amplicons were purified in 1% (w/w) agarose gel and electroeluted, then the solution was treated with phenol/chloroform followed by ethanol precipitation. The pellet was stabilized in Tris-EDTA buffer and stored at −20°C. Importantly, this template does not contain extended strings of phased A-tracts or other prominent sequences (e.g. periodic An/Tn groups) that could introduce anomalously large bends in the adhered DNA molecules and bias our proof-of-concept investigation (see Supplementary Methods and Supplementary Fig. S3 for the base-pairs sequence)3,4,5,9. The DNA adsorption was carried out onto freshly cleaved muscovite mica according to the standard protocols reported in literature22.

AFM imaging & analysis

Samples were imaged in air at room temperature and humidity with a Dimension 3100 AFM equipped with the closed-loop Hybrid XYZ scanner and the Nanoscope IVa control unit (Digital Instruments, Veeco). The AFM was operated in tapping mode and silicon probes (OMCL-AC160TS, Olympus) were used. The AFM images were collected with a dimension of 1024 × 1024 pixels and a typical scan size of 2 μm.

Our image-analysis software allowed a semi-automatic reconstruction of molecular trajectories and a straightforward analysis of the signed curvature associated to segments of given location and length. The tracing algorithm was developed in LabView and molecules were represented as chains of xy pairs separated by a contour length l = 2 nm. Briefly, the AFM images were processed by a first-order, line by line flattening followed by filtering with a 3 × 3 median pixel filter30,50. Chain tracing was initiated at a set of user-defined trial points, that were located along the molecular backbone by hand and linearly interpolated to obtain a constant spacing. The coordinates of each point were then automatically adjusted as explained in23. After this transformation the trajectory consisted in a series of points with non-integer coordinates on the digitized grid. The final processing step consisted in the trajectory interpolation with a cubic polynomial curve, with square error below a user-defined threshold (typically 0.1 nm). The positive values for the signed bending angles θi were arbitrarily assigned to clockwise rotations, i.e. if by progressing along the trajectory the chain turns to the right at θi. The signed curvatures Cj,m were estimated from equation (1), whereas the average quantities 〈Pj,m〉, 〈COSj,m〉, 〈SSj,m〉 were evaluated respectively from the conformational average of Cj,m·CN− 1 − (j + m),m, cosCj,m·cosCN− 1 − (j + m),m and over a given set of AFM imaged molecular profiles.

We preliminarily tested the tracing algorithm by analysis of intrinsically bent 2D chains that were computer-generated using WLC statistics (see Supplementary Methods and Supplementary Fig. S5).