Anxiety-related experience-dependent white matter structural differences in adolescence: A monozygotic twin difference approach

Anxiety is linked to deficits in structural and functional connectivity between limbic structures and pre-frontal cortices. We employed a monozygotic (MZ) twin difference design to examine the relationship between structural characteristics of the uncinate fasciculus (UF) measured by Diffusion Tensor Imaging (DTI) and anxiety symptoms in a sample of N = 100 monozygotic (genetically identical), adolescent twins. The MZ difference design allowed us focus on environmental factors that vary within twin pairs while controlling for genetic and environmental factors shared by twin pairs. Twins aged 13–18 years reported on symptoms of generalized anxiety and social phobia prior to participating in a neuroimaging visit. Regions of interest from the JHU ICBM atlas, including uncinate fasciculus and sagittal stratum as a control tract, were registered to the study template. We incorporated multiple diffusion tensor measures to characterize the white matter differences. Within twin pairs, the more anxious twin exhibited decreased fractional anisotropy (t = −2.22, p = 0.032) and axial diffusivity (t = −2.38, p = 0.022) in the left UF compared to the less anxious twin, controlling for age and gender. This study demonstrated the feasibility and advantages of adopting the MZ twin design for DTI measures in neuroimaging research.


Introduction
We start with one of the simplest examples and build towards regression all using linear models.

Single group
Let's look at how mean for a measurement for a single group is estimated from a linear model point of view.

The solution (empirical)
Since the system is overdetermined (more equations than unknowns) we can pick a "democratic" solution by just summing the left hand side and the right hand side of the equations as follows y 1 + y 2 + · · · + y n = nβ 0 + ε 1 + ε 2 + · · · + ε n .

The solution (empirical) without pairing
Again since the system is overdetermined (more equations than unknowns) we need a "democratic" solution again so we just use all pairs of points (n 1 × n 2 ). Below we show a couple of pairs so we can see a pattern.
A key observation is that unless the ys from both groups are significantly separated the numerator is going to be small due to cancellations (see Fig. 1 (a)). Also notice again that this is just the classic "add-and-divide" within each group and taking a difference by simple subtraction.

The solution (empirical) with pairing
If we have the a priori knowledge that n 1 = n 2 and that y 1 corresponds to y n1+1 etc. we can simply subtract the corresponding pairs and the solution would be 1 Again by just summing left and right hand sides we get Now we can observe that the requirement is that not every point in the other group has to be far from every other point. Just enough number of corresponding points have to be far (see Fig. 1 (b,c)). We can see that pairing helps enhance the effect size ("signal"). (a) Distribution of samples along a measure (x). There are two groups colored differently. A few random samples are shown. The effect size would be proportional to the distance between the two peaks.(b) By acknowledging the fact that there are dependencies between samples in the two groups we can look at the distribution of sample-pairs along the differential measure (∆x). (c) Distribution of sample-pairs along ∆x. The effect size is proportional to the distance of the peak from 0.

Correlation
Now let's focus on identifying relationships between two measurements (x and y) (correlations) using linear models.

Correlation without pairing
3.1.1 The model (abstract) 1 In a fully technical sense the model and the equations are also adjusted to reflect these correspondences and this simple subtraction is forced on us as the parsimonious approach. A full treatment of the correspondence effect is demonstrated in section 3 with the regression example.
Let us say there are n samples collected in an experiment/study. Then

The solution (empirical)
We know from "high-school" that to a fit a line (in 2D space) we need two points. Here again we have many more points than necessary in other words the system of equations is overdetermined. Hence again we go with the "democratic" solution of using all pairs of points which in this case would be n(n − 1) = n 2 − n = O(n 2 ). Let us see a couple of pairs to see a pattern Now simply summing the left and right hand sides again gives us Again as in the case of mean if every pair of subtraction in the numerator and denominator are not consistently contributing there would be many cancellations and the value of |β 1 | will be lower and many data points would not be in the confidence limits of the models. This effect is demonstrated in Fig. 2 (a). In the interest of brevity we only focus onβ 1 and not includeβ 0 but similar observations and arguments can be made about that as well. In the next section we will see that by pairing we will relax that requirement to have consistency only among the predetermined pairs which is the fundamental reason for increase in the statistical power.

The equations (data)
Let us say there are 2n samples (n pairs). Then (53) Linear model between ∆x and ∆y. We can clearly notice the enhancement in the effect size.

The solution (empirical)
Now we fit a "line" (technically a plane) in 3D space since there are three parameters that describe the linear model. Again the system is overdetermined and we will go with "democratic" solution.
At first thought we would think we need to use all triplets of points to estimate the parameters. However we can take advantage of the fact that the third variable repeats to reduce the number of parameters to be estimated. By subtracting out the paired equations on left and right hand sides we get the following n equations Now simply summing the LHS and RHS gives uŝ Here we simply require that the pairwise differences are consistent. This effect can be seen in Figs.  2 (b,c).

Genetic vs. environmental
Now that we have seen above how to go from an abstract model to an empirical solution and how using prior knowledge (in this case pairing) about the data in an experiment can help derive statistically more confident (less risky) empirical solutions let us see just in the abstract space how we can tease out the differential effects of environment (e) vs. genetic (g) on outcome variables in a twin design study. The basic model would be But let us say we know a priori that Combining Eqs. (64) and (65) we get Now just as we saw in section 3 we can get much better signal with where ∆ here is the operator that subtracts paired measurements. But for monozygotic twins the assumption is that ∆g ≈ 0. Therefore we are really estimating the differential effects of environment on y. We just reaped two benefits (1) enhancing the statistical power by pairing (2) getting at the environmental effects.
3.3.1 Inference about the role(s) (α 2 , ψ 2 ) of non-shared environmental differences (∆e) First, we would like to note that the models described in here and used in our real experiments are correlation models and not causal/conditional models. The inference according to these models would be as follows. The non-shared environmental factors (∆e) are correlated with both ∆x (measure of anxiety discordance, in our real experiments) as well as ∆y (measure of uncinate fasciculus (UF) discordance, in our real experiments). We can see that algebraically as (assuming ∆g ≈ 0), ∆x = α 2 ∆e, from Eq. (65) and (70) ∆y = ψ 2 ∆e, from Eq. (69).
But since we do not directly measure ∆e in our studies, our equations in real experiments are derived from the model = β 1 ∆x.
That is we can only estimate the role of non-shared environmental differences indirectly (via β 1 ) as the association between anxiety differences and UF differences, since we cannot estimate either α 2 or ψ 2 . We would like to make a final note that these models represent phenomenological ansatz and based on further investigations and studies might have to adapted to include non-linear relationships between environmental factors and anxiety and white matter.

Introduction
Our starting point for this discussion is the diffusion tensor (D) which represents the diffusion (rate of area-of-spread 1 ) of water molecules in the three standard orthogonal planes (with origin at the center of a voxel). It can be represented using a two dimensional matrix (also known as rank-2 tensor) as where D ij refers to the diffusion in the plane spanned by orthogonal axes {i, j}. If i = j, then it refers to the square of the diffusion length along the axis i. Since the MR signal measured is not sensitive which side of the plane the water is diffusing, the matrix is symmetric i.e. D ij = D ji . We will now, 1. dissect it and understand the commonly used DTI measures and 2. visualize to derive qualitative biological interpretations.

Dissecting the diffusion tensor
One could work directly with D if the goal was only getting at diffusivities in the standard planes. In fact if one were only interested in the average diffusivity known commonly as mean diffusivity (MD) or apparent diffusion coefficient (ADC), we can just read it off D as D xx + D yy + D zz 3 due to a linear algebraic 2 fact that the trace of a matrix is equal to the sum of the eigenvalues of the matrix. Eigenvalues and eigenvectors capture one of the most basic symmetries 3 in vector spaces under linear transformations. But one of the main goals of diffusion imaging is to reveal tissue microstructure including its orientation. Hence we need to extract which direction D is oriented in. A key linear algebraic tool that allows us to extract this is through eigen factorization (decomposition, dissection) 4 of D, (2) 1 As a reference to the physical units, in a typical MRI scan, the observed diffusion in biological tissue without hindrance or restriction is on the order of 3 µm 2 ·ms −1 .
2 Linear algebra is the study of vector spaces under linear transformations. The notion of treating matrices as collections of vectors is a powerful convenience that allows us to glean patterns much more effectively compared to treating them just as a collection of numbers.
3 Finding the set of v, λs that satisfy the equation Av = λv is essentially asking to find the vectors that remain the same (i.e. symmetric (invariant), upto scaling) under the transformation A. 4 Another, now well known tool, called principal component analysis (PCA) which extracts the primary orientation of the spread of data fundamentally relies on the eigen decomposition of a covariance matrix.
Where v 1 , v 2 , v 3 are the three main (orthonormal 5 ) directions (in R 3 ) of spread and the λ 1 , λ 2 , λ 3 are the extent of spread in those directions. Without loss of generality we can order the vs and λs such that λ 1 ≥ λ 2 ≥ λ 3 . The dissections of a sample diffusion tensor are shown in Fig. 1. Figure 1: The dissected diffusion tensor. The three eigenvalues (λ 1 , λ 2 , λ 3 ), eigenvectors (v 1 , v 2 , v 3 ) and the three main planes of diffusion with the corresponding sections of the elliposoids (in red, green and blue) are shown.

Understanding the dissected material (λs and vs)
i) Axial diffusivity (AD). Let's first start with understanding the largest (max) of them all which is λ 1 . This indicates the diffusivity in the direction of v 1 which we can interpret as the primary orientation of the fiber in that voxel.
ii) Radial diffusivity (RD). If we just focus on diffusivity in the plane perpendicular to the main fiber orientation (v 1 ), i.e. the plane spanned by v 2 and v 3 ), we have the two eigenvalues λ 2 , λ 3 and their mean is λ 2 + λ 3 2 which is called the radial diffusivity (RD).
iii) Mean diffusivity (MD). Now let us focus on all the three eigenvalues (λs) jointly. If we have a collection of numbers and want to "understand" them we can start by computing their average (mean) (µ) which in this case is known as the mean diffusivity (MD)= λ 1 + λ 2 + λ 3 3 .
iv) Fractional anisotropy (FA). Similarly we can also compute the (second order) variability in these measures (variance) (σ 2 ) Now if we want to normalize σ 2 such that its range is between 0 and 1, we would simply need to apply the following linear transformation, σ 2 norm is the FA. min(σ 2 ) = 0 since variance is always positive. Now let us calculate max(σ 2 ) as this will reveal a few things about FA.
Now let us actually see when FA actually hits the maximum value. This will happen when such that λ 2 1 + λ 2 2 + λ 2 3 > 0.
Since we already have the order λ 1 ≥ λ 2 ≥ λ 3 ≥ 0, this will happen only when λ 1 > 0 and λ 2 = λ 3 = 0 (i.e. fully restricted diffusion in the direction perpendicular to the fiber orientation). In fact it is precisely the ordering of the λs that imposes dependency (covariability) between FA and AD. In some other words this dependency is by design, that is standard deviations (FA) of a set of numbers (eigenvalues) is driven higher by large values and by definition the large (eigen) values are called axial diffusivities. We also show this relationship using contour plots in Fig. 2 so that we can easily see that FA gets higher when λ 1 (largest eigenvalue i.e. AD) is higher (and crosses) for fixed MD and RD.

Plausible biological interpretations
While pinning down biological interpretations of DTI measures definitively from in vivo data is quite difficult, we can provide some qualitative discussions of plausible changes to fibers when effects on FA and AD are detected but those on MD and RD are harder to detect. The biggest tool we have for this exercise is visualization of the DTI ellipsoids as a function of the eigenvalues  We can notice from both the plots that for fixed values of MD and RD there is quite a bit of co-variability in FA and AD. More strongly so for contours of RD compared to those of MD, since λ 1 does not contribute to RD thus not saturating FA to its maximum as rapidly. More over nor these dependencies are linear nor monotonic (see for e.g. RD contour=2).
(λs). For this we need to pick a set of eigenvalues (λs) for which we have fixed MD and RD values, in other words we need to get the λs for MD and RD contours. Fig. 3 shows these contours. Note that while Fig. 2 shows the contours as a function of FA and AD, for visualization we need those as a function of the λs.    Fig. 4 (bottom box). These would be the examples of cases where one would be able to detect the effects on FA and AD but neither on MD nor on RD. Figure 4: Sample shapes of the diffusion tensors along MD and RD contours. The corresponding AD and FA values are listed below the corresponding tensors. Top box. All the ellipsoids have same MD value of 1.5 µm 2 ·ms −1 . We can notice the "elongation" of tensors as AD increases and also increase in FA. The decrease in AD (and FA) as the anxiety measure increases could perhaps be related to some sort of "shrinkage" in the tract in a voxel. Bottom box. All the ellipsoids have same RD value of 0.5 µm 2 ·ms −1 . You can see that biologically the fiber can be in qualitatively quite different conditions. As AD and FA increases we can see a "squishing" effect along one of the directions perpendicular (here we are showing squishing in the direction of v 3 ) to that of fiber orientation. The decrease in such an effect could perhaps be related to some sort of "unbinding" of the fibers in a voxel.