Variable structures in M87* from space, time and frequency resolved interferometry

The immediate vicinity of an active supermassive black hole—with its event horizon, photon ring, accretion disk and relativistic jets—is an appropriate place to study physics under extreme conditions, particularly general relativity and magnetohydrodynamics. Observing the dynamics of such compact astrophysical objects provides insights into their inner workings, and the recent observations of M87* by the Event Horizon Telescope1–6 using very-long-baseline interferometry techniques allows us to investigate the dynamical processes of M87* on timescales of days. Compared with most radio interferometers, very-long-baseline interferometry networks typically have fewer antennas and low signal-to-noise ratios. Furthermore, the source is variable, prohibiting integration over time to improve signal-to-noise ratio. Here, we present an imaging algorithm7,8 that copes with the data scarcity and temporal evolution, while providing an uncertainty quantification. Our algorithm views the imaging task as a Bayesian inference problem of a time-varying brightness, exploits the correlation structure in time and reconstructs (2 + 1 + 1)-dimensional time-variable and spectrally resolved images. We apply this method to the Event Horizon Telescope observations of M87*9 and validate our approach on synthetic data. The time- and frequency-resolved reconstruction of M87* confirms variable structures on the emission ring and indicates extended and time-variable emission structures outside the ring itself. A Bayesian imaging method for reconstructing radio emission in spatial, temporal and spectral dimensions confirms the structures on the time-varying emission ring of M87* observed by the Event Horizon Telescope, and identifies additional features.

at a time-dependent reconstruction, the measurement data have to be subdivided into multiple separate image frames along the temporal axis, leading to an extremely sparse Fourier space coverage in every frame.In the case of the EHT observation of M87*, data were taken during four 8-hour cycles spread throughout seven days.All missing image information needs to be restored by the imaging algorithm, exploiting implicit and explicit assumptions about the source structure.
Physical sources, including M87*, evolve continuously in time.Images of these sources separated by time intervals that are short compared to the evolutionary time scale are thus expected to be strongly correlated.Imposing these expected correlations during the image reconstruction process can inform image degrees of freedom (DOFs) that are not directly constrained by the data.
In radio interferometric imaging, spatial correlations can be enforced by convolving the image with a kernel, either during imaging, as part of the regularisation, or as a post-processing step.In our algorithm, we use a kernel as part of a forward model, where an initially uncorrelated image is convolved with the kernel to generate a proposal for the logarithmic sky brightness distribution, which is later adjusted to fit the data.The specific structure of such a kernel can have substantial impact on the image reconstruction.We infer this kernel in a non-parametric fashion simultaneously with the image.This substantially reduces the risk of biasing the result by choosing an inappropriate kernel, at the cost of introducing redundancies between DOFs of the convolution kernel and those of the pre-convolution image.
Metric Gaussian Variational Inference (MGVI) is a Bayesian inference algorithm that is capable of tracking uncertainty correlations between all involved DOFs, which is crucial for models with redundancies, while having memory requirements that grow only linearly with the number of DOFs [13].It represents uncertainty correlation matrices implicitly without the need for an explicit storage of their entries and provides uncertainty quantification of the final reconstruction in terms of samples drawn from an approximate Bayesian posterior distribution, with a moderate level of approximation.Compared to methods that provide a best-fit reconstruction, our approach provides a probability distribution, capturing uncertainty.
A limitation of the Gaussian approximation is its unimodality, as the posterior distribution is multi-modal [14].Representing multi-modal posteriors in high dimensions is hard if not infeasible.Therefore, our results describe a typical mode of this distribution, taking the probability mass into account.
MGVI is the central inference engine of the Python package Numerical Information Field Theory [8,15,16,NIFTy], which we use to implement our imaging algorithm, as it permits the flexible implementation of hierarchical Bayesian models.NIFTy turns a forward model into the corresponding backward inference of the model parameters by means of automatic differentiation and MGVI.For time-resolved VLBI imaging, we therefore need to define a data model that encodes all relevant physical knowledge of the measurement process and the brightness distribution of the sky.
This forward model describes in one part the sky brightness, and in another part the measurement process.For the sky brightness, we require strictly positive structures with characteristic correlations in space, time, and frequency.These brightness fluctuations can vary exponentially over linear distances and time intervals, which is represented by a log-normal prior with a Gaussian process kernel.The correlation structure of this process is assumed to be statistically homogeneous and isotropic for space, time, and frequency individually and decoupled for each sub-domain.Consequently the correlations are represented by a direct outer product of rotationally symmetric convolution kernels, or equivalently by a product of one-dimensional, isotropic power spectra in the Fourier domain.We assume the power spectra to be close to power laws with deviations modelled as an integrated Wiener processes on a double logarithmic scale [17].The DOFs, which finally determine the spatio-temporal correlation kernel, are inferred by MGVI alongside the sky brightness distribution.While the adopted model can only describe homogeneous and isotropic correlations, this symmetry is broken for the sky image itself by the data, which in general enforce heterogeneous and anisotropic structures.
The EHT collaboration has published data averaged down to two frequency bands at 227 GHz and 229 GHz.Therefore, we employ a simplified model for the frequency axis: We reconstruct two separate, but correlated images for these bands, with a priori assumed log-normal deviation on the 1 % level, which amounts to spectral indices of ±1 within one standard deviation.Our algorithm does not constrain the absolute flux of the two channels.Thus, we can recover the relative spectral index changes throughout the source but not the absolute ones.A detailed description of the sky model is outlined in the methods section.
We further require an accurate model of the instrument response.Just as the prior model is informed by our physical knowledge of the source, the instrument model is informed by our knowledge of the instrument.We consider two sources of measurement noise that cause the observed visibilities to differ from the perfect sky visibilities, the first being additive Gaussian thermal noise, whose magnitude is provided by the EHT collaboration in the data set.The other component consists of multiplicative, systematic measurement errors, which are mainly caused by antenna-based effects, e.g.differences in the measurement equipment, atmospheric phase shift, and absorption of the incoming electromagnetic waves.This source of errors can be conveniently eliminated by basing the model on derived quantities (closure amplitudes and phases), which are not affected by it.All those effects can be summarized in one complex, possibly time-variable, number per telescope, containing the antenna gain factors and antenna phases.
For VLBI on µas-scale, these effects can be prohibitively large.Fortunately, certain combinations of visibilities are invariant under antenna-based systematic effects, so called closure-phases and -amplitudes [18].These quantities serve as the data for our reconstruction (for details refer to Methods section).
We apply this method to the EHT data of the supermassive black hole M87*.With a shadow of the size of approximately four light days and reported superluminal proper motions of 6c [19], its immediate vicinity is expected to be highly dynamic and subject to change on a time scale of days.The exceptional angular resolution of the EHT allowed for the first time to image the shadow of this super-massive black hole directly and to confirm its variability on horizon scale.
In this letter, we present a time-and frequencyresolved reconstruction of the shadow of M87* over the entire observational cycle of seven days, utilizing correlation in all four dimensions (see fig. 1).The closure quantities do not contain information on the total flux and the absolute position of the source.Therefore, we normalize our results such that the flux in the entire ring is constant in time and agrees with the results of the EHT collaboration for the first frame of our reconstruction.To achieve an alignment of the source even in the absence of absolute position information we start the inference with the data of only the first two observation days and continue with all data until convergence.
Figure 2 displays the frequency-averaged sample mean image for the first observing day together with its pixel-wise uncertainty.In full agreement with the EHT result, our image shows an emission ring that is brighter on its southern part, most likely due to relativistic beaming effects.Additionally, we obtain two faint extended structures, positioned opposite to each other along the south-western and north-eastern direction.They do not have the shape of typical VLBIimaging artefacts, i.e. they are not faint copies of the source itself, and similar structures do not appear in any of our validation examples.We conclude that these structures are either of physical origin or due to unmodelled effects of the measurement in our algorithm.These include baseline-based calibration artefacts such as polarization leakage [3], and extended emission outside the field of view.The latter likely has only a small effect, as we do not use closures that contain intra-site baselines, and all others should be insensitive to the large-scale jet emission [4].The detection of additional significant source features, compared to the results by the EHT collaboration, is enabled by the usage of the data of all four observation days at once and thereby partially integrating the information.
Since our reconstruction is based on closure quantities that are not sensitive to absolute flux, the absolute spectral dependency is not constrained.Still, the relative spectral index variations w.r.t. an overall spectrum can be explored (see top row of fig.3).The map exhibits a higher relative spectral index in the southern portion of the ring which coincides with its brightest emission spot.However, the uncertainty map indicates that this feature is not significant and similar features falsely appear in the validation (see bottom row of fig.3).Therefore, we do not report any significant structures in the spectral behaviour of M87* and continue our analysis with frequency-averaged time frames.
The sky brightness for each day of the observation together with the absolute and relative differences between adjacent days is displayed in fig. 1.We report mild temporal brightness changes of up to 6 % per day, in particular within the western and southern parts of the ring, validating the observations made by [4].Figure 4 shows the detailed temporal evolution of a selected number of locations and areas.Our method consistently interpolates in between observations.Supplementary Video 1 also demonstrates the continuous evolution.In several locations our reconstruction agrees with the EHT's imaging results, whereas others clearly deviate.Especially at location 7, which corresponds to the extended structure in the south-western direction, the brightness decreases by about 5 % between adjacent days throughout the entire observation.This hints at a real and non-trivial temporal evolution.
Following the analysis of [4], we compute empirical characteristics of the asymmetric ring, i.e. diameter d, width w, orientation angle η, azimuthal brightness asymmetry A, and floor-to-ring contrast ratio f C .All findings are summarized in table 1 and compared to the results of the EHT collaboration [4]: We can confirm the stationary values for diameter d, width w, azimuthal brightness asymmetry A, and floor-to-ring contrast ratio f C during the seven days and a significant temporal evolution of the orientation angle η.The latter might be caused by flickering of emission spots [20].We report a slightly larger diameter d = (45 ± 3) µas, which does not significantly deviate from the result published by the EHT Collaboration of d = (42 ± 3) µas [1].
A collection of six validation examples has been assembled to assess accuracy and robustness of our method (figs.5 and 6).fig.7 shows spatial correlation spectra for our scientific and validation images.fig.8 displays the results of the imaging methods used by the EHT Collaboration together with our posterior mean and two samples for all observation periods.The temporal evolution of several samples is illustrated in Supplementary Video 2.
In conclusion, we present and validate the first Bayesian imaging method that is capable of simultaneously reconstructing emission over spatial, temporal and spectral dimensions from closure quantities, utilizing correlation and quantifying uncertainties via posterior samples.We provide the first independent confirmation of the overall morphology of the emis-  7] and computed for images published by the EHT collaboration (first three sections of table) as well as for our reconstruction (last two sections).Section four provides the result of the estimators and their standard deviations as defined by [4] applied to our posterior mean.Section five provides means and standard deviations based on processing our posterior samples individually through the estimators and by computing mean and 1-σ standard deviations from these results.sion ring around M87* and an apparent evolution of its orientation as published by the EHT collaboration.The frequency resolution allows us to obtain a relative spectral index map, together with an uncertainty estimation.For the data set at hand, significant spectral features could not be found.In addition to the emission ring, we resolve significant and potentially dynamic emission structures along the south-western and north-eastern direction.With future observations, our method may help to explore the intricate structure in the spatial, spectral, and temporal domain of M87* and other variable sources.To achieve this, the model can be extended with inference of the prior spectral correlation structure.

Methods
The reconstruction algorithm relies on Bayesian statistics.Thus, it consists of three essential components: the likelihood, the prior, and an inference scheme.
The likelihood is a probabilistic description of the measurement process including details on the measurement device.We choose to describe the measurement in terms of closure quantities that are invariant under antenna-based calibration effects.
The prior model captures all assumptions on the sky brightness distribution.Here we assume positivity at all times, correlation along the temporal, spatial, and spectral direction, as well as the possibility of variations on an exponential scale.This is implemented with the help of a Gaussian process prior of the logarithmic brightness distribution with unknown kernel.Below, a non-parametric kernel model is derived that assumes a stochastic process along each dimension individually.
This constitutes a Bayesian inference problem that is approximately solved by applying Metric Gaussian Variational Inference (MGVI) as inference scheme.This method requires a generative model formulation in which all model parameters are standard-normal distributed a priori.The generative function defined below associates these with the physical quantities (see fig. 9).
We describe all implementation details and give the reasoning behind our choice of hyperparameters and the inference heuristic.The method is validated on six simulated sources with a varying degree of dynamics, ranging from simple shapes to realistic black holes.To demonstrate the effect of hyperparameter choices, we perform 100 reconstructions of both a synthetic example and M87* with randomized hyperparameters within a certain range.All validation efforts show that the algorithm is able to reconstruct synthetic examples successfully and is stable under changes in the hyperparameters.

Likelihood
The likelihood of the measured visibilities given the sky brightness distribution s is computed independently for each time frame.The visibilities for all measured data points are assumed to follow the measurement equation in the flat sky approximation: R(s) AB := e −2πi(u AB x+v AB y) s(x, y) dx dy =: e ρ AB e iφ AB .
(2)     Here AB runs through all ordered pairs of antennas A and B for all non-flagged baselines, u AB and v AB are the coordinates of the measured Fourier points, s(x, y) is the sky brightness distribution as a function of sky angles x and y, and R is called measurement response.The visibilities R(s) AB are complex numbers and we represent them in polar coordinates as phases φ AB (s) ∈ R and logarithmic amplitudes ρ AB (s) ∈ R, i.e.R(s) AB = exp(ρ AB (s) + i φ AB (s)).We assume the thermal noise of the phase and logarithmic amplitude to be independently Gaussian distributed with covariance where d is the reported visibility data and σ is the reported thermal noise level.The operation diag(x) denotes a diagonal matrix with x on its diagonal.This is approximately valid for a signal-to-noise ratio larger than 5 [21], which is true for most of our data.
To avoid antenna based systematic effects, we compute closure quantities from these visibilities [21].Closure phases are obtained by combining a triplet of complex phases of visibilities via: Closure amplitudes are formed by combining the logarithmic absolute value of four visibilities: These closure quantities are invariant under antenna based visibility transformations of the form for all antennas and multiplicative calibration errors c A and c B , where * denotes the complex conjugate.Note that forming the closure phases is a linear operation on the complex phase, while forming the closure amplitudes is linear in the logarithmic absolute value.We can thus represent these operations using matrices: The closure matrices L and M are sparse and contain in every row ±1 for visibilities associated with the closure, and zero elsewhere.
The noise covariances N ρ and N φ of the closure quantities are related to N via: where † denotes the adjoint of the operator and N (n|0, N ) denotes a Gaussian distribution over n with mean 0 and covariance N .The mixing introduced by applying L and M leads to non-diagonal noise covariance matrices of the closure quantities.For a given antenna setup (of five or more antennas), more closure quantities can be constructed than visibilities are available, and therefore they provide a    redundant description of the data.For the logarithmic amplitudes ρ, we first construct all possible closure quantities and then map to a non-redundant set using the eigen-decomposition of N ρ .Specifically, we construct a unitary transformation U ρ where each column of the matrix is an eigenvector corresponding to a nonzero eigenvalue of N ρ .This transformation provides a map from the space of all possible closure amplitudes to the space of maximal non-redundant sets, with the additional property that the transformed noise covariance becomes diagonal.Specifically where Λ ρ denotes a diagonal matrix with the non-zero eigenvalues of N ρ on its diagonal.We can combine L and U ρ to form an operation that maps from the logarithmic amplitudes of visibilities ρ directly to the space of non-redundant closure amplitudes via and use it to compute the observed, non-redundant closure amplitude d from the published visibility data The resulting likelihood for closure amplitudes reads Closure phases are constructed differently to avoid problems induced by phase wraps.Adding or subtracting 2π from a phase does not change the result, and we need to preserve this symmetry in our algorithm.We thus can only add integer multiples of phases such as eq.( 4) and this prohibits using a direct matrix decomposition to find a maximal non-redundant closure set.
We build the closure sets to be used in the imaging with the help of a greedy algorithm that processes closure phases in the order of decreasing signal-to-noise ratio, as defined by the inverse of the diagonal of N φ (eq.( 8)).The algorithm collects closure sets into M until rank(M ) = dim(φ) ensuring that φ cl consists of a maximal non-redundant set.In principle, all maximal non-redundant closure sets are equivalent as long as one takes the non-diagonal noise covariance into account.The concrete choice might have a minor impact for our approximation of the closure phase likelihood.
Within our closure set, we can decompose the noise covariance N φ into a unitary matrix U φ and its eigenvalues Λ φ .Instead of working with the phases φ cl directly, we use their positions on the complex unit circle e iφ cl to define This mitigates the problem of phase wraps at the price of approximating the corresponding covariance.This approximation yields errors below the 1 % level if the signal-to-noise ratio is larger than 10.Most of the data points are above that threshold, and the error decreases quadratically with increasing signal-to-noise ratio.Since data with the lowest standard deviation are also the most informative, we believe the impact of the approximation on the reconstruction to be negligible.Given the closure phases on the unit circle ϕ, the corresponding phase likelihood can be written as where ϕ d = U φ e iM φ d .Note that eq. ( 13) is a Gaussian distribution on complex numbers with the probability density function as and Hermitian covariance X. Complex and real Gaussian distributions only differ in their normalization constant.We do not distinguish between them explicitly, as the normalization is irrelevant for our variational approach.

Modelling the sky brightness
The sky brightness distribution s xtν is defined within a fixed field of view and frequency range Ω ν ⊂ R, which renders it to be a field defined in space, time, and frequency.We assume s to be a priori log-normal distributed: with x ∈ Ω x , t ∈ Ω t , and ν ∈ Ω ν with P(τ |T ) := N (τ |0, T ).The a priori correlation structure of the logarithmic sky brightness τ is encoded within the covariance T .Choosing a log-normal model allows the sky brightness to vary exponentially on linear spatial, temporal, and frequency scales and ensures the positivity of the reconstructed intensity, similarly to [22,23].
We perform a basis transformation to a standardised Gaussian distribution P(ξ s ) = N (ξ s |0, 1), which allows us to separate the correlation structure from its realization [24].The new coordinates ξ s have the same dimension as the original parameters, but are a priori independent: This defines a generative model which turns standard normal distributed DOFs ξ s into random variables s that are distributed according to eq. ( 15).Although the information encoded in a distribution is invariant under coordinate transformations, MGVI depends on the choice of coordinates.Therefore, reformulating the entire inference problem in terms of standardised generative models is important to ensure that the prior information is fully captured by an approximation via MGVI.We visualize our generative model in fig.9.

Correlations in space, time, and frequency
We do not know the correlation structure of the logarithmic sky brightness a priori, so we include it as part of the model, which has to be inferred from the data.The different dimensions of the sky brightness are governed by completely distinct physical phenomena, which should be reflected in the model.Setting up such correlations involves a number of intricate technicalities.The main idea is to model the correlations in space, time, and frequency independently using the same underlying model and combine them via outer products.Doing this naively results in degenerate and highly un-intuitive model parameters.The model we introduce in the following avoids these issues, but unfortunately requires a certain complexity.
For now we consider the correlation structure along the different sub-domains individually.A priori we do not want to single out any specific location or direction for the logarithmic sky brightness, which corresponds to statistical homogeneity and isotropy.According to the Wiener-Khinchin theorem, such correlation structures T (i) with i ∈ {Ω x , Ω t , Ω ν } are diagonal in the Fourier domain and can be expressed in terms of a power spectrum p T (i) (|k|): where F (i) and k denote the Fourier transformation and Fourier coordinates associated to the space i, D (i)  is the dimension of i, δ denotes the Kronecker delta, and |k| is the Euclidean norm of the vector k.We choose our Fourier convention such that no factors of 2π enter the transformation F (i) , and thus its inverse has a factor of 1 /(2π) D (i) .As we build the model in terms of standardised coordinates ξ s , we work with the square root of the correlation matrix that converts those into the logarithmic brightness τ = A ξ s .The amplitude spectrum p (i) (|k|) depends on the characteristic length scales of the underlying physical processes, which we do not know precisely.Our next task is to develop a flexible model for this spectrum that expresses our uncertainty and is compatible with a wide range of possible systems.We model the amplitude spectrum in terms of its logarithm: We do not want to impose any functional basis for this logarithmic amplitude spectrum γ (i) (|k|), so we describe it non-parametrically using an integrated Wiener process in logarithmic l = log|k| coordinates.This corresponds to a smooth, i.e. differentiable, function, with exponential scale dependence [25].In the logarithmic coordinates l, the zero-mode |k| = 0 is infinitely far away from all other modes.Later on we deal with it separately and continue with all remaining modes for now.
The integrated Wiener process in logarithmic coordinates γ i (l) reads: (20) where l 0 is the logarithm of the first mode greater than zero.Without loss of generality, we set the initial offset to zero.Later on we explicitly parameterise it in terms of a more intuitive quantity.The parameter m (i) is the slope of the amplitude on double-logarithmic scale.It is a highly influential quantity, as it controls the overall smoothness of the logarithmic sky brightness distribution.Specifically, after exponentiation, the spectrum is given as a power law with multiplicative deviations, and the exponent of this power law is given by the slope.Therefore, a spectrum with slope zero indicates the absence of any spatial correlation in the image, whereas a slope of −1 indicates continuous, and −2 differentiable brightness distributions along the respective axis [26].The parameter η (i) describes how much the amplitude spectrum deviates from the power law.These deviations follow the smooth integrated Wiener process and can capture characteristic length scales of the logarithmic brightness distribution.Their precise shape is encoded in the realization ξ W |0, 1), which are also parameters of our model and follow a priori the standard Gaussian distribution.We do not want to fix the slope and deviations and therefore impose Gaussian and log-normal priors for j ∈ {m, η} respectively, with preference for a certain value µ (i) j and expected deviations σ (i) j thereof: with ξ ).The amplitude spectrum defines the expected variation U (i) of the log-brightness around its offset via The relation between γ (i) and U (i) is un-intuitive, but it is critical to constrain the expected variation to reasonable values as it has a severe impact on a priori plausible brightness distributions.Therefore we replace the variance amplitude (i.e. the square root of U (i) ) with a new parameter a (i) : Note that this step implicitly determines the offset of the Wiener processes in terms of a (i) .We elevate a (i)  to be a free model parameter and impose a log-normal model analogous to η (i) with hyperparameters µ a and σ (i) a .Next, we combine correlation structures in independent sub-domains.For every one of those, i.e. in our case space, time, and frequency, we use an instance of the model described above.We have not yet specified how to deal with the amplitude of the zero-modes p (i) (0), and their treatment emerges from the combination of the sub-domains.The overall correlation structure including all sub-domains is given by the outer product of the sub-spaces: This product introduces a degeneracy: ) for all α ∈ R + .With every additional sub-domain we add one additional degenerate degree of freedom.We can use this freedom to constrain the zero-mode of the amplitude spectrum, and thus remove the degeneracy up to a global factor.For this we normalize the amplitudes in real-space: The zero-mode of the normalised amplitude A (i) can be fixed to the total volume V (i) of the space Ω (i) .Consequently, the overall correlation structure is expressed as The remaining multiplicative factor α globally sets the scale in all sub-domains and has to be inferred from the data.Additionally, we put a log-normal prior with logarithmic mean µ α and standard deviation σ α hyperparameters and a corresponding standard Gaussian parameter ξ α on this quantity.This was the last ingredient for the correlation structure along multiple independent sub-domains and serves as a generative prior to infer the correlation structure in a space-time-frequency imaging problem.For the specific application to the EHT observations, however, only data averaged down to two narrow frequency channels is available.Therefore, as we do not expect to be able to infer a sensible frequency correlation structure using only two channels, we simplify eq. ( 26) to explicitly parameterize the frequency correlations as where is a hyperparameter that steers the a priori correlation between the frequency channels.
We briefly summarise all the required hyperparameters and how the generative model for the correlation structure is built.We start with the correlations in the individual sub-domains which we describe in terms of their amplitude spectra A (i) (ξ (i) ).Four distinct standardised model parameters are inferred from the data, ξ (i) := (ξ a ).The first describes the slope of the linear contribution to the integrated Wiener process.The second is related to the strength of the smooth deviations from this linear part.The third parameter describes the actual form of these deviations.Finally, the last one describes the real-space fluctuations of the associated field.
The hyperparameters are µ i j and σ i j for j ∈ {m, η, a} specifying the expected mean and standard deviation of the slope m (i) and expected mean and standard deviation for ln(η), ln(a), which are therefore enforced to be positive.In addition to these, we have to determine the global scale parameter α(ξ α ), for which we also specify the logarithmic mean µ α and standard deviation σ α .We determine the values for the hyperparameters of the logarithmic quantities through an additional moment matching step by explicitly specifying the mean and standard deviation of the log-normal distribution.The generative model for the correlation structure is therefore: with Combining this with the generative model for the sky brightness itself we end up with the full model: with Our model is now standardized and all its parameters ξ = (ξ A , ξ s ) follow a multivariate standard Gaussian distribution.The Bayesian inference problem is fully characterised by the negative logarithm (or information) of the joint probability distribution of data and parameters.Combining the closure likelihoods with the described sky brightness model therefore yields: where H 0 is a constant that is independent of the latent variables ξ.

Metric Gaussian Variational Inference
So far, we have developed a probabilistic model in the generative form of the joint distribution of data and model parameters.In the end we want to know what the data tell us about the model parameters, as given in the posterior distribution according to Bayes' theorem.
Our model is non-conjugate and we cannot solve for the result analytically.Instead, we approximate the true posterior distribution with a Gaussian using variational inference.This is fundamentally problematic, as we are approximating a multimodal posterior, which has multiple local optima, with a unimodal distribution.In the end, only one mode of the posterior will be captured by the variational distribution, underestimating the overall uncertainty.Some of these solutions can be considered equivalent.For example, the absolute source location is neither constrained by the closure phases nor by the prior, but it is also irrelevant for the analysis.However, this shift-invariance also introduces several unphysical and pathological modes in the posterior, which might have low probability mass, but are local optima.An example for this is the appearance of multiple or partial copies of the source all over the image.
Every reconstruction method that performs local optimization in the context of closure quantities potentially runs into these issues and our approach is no exception.Our chosen method and several procedures in our inference heuristic partially mitigate these issues and provide robust results.While we do not observe these pathological features in our main results, they do occur in the hyperparameter validation (see below).One principled way to overcome them is posterior sampling, but the scale of the envisioned inference task with 7.4 × 10 6 parameters is prohibitively large.
We use Metric Gaussian Variational Inference (MGVI), which allows us to capture posterior correlations between all model parameters, despite the large scale of the inference problem.MGVI is an iterative scheme that performs a number of subsequent Gaussian approximations N (ξ| ξ, Ξ) to the posterior distribution.Instead of inferring a parametrised covariance, an expression based on the Fisher information metric evaluated at the intermediate mean approximations is used, i.e.Ξ ≈ I(ξ) −1 , with The first two terms originate from the likelihood and the last from the prior.All of these are expressed in terms of computer routines and we do not have to store this matrix explicitly.This is a non-diagonal matrix capturing correlations between all parameters.To infer the mean parameter ξ we minimise the Kullback-Leibler divergence between the true posterior and our approximation: This quantity is an expectation value over the Gaussian approximation and measures the overlap between the true posterior and our approximation.As we minimise this quantity, the normalisation of the posterior distribution is irrelevant and we can work with the joint distribution over data and model parameters, as given by eq. ( 32).We estimate the KL-divergence stochastically by replacing the expectation value through a set of samples from the approximation.The structure of the implicit covariance approximation allows us to draw independent samples from the Gaussian for a given location: Using the mean of the Gaussian plus and minus samples corresponds to antithetic sampling [27], which reduces the sampling variance significantly, leading to performance increases.MGVI now alternates between drawing samples for a given mean parameter and optimising the mean given the set of samples.The main meta-parameters of this procedure are the number of samples and how accurately the intermediate approximations are performed.The procedure converges once the mean estimate ξ is self-consistent with the approximate covariance.To minimise the KL-divergence, we rely on efficient quasi-second-order Newton-Conjugate-Gradient in a natural gradient descent scheme.In the beginning of the procedure, the accuracy of KL and gradient estimates, as well as overall approximation fidelity, is not as important.In practice we gradually increase the accuracy with the number of MGVI iterations to gain overall speedups.

Implementation Details
We implement the generative model in NIFTy [15], which also provides an implementation of MGVI utilising auto-differentiation.We represent the spatial domain with 256 × 256 pixels, each with a length of 1 µas.In the time domain we choose a resolution of 6 hours for the entire observation period of 7 days, thus obtaining 28 time frames.The implementation of the generative model utilizes the Fast Fourier Transform and thus defines the resulting signal on a periodic domain.To avoid artefacts in the time domain, we add another 28 frames to the end of the observed interval, resulting in a temporal domain twice that size.
For the frequency domain, only two channels are available, and we do not expect them to differ much from each other.Instead of inferring the correlation along this direction, as we do for the spatial and temporal axis, we assume a correlation between the two channels on the 99 % level a priori, i.e. we set = 0.01.
This adds another factor of 2 of required pixels to the reconstruction.For future reconstructions with deeper frequency sampling we can extend the model and treat this domain equivalently to the space and time domains.Overall we have to constrain 256×256× 56×2+power spectrum DOFs ≈ 7.4 × 10 6 pixel values with the data.
The Gaussian approximation to the closure likelihoods is only valid in high signal-to-noise regimes [21].We increase the signal-to-noise ratio by means of an averaging procedure, which subdivides each individual scan into equally sized bins with a length of approximately 2 min.To validate that this averaging is justified we compare the empirical standard deviation of averaged data values with the corresponding thermal noise standard deviation and find their ratio to be 1.48 on average, consistent with the expected √ 2 for complex valued data.
The intra-site baselines of ALMA-APEX and SMT-JCMT probe the sky at scales larger than our field of view.To avoid contamination from external sources, we flag these intra-site baselines and exclude closure quantities that involve the respective pair.

Hyperparameters
The hyperparameter choices for the presented reconstruction are given in table 2. All hyperparameters except come in pairs of mean µ and standard deviation σ, parametrizing a Gaussian or log-normal distribution for a parameter.This indirect hyperparameter setting induces a form of parameter search on each parameter, restricting them to be within a few standard deviations of the mean.
An exception to this is the frequency domain for which we only have two channels available.Here, we set an a priori difference of 1 %.This is on the same order of magnitude as the relative difference in frequency, which is 0.9 %.The posterior can differ from this value, governed by the overall scale α.This parameter controls the a-priori expected variance of the average logarithmic sky brightness mean and difference of the two frequencies.For this overall scale α, we set the mean 0.2 with standard deviation 0.1.Since we normalize the flux of the final model, this parameter only controls the expected deviations of , and has no other major effect.A deviation of about half an e-fold would be expected with these hyperparameter settings, as it corresponds to the sum of two means.
Our choices regarding the remaining hyperparameter setting are motivated by being maximally agnostic with respect to the magnitude and shape of spatial correlations, while fixing the temporal correlations to be moderate.By constraining the a priori slope of the spatial amplitude to µ x m = −1.5 with a standard deviation of σ (x) m = 0.5 we allow the model to express structures ranging from the rough Wiener process to the smooth integrated Wiener process within one standard deviation.The overall variance of the logarithmic sky brightness with respect to its spatial mean is set to be a-priori log-normal distributed with mean 0.7 and standard deviation 1.A standard deviation larger than its mean induces a log-normal distribution with a heavy tail, thus allowing for potentially large posterior spatial fluctuations.
The flexibility parameter η specifies the degree to which the power spectrum can deviate from a powerlaw shape and thereby introduce characteristic lengthor time-scales.We choose small values for its mean (0.01) and standard deviation (0.001), discouraging such characteristic scales in both time and space.Still if necessary, strong deviations from a power law are possible if the data demand it (see fig. 7).
In the time domain we do not expect strong variability due to the physical scale of the system, extending over several light-days.We express this through the slope of the temporal amplitude, setting its expected mean to µ To test the sensitivity of our method, we perform a dedicated hyperparameter study in a later paragraph.

Inference Heuristic
Here we want to give the motivation behind the choices for our inference heuristic, as it is described in table 3.These are ad-hoc, but using the described procedure provides robust results for all examples given the described set of hyperparameters.
Our initial parametrization corresponds to a signal configuration that is constant in time and shows a Gaussian shape centred in the field of view with standard deviation of 30 µas.This breaks the translation symmetry of the posterior distribution, concentrating the flux towards the centre.It does not fully prevent the appearance of multiple source copies, but they are not scattered throughout the entire image.A similar trick is also employed in the EHT-Imaging pipeline.
The next issue we are facing is "source teleportation".Close-by frames are well-constrained by our assumed correlation, but the data gap of four days allows for solutions in which the source disappears at one place and re-appears at another.This is also due to the lack of absolute position information and not prevented by our dynamics prior.To avoid these solutions, we start by initially only using data of the first two days.For these we recover one coherent source, which is extrapolated in time.Once we include the data of the remaining two days, the absolute location is already fixed and only deviations and additional information to previous times have to be recovered.
The appearance of multiple source copies can be attributed to multi-modality of the posterior.The stochastic nature of MGVI helps, to some degree, to escape these modes towards more plausible solutions.Nevertheless, this is not enough for strongly separated optima.We therefore employ a tempering scheme during the inference.The phases constrain the relative locations in the image, whereas the amplitudes constrain the relative brightness.Smoothly aligning source copies while keeping the amplitudes constant is either impossible or numerically stiff.Allowing to violate the observed closure amplitudes for a short period of time makes it easier to align all copies to a single instance.We achieve this by not considering the closure amplitude likelihood during one intermediate step of MGVI.The same issue persists for the closure amplitudes.We therefore alternate between only phase-likelihood and amplitude-likelihood.In between these two we always perform a step using both likelihoods.We start this procedure after a fixed number of steps, allowing a rough source shape to form beforehand.In the end we use the full likelihood for several steps.MGVI requires specifying the number of sample pairs used to approximate the KL-divergence.The more samples we use, the more accurate the estimate, but the larger the overall computational load.We steadily increase the number of samples throughout the inference for two reasons.Initially the covariance estimate only inaccurately describes the posterior mode and a large number of samples would be a waste of computational resources.Additionally, fewer samples increase the stochasticity of the inference, which makes it more likely to escape pathological modes of the posterior.Towards the end, it is worth investing computational power into a large number of samples in order to obtain accurate uncertainty estimates.
Finally, we have to specify how and how well the KL is optimized in every MGVI step.In the beginning, we do not want to optimize too aggressively, as we only use a limited number of samples and we want to avoid an over-fitting on the sample realizations.We therefore use the LBFGS [28] method with an increasing number of steps.For the last period, where we have accurate KL estimates, we employ the more aggressive natural gradient descent equivalent to scipy's NewtonCG algorithm [29] to achieve deep convergence.
To demonstrate the robustness of this procedure we perform the reconstruction of M87* and the six validation examples (see below) for five different random seeds, in total 35 full reconstructions.Using the described heuristic, we do not encounter any of the discussed pitfalls, and we obtain consistent results.This corresponds to a success rate of at least 97 %.

Method validation
Synthetic observations We validate our method on six synthetic examples, three of which exhibit temporal variation.The first two time-variable examples are crescents with an evolution of the angular asymmetry on time scales similar to what was measured by the EHT collaboration for M87*.They are toy models of the vicinity of the black hole and are defined analogously to [4, Section C.2]: where r 0 is the ring radius, A the ring asymmetry, w the full width half maximum of the ring, and x, y, and t are space and time coordinates.We choose two sets of parameters.The first, called eht-crescent, follows the validation analysis of the EHT Collaboration [4]: r 0 = 22 µas, A = 0.23, and w = 10 µas.The second, called slim-crescent, has a smaller radius, a more pronounced asymmetry, and a sharper ring: r 0 = 20 µas, A = 0.5, and w = 3 µas.As a third example, called double-sources, we choose two Gaussian shapes b(t, x, y) with full-width half maximum r = 20 µas that approach each other: b 1 (t, x, y) ∝ b1 (α sin(φ), α cos(φ); t, x, y)+ + b1 (−α sin(φ), −α cos(φ); t, x, y), ( where x, y, and t are space and time coordinates, α is the time-dependent distance, and φ the timedependent angle: The static examples consist of a uniform disk with blurred edges and two simulations of black holes, challenge1 and challenge2, taken from the EHT imaging challenge [30].The brightness of the blurred disk with a diameter of 40 µas is given by: where x and y again denote the spatial coordinates.
For our validation we simulate the M87* observation, using the identical uv-coverage, frequencies, and time Parameter mean std.deviation log-mean log-std.deviation  sampling.We set the total flux of the example sources to 0.5 Jy and add the reported thermal noise from the original observation.We do not add non-closing errors, such as polarization leakage.We also ignore the existence of large-scale emission around the source, as it would be expected for M87* [31].This kind of emission only has a significant contribution to the intrasite baselines [4].By excluding these, we make sure that the large-scale emission does not affect our results.The reconstruction follows the identical procedure as for M87*, using the same hyperparameters and pixel resolution.
The results of the dynamic examples versus the ground truth and the pixel-wise uncertainty are shown in fig. 5.For all static examples, we do not find timevariability in the reconstructions.Thus, we only show the first frame versus ground truth, smoothed ground truth, and the pixel-wise uncertainty in the figure .As the likelihood is invariant under shifts, offsets in the reconstruction are to be expected.We are able to recover the shapes of the different examples, irrespective of the source being static or not.
The ring-parameter analysis is applied to the two crescent scenarios as well.The results for the recovered diameter d, width w and orientation angle η are shown in tables 4 and 5.Here we compare the ground truth to the analysis of the mean reconstruction, following the approach of the EHT collaboration.In order to propagate the uncertainty estimate of our reconstruction directly, we can extract the crescent parameters of all samples individually to obtain a mean estimate with associated uncertainty.The variational approximation has the tendency to under-estimate the true variance and in this case should be regarded more as a lower limit.For the estimation of the ring diameter we adopt the approach described in Appendix G of [4] to correct the diameter for the bias due to finite resolution.We further discuss the recovered spatial correlation structures of the log-brightness in ?? .
Starting with the first crescent, we recover well the diameter d, orientation angle η, and asymmetry A. The ground truth is within the uncertainty of both procedures.The width w of the crescent is below the angular resolution of the telescope, so it is not surprising that we do not fully resolve it in the reconstruction.Both ways to calculate the uncertainty do not account for the discrepancies.Interestingly, all quantities, except for the orientation angle, are static in time.For this example, we additionally show the temporal evolution of selected points in fig. 10, analogously to M87*.The reconstruction follows the dynamics of the ground truth, as indicated by the dashed line.
The reconstruction of the slim-crescent proves more challenging.Due to the weak signal, we do not recover the faint part of the circle.For an accurate extraction of the ring parameters, however, this area is vital to constrain the radius.Here, we only recover the orientation angle well.The diameter estimate has large error bars when following the approach of the EHT collaboration.In this scenario the uncertainty estimate appears too conservative.In contrast to that, using samples for the uncertainty provides significantly smaller error bars.This could be due to the variational approximation, which tends to under-estimate the true uncertainty.
The dynamics of the two Gaussian shapes are recovered accurately and our model correctly interpolates through the gap of three days without data.
Overall, our method is capable of accurately resolving dynamics that are comparable to the ones expected in M87*.Therefore, our findings regarding the temporal evolution of M87* may be trusted.
Figure 3 shows the relative spectral index of M87*, as well as the eht-crescent validation example.In both cases, an increased spectral index coincides with the brightest spot on the ring.This is not a feature of the validation example, as we use a constant spectral index throughout the source.The apparent feature could emerge from different coverage, as well as a bias due to the unimodal approximation.Nevertheless, these features are insignificant as our reported posterior uncertainty is large enough to be consistent with a constant spectral index throughout the image.This finding is not surprising due to the small separation of the two channels.In principle our method is capable of providing a spectral index, but in this application the data is inconclusive.
The reconstructions of the three static examples are shown in fig.6.For illustrative purposes we show not only the ground truth, but also a blurred image of the ground truth, which we obtain by convolving with a Gaussian beam of 12 µas.Overall we recover the general shape and main features of the sources.
None of the validation reconstructions yield imaging artefacts that appear in any way similar to the elongated structure that our algorithm recovers in the south-western and north-eastern directions of M87*.Especially the eht-crescent model is accurately recovered without a trace of spurious structures.We conclude that the elongated features of M87* are either of physical origin or due to baseline-based errors and that they are not an artefact introduced by our imaging technique.

Hyperparameter validation
To study the sensitivity of our results with regard to hyperparameters, we repeat the reconstruction of M87*, as well as eht-crescent, with 100 randomized, but shared configurations.We do not vary the standard-deviation related hyperparameters, but sample the corresponding mean hyperparameters uniformly within three respective standard deviations.For the expected frequency deviation we sample logarithmically uniformly between 0.001 and 0.5.Some of these configurations are numerically unstable and will result in errors.Other configurations do not facilitate the emergence of a single source and exhibit typical VLBI artefacts, especially ]) April 44.5 ± 0.7 10.0 ± 0.8 150.0 ± 0.0 0.23 ± 0.00 0.000 April 44.5 ± 0.7 10.0 ± 0.8 152.9 ± 0.0 0.23 ± 0.00 0.000 April 44.5 ± 0.7 10.0 ± 0.8 164.3 ± 0.0 0.23 ± 0.00 0.000 April 44.5 ± 0.7 10.0 ± 0.9 167.1 ± 0.0 0.     source doubling throughout the image.This behaviour is easy to detect and we label the results manually.
The resulting mean sky brightness distributions can be found in figs.11 and 12.The algorithm fails in 30% of the cases, results in artefacts 5.5% of the time, and facilitates the emergence of a single ring in 64.5% of all cases.All of the latter cases exhibit extended structures in the case of M87*, whereas we do not observe any similar features for eht-crescent.We are therefore confident that these do not originate from the choice of the hyperparameters.
For a significant portion of the parameter configurations we do not find the shift of the brightness asymmetry.However, the results with a static source all exhibit a significantly higher reduced χ 2 value compared to the reconstructions that feature a shift in brightness asymmetry.In 23% of all test cases we obtain reconstructions with shifting asymmetry, all of which are consistent with the main result of this paper.Figure 13 shows that all reported ring fit parameters of our main result including their uncertainties are fully consistent with the hyperparameter validation.
There are two possible explanations for the absence of asymmetry shifts.First, the prior on the temporal evolution already favours slow dynamics and sampling even more extreme values for this validation might lead to static reconstructions.Second, the inference heuristic was optimized for the parameter sets similar to the one used for the main result and not for the large variety of cases.They numerically pose completely different challenges and might converge more slowly or exhibit different optima.Improvements in the heuristic would most probably lead to a more robust behaviour for a larger parameter range.

Data consistency
The time-resolved residuals-χ 2 of the closure phases and amplitudes for all validation examples, as well as for M87* are shown in table 6.Additionally, in fig.14, we display the noise-weighted residuals for the M87* reconstruction for the four observation periods as a function of time.We show the residual values for all posterior samples and for both frequency channels.In fig.15 we show residuals for three baselines on April 11th, similar to fig. 13 of [6].Note that the apparent time evolution is largely due to the rotation of the Earth, and not due to intrinsic source variability.Our inspection of the residuals validates that temporal changes in the data are captured by the reconstruction, as there is no systematic change of the residuals as time progresses for any of the four periods.
By using only closure quantities, station-dependent calibration terms have been fully projected out for our reconstruction.Since [3] does not only perform partial calibration but also estimates the magnitude of the residual gains, performing self-calibration on our reconstruction provides an important consistency check.Our reconstruction is not a single result but rather a collection of approximate posterior samples, so individual calibration solutions need to be computed for each of them.Thereby, we obtain an uncertainty estimate on the gains, which we expect to be consistent with the pre-calibrated gains from the telescope.
The negative log-amplitude-gains for all stations and days are shown in fig.16, and, for reference, fig.17 depicts the sample-averaged dirty images of the calibrated data, overlaid with contours of the posterior mean image.We can reproduce the issues with the calibration of the station LMT that have been reported 1.1, 0.9 1.1, 0.8 1.1, 0.9 1.1, 0.9 m87 (EHT-imaging) 1.0, 1.0 1.0, 1.0 1.0, 0.8 1.0, 1.0   The vertical lines and shaded area display the approximate posterior means and 1-σ standard deviations as reported in table 1 and Supplementary table 4.  Note that this data is not directly employed in our algorithm since we further process the data in order to take correlations between different closure phases into account, as described in the likelihood subsection of the methods section.
by the EHT collaboration [4].Apart from those, the pre-calibrated visibilities agree with our result within the uncertainty.

Reconstruction of the Correlation Structure
The recovered spatial correlation structures for the logbrightness, as well as the brightness itself are shown in fig. 7. The relation between the power spectrum of the brightness P s and the log-brightness |A| 2 is given by: where F denotes the Fourier transformation as defined in eq. ( 31).On large scales, these agree with the ground truth to within the error bounds.Our examples do not have prominent small-scale features, so the ground truth power spectra drop off rapidly.
We have only limited data on these scales due to the measurement setup, so the reconstruction is primarily informed by the prior distribution.As the prior favours power-law like behavior, the large scale information about the slope of the spectrum is extrapolated as a straight line towards small-scale modes.Therefore, deviations from a straight line cannot be captured in these regions and the variability of these deviations is limited by the prior variance.
In addition, the posterior statistical properties of the power spectrum cannot be fully captured by the variational approximation of MGVI.In particular, for small-scale features, the posterior uncertainty is asymmetric since deviations above and below the mean have an asymmetric effect on the observed data: if the mean power of these scales is small compared to the power on large scales, further decreasing the power on these scales has almost no effect on the observed data whereas increasing the small-scale power has a significant impact.The forced symmetry of the posterior uncertainty can lead to an over-estimation of the small-scale power as the uncertainty towards less power is underestimated (see fig. 7).
On large image scales, where good constraints from the data are available, the correlation matches the ground truth exceptionally well, including characteristic features such as the disk diameter.The spectra of the two simulations based on the EHT imaging challenge, challenge1 and challenge2, are an exception.We believe that the mismatch is explained by the diverse and pronounced structure of the simulations on all scales that cannot be resolved by the data.

Data Availability
The data this work is based on have been published by the Event Horizon Collaboration [3,4] and are available at [9].We provide a set of 160 antithetic sample pairs of the sky brightness from the approximate posterior distribution, which can be used to propagate uncertainty to any derived quantity.The samples are available at [32].

Figure 1 :
Figure 1: Temporal evolution of the brightness distribution.All figures are constrained to half the reconstructed field of view.The first row shows time frames of the image cube, one for each day.The second row visualises the brightness for day N + 1 minus day N .Red and blue visualises increasing and decreasing brightness over time, respectively.The third row visualises the relative difference in brightness over time.The over-plotted contour lines show brightness in multiplicative steps of 1 / √ 2 and start at the maximum of the posterior mean of our reconstruction.The solid lines correspond to factors of powers of two from the maximum.

Figure 2 :
Figure 2: Brightness distribution on the first day.The top row shows the reconstructed mean and relative error.Note that the small-scale structure in regions with high uncertainty in the error map is an artefact of the limited number of samples.The bottom left shows a saturated plot of the approximate posterior mean, revealing the emission zones outside the ring.The bottom right shows the result of the EHT-imaging pipeline in comparison, saturated to the same scale and with overplotted contour lines.The over-plotted contour lines show brightness in multiplicative steps of 1 / √ 2 and start at the maximum of the posterior mean of our reconstruction.The solid lines correspond to factors of powers of two from the maximum.

Figure 3 :
Figure 3: Relative spectral index.Mean and pixel-wise uncertainty of the relative spectral index, as calculated from the 227 GHz and 229 GHz channels for M87* (top) and the eht-crescent example (bottom).

Figure 4 :
Figure 4: Temporal evolution at selected locations.The brightness and flux for approximate posterior samples and their ensemble mean at specific sky locations and areas as indicated in the central panel.The peripheral panels show brightness and flux values of samples (thin lines) and their mean (thick lines).Of those, the bottom right one displays the flux inside (red) and outside the circle (green), as well as the sum of the two (blue).For comparability, only brightness within the field of view of the EHT collaboration image, indicated by the black box in the central plot, is integrated.The remaining panels give the local brightness for the different locations labelled by numbers in the central panel.The single-day results from EHT-imaging are indicated as points.

Figure 5 :
Figure 5: Validation on synthetic observations of time-variable sources.In the figure, time goes from left to right showing slices through the image cube for the first time bin of each day.Different source models are shown from top to bottom: eht-crescent, slim-crescent, and double-sources.For each source the ground truth, the approximate posterior mean of the reconstruction, and the relative standard deviation, clipped to the interval [0, 1], are displayed (from top to bottom).The central three columns show moments in time in which no data is available since data was taken only during the first and last two days of the week-long observation period.

Figure 6 :Figure 7 :Figure 8 :
Figure 6: Validation for static sources.We show two scenarios from the EHT imaging challenge and a uniform disk.The rows depict the ground truth, the smoothed ground truth, the approximate posterior mean, and the relative standard deviation for our three static validation examples.The plots in the first three rows are normalized to their respective maximum, are not clipped, and the minimum of the colour bar is zero.In the last row the colour bar is clipped to the interval [0, 1].

Figure 9 :
Figure 9: Graphical structure of our model.The hierarchical model that was used as prior on the four-dimensional (frequency, time and space) image s, as described in the methods section.The round dashed nodes represent the inferred latent parameters, which are independent normal distributed a priori.The solid rectangular nodes represent computation steps.Arrows denote dependencies.All hyperparameters are marked in teal.The upper half of the diagram describes our non-parametric model of the power spectra in temporal and spatial domains.The lower half specifies how the four dimensional image is obtained from additional latent parameters and the power spectra.
imposing long correlations in time.The overall fluctuations are again relatively unconstrained with mean 0.2 and standard deviation 1.

Figure 10 :
Figure 10: The time evolution of the validation data set eht-crescent.Analogous to fig. 4. The dashed lines represent the ground truth.In subfigures 5 to 7 the ground truth is constantly zero.

Figure 12 :
Figure 12: Approximate posterior mean of the eht-crescent validation for 100 different hyperparameter settings, analogous to Supplementary fig.11.

Figure 13 :
Figure 13: Histograms of the crescent parameters over all hyper parameter validation runs with a total reduced χ 2 value below 1.2.The vertical lines and shaded area display the approximate posterior means and 1-σ standard deviations as reported in table1and Supplementary table 4.

Figure 14 :
Figure 14: Absolute noise-weighted residuals for the closure phases ϕ (left column) and closure amplitudes (right column) for the two frequency channels (227 GHz in red, and 229 GHz in blue), for the M87* reconstruction for all samples.Each row corresponds to one of the four observational days and the x-axis of each figure corresponds to the time (in hours) progressed on the corresponding day.

Figure 15 :
Figure 15: Closure phases for selected antenna triplets in time.The plot shows observations made at 229 GHz (black dots) on April11th with black lines indicating the 1-sigma measurement error, as well as artificial measurements of 20 uncurated posterior samples (transparent maroon lines) and the corresponding posterior mean (maroon line).Note that this data is not directly employed in our algorithm since we further process the data in order to take correlations between different closure phases into account, as described in the likelihood subsection of the methods section.

Figure 16 :
Figure 16: Normalized negative log-amplitude residual gains.Mean and 1-σ standard deviation of normalised negative logamplitude residual gains g.The a priori noise budget is taken from [4, tab.14].

Table 2 :
Hyperparameters of the generative model.The first column indicates the symbol of the parameter, as used in the manuscript.The second and third column denote the a priori mean and standard deviation.All quantities for which positivity is enforced are modelled as log-normal distribution.Their corresponding logarithmic mean and logarithmic standard deviation are reported in the fourth and fifth columns.The expected frequency deviation is fixed and not variable.

Table 3 :
Minimisation scheme used for the inference.In addition to the mentioned samples, their antithetic counterparts were used as well.

Table 4 :
The crescent parameters recovered from the eht-crescent validation example versus ground truth.Analogue to table 1.

Table 5 :
The crescent parameters recovered from the slim-crescent validation example versus ground truth.Analogue to table1.

Figure 11 :
Approximate posterior mean of M87* for 100 different hyperparameter settings within ±3 standard deviations around our chosen hyper parameters.The colored borders indicate the quality of the result: green for results that are visually similar to the main run, red for unphysical results, and black (with plain white image content) for runs in which the algorithm numerically diverged.The green thick and thin border corresponds to results with total reduced χ 2 value below and above 1.2, respectively.

Table 6 :
Sample-averaged reduced χ 2 values.The left and right values are the reduced χ 2 values for the closure phase and the closure amplitude likelihood, respectively.The reduced χ 2 values for the EHT-imaging results are computed for the likelihood presented in this work.Note that the EHT-imaging results themselves have been obtained using a different likelihood and the reduced χ 2 value is not a posterior average but computed using the mean image.