Posterior samples of the parameters of binary black holes from Advanced LIGO, Virgo’s second observing run

This paper presents a parameter estimation analysis of the seven binary black hole mergers—GW170104, GW170608, GW170729, GW170809, GW170814, GW170818, and GW170823—detected during the second observing run of the Advanced LIGO and Virgo observatories using the gravitational-wave open data. We describe the methodology for parameter estimation of compact binaries using gravitational-wave data, and we present the posterior distributions of the inferred astrophysical parameters. We release our samples of the posterior probability density function with tutorials on using and replicating our results presented in this paper.

used in the likelihood are generated using the IMRPhenomPv2 30,31 waveform model implemented in the LIGO Algorithm Library (LAL) 32 . The parameters ϑ → measured in the ensemble MCMC for these seven events are: right ascension α, declination δ, polarization ψ, component masses in the detector frame m 1 det and m 2 det , luminosity distance d L , inclination angle ι, coalescence time t c , magnitudes for the spin vector a 1 and a 2 , azimuthal angles for the spin vectors θ 1 a and θ 2 a , polar angles for the spin vectors θ 1 p and θ 2 p . We analytically marginalize over the fiducial phase φ. For efficient sampling of the parameter space and faster convergence of the Markov chains, we apply a transformation from the mass parameters that define the prior (m 1 det , m 2 det ) to chirp mass and mass ratio 1/5 . While sampling, we allow the mass ratio q to be both greater and less than 1.
For GW170104, we assume uniform priors for detector-frame component masses m 1,2 det ∈ [5.5, 160) M ⊙ . When generating the waveform in the MCMC, the masses are transformed to the detector-frame chirp mass  det and q with a restriction   . < < . www.nature.com/scientificdata www.nature.com/scientificdata/ for GW170809, d L ∈ [10, 1500) Mpc for GW170814, d L ∈ [10, 3000) Mpc for GW170818, and d L ∈ [10, 5000) Mpc for GW170823. The priors for the remaining parameters are the same for all the events. For spin magnitudes, we use uniform priors a 1,2 ∈ [0.0, 0.99). We use a uniform solid angle prior for the spin angles, assuming a uniform distribution for the spin azimuthal angles θ π ∈ [0, 2 ) 1,2 a and a sine-angle distribution for the spin polar angles θ 1,2 p . We use uniform priors for the arrival time t c ∈ [t s − 0.1 s, t s + 0.1 s) where t s is the trigger time of the event being analyzed, reported in [2][3][4]6 . For the sky location parameters, we use a uniform distribution prior for α ∈ [0, 2π) and a cosine-angle distribution prior for δ. We use a uniform prior for the polarization angle ψ ∈ [0, 2π) and a sine-angle distribution for the inclination angle ι prior. The mass and spin priors for GW170104 are the same as those mentioned for the final analysis using the "effective precession" model in ref. 2 .
The parameter estimation analyses of the events produce samples of the posterior probability density function in the form of Markov chains. Successive states of these chains are not independent, as Markov processes depend on the previous state 33 . Independent samples are obtained from the full Markov chains by "thinning" or drawing samples from chains of the coldest temperature, with an interval of the autocorrelation length 11,33 . These independent samples are used to calculate estimates for the model parameters from the analysis.
Posterior probability density functions. Independent samples from the ensemble MCMC chains from the analyses of all the seven events are available for download at the data release repository for this work 34 . We encourage use of these data in derivative works. The repository also contains IPython notebooks 35 demonstrating how to read the data from the files and manipulate them, and provide examples of reconstructing the figures presented in this paper.
Samples of the varied parameters in the MCMC can be combined to obtain posteriors for other derivable parameters. We map the values for the detector-frame masses (m 1 det , m 2 det ) and the luminosity distance d L from the runs to source-frame masses (m 1 src , m 2 src ) using the standard Λ-CDM cosmology 36,37 . While visualizing and quoting the detector-frame and source-frame masses, we use = = q m m m m / / 1 det 2 det 1 src 2 src where m 1 det and m 1 src refer to the more massive black hole, and m 2 det and m 2 src refer to the less massive black hole in the binary; ie. we present our results with q ≥ 1. We also map the component masses to parameters such as the chirp mass  and the mass ratio q, and map the component masses and spins to the effective inspiral spin parameter χ eff and the effective precession spin parameter χ p 30,31 . Our measurements show that all the events are in agreement with being binary black hole sources.
In order to obtain an estimate for a particular parameter, the other parameters that were varied in the ensemble MCMC can be marginalized over in the posterior probability density function. Recorded in Table 1, is a summary of the median and 90% credible interval values of the main parameters of interests obtained from the analyses of all seven O2 binary black hole events. The marginalized distributions for − m m 1 src 2 src , q − χ eff , and d L − ι for the seven events are shown in Figs 1, 2 and 3 respectively. The two-dimensional plots in these figures show 90% credible regions for the respective parameters.
Our results show that GW170729 is the largest mass binary black hole signal and GW170608 is the smallest mass binary black hole signal from the detections during O1 and O2. Parameter estimates of the binary black holes observed during O1 were presented in refs 7,11 . GW170814 seems to have lesser support for asymmetric mass ratios than the other events. All the events have low effective spin values. GW170814 has more support for face-on systems, whereas GW170809 and GW170818 has a preference for face-off systems. For GW170608, there is preference for both face-on (ι = 0) and face-off (ι = 180). GW170104, GW170729, and GW170823 has support for face-on (ι = 0), face-off (ι = 180) and edge-on (ι = 90). Face-on systems are those for which the inclination angle ι = 0; ie. the line of sight is parallel to the binary's orbital angular momentum. Face-off systems are those for which ι = π (the line of sight is anti-parallel to the binary's orbital angular momentum). We also computed χ p for each of the events and found no significant measurements of precession. GW170608 seems to be observed at the closest luminosity distance and GW170729 the farthest among the O2 binary black holes.  www.nature.com/scientificdata www.nature.com/scientificdata/ Figure 4 shows the 90% credible regions for the sky location posterior distributions of all the seven binary black hole events in a Mollweide projection and celestial coordinates. GW170818 and GW170814 have substantially small sky localization areas as they were detected by the H1L1V1 three-detector network, with a significant signal-to-noise ratio (SNR) contribution from all the detectors. The GW170729 and GW170809 parameter estimation analyses use data from all three detectors in the network. However, the SNR in Virgo is not significant, causing the sky localization area to be broader than in the cases of GW170814 and GW170818. The sky localization area of GW170809 is smaller as compared to GW170729, as the former has a higher network SNR than the latter; the sky localization area varies inversely as the square of the SNR. The events observed by the H1L1 Fig. 1 Posterior probabilities of the source frame primary mass m 1 src and secondary mass m 2 src from the PyCBC Inference analyses of the seven gravitational-wave signals from binary black hole mergers in Advanced LIGO-Virgo's second observing run (O2). Plotted are the 90% credible contours in the 2D plane. The measurements suggests that GW170729 has the highest masses and GW170608 has the lowest masses among all black hole binaries observed in O1 and O2. Parameter estimates of the O1 binary black holes were presented in refs 7,11 . Fig. 2 Posterior probabilities of the asymmetric mass ratio q and the effective inspiral spin χ eff from the PyCBC Inference analyses of the seven gravitational-wave signals from binary black hole mergers in Advanced LIGO-Virgo's second observing run. Plotted are the 90% credible contours in the 2D plane. All the events have low χ eff values. GW170814 has lesser support for asymmetric mass ratios than the other events.
www.nature.com/scientificdata www.nature.com/scientificdata/ two-detector network-GW170104, GW170608, GW170823 have poor sky localization, with GW170823 having the lowest network SNR and broadest sky localization area, and GW170608 having the highest network SNR and smallest sky localization area.
Estimates of the parameters for these events were previously published in the LIGO-Virgo Collaboration (LVC) detection papers for these events [2][3][4]6 . The results from our analyses are overall in agreement with the estimates published by the LVC within the statistical errors of measurement of the parameters. Any small discrepancies in the measurement of the parameters would be due to the differences in the analysis methods. One of the differences is the method of the PSD estimation. Another such difference is that we do not marginalize over calibration uncertainties of the measured strain 38 , whereas the LVC analyses use a spline model to fit the calibration uncertainties. The true impact of calibration errors on the parameter estimates should be evaluated have support for face-on (ι = 0), face-off (ι = 180) and edge-on (ι = 90). For GW170608, there is a stronger preference for the system being face-on (ι = 0) and face-off (ι = 180). For GW170814, there is a stronger preference for the system being face-on (ι = 0). For GW170809 and GW170818 there is a stronger preference for face-off (ι = 180). GW170608 is observed at the closest luminosity distance and GW170729 the farthest.

Fig. 4
Posterior probabilities for the sky location parameters-right ascension and declination from the PyCBC Inference analyses of the seven gravitational-wave signals from binary black hole mergers in Advanced LIGO-Virgo's second observing run. Plotted are the 90% credible contours in Mollweide projection and celestial coordinates; the right ascension is expressed in hours and the declination in degrees. GW170818 and GW170814 have substantially small sky localization areas, being detected by the H1L1V1 three-detector network, with considerable SNR in all the detectors. The GW170729 and GW170809 analyses used data from the three-detector network. However, the sky localization area is broad due to low Virgo SNR. Between GW170729 and GW170809, the latter has a higher network SNR leading to a smaller sky localization area. GW170104, GW170608, GW170823 have poor sky localization, as they were detected by the H1L1 two detector network; GW170823 has the lowest network SNR and broadest sky localization area, and GW170608 has the highest network SNR causing the smaller sky localization area.
www.nature.com/scientificdata www.nature.com/scientificdata/ using a physical model of the calibration, which does not exist currently in any analysis. This will be revisited in a future work.

Data records
The data products from the parameter estimation analyses for the seven events are stored in seven HDF 39 files, available within the Zenodo data release repository 34 for this work. The location of these HDF files within the repository are listed in Table 2. In this section, we describe the contents of these seven HDF files.
The top-level of each HDF file contains attributes named ifos, variable_args, posterior_only, and lognl. variable_args is a list of the inferred model parameters. For these seven analyses this includes: the coalescence time (tc), distance (distance), inclination angle (inclination), polarization angle (polarization), right ascension (ra), declination (dec), detector-frame component masses (mass1 and mass2), azimuthal angles of the spin vector (spin1_azimuthal and spin2_azimuthal), polar angles of the spin vector (spin1_polar and spin2_polar), and magnitudes of the spin vector (spin1_a and spin2_a). mass1, spin1_a, spin1_polar, spin1_azimuthal in the files refer to the primary black hole in the binary. mass2, spin2_a, spin2_polar, spin2_azimuthal refer to the secondary black hole in the binary.
ifos stores the list of the names of interferometers from which data has been analyzed in each run. The attribute posterior_only is a Boolean where a True value indicates that the posterior samples and likelihood statistics are stored as flattened arrays in the files. lognl stores the value of the noise likelihood, which is described below.
The independent samples of the model parameters are stored in a top-level HDF group, named ['samples']. For each parameter listed in the variable_args attribute, the ['samples'] HDF group contains an HDF dataset that is a one-dimensional array indexed by the independent samples. Therefore, the set of parameters for the i-th independent sample is the i-th element of each array. For example, ['samples/mass1'] 32 and ['samples/mass2'] 32 are the masses for the 32-nd independent sample. Samples in the mass1 and mass2 data sets are in solar mass units, those in distance are in Mpc units, those in tc are in seconds, and those in spin1_a and spin2_a are dimensionless. Samples in the spin1_polar, spin2_polar, spin1_azimuthal, spin2_azimuthal, inclination, ra, dec, and polarization are in radians.
The second top-level HDF group is ['prior_samples'], which stores prior samples in a similar format as the ['samples'] group described above. For each of the parameters listed in the variable_args attribute, the ['prior_samples'] HDF group contains an HDF dataset that is a one-dimensional array of samples of that parameter drawn from the prior distribution.
The third top-level HDF group, named ['likelihood_stats'], contains quantities to obtain the prior ϑ → p H ( ) and likelihood | → ϑ → p d t H ( ( ) , ) from Eq. 1 for each independent sample. In order to obtain the prior for each independent sample, the ['likelihood_stats'] HDF group contains a dataset of the natural logarithm of the prior probabilities called ['likelihood_stats/prior']. The datasets in the ['likeli-hood_stats'] HDF group are one-dimensional arrays indexed by the independent sample (eg. the i-th element corresponds to the prior probability of the i-th independent sample) as well. In order to obtain the likelihood for each independent sample, there is a dataset containing the natural logarithm of the likelihood ratio Λ called ['likelihood_stats/loglr']. The likelihood ratio Λ is defined as 11 where | → → p d t n log ( ( ) ) is the natural logarithm of the noise likelihood defined as 11 www.nature.com/scientificdata www.nature.com/scientificdata/ The natural logarithm of the noise likelihood is a constant for each analysis. Therefore from Eq. 4, in order to compute the natural logarithm of the likelihood, | → ϑ → p d t H log ( ( ) , ), the user adds lognl to each element of ['likelihood_stats/loglr'].
The fourth top-level HDF group is ['psds']. For each interferometer from which data has been used in the analysis, the ['psds'] HDF group contains a dataset storing a frequency series of the PSD multiplied by the square of the dynamic range factor. The dynamic range factor is a large constant to reduce the dynamic range of the strain; here, we use 2 69 rounded to 17 significant figures (precisely 5.9029581035870565 × 10 20 ). The first entry in each PSD frequency series corresponds to frequency f = 0 Hz, and the last entry corresponds to f = 1024 Hz. Attached as attributes to each interferometer's PSD frequency series dataset object are the frequency resolution-delta_f and the low frequency cutoff used for that interferometer in the PSD estimation and likelihood computation-low_frequency_cutoff.

technical Validation
The analyses in this paper were performed using the PyCBC Inference software 11 with the parallel-tempered emcee sampler 15,16 (https://github.com/dfm/emcee/tree/v2.2.1), hereafter referred to as emcee_pt, as the sampling algorithm. A validation study of PyCBC Inference with the emcee_pt sampler was presented in Sec. 4 of ref. 11 . The validation study in ref. 11 used the same version of the PyCBC code, waveform model, sampler settings, data conditioning settings, and burn-in test as used in our analyses in this paper, and therefore demonstrates the credibility of the results presented in this paper. In this section, we summarize the validation study.
We have tested the performance of this setup (ie. code version, waveform model, sampler settings, etc.) using analytic likelihood functions such as the multivariate normal, Rosenbrock, eggbox, and volcano functions. The emcee_pt sampler successfully sampled the underlying analytical distributions. The recovery of parameters of a four-dimensional normal distribution using the emcee_pt sampler is shown in Fig. 2 of ref. 11 .
Reference 11 also describes a test performed using simulated binary black hole signals to validate the reliability of parameter estimates generated by PyCBC Inference with the emcee_pt sampler. The test is carried out by generating 100 realizations of stationary Gaussian noise colored by the power spectral densities of the Advanced LIGO detectors around the time of observation of GW150914 40 . A unique simulated binary black hole signal, whose parameters were sampled from the prior probability density function, is injected into each simulated noise realization. For the population of 100 simulated binary black hole signals, the network signal-to-noise ratios range from 5 to 160, and are predominantly spaced between 10 to 40. PyCBC Inference, using the emcee_pt sampler, was then run on each simulated binary black hole signal to produce samples of the posterior probability density function and compute credible intervals that estimate the modeled parameter values. For each parameter, we then calculate the percentage of the runs (x%) in which the true value of the parameter was recovered within a certain credible interval (y%). In the ideal case, there should be a 1-to-1 relation between these percentiles, ie.
x should equal y for any value of the percentile y. The percentile-percentile curves obtained for each parameter in the test is plotted in Fig. 3 of ref. 11 . To evaluate the deviation between the percentile-percentile curve for each parameter from a 1-to-1 relation, a Kolmogorov-Smirnov (KS) test is performed. Using the set of p-values obtained for all the parameters, another KS test is performed expecting the p-values to adhere to a uniform distribution. The p-value obtained from this calculation is 0.7, which is sufficiently high to infer that PyCBC Inference, with it's implementation of the emcee_pt sampler, provides unbiased estimates of the binary black hole modeled parameters.
In addition to the aforementioned tests using analytical distributions and simulated signals, the 90% credible interval measurements of the binary black hole parameters from our analyses presented in this paper are in agreement with the LIGO-Virgo Collaboration estimates 2-4,6 which used a different inference code. This further validates the results presented here.

Usage Notes
When citing the data associated with this paper and released in the data release repository 34 , please cite this paper for describing the data and the analyses that generated them. Please also cite ref. 11 which describes and validates the PyCBC Inference parameter estimation toolkit that was used for generating the data. The samples of the posterior probability density function for each analysis presented in this paper are stored in separate HDF files, and the location of each HDF file is listed in Table 2. We direct users to the tools available in PyCBC Inference to read these files and visualize the data. Figures 1, 2 and 3 in this paper were generated using these tools from the PyCBC version 1.12.3 release. The data release repository also includes scripts to execute pycbc_inference and reproduce the analysis and resulting samples.
The data release repository for this work 34  , and visualize the samples of the posterior probability density function. The samples' credible intervals are visualized as marginalized one-dimensional histograms and two-dimensional credible contour regions. We include commands in this notebook to reproduce Figs 1, 2 and 3 in this paper. PyCBC Inference also includes an executable called pycbc_ inference_plot_posterior to render these visualizations. The IPython notebook o2_bbh_pe_skymaps.ipynb demonstrates a method of visualizing the sky location posterior distributions, as presented in Fig. 4 in this paper. We use tools from the open source ligo.skymap package (https://pypi.org/project/ligo. skymap/) for writing the sky location posterior samples from our analyses into FITS files, reading them, and generating probability density contours on a Mollweide projection.
The released data are freely available under the Creative Commons License: CC BY.