An open-source, end-to-end workflow for multidimensional photoemission spectroscopy

Xian, R. Patrick; Acremann, Yves; Agustsson, Steinn Y.; Dendzik, Maciej; Bühlmann, Kevin; Curcio, Davide; Kutnyakhov, Dmytro; Pressacco, Federico; Heber, Michael; Dong, Shuo; Pincelli, Tommaso; Demsar, Jure; Wurth, Wilfried; Hofmann, Philip; Wolf, Martin; Scheidgen, Markus; Rettig, Laurenz; Ernstorfer, Ralph

doi:10.1038/s41597-020-00769-8

Download PDF

Article
Open access
Published: 17 December 2020

An open-source, end-to-end workflow for multidimensional photoemission spectroscopy

Scientific Data volume 7, Article number: 442 (2020) Cite this article

4095 Accesses
18 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Characterization of the electronic band structure of solid state materials is routinely performed using photoemission spectroscopy. Recent advancements in short-wavelength light sources and electron detectors give rise to multidimensional photoemission spectroscopy, allowing parallel measurements of the electron spectral function simultaneously in energy, two momentum components and additional physical parameters with single-event detection capability. Efficient processing of the photoelectron event streams at a rate of up to tens of megabytes per second will enable rapid band mapping for materials characterization. We describe an open-source workflow that allows user interaction with billion-count single-electron events in photoemission band mapping experiments, compatible with beamlines at 3^rd and 4^rd generation light sources and table-top laser-based setups. The workflow offers an end-to-end recipe from distributed operations on single-event data to structured formats for downstream scientific tasks and storage to materials science database integration. Both the workflow and processed data can be archived for reuse, providing the infrastructure for documenting the provenance and lineage of photoemission data for future high-throughput experiments.

High-accuracy bulk electronic bandmapping with eliminated diffraction effects using hard X-ray photoelectron momentum microscopy

Article Open access 10 September 2019

On strong-scaling and open-source tools for analyzing atom probe tomography data

Article Open access 29 January 2021

pyGWBSE: a high throughput workflow package for GW-BSE calculations

Article Open access 13 February 2023

Introduction

Many disciplines in the natural sciences are increasingly dealing with densely sampled multidimensional datasets. The scientific workflows to obtain and process them are becoming increasingly complex due to the provenance and structure of the data and the information needed to be extracted and analyzed^1,2. In materials science and condensed matter physics, various spectroscopic and structural characterization techniques produce experimental data of distinct formats and characteristics. Their creation and understanding require customized processing and analysis pipelines designed by specialists in the respective fields. The growing incentive for building experimental materials science databases³ that complement established theoretical counterparts⁴ calls for open-source and reusable workflows for data processing^5,6 that transform raw data to shareable formats for downstream query, analysis and comparison by non-specialists of the experimental techniques^7,8. Among the various properties associated with materials, the electronic band structure (EBS) of condensed matter systems is of vital importance to the understanding of their electronic properties in and out of equilibrium. Multidimensional photoemission spectroscopy (MPES)^9,10,11 is an emerging technique that bears the potential of high-throughput EBS characterization through band mapping experiments and holds promise as an enabling technology for building experimental EBS databases, where data integration requires traceable knowledge of the processing steps between the archived and the raw format. Here we present an open-source workflow that focuses on band mapping data from MPES. In the following, we briefly introduce the technology of MPES and the associated data processing, before providing details on the workflow from raw data to database integration.

MPES, also called momentum microscopy (MM), is born out of the recent integration of time-of-flight (TOF) electron spectrometers with delay-line detectors (DLDs) and improved electron-optic lens designs^12,13,14,15. Compared with the earlier generations of angle-resolved photoemission spectroscopy (ARPES)^16,17 using hemispherical analyzers to measure the 2D energy-momentum distribution of the photoemitted electrons¹⁸, MPES is capable of recording single-electron events simultaneously sorted into the (k_x, k_y, E) coordinates (E: electron energy, k_x, k_y: parallel momentum components) in band mapping experiments, obviating the need for scanning across sample orientations and subsequent data merging as is the case for similar experiments using a single hemispherical analyzer. Operation of the TOF DLD in MPES requires a pulsed photon source and is directly compatible with 3^rd and 4^th generation light sources¹⁹ as well as laboratory-based table-top setups^20,21,22,23, harnessing their high repetition rates in the range of multi-kilohertz to megahertz to drastically improve the detection speed and efficiency. Mapping of the 3D band structure with sufficient signal-to-noise ratio (SNR) may be achieved on the timescale of minutes. The technological convergence opens up the possibilities to record 3D datasets in dependence of one or more additional parameters, such as spatial location I(x, y, k_x, k_y, E), probe photon energy, I(k_x, k_y, E, k_z)¹⁰, spin-polarization, I(k_x, k_y, E, S)⁹, or pump-probe time in time-resolved MM, I(k_x, k_y, E, t)²⁴ within a reasonable time frame.

From the data perspective, the pulsed sources with high repetition rates generate densely sampled data at rates of multiple megabytes per second (MB/s), which has brought about challenges in data processing and management compared with conventional ARPES experiments. The raw data in MPES are single photoelectron events registered by the DLD and the physical quantities related to the detected events are streamed in parallel to the storage files in a hierarchical file format (e.g. HDF5²⁵). A typical dataset involves 10⁷–10¹⁰ detected events with a total size of up to a few hundred gigabytes (GBs), depending on the number of coordinates measured (3D or 4D) and the required SNR. Unlike the large 2D or 3D image-based datasets, such as those obtained in various forms of optical^26,27 and electron microscopy techniques^28,29, processing and conversion of tabulated single-event data requires additional steps of statistical computing for conversion into standard images. This motivates the current workflow development for efficient data processing and analysis. In data processing and calibration, experiments performed at different facilities share similar procedures going from the raw events to the multidimensional hypervolume with calibrated axes, which is the basis for archiving and downstream analysis. To maintain reproducibility for the particular data source characteristics, data structure and processing procedure, we have summarized the workflow (see Fig. 1) into two open-source software packages (hextof-processor³⁰ and mpes³¹), with similar design principles for coping with large-scale facility and table-top experiments, respectively. The core of our approach includes distributed statistical processing at the single-event level using parameters calibrated and determined from preprocessed volumetric datasets, which enables effective instrument diagnostics, artifact correction, and sample condition monitoring. The algorithms involved balance physical knowledge and existing methods in image processing and computer vision. The workflow is illustrated next with data obtained at some of the electron momentum microscopes currently in operation, such as the HEXTOF (high energy X-ray time-of-flight) measurement system²⁴ at the free-electron laser source FLASH³² at DESY, and the table-top high harmonic generation-based setup at the Fritz Haber Institute (FHI)²¹ involving a commercial TOF and DLD (METIS 1000, SPECS GmbH). We use the material example of tungsten diselenide (WSe₂) measured at both experimental setups to demonstrate the workflow execution, because in momentum space, the patent features of WSe₂ band structure and the nonequilibrium dynamics initiated by optical excitation of WSe₂ have been thoroughly studied in the past (see Methods)^{24,33,34,35,36}. We expect the workflow described here to serve as a blueprint for upcoming software platforms in similar setups to be installed in other facilities or laboratories worldwide.

Results

Workflow description

The workflow schematic shown in Fig. 1 starts with raw single-event data from measurements. The data are (i) binned in a distributed fashion in the measurement coordinates, including each of the photoelectrons’ position on the detector (X, Y), its TOF, a digital encoder (ENC) axis, and others, if more than four dimensions are acquired in parallel. The binned histogram is (ii) used to estimate the numerical transforms for distortion correction and axis calibration. Next, these transforms are (iii) applied to the raw single-event data to convert the measurement coordinates to the physical axes, (k_x, k_y, E, t_pp) and others for higher dimensions (see also Fig. 2). Finally, the single-event data are (iv) binned in the transformed grid to yield 3D, 3D + t or other higher-dimensional data with the correct axis values. The outcome may be exported in different formats for storage, visualization and downstream analysis.

Tasks and software infrastructure

Processing billion-count single-event data requires user interaction for data checking and distributed processing to reduce the time consumption. The general tasks in the workflow include the transformation of data streams to multidimensional histograms, artifact correction and axis calibration. These operations can be efficiently decomposed into column-wise operations of the distributed dataframe format offered by the dask package³⁷ in Python. While the use of dask dataframes provides the common foundation for interactivity with single events in hextof-processor and mpes, they distinguish themselves by the experimental requirements.

At large-scale facilities, experiments often record a large number of machine parameters that need to be stored, though only a small number of relevant parameters are needed for downstream processing. Therefore, the hextof-processor package includes a parameter sampling step to retrieve intermediate tabulated data in the Apache Parquet format (https://parquet.apache.org/), a column-based data structure optimized for computational efficiency. This approach reduces the processing overhead in searching through the raw data files every time when data are queried during the subsequent processing. As an open-source project, other beamtime-specific functionalities are added by users to the existing framework at every new experimental run. The mpes package adapts to the much simpler file structure produced at table-top experimental setups and makes direct use of the HDF5 raw data. It comes with added functionalities motivated by the existing issues encountered in data acquisition and downstream processing such as axis calibration, masking, alignment and different forms of artifact correction. The softwares come with detailed documentation and examples online for users to gain familiarity (see Code availability).

Artifact correction

Artifacts in MPES data come from mechanical imperfections, stray fields (electric and magnetic), uncertainties in the alignment of the sample, light beams and the multistage electron-optic lens systems as well as the data digitization process. Minimizing and correcting instrumental imperfections plays an important role in the validity of downstream analysis. We carry out artifact correction sequentially at the level of single photoelectron events or the data hypervolume obtained from multidimensional histogramming (see Fig. 2). The outcomes are illustrated using the correction of (1) digitization artifact (see Fig. 3) and (2) spherical timing aberration artifact (see Fig. 4), with technical details in Methods.

Axis calibration

To transform the measurement axes of the DLD into physically relevant axes for electronic band mapping, calibrations are required, as shown in Fig. 2. The calibration functions are constructed with parameters derived from comparing physical knowledge of the materials (e.g. Brillouin zone size, Fermi level position) with the corresponding scales in data. They are applied either to the binned data hypervolume, or to the single-electron events raw data individually in a distributed fashion before binning. Details on the calibration data transforms are provided in Methods.

Data storage and format

The simplistic form of the output data hypervolume derived from single-electron events includes non-negative scalar values of the photoemission intensity and the calibrated real-valued axes coordinates, including k_x, k_y, E, and other parameter dependencies such as the pump-probe time delay t_pp. These values are exported as HDF5, MAT or TIFF, with the metadata included as attributes of the files.

Workflow archiving and reuse

Computational workflows are valued by their reproducibility³⁸. Archiving and sharing the workflow parameters among users of the beamlines or facilities allow comparison between experimental runs and reuse for the simultaneous benefits of machine diagnostics and experimental efficiency. To achieve this, we store critical parameters generated within the workflow in a separate file as workflow parameters (see Fig. 1) during each step, including the numerical values used in binning, the intermediate parameters and coefficients of the correction and calibration functions, etc. They can be reused when loading into the processing of other datasets.

Data visualization

The adaptation of established scientific visualization methods in the physical sciences^39,40 to band mapping data should incorporate the requirements and knowledge of the data characteristics in this field of research. The band mapping data in 3D (multi-megavoxel) and 3D + t (multi-gigavoxel) include the inherent symmetries from the electronic band structure of the material, but the intensity modulations in the photoemission process⁴¹, dynamics and sample condition disrupt the original symmetry. The overall goal is to emphasize the features of interest while exploiting the symmetry to simplify the visualization (see Methods). The output files from the processing pipeline are compatible with open-source visualization software such as matplotlib⁴², ParaView³⁹ and Blender⁴³.

Downstream analysis integration

Typical photoemission data analysis involves extracting electronic band structure parameters, physical coupling constants and lifetimes via fitting of lineshapes¹⁶ or dynamical models⁴⁴, often carried out specific to the material under study. At the end of our distributed workflow, the data size is on the order of a few to tens of gigabytes, which can be directly loaded into memory on users’ local machines for downstream data analysis with custom routines.

Experimental metadata

The metadata of the data files have a tree structure and contain information of the experimental setting, parameters of the pulsed light source, the detector and the sample under study. A list of top-level metadata parameters is presented in Table 1. A full and current list of all metadata parameters, including the top-level parameters and their constituent lower-level parameters, along with their definitions, units and values, is provided in Supplementary Tables 1–4. For database integration, an accompanying data parser (parser-mpes, see Code availability) for MPES data has been written in accordance with existing standards⁴⁵ for computational materials science in NOMAD⁸, featuring an electronic version of the metadata parameter list in the file mpes.nomadmetainfo.json online. The metadata parameter list and the data parser are versioned and are updated based on the corresponding changes in the data structure for photoemission spectroscopy experiments. The existing WSe₂ photoemission data have been integrated into the experimental section of the materials science database NOMAD (see Data availability).

Table 1 Top-level metadata parameters.

Full size table

Discussion

We have designed and implemented an open-source, end-to-end workflow for processing single-event data produced in multidimensional photoemission spectroscopy, linking to downstream tasks, providing guidelines and software for integrating processed data into the NOMAD experimental materials science database. The distributed processing takes full advantage of the single-event data streams directly accessible from the TOF delay-line detector for event-wise correction and calibration and converts the raw events to the calibrated data hypervolume for project-specific downstream analysis. The functionalities within the workflow are publicly accessible through the software packages we have developed (hextof-processor³⁰ and mpes³¹). The processing workflow is archived at each step of operation and the processed data may be integrated into experimental database with user-specified metadata. The methods described here are applicable to all existing types of multidimensional photoemission band mapping measurements beyond the static and time-dependent settings described here.

Our end-to-end workflow from raw data to processed data to database integration provides a fast-track and all-in-one solution to the demands for open experimental data and reproducible research in the materials science community^7,8. The public repositories for the software packages are the foundations for phased future extension and integration with existing analytical tools in the photoemission spectroscopy community. The modular structure of the packages introduced here allows targeted upgrades by both temporary and dedicated maintainers and users. Casting the workflow in the Python programming environment provides the foundation for convenient incorporation of existing image processing and machine-learning resources⁴⁶ for further exploration and understanding of the band mapping datasets, which contain rich information owing to the complex nature of the photoemission process^16,18. This is especially beneficial for broader adoption of photoemission since the interpretation of photoemission data is often linked to the observed or extracted outstanding features such as local intensity extrema, dispersion kinks and satellites, lineshape parameters and pattern symmetry¹⁶, therefore, the access to experimental data and the potential integration with existing electronic structure-related software^5,47,48,49 will facilitate method developments and the direct comparison between experimental results and theoretical band structure calculations within the same programming platform.

Methods

Sample preparation

Single-crystalline samples of 2D bulk WSe₂ (2H stacking) were purchased from HQ Graphene. Crystals of size around 5 mm × 5 mm × 1 mm were used directly for the measurements. To prepare a clean surface by cleaving, we attached a cleaving pin upright to the sample surface using conducting epoxy (EPOTEC H20) outside the vacuum chamber and removed the pin by mechanical force in ultrahigh vacuum.

Photoemission experiments

The measurements were conducted using the HEXTOF instrument²⁴ at the DESY FLASH PG-2 beamline⁵⁰ with the free-electron laser (FEL) as well as a laboratory source²¹ with a METIS electron momentum microscope (SPECS METIS 1000) installed at the FHI. In the measurements at FLASH, the FEL was tuned to 36.5 eV (or 34.0 nm) and 109 eV (or 11.4 nm), the optical pump pulse had a center wavelength of 775 nm. The measurements at the FHI used a 21.7 eV home-built extreme UV source based on high harmonic generation in Ar gas driven by an optical parametric chirped-pulse amplifier operating at 500 kHz repetition rate⁵¹. The optical pump pulse is centered at 800 nm. In both FEL and laboratory experiments, the near-infrared light pulses promote the electronic population at the K and K′ high-symmetry points (corresponding to $\bar{{\rm{K}}}$ and $\bar{{\rm{K}}{\prime} }$ points, respectively, in the projected Brillouin zone obtained from photoemission, as shown in Fig. 5) in momentum space to the excited states via direct optical transitions. The nonequilibrium electronic dynamics are probed via valence and conduction band photoemission³⁵ as well as core-level photoemission³⁶, using s-polarized extreme UV and soft X-ray probe pulses, respectively.

Digitization artifact

The time-to-digital converter (TDC) outputs digitized data according to the binning width of the on-board electronics. Data conversion from one digitized format to another in a rebinning process often creates a picket fence-like effect (see Fig. 3). This phenomenon originates from the incommensurate bin size in the two rounds of sampling processes (binning and rebinning). To solve the problem, one introduces a slight amount of uniformly distributed noise, with an amplitude equal to half of the original bin size, to the single-event values when carrying out the bin counts. This is similar to the histogram jittering (or dithering) technique^52,53 used in statistical visualization and computer graphics. Mathematically, the uniformly distributed noise U(0,1) bounded in the range [0,1] is added before binning to a univariate data stream, S = {S_i} via,

$$S{{\prime} }_{i}={S}_{i}+\frac{{w}_{b}}{2}\times U(0,1).$$

(1)

here, w_b is the bin width. For binning of multivariate data streams, such as the detector X position (or k_x), Y position (or k_y), and the photoelectron TOF (or E), we adopt the same approach individually for each dimension. The effect of jittering in reducing the digitization artifact is demonstrated in Fig. 3.

Spherical timing aberration

Electrons entering the TOF tube at different lateral positions travel through different path lengths to reach the detector, which is the origin of the spherical timing aberration as illustrated in Fig. 4. The lateral position-dependent time delay may be expressed as,

$$\Delta {{\rm{TOF}}}_{{\rm{sph}}}(r)=(\sqrt{1+{r}^{2}/{d}^{2}}-1){{\rm{TOF}}}_{0},$$

(2)

where r is the radial distance from the center of the DLD and TOF₀ is the TOF normalization constant. For a typical field-free region length of d∼1 m in the TOF tube and a DLD screen radius of r = 50 mm, $\Delta {\rm{TOF}}/{{\rm{TOF}}}_{0}\approx 1.25\,\times \,1{0}^{-3}$. Assuming TOF₀ = 0.5 μs, the spherical timing aberration in TOF scale is $\Delta {{\rm{TOF}}}_{{\rm{sph}}}\approx 0.62$ ns, which is larger than the DLD’s temporal resolution of ∼ 0.15 ns. The effect of the spherical timing aberration is visible for a few eV energy range with fine bins but quite small on a large energy range. To illustrate this effect, we use the W 4f core-level data presented in Fig. 4b. For every (X, Y) position on the detector the peak of W 4f_7/2 was fitted with a Voigt profile and the peak positions are shown in Fig. 4c. As the spectra from deep core levels typically do not show dispersion, the deviation from fitting corresponds to the spherical timing aberration of the electron optics. In order to compensate for the spherical timing aberration, we first transform the data from Cartesian to the polar coordinates (see Fig. 4c), and then fit the radial-averaged peak position to a polynomial function of the radius,

$$\Delta {{\rm{TOF}}}_{{\rm{sph}}}(r)=\frac{{r}^{2}{{\rm{TOF}}}_{0}}{2{d}^{2}}-\frac{{r}^{4}{{\rm{TOF}}}_{{\rm{0}}}}{8{d}^{4}}+{\rm{O}}({r}^{6}).$$

(3)

The fitting results together with the corrected radial distribution are presented in Fig. 4d.

Symmetry distortion

Photoemission patterns in the (k_x, k_y) plane (i.e. an energy slice) may exhibit distorted symmetry due to the influence of various factors from the instrument, the sample and the experimental geometry on the trajectory of low-energy photoelectrons. Correction of the symmetry distortion yet preserving the intensity features requires the use of symmetry-related landmarks to solve for the symmetrization coordinate transform in the framework of nonrigid image registration⁵⁴. In typical situations with an excellent electron lens alignment, the energy dependence of the momentum distortion within the focused phase space volume covering an energy range of several eV is negligible, so the same coordinate transform can be applied to all energy slices in the volumetric data (including both valence and conduction bands) or simultaneously to all single events.

Other single-experiment artifacts

(1) Momentum center shift: The momentum center of the emergent photoelectrons travelling through the electron-optic system may experience an energy-dependent shift owing to the slight misalignment in the system or the influence of stray fields. Correction of the center shift requires an energy-dependent center alignment of energy slices. The shift along the energy (or TOF) axis may be estimated using phase correlation⁵⁵ or mutual information-based⁵⁶ sequential image registration methods, in which the series of energy slices are treated as an image sequence. In a well-shielded and well-aligned electron-optic lens system, generally, the momentum center shift is negligible in the focused photoelectron energy range. (2) Space-charge effect (SCE): The secondary photoelectron clouds originating from the probe and pump pulses cause a “doming effect” of the photoemission intensity distribution around the momentum center of the band structure. This is especially visible in systems with a clear Fermi edge^9,11 or non-dispersing shallow core levels, which may be used as references for calibrating the parameters used for the flattening transform by applying a momentum-dependent shift $\Delta {{\rm{T}}{\rm{O}}{\rm{F}}}_{{\rm{s}}{\rm{c}}}({k}_{x},{k}_{y})$ in the TOF (or the calibrated energy) coordinate of the single-event data.

Momentum calibration

The scaling factors for momentum calibration are computed by comparing the positions of known high symmetry points in the band structure with their corresponding locations in an energy slice. Suppose A and B are two high symmetry points identifiable (e.g. as local extrema) from the experimental data with pixel positions (X_A, Y_A) and (X_B, Y_B), and momentum positions, (${k}_{x}^{A}$, ${k}_{y}^{A}$) and (${k}_{x}^{B}$, ${k}_{y}^{B}$), respectively. We calculate the pixel-to-momentum scaling ratios, f_X and f_Y, along the X (column) and Y (row) directions of a 2D k-space image, respectively. Then, the momentum coordinate (k_x, k_y) at each pixel position (X, Y) may be derived.

$${f}_{D}=\left({k}_{d}^{A}-{k}_{d}^{B}\right)/\left({D}_{A}-{D}_{B}\right)$$

(4)

$${k}_{d}={f}_{D}\times (D-{D}_{A})\quad (D,d=X,x\,{\rm{or}}Y,y)$$

(5)

Energy calibration

The calibration requires a set of band mapping data measured at different bias voltages (applied between the material sample and the ground), usually sampled with a spacing of 0.5 V in a range of ± 3–5 V around the normally applied bias voltage for a particular sample. The calibration proceeds by finding the TOF feature (e.g. local extrema) correspondences in the 1D energy distribution curves (EDCs) at different biases using the dynamic time warping algorithm⁵⁷. The transformation from the TOF to the photoelectron energy E is approximated as a polynomial function,

$$E({\rm{TOF}})=\mathop{\sum }\limits_{i=0}^{n}{a}_{i}{{\rm{TOF}}}^{i}$$

(6)

The approximation is sufficiently accurate within a range of ∼ 20 eV, sufficient to cover the entire valence band and some low-lying conduction bands of typical materials. The polynomial coefficients are determined using nonlinear least squares by solving $\Delta T\cdot {\boldsymbol{a}}=\Delta E$, in which ${\boldsymbol{a}}={({a}_{1},{a}_{2},...)}^{T}$ is the coefficient vector while the constant offset a₀ is determined by manual alignment to an energy reference, such as the Fermi level or valence band maximum. The vector ΔE and the matrix ΔT contain, respectively, the pairwise differences of the bias voltages and the polynomial terms of differential TOF values. To calibrate a large energy range including multiple core levels, a piecewise polynomial may be used¹¹.

Pump-probe delay calibration

The time origin (“time zero”) in time-resolved photoemission spectroscopy, i.e. the temporal overlap of pump and probe pulses, is determined by fitting of a characteristic trace extracted from the data. Since the readings of the digital encoder (see Fig. 2) are sampled linearly, equally-spaced pump-probe delays are directly convertible from the readings using linear interpolation, given the boundary values of the translation stage positions and the corresponding delay times. For unequally-spaced delays, a delay marker is first added to each data point as a separate column after data acquisition to group together the encoder reading ranges that correspond to the specific time delays. The data binning is carried out over the delay marker column instead of the equally-sampled encoder readings.

Visualization strategies

We discuss here three methods for the display of volumetric band mapping data, which are, at the same time, the basis for visualizing 3D + t data with time as an animated axis. (1) The orthoslice representation includes orthogonal 2D planes selected in specific regions in the 3D volume³⁹, which highlights specific slices deep within the data less visible in a volumetrically rendered view (see Fig. 5a). Along this line, we have developed a software package, 4Dview⁵⁸, to explore 4D data using simultaneously linked orthoslices, which also features contrast adjustment and data integration within a hypervolume of interest. (2) The band-path plot (see Fig. 5b) is a 2D representation of the 3D band mapping volume generated by combining a series of 2D cuts along selected momentum paths (or k-paths) traversing a list of so-called high-symmetry points^59,60. This representation captures the largest dispersion within the band structure. For volumetric data, the same path may be sampled from all the full energy range to produce the plot shown in Fig. 5b. The analysis and visualization modules in the mpes package include functionalities to compose customized band-path plots. (3) The cut-out view (see Fig. 5c) exposes a specific part of interest in the volumetric data, while not losing the rest³⁹. The analysis module in the mpes package provides ways to generate precise cut-outs using position landmarks (e.g. high-symmetry points labelled in Fig. 5) and inequalities.

Data availability

The single-event photoemission data used for demonstrating the workflow is available on the Zenodo platform at https://doi.org/10.5281/zenodo.2704787 (valence and conduction band photoemission at FEL)⁶¹, https://doi.org/10.5281/zenodo.3945432 (core-level photoemission at FEL)⁶² and https://doi.org/10.5281/zenodo.3987303 (valence band photoemission from laboratory setup)⁶³. The preprocessed data are being integrated into the NOMAD database in the domain for experimental materials science data accessible at https://nomad-lab.eu/prod/rae/gui/search?domain=ems.

Code availability

The code, including documentation and examples in Jupyter notebooks for implementing the data transformations in the workflow, is available as hextof-processor (https://github.com/momentoscope/hextof-processor)³⁰ and mpes (https://github.com/mpes-kit/mpes)³¹. The parser for integrating preprocessed experimental data into the NOMAD database is available as parser-mpes (https://gitlab.mpcdf.mpg.de/rpx/parser-mpes)⁶⁴.

References

Pruneau, C. Data Analysis Techniques for Physical Scientists (Cambridge University Press, 2017).
Deelman, E. et al. The future of scientific workflows. The International Journal of High Performance Computing Applications 32, 159–175 (2018).
Article Google Scholar
Zakutayev, A. et al. An open experimental database for exploring inorganic materials. Scientific Data 5, 180053 (2018).
Article PubMed PubMed Central Google Scholar
Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data-Driven Materials Science: Status, Challenges, and Perspectives. Advanced Science 1900808 (2019).
Pizzi, G., Togo, A. & Kozinsky, B. Provenance, workflows, and crystallographic tools in materials science: AiiDA, spglib, and seekpath. MRS Bulletin 43, 696–702 (2018).
Article Google Scholar
Perkel, J. M. Workflow systems turn raw data into scientific knowledge. Nature 573, 149–150 (2019).
Article ADS CAS PubMed Google Scholar
Hill, J. et al. Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bulletin 41, 399–409 (2016).
Article CAS Google Scholar
Draxl, C. & Scheffler, M. NOMAD: The FAIR concept for big data-driven materials science. MRS Bulletin 43, 676–682 (2018).
Schönhense, G., Medjanik, K. & Elmers, H.-J. Space-, time- and spin-resolved photoemission. Journal of Electron Spectroscopy and Related Phenomena 200, 94–118 (2015).
Article CAS Google Scholar
Medjanik, K. et al. Direct 3D mapping of the Fermi surface and Fermi velocity. Nature Materials 16, 615–621 (2017).
Article ADS CAS PubMed Google Scholar
Schönhense, B. et al. Multidimensional photoemission spectroscopy—the space-charge limit. New Journal of Physics 20, 033004 (2018).
Article ADS CAS Google Scholar
Krömker, B. et al. Development of a momentum microscope for time resolved band structure imaging. Review of Scientific Instruments 79, 053702 (2008).
Article ADS CAS Google Scholar
Ovsyannikov, R. et al. Principles and operation of a new type of electron spectrometer –ArTOF. Journal of Electron Spectroscopy and Related Phenomena 191, 92–103 (2013).
Article CAS Google Scholar
Damm, A. et al. Application of a time-of-flight spectrometer with delay-line detector for time- and angle-resolved two-photon photoemission. Journal of Electron Spectroscopy and Related Phenomena 202, 74–80 (2015).
Article CAS Google Scholar
Tusche, C., Krasyuk, A. & Kirschner, J. Spin resolved bandstructure imaging with a high resolution momentum microscope. Ultramicroscopy 159, 520–529 (2015).
Article CAS PubMed Google Scholar
Damascelli, A., Hussain, Z. & Shen, Z.-X. Angle-resolved photoemission studies of the cuprate superconductors. Reviews of Modern Physics 75, 473–541 (2003).
Article ADS CAS Google Scholar
Yang, H. et al. Visualizing electronic structures of quantum materials by angle-resolved photoemission spectroscopy. Nature Reviews Materials 3, 341–353 (2018).
Article ADS CAS Google Scholar
Suga, S. & Sekiyama, A. Photoelectron Spectroscopy: Bulk and Surface Electronic Structures (Springer, 2014).
Couprie, M. New generation of light sources: Present and future. Journal of Electron Spectroscopy and Related Phenomena 196, 3–13 (2014).
Article CAS Google Scholar
Chiang, C.-T. et al. Boosting laboratory photoelectron spectroscopy by megahertz highorder harmonics. New Journal of Physics 17, 013035 (2015).
Article ADS CAS Google Scholar
Puppin, M. et al. Time- and angle-resolved photoemission spectroscopy of solids in the extreme ultraviolet at 500 kHz repetition rate. Review of Scientific Instruments 90, 023104 (2019).
Article ADS CAS Google Scholar
Corder, C. et al. Ultrafast extreme ultraviolet photoemission without space charge. Structural Dynamics 5, 054301 (2018).
Article PubMed PubMed Central CAS Google Scholar
Buss, J. H. et al. A setup for extreme-ultraviolet ultrafast angle-resolved photoelectron spectroscopy at 50-kHz repetition rate. Review of Scientific Instruments 90, 023105 (2019).
Article ADS CAS Google Scholar
Kutnyakhov, D. et al. Time- and momentum-resolved photoemission studies using timeof-flight momentum microscopy at a free-electron laser. Review of Scientific Instruments 91, 013109 (2020).
Article ADS CAS Google Scholar
Folk, M., Heber, G., Koziol, Q., Pourmal, E. & Robinson, D. An overview of the HDF5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases - AD ’11, 36–47 (ACM Press, New York, New York, USA, 2011).
Weiler, N. C., Collman, F., Vogelstein, J. T., Burns, R. & Smith, S. J. Synaptic molecular imaging in spared and deprived columns of mouse barrel cortex with array tomography. Scientific Data 1, 140046 (2014).
Article PubMed PubMed Central Google Scholar
Ker, D. F. E. et al. Phase contrast time-lapse microscopy datasets with automated and manual cell tracking annotations. Scientific Data 5, 180237 (2018).
Article PubMed PubMed Central Google Scholar
Levin, B. D. et al. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy. Scientific Data 3, 160041 (2016).
Article PubMed PubMed Central Google Scholar
Aversa, R., Modarres, M. H., Cozzini, S., Ciancio, R. & Chiusole, A. The first annotated set of scanning electron microscopy images for nanoscience. Scientific Data 5, 180172 (2018).
Article CAS PubMed PubMed Central Google Scholar
Acremann, Y. et al. hextof-processor. https://github.com/momentoscope/hextof-processor (2020).
Xian, R. P. & Rettig, L. mpes. https://github.com/mpes-kit/mpes (2020).
Ackermann, W. et al. Operation of a free-electron laser from the extreme ultraviolet to the water window. Nature Photonics 1, 336–342 (2007).
Article ADS CAS Google Scholar
Riley, J. M. et al. Direct observation of spin-polarized bulk bands in an inversionsymmetric semiconductor. Nature Physics 10, 835–839 (2014).
Article ADS CAS Google Scholar
Shallenberger, J. R. 2D tungsten diselenide analyzed by XPS. Surface Science Spectra 25, 014001 (2018).
Article ADS CAS Google Scholar
Bertoni, R. et al. Generation and Evolution of Spin-, Valley-, and Layer-Polarized Excited Carriers in Inversion-Symmetric WSe2. Physical Review Letters 117, 277201 (2016).
Article CAS PubMed Google Scholar
Dendzik, M. et al. Observation of an Excitonic Mott Transition Through Ultrafast Corecum -Conduction Photoemission Spectroscopy. Physical Review Letters 125, 096401 (2020).
Article ADS CAS PubMed Google Scholar
Dask Development Team. Dask: Library for dynamic task scheduling https://dask.org (2016).
Stodden, V. et al. Enhancing reproducibility for computational methods. Science 354, 1240–1241 (2016).
Article ADS CAS PubMed Google Scholar
Hansen, C. D. & Johnson, C. R. (eds.) The Visualization Handbook (Elsevier Butterworth-Heinemann, 2005).
Lipşa, D. R. et al. Visualization for the Physical Sciences. Computer Graphics Forum 31, 2317–2347 (2012).
Article Google Scholar
Moser, S. An experimentalist’s guide to the matrix element in angle resolved photoemission. Journal of Electron Spectroscopy and Related Phenomena 214, 29–52 (2017).
Article CAS Google Scholar
Hunter, J. D. Matplotlib: A 2d graphics environment. Computing in Science & Engineering 9, 90–95 (2007).
Article ADS Google Scholar
Community, B. O. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam http://www.blender.org (2018).
Weinelt, M. Time-resolved two-photon photoemission from metal surfaces. Journal of Physics: Condensed Matter 14, R1099–R1141 (2002).
ADS CAS Google Scholar
Ghiringhelli, L. M. et al. Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats. npj Computational Materials 3, 46 (2017).
Article ADS CAS Google Scholar
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Article ADS CAS PubMed Google Scholar
Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science 68, 314–319 (2013).
Article CAS Google Scholar
Hjorth Larsen, A. et al. The atomic simulation environment—a Python library for working with atoms. Journal of Physics: Condensed Matter 29, 273002 (2017).
PubMed Google Scholar
Ganose, M. & Jackson, A.,J. A. & O. Scanlon, D. sumo: Command-line tools for plotting and analysis of periodic ab initio calculations. Journal of Open Source Software 3, 717 (2018).
Article ADS Google Scholar
Gerasimova, N., Dziarzhytski, S. & Feldhaus, J. The monochromator beamline at FLASH: performance, capabilities and upgrade plans. Journal of Modern Optics 58, 1480–1485 (2011).
Article ADS Google Scholar
Puppin, M. et al. 500 kHz OPCPA delivering tunable sub-20 fs pulses with 15 W average power based on an all-ytterbium laser. Optics Express 23, 1491 (2015).
Article ADS CAS PubMed Google Scholar
Chambers, M., Cleveland, S., Tukey, A. & Kleiner, B. Graphical Methods for Data Analysis (Wadsworth International Group, 1983).
Novo, D. & Wood, J. Flow cytometry histograms: Transformations, resolution, and display. Cytometry Part A 73A, 685–692 (2008).
Article Google Scholar
Xian, R. P., Rettig, L. & Ernstorfer, R. Symmetry-guided nonrigid registration: The case for distortion correction in multidimensional photoemission spectroscopy. Ultramicroscopy 202, 133–139 (2019).
Article CAS PubMed Google Scholar
Guizar-Sicairos, M., Thurman, S. T. & Fienup, J. R. Efficient subpixel image registration algorithms. Optics Letters 33, 156 (2008).
Article ADS PubMed Google Scholar
Viola, P. & Wells, W. M. Alignment by Maximisation of Mutual Information. International Journal of Computer Vision 24, 137–154 (1997).
Article Google Scholar
Salvador, S. & Chan, P. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 561–580 (2007).
Article Google Scholar
Dendzik, M. mdendzik/4Dview 1.0. Zenodo https://doi.org/10.5281/zenodo.3360817 (2019).
Setyawan, W. & Curtarolo, S. High-throughput electronic band structure calculations: Challenges and tools. Computational Materials Science 49, 299–312 (2010).
Article Google Scholar
Hinuma, Y., Pizzi, G., Kumagai, Y., Oba, F. & Tanaka, I. Band structure diagram paths based on crystallography. Computational Materials Science 128, 140–184 (2017).
Article CAS Google Scholar
Xian, R. P. et al. Multidimensional photoemission spectra of tungsten diselenide. Zenodo https://doi.org/10.5281/zenodo.2704787 (2020).
Dendzik, M. et al. Time-resolved core-level photoemission data of tungsten diselenide. Zenodo https://doi.org/10.5281/zenodo.3945432 (2020).
Xian, R. P. et al. Datasets for the computational workflow of multidimensional photoemission spectroscopy. Zenodo https://doi.org/10.5281/zenodo.3987303 (2020).
Xian, R. P. & Scheidgen, M. parser-mpes. https://gitlab.mpcdf.mpg.de/rpx/parser-mpes (2019).

Download references

Acknowledgements

We thank G. Schönhense for support on the photoelectron detector, S. Grunewald, S. Schülke and G. Schnapka for support on the computing infrastructures. We thank G. Brenner, H. Redlin and S. Dziarzhytski at FLASH, DESY, and H. Meyer and S. Gieschen from the University of Hamburg for beamline and instrumentation support. The work was partially supported by BiGmax, the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant No. ERC-2015-CoG-682843), and the German Research Foundation (DFG) through the Emmy Noether program under grant number RE 3977/1 and the SFB/TRR 227 “Ultrafast Spin Dynamics” (projects A09 and B07). D. Kutnyakhov, M. Heber and W. Wurth acknowledge funding by the DFG within the framework of the Collaborative Research Centre SFB 925 - 170620586 (project B2). F. Pressacco acknowledges funding from the excellence cluster EXC 1074 “The Hamburg Centre for Ultrafast Imaging - Structure, Dynamics and Control of Matter at the Atomic Scale” of the DFG. S. Y. Agustsson and J. Demsar acknowledge the financial support by the DFG in the framework of the Collaborative Research Centre SFB TRR 173 - 268565370 (project A5). D. Curcio and P. Hofmann acknowledge funding from VILLUM FONDEN via the Centre of Excellence for Dirac Materials (Grant No. 11744). T. Pincelli thanks the Alexander von Humboldt Foundation for financial support. Open Access funding enabled and organized by Projekt DEAL.

Author information

Deceased: Wilfried Wurth.

Authors and Affiliations

Fritz Haber Institute of the Max Planck Society, 14195, Berlin, Germany
R. Patrick Xian, Maciej Dendzik, Shuo Dong, Tommaso Pincelli, Martin Wolf, Markus Scheidgen, Laurenz Rettig & Ralph Ernstorfer
Laboratory for Solid State Physics, ETH Zurich, 8093, Zurich, Switzerland
Yves Acremann & Kevin Bühlmann
Institute of Physics, University of Mainz, 55128, Mainz, Germany
Steinn Y. Agustsson & Jure Demsar
Department of Physics and Astronomy, Interdisciplinary Nanoscience Center (iNANO), Aarhus University, 8000, Aarhus C, Denmark
Davide Curcio & Philip Hofmann
DESY Photon Science, 22607, Hamburg, Germany
Dmytro Kutnyakhov, Federico Pressacco, Michael Heber & Wilfried Wurth
Department of Physics, University of Hamburg, 22761, Hamburg, Germany
Federico Pressacco & Wilfried Wurth
Department of Physics, Humboldt University of Berlin, 12489, Berlin, Germany
Markus Scheidgen

Authors

R. Patrick Xian
View author publications
You can also search for this author in PubMed Google Scholar
Yves Acremann
View author publications
You can also search for this author in PubMed Google Scholar
Steinn Y. Agustsson
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Dendzik
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Bühlmann
View author publications
You can also search for this author in PubMed Google Scholar
Davide Curcio
View author publications
You can also search for this author in PubMed Google Scholar
Dmytro Kutnyakhov
View author publications
You can also search for this author in PubMed Google Scholar
Federico Pressacco
View author publications
You can also search for this author in PubMed Google Scholar
Michael Heber
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Dong
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Pincelli
View author publications
You can also search for this author in PubMed Google Scholar
Jure Demsar
View author publications
You can also search for this author in PubMed Google Scholar
Wilfried Wurth
View author publications
You can also search for this author in PubMed Google Scholar
Philip Hofmann
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Markus Scheidgen
View author publications
You can also search for this author in PubMed Google Scholar
Laurenz Rettig
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ernstorfer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.A., K.B., S.Y.A., D.C., R.P.X. and M.D. wrote the hextof-processor package. R.P.X. and L.R. wrote the mpes package. D.K., Y.A., F.P., R.P.X., S.Y.A., D.C., M.D., M.H., S.D., P.H., L.R., R.E. and W.W. participated in the experiments at the FLASH PG-2 beamline using the HEXTOF instrument in Hamburg. S.D. and L.R. conducted the experiment at the Fritz Haber Institute using the METIS electron momentum microscope. R.P.X., M.D., L.R., R.E., M.S., T.P. constructed the metadata format, R.P.X. and M.S. implemented them into parser-mpes. R.P.X. wrote the initial manuscript with contributions from M.D. and Y.A. All authors contributed to discussions to bring the manuscript to its final form.

Corresponding authors

Correspondence to R. Patrick Xian, Laurenz Rettig or Ralph Ernstorfer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xian, R.P., Acremann, Y., Agustsson, S.Y. et al. An open-source, end-to-end workflow for multidimensional photoemission spectroscopy. Sci Data 7, 442 (2020). https://doi.org/10.1038/s41597-020-00769-8

Download citation

Received: 24 August 2020
Accepted: 13 November 2020
Published: 17 December 2020
DOI: https://doi.org/10.1038/s41597-020-00769-8

This article is cited by

Out-of-equilibrium charge redistribution in a copper-oxide based superconductor by time-resolved X-ray photoelectron spectroscopy
- Denny Puntel
- Dmytro Kutnyakhov
- Federico Pressacco
Scientific Reports (2024)
Observation of ultrafast interfacial Meitner-Auger energy transfer in a Van der Waals heterostructure
- Shuo Dong
- Samuel Beaulieu
- Ralph Ernstorfer
Nature Communications (2023)
A machine learning route between band mapping and band structure
- R. Patrick Xian
- Vincent Stimper
- Ralph Ernstorfer
Nature Computational Science (2022)
Angle-resolved photoemission spectroscopy
- Hongyun Zhang
- Tommaso Pincelli
- Shuyun Zhou
Nature Reviews Methods Primers (2022)
Unveiling the orbital texture of 1T-TiTe2 using intrinsic linear dichroism in multidimensional photoemission spectroscopy
- Samuel Beaulieu
- Michael Schüler
- Ralph Ernstorfer
npj Quantum Materials (2021)