Master Thesis: A comprehensive solution for big data handling in material research

Jülich Research Centre (FZJ)

Jülich, Germany

Work group:

IEK-9 – Grundlagen der Elektrochemie

Area of research:

Diploma & Master Thesis

Starting date:


Contract time limit:


Job description:

BackgroundRecent, rapid progress in electronics unveiled a new horizon in experimental techniques, especially in imaging. For example, the beamline ID15 at ESRF is now able to record 2016×2016 x-ray diffraction images with stunning frequency of 1 kHz 1. With continuous progress in networking devices and detectors/scintillators, the breakthrough towards GHz region is only matter of time 2. In 3D tomographic imaging, dozens of acquisitions can be performed for single in situ / in operando experiment, resulting in hundreds of thousands features being tracked over time in order to understand the particular process 3. However, this means that researchers have to deal with growing amount of information, where single experiment is now represented sometimes by terabytes of data. Moreover, to understand the physical phenomena of interest, the multimodal analysis must be performed. There, the experimental data is collected by means of very different techniques, often utilizing exotic data formats. This leads to big-data problem, requiring modern methods for data processing and meta-analysis, e.g. machine learning. Existing solutions for handling the imaging data are usually specific to few modalities related to single research field. For example, XNAT4 is open-source archive system for neuroimaging, supporting both MRI and XCT scanners.

Unfortunately in case of material science, the variety of modalities and data types (from text files, through images to proprietary formats) makes it more challenging. The aim of this work is to provide open-source, comprehensive, expandable, i.e. plugin based solution (software) for multimodal data provided by experimental techniques used in material science. This includes XCT, MRI, SEM, PDF, XRD, XRS, SAXS, SANS, DCT, XANES, XAS and more. The software will archive the experimental data and store it at different levels of processing, starting from RAW images, on dedicated RAID system. Search engine, together with API supporting well established resource exchange protocol – REST 5, will allow researchers easy access to data base. Browsing for relevant information will be possible from any platform (Linux, Windows, Mac) via any interface i.e. – direct REST queries, simple web interface, Matlab/Python scripts, etc. Finally, the meta-analysis will link the multiple modalities to enhance the statistical power of the results.

References1 W. U. Mirihanage, M. Di Michiel, A. Reiten, L. Arnberg, H. B. Dong, and R. H. Mathiesen, “Time-resolved X-ray diffraction studies of solidification microstructure evolution in welding,” Acta Mater., vol. 68, pp. 159–168, Apr. 2014.2 C. Hu et al., “Ultrafast inorganic scintillator-based front imager for Gigahertz Hard X-ray imaging,” Nucl. Instrum. Methods Phys. Res. Sect. Accel. Spectrometers Detect. Assoc. Equip., vol. 940, pp. 223–229, Oct. 2019.3 K. Dzieciol et al., “Void growth in copper during high-temperature power-law creep,” Acta Mater., vol. 59, no. 2, pp. 671–677, 2011.4 D. S. Marcus, T. R. Olsen, M. Ramaratnam, and R. L. Buckner, “The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data,” Neuroinformatics, vol. 5, no. 1, pp. 11–34, 2007.5 R. T. Fielding, “Architectural Styles and the Design of Network-based Software Architectures,” PhD Thesis, University of California, Irvine, 2000.

Your tasks

  • Getting familiar with experimental techniques – literature studies, assisting researchers in the labs.
  • Research the existing solutions in experimental data handling.
  • Research regarding the machine learning applications to experimental data coming from multiple modalities.
  • Software design – decision regarding programming language/framework to use, designing the structure and user interface based on the experience gained in the labs.
  • Create basic data parsers for proprietary data formats.
  • Designing the relational data base (SQL).
  • Research possibilities of direct connection to the machines in order to pull the raw data in automatic manner.
  • Implementation of REST API.
  • Writing plugins for modalities used at IEK-9 (e.g. SEM, XCT)
  • Creating the documentation with the emphasis on plugin development.

Your profile

  • Very good grades in computer science and programming.
  • Some practical experience in programming (e.g. project on GitHub in any programming language).
  • Knowledge of database concepts.
  • Basic familiarity with REST API, database query language and experimental techniques used in material science are welcome but not required.
  • Fluency in English language is very important due to international environment at IEK-9.

Our offer

  • Access to state-of-the-art infrastructure
  • Friendly, supporting environment

This research center is part of the Helmholtz Association of German Research Centers. With more than 42,000 employees and an annual budget of over € 5 billion, the Helmholtz Association is Germany’s largest scientific organisation.

Please apply via recruiter’s website.

Quote Reference: 2019M-101