Institute for Data Processing and Electronics (IPE)
Area of research:
Progress in virtually all areas of physics research relies on recording and analyzing enormous amounts of data. This is equally true for the high energy physics at LHC, planned future lepton and neutrino detectors, as well as for experiments at high-intensity light sources such as the EU-XFEL or PETRA III. Recent improvements in detector instrumentation provide unprecedented detail to researchers. At the same time data rates far outpace the improvements in the performance of storage systems. Online data reduction is crucial for the next generation of detectors.
However, the real-time extraction of knowledge from the vast streams of data is challenging and often extremely computational-intensive. Sophisticated data processing workflows and novel hardware-aware parallel algorithms are essential to leverage heterogeneous computational resources and facilitate data-driven control of high-throughput detectors. Even then, no single computer will be able to deliver the required performance. Intelligent flow management relaying on high-speed interconnects and RDMA (Remote DMA) mechanisms is key to achieve good scalability.
We aim to establish a closer integration of the data acquisition workflows with cloud-enabled HPC centers. The goal is to build infrastructure to push data from the detector directly into the local HPC data center and rely on cloud resources for data processing and reduction. The rapid advances in the Ethernet technology allow sufficient readout bandwidth, but efficient data distribution methods relaying on RDMA technologies are required to utilize network capacity efficiently. One of the major challenges is to develop mechanisms to prevent data loss due to unavoidable hardware failures. A good compromise between the system reliability and resource overhead may be achieved by enabling cooperation between a detector firmware and a cloud middleware. Additional operational information from the detectors will help the middleware to manage resources and steer data flow efficiently. Furthermore, a distributed data processing framework is required to simplify development of scalable data reduction modules optimized for heterogeneous architectures. Particularly, the framework should facilitate deployment of extremely complex Machine Learning models which can be executed across multiple nodes and accelerated using FPGAs, GPUs, or/and custom neuro-computers. Relaying on the existing Cloud-Native Workflow Engines, we need to contribute additional enhancements for data-intensive work loads and devise efficient data storage strategies. Facilitating zero-copy and RDMA mechanisms in the container communication is one of the top priorities here. Finally, integration with existing control and monitoring systems is expected.
As a pilot project, you will apply introduced concepts to build control system for beam-diagnostic instrumentation. The high precision instruments like KAPTURE and KALYPSO characterize short pulse electron beams with unprecedented resolution. Using this information for control tasks requires real-time data reduction and classification of data streams with rates in the 10 – 20 GB/s range.
You have a university degree (diploma (Uni) / Master) in the field of computer Science, mathematics or physics. You also have a good background in high performance computing, cloud computing, distributed storage, and networking. Familiarity with C and Python programming languages.
limited to 3 years
Application up to
Contact person in line-management