Area of research:
Diploma & Master Thesis
The position is suitable for part-time employment.
Complex and distributed detector and control systems are required for modern scientific experiments. The instrumentation integrates custom and commercial components from various sources and generates ever-increasing amounts of data. A variety of different formats, underlying storage engines, and data workflows are used. Often proper manual data interpretation and quality assurance is difficult or even impossible due to the tremendously increase of both number and size of datasets. This raises the need for novel automatic or semi-automatic data analysis methods and tools. Information on
operation and scientific meaning needs to be extracted from the data stream and provided to the users in visual and easy to interpret form.
The work is embedded in a project that aims to develop a novel platform for handling data management tasks of mid-range scientific experiments. We plan to build tools to integrate the data recorded by different subsystems and made it available to users in uniform, comprehensible, and easy-to-use fashion. The thesis is focused on the data storage subsystem. Student is expected to review novel database technologies and provide detailed evaluation of several possible solutions which are optimized to store high volumes of time series data. The selected engine should be integrated with the existing data management system operating at Aragats Space Environmental Center.
The ideal database will:
Reliably store the high-bandwidth streams of the data;Scale well in the cluster environment;Include intelligent caching mechanisms to speed-up the queries;Extract standard statistical information and provide programming interface to compute custom properties;Support Geo-distributed operation modes;Integrate with data analysis tools like Apache Spark, etc.