"The modern development approach of SIDESTREAM made me curious. I was convinced by the confident communication with the many stakeholders in the research center and the flexible handling of varying requirements during the course of the project. SIDESTREAM's software meets our scientific standards and makes our everyday research work much easier."
Background
Digitalization is also making its way into research. Our customer, ZEA-3, is a facility of Forschungszentrum Jülich (Research Center Jülich) with a focus on analysis services in the field of compositional analysis.
Through the use of various methods, procedures and technologies, ZEA-3 analyses physical samples from all over the world. The results of the laboratory processes serve as the basis for scientific studies and thus form an important part of research. The data is also essential for external customers to work on new innovations and products.
The challenge: Complex processes and lots of data
These scientific laboratory processes are laborious and complex. In order for the samples to be analyzed, they must first be prepared for the procedures used. In addition, the manufacturers of individual laboratory devices have specific formatting of data sets. This brings with it a high number of results and formats.
For uniform structuring, the data are therefore transferred to Excel tables. This presentation of the data is all the more important because different research areas need access to the results.
Therefore, an error-free assignment of the specific results is crucial. This is associated with a clear structuring, as often several hundred samples are examined in parallel. These individual procedures and results build on each other and form a coherent laboratory process.
The requirement: step by step to a fully autonomous laboratory
The complex processing and preparation of the measurement results is time-consuming and error-prone. Therefore, ZEA-3 turned to SIDESTREAM to develop an automated solution. The goal was to develop comprehensive software that would make it possible to carry out the entire examination process fully autonomously in the long term. It is particularly important that no errors occur. Because the measurement results meet scientific standards and the institute's customers trust the accuracy of the data. The efficiency and user-friendliness of the software must therefore not come at the expense of data governance.
Complex processes, simple solution
Our approach was to analyze the overall process and identify important fundamental process steps. Close communication with the scientists and high test coverage of the end product were of great importance.
The focus of our work was on data quality and the FAIR principles. The acronym FAIR stands for Findable, Accessible, Interoperable and Reusable.
We had to:
- output the individual formats of the analysis technologies and procedures error-free,
- calculate measured values correctly,
- and adhere to scientific standards.
We have managed to prepare the various formats and results clearly in a web application. The basis for this was the various result files of the individual processes. We used these to reverse engineer and derive input, output and the process function. Based on our findings, we programmed a first version of an analysis procedure. An important aspect of this project was the test runs with the scientists. In addition to automated data processing, the application presents the results and formats in a clear and structured manner so that the institute's employees can use the data efficiently. In the future, the entire analysis process will be fully automated so that scientists can concentrate on interpreting the data.
Technology Deep Dive
In addition to the large number of steps and data sets, repetition loops also make the entire analysis process very complex. The process steps are not strictly linear. Certain measurement results from the laboratory instruments may lead to repetitions of the respective analysis. These verification loops must be covered in the software to guarantee the accuracy of the results. To do this, we modeled the entire process as a Finite State Machine (FSM). This allows absolute control over the process. Because FSMs cannot be avoided. The process will only continue once the correct conditions have been fully met. This approach is essential for success, particularly in data-centric processes.
This FSM-based solution is a perfect example of a human-in-the-loop process. The scientists only carry out a few expert steps and enter the final results back into the system. The system consists of several Docker microservices. These were deployed locally in the research center. The data is managed in a PostgreSQL database. The majority of the code forms a Python backend, while the user application is a modern VueJS web application. The entire code is checked by a high test coverage. At least 90 percent must be achieved at every point in the code. Otherwise, the automatic test and merge process does not allow the introduction of new features (“merge in master”). This eliminates potential errors at an early stage and enables an application that meets scientific standards.