Event Data Management Services¶
In HERDOS, the data management service refers to services responsible for memory management and data input/output. Data management services are commonly used by most data processing software and are an important component of HERDOS. During the data processing, the data management service is responsible for managing the layout of data in memory and the lifecycle of the data model, providing interfaces for reading and storing data, and implementing functions such as input/output of data files.
In HERDOS multiple SNiPER services are developed to implement the data management functionalities, including:
PodioDataSvc: Responsible for managing the EventStore, implementing the service for managing data model objects and collections in memory.
PodioLocalDataSvc: Similar with PodioDataSvc, but specifically designed for parallel computing, responsible for accessing the GlobalStore when the program runs in parallel and providing interfaces for worker threads to access and store data.
PodioInputSvc: Responsible for loading data from input ROOT files and converting them into podio data model formats.
PodioOutputSvc: Responsible for writing data model objects in memory to ROOT files.
In addition to these four services, DataHandle is developed as an interface for all user algorithms to obtain data. The relationships between them can be represented by the following figure. During the data processing software process, PodioInputSvc and PodioOutputSvc are mounted with SNiPER¡¯s instance loop control: before an instance starts, PodioInputSvc automatically reads an instance from the input file; after the instance ends, PodioOutputSvc writes the data in memory to an output file and completes the cleanup process in memory. Algorithm developers only need to access the data in memory through DataHandle without worrying about implementation details such as memory management and data input/output.

GlobalStore¶
During data processing, multiple SNiPER work threads typically process data from the same batch of files and write the processed data to the same output file, meaning that multiple work threads need to share the same data input and output stream. GlobalStore is developed to implement the concurrent data management functionalities.

GlobalStore maintains a list internally to cache multiple physical instances. During operation, the input thread connected to GlobalStore (GlobalInputAlg) continuously attempts to read instances from files and places them in idle data slots; multiple cached physical instances are distributed to each work thread on demand, and after the work thread processes the instance, the data is returned to GlobalStore; the output thread connected to GlobalStore (GlobalOutputAlg) continuously attempts to write processed data to files. In this way, with the cooperation of an input thread, an output thread, multiple work threads, and GlobalStore, each work thread can complete data processing tasks independently without interference. During operation, GlobalStore assigns a status flag to each data slot, including idle, ready, occupied, and finished. The meaning of each status is as follows:
Idle: There is no data in the current data slot, waiting for the input thread to input data.
Ready: Data input is completed and waiting for algorithm processing.
Occupied: There is already a work thread processing the data.
Finished: Data processing is completed and waiting for the output thread to output data.
This design has two significant advantages: there is no coupling between data input and output and the work threads, so input and output are executed serially, avoiding the trouble of data synchronization; each work thread still follows SNiPER¡¯s serial mode workflow, so parallel and serial operating modes only exist in configuration differences, and the code implementation is almost identical, greatly reducing the difficulty for developers to develop parallel applications.
As shown in the following figure, the input and output threads and work threads access GlobalStore through four interfaces: addElement, clearElement, setIn, and setDone. As they will change the internal state of GlobalStore, in order to avoid deadlock or competition, three conditional thread locks are set inside GlobalStore to protect internal data. Each interface function requires certain conditions to be met before execution, otherwise it will be suspended; after each interface function is completed, it will notify the corresponding thread lock to try to wake up other threads that are suspended, including:
addElement: Execute before requiring there to be idle data slots in GlobalStore and attempt to wake up the work threads after execution.
clearElement: Execute before requiring there to be data in the end state in GlobalStore and attempt to wake up the input thread after execution.
setIn: Execute before requiring there to be data in the ready state in GlobalStore.
setDone: Execute after attempting to wake up the output thread.
Based on GlobalStore, we have integrated serial and parallel data management services together. The integrated design is shown in the following figure. By providing a unified interface class for PodioDataSvc and PodioLocalDataSvc, user algorithm code accesses data uniformly through DataHandle, completely transparentizing the implementation of data management, and enabling user code to operate consistently in both serial and parallel modes.
