Event Data Model (EDM)

EDM design based on podio

The event data model defines the format of event data in memory and data files, which has a profound impact on the functionality and performance of the entire offline software. In the past few decades, many high-energy physics experiments including ATLAS, LHCb, and China’s BESIII have often utilized the inheritance and encapsulation characteristics of C++ to adopt a multi-level nested data model design. However, with the significant increase in the amount of modern high-energy physics experimental data, this design pattern has proven to often have poor I/O performance. In this context, the European Horizon project proposed a flattened data model design, namely podio (plain old data I/O). Its advanced design concept and excellent performance have rapidly attracted the attention of many high-energy physics experiments. Currently, multiple experiments under research and development internationally are based on podio to design their data models, including FCC, ILC, CEPC, and STCF.

In terms of design, podio adopts a flattened structure and adopts a three-layer design while ensuring its performance and flexibility, as shown in the following figure. The topmost user layer (User Layer) provides two basic types: container and ordinary object. The implementation is extremely lightweight and is only used to provide user interfaces. The middle object layer (Object Layer) is used to implement the inter-correlation function between data model objects. The bottommost data layer (POD Layer) is used to store specific variables, and the types and names of the stored variables are determined by the developer (e.g., particle type, momentum, energy, etc.).

Fig. Three-layer design of podio

Besides its flattened design, Podio also has the following features that provide powerful support for designing HERD data models:

  • Automated memory management: Podio internally designs automatic reference counting and garbage collection mechanisms, so developers generally do not need to worry about gabage recollection issues, which greatly avoids issues such as memory allocation and memory leaks.

  • Mixed programming with Python and C++: Data communication between C++ and Python code allows the podio data model defined in C++ code to be accessed from Python code and vice versa.

  • Version evolution mechanism: It provides a flexible backward compatibility mechanism that allows newer code to access older data, providing a guarantee for long-lived experiments.

  • Thread safety: Podio code has comprehensively considered thread safety during the design process, so parallel computing can be easily applied.

  • Flexible correlation relationships: Podio internally implements mutual correlation between data model objects through easily identifiable identifiers and serial numbers, ensuring that the correlation relationships are not lost during data I/O and facilitating the implementation of complex physical analysis work.

EDM definition based on yaml

Due to the large and complex code related to data models, and considering the long lifecycle of HERD experiments, modifications to data models will be frequent during the experimental development process. Therefore, maintaining and upgrading the data model code is difficult and error-prone. To alleviate this issue, the software framework system provides a method for defining data models based on yaml files. Application software developers can define the desired types of data models in readable yaml files according to their needs. During offline software compilation, a code generator will automatically generate corresponding C++ code based on the content of the yaml file and pre-defined rules.

Fig. Auto-generation of C++ code based on yaml definition

The above figure shows an example of using yaml files to define data models: for each type defined in the yaml file, the code generator will generate corresponding header and code files. When defining data models, developers generally only need to care about the data types and names that need to be stored, while other complex rules are automatically generated by the code generator. It is common to define all data models in HERDOS in one or multiple yaml files. If the overall design of the data models needs to be upgraded, only the rules defined in the code generator need to be upgraded without manually modifying the C++ code.

HERDOS EDM reference

Currently, based on this mechanism, the EDM classes for detector simulation, digitization, and reconstruction for all sub-detectors in HERD have been defined.

The yaml definition is listed below for reference.

components:
   # A component is a POD designed to be embedded in another POD,
   # Components cannot be stored in a collection
   edm::Vector3f: 
      Members:
         - float x
         - float y
         - float z
      ExtraCode:
         declaration: "
            Vector3f() = default;\n
            Vector3f(float xx, float yy, float zz): x(xx),y(yy),z(zz) {}\n
            /// enable to construct from array interface \n
            template<typename T> Vector3f(const T* v): x(v[0]),y(v[1]),z(v[2]) {}\n
            /// enable to construct from G4ThreeVector or TVector3 or other 3-vector classes \n
            template<class T> Vector3f(T const& other): x(other.x()), y(other.y()), z(other.z()) {} \n
            /// comparison op \n
            bool operator==(const Vector3f& v) const { return (x==v.x && y==v.y && z==v.z); }\n
            /// array-like interface \n
            float operator[](unsigned i) const { return *( &x + i ) ; }\n
            float& operator[](unsigned i) { return *( &x + i ) ; }\n
            /// raw array interface \n
            operator float const* () const { return &x; }\n
            /// TODO: need a way to construct other 3-vector types...\n
            /// e.g. G4ThreeVector(CLHEP::Hep3Vector) can only construct from x,y,z numbers\n
            /// TVector3 accepts double*, but XYZVector need object with x() y() z() ...\n
            "

# Datatypes are components that can be stored in a Collection
datatypes :
   edm::MCEvent:
      Description: "info dedicated for simulation"
      Author: "Z.Tang"
      Members:
         - int randomSeed // the random seed for this event
         - int status     // reserved for status bits

   edm::Event:
      Description: "event info"
      Author: "Z.Tang"
      Members :
         - int run       // run id
         - int event     // event id in the run
         - int localtime // reserved for DAQ computer local time in seconds from 2020-1-1
         - double utc    // reserved for UTC time from DAQ computer or offline

   edm::MCParticle:
      Description: "MC particle"
      Author: "Z.Tang"
      Members: 
         - int pdgID                    // PDG code of the particle
         - int trackID                  // index of the particle
         - int parentID                 // index of parent particle
         - edm::Vector3f momentum       // particle 3-momentum in [GeV]
         - edm::Vector3f vertex         // production vertex of the particle in [cm].
         - short charge                 // atomic charge
         - uint16_t mass                // atomic mass, 0 for e+ e-
         - float time                   // creation time of the particle in [ns] w.r.t. the event
         - uint32_t simstat             // (opt) status of the particle from the simulation

   edm::CaloSimCell:
      Description: "MC truth info in calo cell"
      Author: "Z.Tang"
      Members:
         # unsigned char not supported... not sure if we shall use char
         - short ix                // X index of the cell    
         - short iy                // Y index of the cell    
         - short iz                // Z index of the cell 
         - float edep              // energy of the hit in [GeV].
         - uint16_t fracEM         // int(frac*65535)
         - uint16_t fracBS         // both, int(em*255) | int(h*255)<<8
         - edm::Vector3f pos       // CoG of the hits in the cell in [cm]
         - edm::Vector3f localpos  // as above, in local coo [cm]
         - float tfirst            // first hit time in the cell [ns] (or t0?)
         - float tmean             // mean time of hits in the cell [ns] 
         - float tsigma            // sigma of times of hits in the cell [ns]
      MutableExtraCode:
         declaration: "
             void setCellIndex(short x, short y, short z){ setIx(x);setIy(y);setIz(z); }
            "

   edm::TrackingSimHit:
      Description: "Simulated tracking hit, record individual truth step in cells"
      Author: "Z.Tang"
      Members:
         - unsigned int  cellCode    // encoded cell numbering, e.g. sth like 10000*ilayer+100*iladder+isensor
         - float         time        // incident time of the hit [ns].
         - edm::Vector3f pos         // the hit position in [cm].
         - edm::Vector3f localpos    // the local step start position in [cm].
         - edm::Vector3f localend    // the local step ending position in [cm].
         - float         pathlen     // path length of the hit(track) in the sensitive material [cm]
         - float         edep        // energy deposited in the hit [GeV].
         - edm::Vector3f momentum    // the 3-momentum of the particle at the hit position in [GeV]
         - int           trackID     // the track ID of the particle which resulting the hit
         - int           pdgID       // PDG ID of the particle that created this hit
           #- edm::Vector3f dir         // try to replace momentum with dir and P=|momentum|
           #- float P                    // |momentum| or maybe betagamma better?
           #- float dir_theta //
           #- float dir_phi //

   edm::CaloDigiCell:
      Description: "calo cell digitization"
      Author: "Z.Tang Z.Quan"
      Members:
         - unsigned int cellCode        // encoded cell code
         - std::array<int, 3> grayHR    // total grayscale value in CMOS for high range, 3 radius
         - std::array<int, 3> grayLR    // total grayscale value in CMOS for low range, 3 radius
      OneToOneRelations:
         - edm::CaloSimCell siminfo    // if there is any...

   edm::CaloRecoCell:
      Description: "calo cell after calibration"
      Author: "Z.Tang Z.Quan"
      Members:
         - float edep       // cell edep [GeV] after calibration
         - short ix         // X index of the cell in calo
         - short iy         // Y index of the cell in calo
         - short iz         // Z index of the cell in calo
      OneToOneRelations:
         - edm::CaloDigiCell  digiinfo // link to digi object
           #maybe also need other links in the future...

   edm::CaloClusters: 
      Description: "Calo Clusters"
      Author: "C.Zhang et al"
      Members:
         - int id          // cluster ID
         - float edep      // cluster total energy deposition
      VectorMembers:
         - uint16_t ihits  // hit id in the cluster

   edm::CaloShowerAxis: 
      Description: "Calo Shower Axis"
      Author: "C.Zhang et al"
      Members:
         - int id                  // Shower ID
         - edm::Vector3f dir       // [dx,dy,dz]
         - edm::Vector3f entryCoor // [x1,y1,z1]
         - edm::Vector3f cog       // [x0,y0,z0]
         - edm::Vector3f exitCoor  // [x2,y2,z2]
      VectorMembers:
         - uint16_t iclusters      // cluster ids
        #- uint16_t iHits       // hit ids <optional>

   edm::SiliconDigiHit:
      Description: "digi hit (strip) for silicon detectors"
      Author: "Z.Tang"
      Members:
         - unsigned int cellCode        // the cell(sensor) code as in TrackingSimHit but only the part down to ladder make sensne
         - uint16_t stripID             // strip ID
         - uint32_t ADC                 // strip ADC value
         #- unsigned int TDC            // reserved for TDC
      OneToManyRelations:
         - edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?

   edm::SCDCluster:
      Description: "SCD Cluster"
      Author: "D.Guo et al"
      Members:
         - unsigned int ladderCode      // global id (sensor cellcode/100)
         - double       centroid        // centroid wrt ladder
         - double       clusterTotalADC //
         - float        clusterCharge   //
         - short        clusterSize     //
         - double       maxStripADC     //
         - uint16_t     maxStripID      //
         - int          clusterType     // simple cluster, or splited from large cluster(1to2,even 1to3...)
         - int          sharedType      // 0 no sharedstrip, 1, first strip shared, 2 last strip shared, 3 both first and last strip shared
         - edm::Vector3f posGlobal      // global position of the cluster 
         - char         vertAligned     // flag for cell projection. See DirectionInfo in GeometrySvc for details
      VectorMembers:
         - double stripADC              // ADCs of strips 
         - uint16_t stripID             // corresponding strip IDs

   edm::SCDTrack:
      Description: "SCD Track"
      Author: "郭东亚等"
      Members:
         - int           seedType        // truthseed,  blindseed,caloseed
         - int           impactSector    // 属于哪个单体, 0~4
         - int           impactPlane     // 初始击中了哪层, 0~3
         - edm::Vector3f m_impactPoint   // 初始击中点的位置(根据滤波平滑后径迹计算)
         - float         particleZ       // 0~gamma, >1 charged particle
         - float         chi2            // 平滑后的chi2
         - int           ndf             // 拟合自由度
         - int           nHitX           // 参与拟合的X方向Hit数
         - int           nHitY           // 参与拟合的Y方向Hit数
         - int           nHitXY          // 参与拟合的同时有XY的Hit 【?】
      VectorMembers:
         - int   clusterXIndex          // 参与径迹重建的clusterX的索引
         - int   clusterYIndex          // 参与径迹重建的clusterY的索引
         - float covxx                  // 滤波平滑后的协方差矩阵xx
         - float covyy                  // 滤波平滑后的协方差矩阵yy
         - float covxy                  // 滤波平滑后的协方差矩阵xy
         - float x                      // 滤波平滑后的x坐标
         - float y                      // 滤波平滑后的y坐标
         - float xMeas                  // clusterX的x坐标测量值(Alignment后)
         - float yMeas                  // clusterY的y坐标测量值(Alignment后)
         - float zxMeas                 // clusterX的z坐标测量值(Alignment后)
         - float zyMeas                 // clusterY的z坐标测量值(Alignment后)
         - float covMeasxx              // 滤波平滑后残差的协方差矩阵xx
         - float covMeasyy              // 滤波平滑后残差的协方差矩阵yy
         - short plane                  // plane ID
         - float trackSlopex            // 滤波平滑后X方向斜率
         - float trackSlopey            // 滤波平滑后Y方向斜率
         - float covSlopexx             // 滤波平滑后斜率的协方差矩阵xx
         - float covSlopeyy             // 滤波平滑后斜率的协方差矩阵yy
         - float covSlopexy             // 滤波平滑后斜率的协方差矩阵xy
         - short vecClusters            // cluster index, maybe link is better?

   edm::PSDDigiCell:
      Description: "digi hit for psd bar v0"
      Author: "Z.Tang, Q.Wu"
      Members:
         - unsigned int cellCode        // the same bar/tile code as in TrackingSimHit
         - int ADC0                     // SiPM ADC value, one for each side
         - int ADC1                     // 
         #//- short TDC[2]          // reserved for TDC
         #OneToManyRelations:
         #- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?

   edm::PSDRecoCell:
      Description: "reco hit for psd bar v0"
      Author: "Z.Tang, Q.Wu"
      Members:
         - unsigned int cellCode        //
         - float edep                   // reconstructed edep
         - float edepBirksCorr          // with Birks law correction (experimental)
         - float nPE0                   // convert ADC to nPE at SiPM w/ corrections
         - float nPE1                   // convert ADC to nPE at SiPM w/ corrections
         - float hitpos                 // reco hit local position from bar alone

   edm::FITDigiCell:
      Description: "digi hit for FIT v0"
      Author: "Z.Tang, J Wang"
      Members:
         - unsigned int cellCode        // the same mat code as in TrackingSimHit
         - uint16_t channelID           // channel id inside a MAT (3 SiPM at the moment)
         - int ADC                      // SiPM channel ADC value
         #//- short TDC          // reserved for TDC
      OneToManyRelations:
         - edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?

   edm::FITRecoCell:
      Description: "reco hit for FIT v0"
      Author: "Z.Tang, J.Wang"
      Members:
         - unsigned int cellCode        //
         - uint16_t channelID           // channel id inside a MAT (3 SiPM)
         - float edep                   // reconstructed edep
           #- float edepBirksCorr          // with Birks law correction (experimental)

   edm::FITCluster:
      Description: "FIT Cluster"
      Author: "J.Wang et al"
      Members:
         - unsigned int cellCode        // cell code of the mat
         - bool         ifOddLayer      // flag of whether on an odd numbered layer
         - char         vertAligned     // flag of cell projection, see DirectionInfo in GeometrySvc
         - uint16_t     peakChannel     // channel ID of the peak
         - float        peakChannelAmp  // after calibration, in unit of p.e. [float]
         - uint16_t     peakLeft        // how many channels are on the left side of the peak channel?
         - uint16_t     peakRight       // how many channels are on the right side of the peak channel?


         - float        clusterAmp      // total channel amp
         - uint16_t     clusterSize     // number of channels with amp > neighbor threshold
         - float        clusterCharge   // reconstructed cluster |z| if m_trackType==calo seed (default -1)
         - edm::Vector3f posLocal       //  wrt layer  [?]
         - edm::Vector3f position       // cluster position in global coo, use mat coo for the unkown direction
         - int          type            // simple cluster, or splited from large cluster(1to2,even 1to3...)
      VectorMembers:
         - uint16_t     channels        // channel id within a mat
         - float        channelAmps     // corrected?

   edm::FITTrack:
      Description: "FIT Track"
      Author: "J.Wang et al"
      Members:
         - int          seedType        // blind search, calo seed, scd seed
         - edm::Vector3f fittedDir      //
         - edm::Vector3f fittedPos      // impact point closest to CALO
         - int          impactSector    // 0~4
         - int          impactLayer     // initial hit from outer layers
         - double       particleZ       //0~gamma, 1 charged particle
         - double       chi2x           //
         - double       chi2y           //
         - double       ndfx            // int?
         - double       ndfy            //
         - int          nHitX           // number of y hit in X layer
         - int          nHitY           // number of x hit in Y layer
         - int          nHitXY          // number of x&y pair in physical geo
         - double trackLength           // ?
      VectorMembers:
         - short clusterIDs             // maybe podio relation
         - short planes                 // layer# 
         #- float fittedCov //

   edm::TRDDigiCell:
      Description: "digi hit for TRD v0"
      Author: "Z.Tang, C.Dai"
      Members:
         - unsigned int cellCode       // the same mat code as in TrackingSimHit
         - std::array<short,128> ADC     // ADC values for all channels in a module
           #//- std::array<short,128> TDC          // reserved for TDC
         #OneToManyRelations:
         #- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?

   edm::TRDRecoCell:
      Description: "reco hit for TRD v0"
      Author: "Z.Tang, C.Dai"
      Members:
         - unsigned int cellCode          // cell code (module)
         - float edep                     // reconstructed edep of selected channels 
         - std::array<float,128> rawEdep  // raw edep from ADC
         - std::array<float,128> corrEdep // considering also lateral position correction

   edm::FastTrigger:
      Description: "fast trigger digi & calculation in regions, for subdets (CALO, PSD, ...)"
      Author: "Z.Tang M.Xu"
      Members:
         - char det     // calo pmt, calo pd, psd, ...
         - char type    // det specific, e.g. calo= LEE,LEG,...
         - char region  // triggered region (core, shell1,shell2, ...)
         - float time    // trigger time simulation, to be used in high level trigger

   edm::Trigger:
      Description: "high level trigger results based on fast triggers"
      Author: "Z.Tang M.Xu"
      Members:
         - unsigned int pattern               // trigger pattern
      OneToManyRelations:
         - edm::FastTrigger fastTrg // link to fast triggers

   edm::GlobalTrack:
      Description: "Reconstructed track from SCD cluster and FIT clusters"
      Author : "ZYQU"
      Members: 
         - edm::Vector3f position       // position
         - edm::Vector3f direction      // direction
         - int nHit                     // number of hit used in this track
         - float chi2                   // fited Chi2 with default measurement error
         - float charge                 // Global Charge from all Clusters
         - float chargeSCD              // Charge estimated from SCD
         - float chargeFIT              // Charge estimated from FIT
         - int   type                   // type of track, 0 for half track, 1 for whole track 
      VectorMembers:
         - edm::Vector3f coos           // coordinate used in the fit
      OneToManyRelations:
         - edm::SCDCluster  scdcls      // SCD cluster collection
         - edm::FITCluster  fitcls      // FIT cluster collection