Event Data Model (EDM)¶
EDM design based on podio¶
The event data model defines the format of event data in memory and data files, which has a profound impact on the functionality and performance of the entire offline software. In the past few decades, many high-energy physics experiments including ATLAS, LHCb, and China’s BESIII have often utilized the inheritance and encapsulation characteristics of C++ to adopt a multi-level nested data model design. However, with the significant increase in the amount of modern high-energy physics experimental data, this design pattern has proven to often have poor I/O performance. In this context, the European Horizon project proposed a flattened data model design, namely podio (plain old data I/O). Its advanced design concept and excellent performance have rapidly attracted the attention of many high-energy physics experiments. Currently, multiple experiments under research and development internationally are based on podio to design their data models, including FCC, ILC, CEPC, and STCF.
In terms of design, podio adopts a flattened structure and adopts a three-layer design while ensuring its performance and flexibility, as shown in the following figure. The topmost user layer (User Layer) provides two basic types: container and ordinary object. The implementation is extremely lightweight and is only used to provide user interfaces. The middle object layer (Object Layer) is used to implement the inter-correlation function between data model objects. The bottommost data layer (POD Layer) is used to store specific variables, and the types and names of the stored variables are determined by the developer (e.g., particle type, momentum, energy, etc.).

Besides its flattened design, Podio also has the following features that provide powerful support for designing HERD data models:
Automated memory management: Podio internally designs automatic reference counting and garbage collection mechanisms, so developers generally do not need to worry about gabage recollection issues, which greatly avoids issues such as memory allocation and memory leaks.
Mixed programming with Python and C++: Data communication between C++ and Python code allows the podio data model defined in C++ code to be accessed from Python code and vice versa.
Version evolution mechanism: It provides a flexible backward compatibility mechanism that allows newer code to access older data, providing a guarantee for long-lived experiments.
Thread safety: Podio code has comprehensively considered thread safety during the design process, so parallel computing can be easily applied.
Flexible correlation relationships: Podio internally implements mutual correlation between data model objects through easily identifiable identifiers and serial numbers, ensuring that the correlation relationships are not lost during data I/O and facilitating the implementation of complex physical analysis work.
EDM definition based on yaml¶
Due to the large and complex code related to data models, and considering the long lifecycle of HERD experiments, modifications to data models will be frequent during the experimental development process. Therefore, maintaining and upgrading the data model code is difficult and error-prone. To alleviate this issue, the software framework system provides a method for defining data models based on yaml files. Application software developers can define the desired types of data models in readable yaml files according to their needs. During offline software compilation, a code generator will automatically generate corresponding C++ code based on the content of the yaml file and pre-defined rules.

The above figure shows an example of using yaml files to define data models: for each type defined in the yaml file, the code generator will generate corresponding header and code files. When defining data models, developers generally only need to care about the data types and names that need to be stored, while other complex rules are automatically generated by the code generator. It is common to define all data models in HERDOS in one or multiple yaml files. If the overall design of the data models needs to be upgraded, only the rules defined in the code generator need to be upgraded without manually modifying the C++ code.
HERDOS EDM reference¶
Currently, based on this mechanism, the EDM classes for detector simulation, digitization, and reconstruction for all sub-detectors in HERD have been defined.
The yaml definition is listed below for reference.
components:
# A component is a POD designed to be embedded in another POD,
# Components cannot be stored in a collection
edm::Vector3f:
Members:
- float x
- float y
- float z
ExtraCode:
declaration: "
Vector3f() = default;\n
Vector3f(float xx, float yy, float zz): x(xx),y(yy),z(zz) {}\n
/// enable to construct from array interface \n
template<typename T> Vector3f(const T* v): x(v[0]),y(v[1]),z(v[2]) {}\n
/// enable to construct from G4ThreeVector or TVector3 or other 3-vector classes \n
template<class T> Vector3f(T const& other): x(other.x()), y(other.y()), z(other.z()) {} \n
/// comparison op \n
bool operator==(const Vector3f& v) const { return (x==v.x && y==v.y && z==v.z); }\n
/// array-like interface \n
float operator[](unsigned i) const { return *( &x + i ) ; }\n
float& operator[](unsigned i) { return *( &x + i ) ; }\n
/// raw array interface \n
operator float const* () const { return &x; }\n
/// TODO: need a way to construct other 3-vector types...\n
/// e.g. G4ThreeVector(CLHEP::Hep3Vector) can only construct from x,y,z numbers\n
/// TVector3 accepts double*, but XYZVector need object with x() y() z() ...\n
"
# Datatypes are components that can be stored in a Collection
datatypes :
edm::MCEvent:
Description: "info dedicated for simulation"
Author: "Z.Tang"
Members:
- int randomSeed // the random seed for this event
- int status // reserved for status bits
edm::Event:
Description: "event info"
Author: "Z.Tang"
Members :
- int run // run id
- int event // event id in the run
- int localtime // reserved for DAQ computer local time in seconds from 2020-1-1
- double utc // reserved for UTC time from DAQ computer or offline
edm::MCParticle:
Description: "MC particle"
Author: "Z.Tang"
Members:
- int pdgID // PDG code of the particle
- int trackID // index of the particle
- int parentID // index of parent particle
- edm::Vector3f momentum // particle 3-momentum in [GeV]
- edm::Vector3f vertex // production vertex of the particle in [cm].
- short charge // atomic charge
- uint16_t mass // atomic mass, 0 for e+ e-
- float time // creation time of the particle in [ns] w.r.t. the event
- uint32_t simstat // (opt) status of the particle from the simulation
edm::CaloSimCell:
Description: "MC truth info in calo cell"
Author: "Z.Tang"
Members:
# unsigned char not supported... not sure if we shall use char
- short ix // X index of the cell
- short iy // Y index of the cell
- short iz // Z index of the cell
- float edep // energy of the hit in [GeV].
- uint16_t fracEM // int(frac*65535)
- uint16_t fracBS // both, int(em*255) | int(h*255)<<8
- edm::Vector3f pos // CoG of the hits in the cell in [cm]
- edm::Vector3f localpos // as above, in local coo [cm]
- float tfirst // first hit time in the cell [ns] (or t0?)
- float tmean // mean time of hits in the cell [ns]
- float tsigma // sigma of times of hits in the cell [ns]
MutableExtraCode:
declaration: "
void setCellIndex(short x, short y, short z){ setIx(x);setIy(y);setIz(z); }
"
edm::TrackingSimHit:
Description: "Simulated tracking hit, record individual truth step in cells"
Author: "Z.Tang"
Members:
- unsigned int cellCode // encoded cell numbering, e.g. sth like 10000*ilayer+100*iladder+isensor
- float time // incident time of the hit [ns].
- edm::Vector3f pos // the hit position in [cm].
- edm::Vector3f localpos // the local step start position in [cm].
- edm::Vector3f localend // the local step ending position in [cm].
- float pathlen // path length of the hit(track) in the sensitive material [cm]
- float edep // energy deposited in the hit [GeV].
- edm::Vector3f momentum // the 3-momentum of the particle at the hit position in [GeV]
- int trackID // the track ID of the particle which resulting the hit
- int pdgID // PDG ID of the particle that created this hit
#- edm::Vector3f dir // try to replace momentum with dir and P=|momentum|
#- float P // |momentum| or maybe betagamma better?
#- float dir_theta //
#- float dir_phi //
edm::CaloDigiCell:
Description: "calo cell digitization"
Author: "Z.Tang Z.Quan"
Members:
- unsigned int cellCode // encoded cell code
- std::array<int, 3> grayHR // total grayscale value in CMOS for high range, 3 radius
- std::array<int, 3> grayLR // total grayscale value in CMOS for low range, 3 radius
OneToOneRelations:
- edm::CaloSimCell siminfo // if there is any...
edm::CaloRecoCell:
Description: "calo cell after calibration"
Author: "Z.Tang Z.Quan"
Members:
- float edep // cell edep [GeV] after calibration
- short ix // X index of the cell in calo
- short iy // Y index of the cell in calo
- short iz // Z index of the cell in calo
OneToOneRelations:
- edm::CaloDigiCell digiinfo // link to digi object
#maybe also need other links in the future...
edm::CaloClusters:
Description: "Calo Clusters"
Author: "C.Zhang et al"
Members:
- int id // cluster ID
- float edep // cluster total energy deposition
VectorMembers:
- uint16_t ihits // hit id in the cluster
edm::CaloShowerAxis:
Description: "Calo Shower Axis"
Author: "C.Zhang et al"
Members:
- int id // Shower ID
- edm::Vector3f dir // [dx,dy,dz]
- edm::Vector3f entryCoor // [x1,y1,z1]
- edm::Vector3f cog // [x0,y0,z0]
- edm::Vector3f exitCoor // [x2,y2,z2]
VectorMembers:
- uint16_t iclusters // cluster ids
#- uint16_t iHits // hit ids <optional>
edm::SiliconDigiHit:
Description: "digi hit (strip) for silicon detectors"
Author: "Z.Tang"
Members:
- unsigned int cellCode // the cell(sensor) code as in TrackingSimHit but only the part down to ladder make sensne
- uint16_t stripID // strip ID
- uint32_t ADC // strip ADC value
#- unsigned int TDC // reserved for TDC
OneToManyRelations:
- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?
edm::SCDCluster:
Description: "SCD Cluster"
Author: "D.Guo et al"
Members:
- unsigned int ladderCode // global id (sensor cellcode/100)
- double centroid // centroid wrt ladder
- double clusterTotalADC //
- float clusterCharge //
- short clusterSize //
- double maxStripADC //
- uint16_t maxStripID //
- int clusterType // simple cluster, or splited from large cluster(1to2,even 1to3...)
- int sharedType // 0 no sharedstrip, 1, first strip shared, 2 last strip shared, 3 both first and last strip shared
- edm::Vector3f posGlobal // global position of the cluster
- char vertAligned // flag for cell projection. See DirectionInfo in GeometrySvc for details
VectorMembers:
- double stripADC // ADCs of strips
- uint16_t stripID // corresponding strip IDs
edm::SCDTrack:
Description: "SCD Track"
Author: "郭东亚等"
Members:
- int seedType // truthseed, blindseed,caloseed
- int impactSector // 属于哪个单体, 0~4
- int impactPlane // 初始击中了哪层, 0~3
- edm::Vector3f m_impactPoint // 初始击中点的位置(根据滤波平滑后径迹计算)
- float particleZ // 0~gamma, >1 charged particle
- float chi2 // 平滑后的chi2
- int ndf // 拟合自由度
- int nHitX // 参与拟合的X方向Hit数
- int nHitY // 参与拟合的Y方向Hit数
- int nHitXY // 参与拟合的同时有XY的Hit 【?】
VectorMembers:
- int clusterXIndex // 参与径迹重建的clusterX的索引
- int clusterYIndex // 参与径迹重建的clusterY的索引
- float covxx // 滤波平滑后的协方差矩阵xx
- float covyy // 滤波平滑后的协方差矩阵yy
- float covxy // 滤波平滑后的协方差矩阵xy
- float x // 滤波平滑后的x坐标
- float y // 滤波平滑后的y坐标
- float xMeas // clusterX的x坐标测量值(Alignment后)
- float yMeas // clusterY的y坐标测量值(Alignment后)
- float zxMeas // clusterX的z坐标测量值(Alignment后)
- float zyMeas // clusterY的z坐标测量值(Alignment后)
- float covMeasxx // 滤波平滑后残差的协方差矩阵xx
- float covMeasyy // 滤波平滑后残差的协方差矩阵yy
- short plane // plane ID
- float trackSlopex // 滤波平滑后X方向斜率
- float trackSlopey // 滤波平滑后Y方向斜率
- float covSlopexx // 滤波平滑后斜率的协方差矩阵xx
- float covSlopeyy // 滤波平滑后斜率的协方差矩阵yy
- float covSlopexy // 滤波平滑后斜率的协方差矩阵xy
- short vecClusters // cluster index, maybe link is better?
edm::PSDDigiCell:
Description: "digi hit for psd bar v0"
Author: "Z.Tang, Q.Wu"
Members:
- unsigned int cellCode // the same bar/tile code as in TrackingSimHit
- int ADC0 // SiPM ADC value, one for each side
- int ADC1 //
#//- short TDC[2] // reserved for TDC
#OneToManyRelations:
#- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?
edm::PSDRecoCell:
Description: "reco hit for psd bar v0"
Author: "Z.Tang, Q.Wu"
Members:
- unsigned int cellCode //
- float edep // reconstructed edep
- float edepBirksCorr // with Birks law correction (experimental)
- float nPE0 // convert ADC to nPE at SiPM w/ corrections
- float nPE1 // convert ADC to nPE at SiPM w/ corrections
- float hitpos // reco hit local position from bar alone
edm::FITDigiCell:
Description: "digi hit for FIT v0"
Author: "Z.Tang, J Wang"
Members:
- unsigned int cellCode // the same mat code as in TrackingSimHit
- uint16_t channelID // channel id inside a MAT (3 SiPM at the moment)
- int ADC // SiPM channel ADC value
#//- short TDC // reserved for TDC
OneToManyRelations:
- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?
edm::FITRecoCell:
Description: "reco hit for FIT v0"
Author: "Z.Tang, J.Wang"
Members:
- unsigned int cellCode //
- uint16_t channelID // channel id inside a MAT (3 SiPM)
- float edep // reconstructed edep
#- float edepBirksCorr // with Birks law correction (experimental)
edm::FITCluster:
Description: "FIT Cluster"
Author: "J.Wang et al"
Members:
- unsigned int cellCode // cell code of the mat
- bool ifOddLayer // flag of whether on an odd numbered layer
- char vertAligned // flag of cell projection, see DirectionInfo in GeometrySvc
- uint16_t peakChannel // channel ID of the peak
- float peakChannelAmp // after calibration, in unit of p.e. [float]
- uint16_t peakLeft // how many channels are on the left side of the peak channel?
- uint16_t peakRight // how many channels are on the right side of the peak channel?
- float clusterAmp // total channel amp
- uint16_t clusterSize // number of channels with amp > neighbor threshold
- float clusterCharge // reconstructed cluster |z| if m_trackType==calo seed (default -1)
- edm::Vector3f posLocal // wrt layer [?]
- edm::Vector3f position // cluster position in global coo, use mat coo for the unkown direction
- int type // simple cluster, or splited from large cluster(1to2,even 1to3...)
VectorMembers:
- uint16_t channels // channel id within a mat
- float channelAmps // corrected?
edm::FITTrack:
Description: "FIT Track"
Author: "J.Wang et al"
Members:
- int seedType // blind search, calo seed, scd seed
- edm::Vector3f fittedDir //
- edm::Vector3f fittedPos // impact point closest to CALO
- int impactSector // 0~4
- int impactLayer // initial hit from outer layers
- double particleZ //0~gamma, 1 charged particle
- double chi2x //
- double chi2y //
- double ndfx // int?
- double ndfy //
- int nHitX // number of y hit in X layer
- int nHitY // number of x hit in Y layer
- int nHitXY // number of x&y pair in physical geo
- double trackLength // ?
VectorMembers:
- short clusterIDs // maybe podio relation
- short planes // layer#
#- float fittedCov //
edm::TRDDigiCell:
Description: "digi hit for TRD v0"
Author: "Z.Tang, C.Dai"
Members:
- unsigned int cellCode // the same mat code as in TrackingSimHit
- std::array<short,128> ADC // ADC values for all channels in a module
#//- std::array<short,128> TDC // reserved for TDC
#OneToManyRelations:
#- edm::TrackingSimHit siminfos // link to lots of sim hits, do we need?
edm::TRDRecoCell:
Description: "reco hit for TRD v0"
Author: "Z.Tang, C.Dai"
Members:
- unsigned int cellCode // cell code (module)
- float edep // reconstructed edep of selected channels
- std::array<float,128> rawEdep // raw edep from ADC
- std::array<float,128> corrEdep // considering also lateral position correction
edm::FastTrigger:
Description: "fast trigger digi & calculation in regions, for subdets (CALO, PSD, ...)"
Author: "Z.Tang M.Xu"
Members:
- char det // calo pmt, calo pd, psd, ...
- char type // det specific, e.g. calo= LEE,LEG,...
- char region // triggered region (core, shell1,shell2, ...)
- float time // trigger time simulation, to be used in high level trigger
edm::Trigger:
Description: "high level trigger results based on fast triggers"
Author: "Z.Tang M.Xu"
Members:
- unsigned int pattern // trigger pattern
OneToManyRelations:
- edm::FastTrigger fastTrg // link to fast triggers
edm::GlobalTrack:
Description: "Reconstructed track from SCD cluster and FIT clusters"
Author : "ZYQU"
Members:
- edm::Vector3f position // position
- edm::Vector3f direction // direction
- int nHit // number of hit used in this track
- float chi2 // fited Chi2 with default measurement error
- float charge // Global Charge from all Clusters
- float chargeSCD // Charge estimated from SCD
- float chargeFIT // Charge estimated from FIT
- int type // type of track, 0 for half track, 1 for whole track
VectorMembers:
- edm::Vector3f coos // coordinate used in the fit
OneToManyRelations:
- edm::SCDCluster scdcls // SCD cluster collection
- edm::FITCluster fitcls // FIT cluster collection