Local HDF5 Backend¶

Local HDF5 Backend Implementation, Identifier: HDF5_00

Backend Identifiers¶

Data is written to specific subarray indexes inside an HDF5 “dataset” in a single HDF5 File.
In each HDF5 File there are COLLECTION_COUNT “datasets” (named ["0" : "{COLLECTION_COUNT}"]). These are referred to as "dataset number"
Each dataset is a zero-initialized array of:
- dtype: {schema_dtype}; ie np.float32 or np.uint8
- shape: (COLLECTION_SIZE, *{schema_shape}); ie (500, 10) or (500, 4, 3). The first index in the dataset is referred to as a collection index.
Compression Filters, Chunking Configuration/Options are applied globally for all datasets in a file at dataset creation time.

Examples

Adding the first piece of data to a file:
- Array shape (Subarray Shape): (10)
- File UID: “2HvGf9”
- Dataset Number: “0”
- Collection Index: 0

Record Data => "00:2HvGf9$0 0*10"

Files are read only after initial creation/writes. Only a write-enabled checkout can open a HDF5 file in "w" or "a" mode, and writer checkouts create new files on every checkout, and make no attempt to fill in unset locations in previous files. This is not an issue as no disk space is used until data is written to the initially created “zero-initialized” collection datasets
On write: Single Writer Multiple Reader (SWMR) mode is set to ensure that improper closing (not calling .close()) method does not corrupt any data which had been previously flushed to the file.
On read: SWMR is set to allow multiple readers (in different threads / processes) to read from the same file. File handle serialization is handled via custom python pickle serialization/reduction logic which is implemented by the high level pickle reduction __set_state__(), __get_state__() class methods.