Local HDF5 Backend¶
Local HDF5 Backend Implementation, Identifier: HDF5_00
Backend Identifiers¶
- Backend:
0
- Version:
0
- Format Code:
00
- Canonical Name:
HDF5_00
Storage Method¶
- Data is written to specific subarray indexes inside an HDF5 “dataset” in a single HDF5 File.
- In each HDF5 File there are
COLLECTION_COUNT
“datasets” (named["0" : "{COLLECTION_COUNT}"]
). These are referred to as"dataset number"
- Each dataset is a zero-initialized array of:
dtype: {schema_dtype}
; ienp.float32
ornp.uint8
shape: (COLLECTION_SIZE, *{schema_shape})
; ie(500, 10)
or(500, 4, 3)
. The first index in the dataset is referred to as acollection index
.
- Compression Filters, Chunking Configuration/Options are applied globally for
all
datasets
in a file at dataset creation time.
Record Format¶
Fields Recorded for Each Array¶
- Format Code
- File UID
- Dataset Number (
0:COLLECTION_COUNT
dataset selection) - Collection Index (
0:COLLECTION_SIZE
dataset subarray selection) - Subarray Shape
Separators used¶
SEP_KEY: ":"
SEP_HSH: "$"
SEP_LST: " "
SEP_SLC: "*"
Examples
- Adding the first piece of data to a file:
- Array shape (Subarray Shape): (10)
- File UID: “2HvGf9”
- Dataset Number: “0”
- Collection Index: 0
Record Data => "00:2HvGf9$0 0*10"
Adding to a piece of data to a the middle of a file:
- Array shape (Subarray Shape): (20, 2, 3)
- File UID: “WzUtdu”
- Dataset Number: “3”
- Collection Index: 199
Record Data => "00:WzUtdu$3 199*20 2 3"
Technical Notes¶
- Files are read only after initial creation/writes. Only a write-enabled
checkout can open a HDF5 file in
"w"
or"a"
mode, and writer checkouts create new files on every checkout, and make no attempt to fill in unset locations in previous files. This is not an issue as no disk space is used until data is written to the initially created “zero-initialized” collection datasets - On write: Single Writer Multiple Reader (
SWMR
) mode is set to ensure that improper closing (not calling.close()
) method does not corrupt any data which had been previously flushed to the file. - On read: SWMR is set to allow multiple readers (in different threads /
processes) to read from the same file. File handle serialization is handled
via custom python
pickle
serialization/reduction logic which is implemented by the high levelpickle
reduction__set_state__()
,__get_state__()
class methods.