Local NP Memmap Backend¶
Local Numpy memmap Backend Implementation, Identifier: NUMPY_10
Backend Identifiers¶
Backend:
1Version:
0Format Code:
10Canonical Name:
NUMPY_10
Storage Method¶
Data is written to specific subarray indexes inside a numpy memmapped array on disk.
Each file is a zero-initialized array of
dtype: {schema_dtype}; ienp.float32ornp.uint8shape: (COLLECTION_SIZE, *{schema_shape}); ie(500, 10)or(500, 4, 3). The first index in the array is referred to as a “collection index”.
Compression Options¶
Does not accept any compression options. No compression is applied.
Record Format¶
Fields Recorded for Each Array¶
Format Code
File UID
xxhash64_hexdigest
Collection Index (0:COLLECTION_SIZE subarray selection)
Subarray Shape
Examples
Adding the first piece of data to a file:
Array shape (Subarray Shape): (10, 10)
File UID: “K3ktxv”
xxhash64_hexdigest: 94701dd9f32626e2
Collection Index: 488
Record Data => "10:K3ktxv:94701dd9f32626e2:488:10 10"Adding to a piece of data to a the middle of a file:
Array shape (Subarray Shape): (20, 2, 3)
File UID: “Mk23nl”
xxhash64_hexdigest: 1363344b6c051b29
Collection Index: 199
Record Data => "10:Mk23nl:1363344b6c051b29:199:20 2 3"
Technical Notes¶
A typical numpy memmap file persisted to disk does not retain information about its datatype or shape, and as such must be provided when re-opened after close. In order to persist a memmap in
.npyformat, we use the a special functionopen_memmapimported fromnp.lib.formatwhich can open a memmap file and persist necessary header info to disk in.npyformat.On each write, an
xxhash64_hexdigestchecksum is calculated. This is not for use as the primary hash algorithm, but rather stored in the local record format itself to serve as a quick way to verify no disk corruption occurred. This is required since numpy has no built in data integrity validation methods when reading from disk.