Local NP Memmap Backend¶
Local Numpy memmap Backend Implementation, Identifier: NUMPY_10
Backend Identifiers¶
Backend:
1
Version:
0
Format Code:
10
Canonical Name:
NUMPY_10
Storage Method¶
Data is written to specific subarray indexes inside a numpy memmapped array on disk.
Each file is a zero-initialized array of
dtype: {schema_dtype}
; ienp.float32
ornp.uint8
shape: (COLLECTION_SIZE, *{schema_shape})
; ie(500, 10)
or(500, 4, 3)
. The first index in the array is referred to as a “collection index”.
Compression Options¶
Does not accept any compression options. No compression is applied.
Record Format¶
Fields Recorded for Each Array¶
Format Code
File UID
xxhash64_hexdigest
Collection Index (0:COLLECTION_SIZE subarray selection)
Subarray Shape
Examples
Adding the first piece of data to a file:
Array shape (Subarray Shape): (10, 10)
File UID: “K3ktxv”
xxhash64_hexdigest: 94701dd9f32626e2
Collection Index: 488
Record Data => "10:K3ktxv:94701dd9f32626e2:488:10 10"
Adding to a piece of data to a the middle of a file:
Array shape (Subarray Shape): (20, 2, 3)
File UID: “Mk23nl”
xxhash64_hexdigest: 1363344b6c051b29
Collection Index: 199
Record Data => "10:Mk23nl:1363344b6c051b29:199:20 2 3"
Technical Notes¶
A typical numpy memmap file persisted to disk does not retain information about its datatype or shape, and as such must be provided when re-opened after close. In order to persist a memmap in
.npy
format, we use the a special functionopen_memmap
imported fromnp.lib.format
which can open a memmap file and persist necessary header info to disk in.npy
format.On each write, an
xxhash64_hexdigest
checksum is calculated. This is not for use as the primary hash algorithm, but rather stored in the local record format itself to serve as a quick way to verify no disk corruption occurred. This is required since numpy has no built in data integrity validation methods when reading from disk.