Local NP Memmap Backend¶
Local Numpy memmap Backend Implementation, Identifier: NUMPY_10
Backend Identifiers¶
- Backend:
1
- Version:
0
- Format Code:
10
- Canonical Name:
NUMPY_10
Storage Method¶
- Data is written to specific subarray indexes inside a numpy memmapped array on disk.
- Each file is a zero-initialized array of
dtype: {schema_dtype}
; ienp.float32
ornp.uint8
shape: (COLLECTION_SIZE, *{schema_shape})
; ie(500, 10)
or(500, 4, 3)
. The first index in the array is referred to as a “collection index”.
Compression Options¶
Does not accept any compression options. No compression is applied.
Record Format¶
Fields Recorded for Each Array¶
- Format Code
- File UID
- xxhash64_hexdigest
- Collection Index (0:COLLECTION_SIZE subarray selection)
- Subarray Shape
Examples
Adding the first piece of data to a file:
- Array shape (Subarray Shape): (10, 10)
- File UID: “K3ktxv”
- xxhash64_hexdigest: 94701dd9f32626e2
- Collection Index: 488
Record Data => "10:K3ktxv:94701dd9f32626e2:488:10 10"
Adding to a piece of data to a the middle of a file:
- Array shape (Subarray Shape): (20, 2, 3)
- File UID: “Mk23nl”
- xxhash64_hexdigest: 1363344b6c051b29
- Collection Index: 199
Record Data => "10:Mk23nl:1363344b6c051b29:199:20 2 3"
Technical Notes¶
- A typical numpy memmap file persisted to disk does not retain information
about its datatype or shape, and as such must be provided when re-opened
after close. In order to persist a memmap in
.npy
format, we use the a special functionopen_memmap
imported fromnp.lib.format
which can open a memmap file and persist necessary header info to disk in.npy
format. - On each write, an
xxhash64_hexdigest
checksum is calculated. This is not for use as the primary hash algorithm, but rather stored in the local record format itself to serve as a quick way to verify no disk corruption occurred. This is required since numpy has no built in data integrity validation methods when reading from disk.