Local NP Memmap Backend¶
Local Numpy memmap Backend Implementation, Identifier: NUMPY_10
Backend Identifiers¶
- Backend:
1
- Version:
0
- Format Code:
10
- Canonical Name:
NUMPY_10
Storage Method¶
- Data is written to specific subarray indexes inside a numpy memmapped array on disk.
- Each file is a zero-initialized array of
dtype: {schema_dtype}
; ienp.float32
ornp.uint8
shape: (COLLECTION_SIZE, *{schema_shape})
; ie(500, 10)
or(500, 4, 3)
. The first index in the array is referred to as a “collection index”.
Record Format¶
Fields Recorded for Each Array¶
- Format Code
- File UID
- Alder32 Checksum
- Collection Index (0:COLLECTION_SIZE subarray selection)
- Subarray Shape
Separators used¶
SEP_KEY: ":"
SEP_HSH: "$"
SEP_LST: " "
SEP_SLC: "*"
Examples
Adding the first piece of data to a file:
- Array shape (Subarray Shape): (10)
- File UID: “NJUUUK”
- Alder32 Checksum: 900338819
- Collection Index: 2
Record Data => '10:NJUUUK$900338819$2*10'
Adding to a piece of data to a the middle of a file:
- Array shape (Subarray Shape): (20, 2, 3)
- File UID: “Mk23nl”
- Alder32 Checksum: 2546668575
- Collection Index: 199
Record Data => "10:Mk23nl$2546668575$199*20 2 3"
Technical Notes¶
- A typical numpy memmap file persisted to disk does not retain information
about its datatype or shape, and as such must be provided when re-opened
after close. In order to persist a memmap in
.npy
format, we use the a special functionopen_memmap
imported fromnp.lib.format
which can open a memmap file and persist necessary header info to disk in.npy
format. - On each write, an
alder32
checksum is calculated. This is not for use as the primary hash algorithm, but rather stored in the local record format itself to serve as a quick way to verify no disk corruption occurred. This is required since numpy has no built in data integrity validation methods when reading from disk.