Local NP Memmap Backend¶
Local Numpy memmap Backend Implementation, Identifier: NUMPY_10
Backend Identifiers¶
- Backend:
1 - Version:
0 - Format Code:
10 - Canonical Name:
NUMPY_10
Storage Method¶
- Data is written to specific subarray indexes inside a numpy memmapped array on disk.
- Each file is a zero-initialized array of
dtype: {schema_dtype}; ienp.float32ornp.uint8shape: (COLLECTION_SIZE, *{schema_shape}); ie(500, 10)or(500, 4, 3). The first index in the array is referred to as a “collection index”.
Record Format¶
Fields Recorded for Each Array¶
- Format Code
- File UID
- Alder32 Checksum
- Collection Index (0:COLLECTION_SIZE subarray selection)
- Subarray Shape
Separators used¶
SEP_KEY: ":"SEP_HSH: "$"SEP_LST: " "SEP_SLC: "*"
Examples
Adding the first piece of data to a file:
- Array shape (Subarray Shape): (10)
- File UID: “NJUUUK”
- Alder32 Checksum: 900338819
- Collection Index: 2
Record Data => '10:NJUUUK$900338819$2*10'
Adding to a piece of data to a the middle of a file:
- Array shape (Subarray Shape): (20, 2, 3)
- File UID: “Mk23nl”
- Alder32 Checksum: 2546668575
- Collection Index: 199
Record Data => "10:Mk23nl$2546668575$199*20 2 3"
Technical Notes¶
- A typical numpy memmap file persisted to disk does not retain information
about its datatype or shape, and as such must be provided when re-opened
after close. In order to persist a memmap in
.npyformat, we use the a special functionopen_memmapimported fromnp.lib.formatwhich can open a memmap file and persist necessary header info to disk in.npyformat. - On each write, an
alder32checksum is calculated. This is not for use as the primary hash algorithm, but rather stored in the local record format itself to serve as a quick way to verify no disk corruption occurred. This is required since numpy has no built in data integrity validation methods when reading from disk.