Change Log¶
v0.4.0 (2019-11-21)¶
New Features¶
- Added ability to delete branch names/pointers from a local repository via both API and CLI. (#128) @rlizzo
- Added
local
keyword arg to arrayset key/value iterators to return only locally available samples (#131) @rlizzo - Ability to change the backend storage format and options applied to an
arrayset
after initialization. (#133) @rlizzo - Added blosc compression to HDF5 backend by default on PyPi installations. (#146) @rlizzo
- Added Benchmarking Suite to Test for Performance Regressions in PRs. (#155) @rlizzo
- Added new backend optimized to increase speeds for fixed size arrayset access. (#160) @rlizzo
Improvements¶
- Removed
msgpack
andpyyaml
dependencies. Cleaned up and improved remote client/server code. (#130) @rlizzo - Multiprocess Torch DataLoaders allowed on Linux and MacOS. (#144) @rlizzo
- Added CLI options
commit
,checkout
,arrayset create
, &arrayset remove
. (#150) @rlizzo - Plugin system revamp. (#134) @hhsecond
- Documentation Improvements and Typo-Fixes. (#156) @alessiamarcolini
- Removed implicit removal of arrayset schema from checkout if every sample was removed from arrayset. This could potentially result in dangling accessors which may or may not self-destruct (as expected) in certain edge-cases. (#159) @rlizzo
- Added type codes to hash digests so that calculation function can be updated in the future without breaking repos written in previous Hangar versions. (#165) @rlizzo
Bug Fixes¶
- Programatic access to repository log contents now returns branch heads alongside other log info. (#125) @rlizzo
- Fixed minor bug in types of values allowed for
Arrayset
names vsSample
names. (#151) @rlizzo - Fixed issue where using checkout object to access a sample in multiple arraysets would try to create
a
namedtuple
instance with invalid field names. Now incompatible field names are automatically renamed with their positional index. (#161) @rlizzo - Explicitly raise error if
commit
argument is set while checking out a repository withwrite=True
. (#166) @rlizzo
Breaking changes¶
- New commit reference serialization format is incompatible with repositories written in version 0.3.0 or earlier.
v0.3.0 (2019-09-10)¶
New Features¶
Improvements¶
- Added tutorial on working with remote data. (#113) @rlizzo
- Added Tutorial on Tensorflow and PyTorch Dataloaders. (#117) @hhsecond
- Large performance improvement to diff/merge algorithm (~30x previous). (#112) @rlizzo
- New commit hash algorithm which is much more reproducible in the long term. (#120) @rlizzo
- HDF5 backend updated to increase speed of reading/writing variable sized dataset compressed chunks (#120) @rlizzo
Bug Fixes¶
Breaking changes¶
- New commit hash algorithm is incompatible with repositories written in version 0.2.0 or earlier
v0.2.0 (2019-08-09)¶
New Features¶
- Numpy memory-mapped array file backend added. (#70) @rlizzo
- Remote server data backend added. (#70) @rlizzo
- Selection heuristics to determine appropriate backend from arrayset schema. (#70) @rlizzo
- Partial remote clones and fetch operations now fully supported. (#85) @rlizzo
- CLI has been placed under test coverage, added interface usage to docs. (#85) @rlizzo
- TensorFlow and PyTorch Machine Learning Dataloader Methods (Experimental Release). (#91) lead: @hhsecond, co-author: @rlizzo, reviewed by: @elistevens
Improvements¶
- Record format versioning and standardization so to not break backwards compatibility in the future. (#70) @rlizzo
- Backend addition and update developer protocols and documentation. (#70) @rlizzo
- Read-only checkout arrayset sample
get
methods now are multithread and multiprocess safe. (#84) @rlizzo - Read-only checkout metadata sample
get
methods are thread safe if used within a context manager. (#101) @rlizzo - Samples can be assigned integer names in addition to
string
names. (#89) @rlizzo - Forgetting to close a
write-enabled
checkout before terminating the python process will close the checkout automatically for many situations. (#101) @rlizzo - Repository software version compatability methods added to ensure upgrade paths in the future. (#101) @rlizzo
- Many tests added (including support for Mac OSX on Travis-CI). lead: @rlizzo, co-author: @hhsecond
Bug Fixes¶
Breaking changes¶
- Renamed all references to
datasets
in the API / world-view toarraysets
. - These are backwards incompatible changes. For all versions > 0.2, repository upgrade utilities will be provided if breaking changes occur.