Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.
Free software: Apache 2.0 license
What is Hangar?¶
Hangar is based off the belief that too much time is spent collecting, managing,
and creating home-brewed version control systems for data. At it’s core Hangar
is designed to solve many of the same problems faced by traditional code version
control system (ie.
Git), just adapted for numerical data:
Time travel through the historical evolution of a dataset.
Zero-cost Branching to enable exploratory analysis and collaboration
Cheap Merging to build datasets over time (with multiple collaborators)
Completely abstracted organization and management of data files on disk
Ability to only retrieve a small portion of the data (as needed) while still maintaining complete historical record
Ability to push and pull changes directly to collaborators or a central server (ie a truly distributed version control system)
The ability of version control systems to perform these tasks for codebases is largely taken for granted by almost every developer today; However, we are in-fact standing on the shoulders of giants, with decades of engineering which has resulted in these phenomenally useful tools. Now that a new era of “Data-Defined software” is taking hold, we find there is a strong need for analogous version control systems which are designed to handle numerical data at large scale… Welcome to Hangar!
The Hangar Workflow:
Checkout Branch | ▼ Create/Access Data | ▼ Add/Remove/Update Samples | ▼ Commit
Log Style Output:
* 5254ec (master) : merge commit combining training updates and new validation samples |\ | * 650361 (add-validation-data) : Add validation labels and image data in isolated branch * | 5f15b4 : Add some metadata for later reference and add new training samples received after initial import |/ * baddba : Initial commit adding training images and labels
Learn more about what Hangar is all about at https://hangar-py.readthedocs.io/
Hangar is in early alpha development release!
pip install hangar
To run the all tests run:
Note, to combine the coverage data from all the tox environments run:
set PYTEST_ADDOPTS=--cov-append tox
- Hangar Core Concepts
- What Is Hangar?
- How Hangar Thinks About Data
- Implications of the Hangar Data Philosophy
- What’s Next?
- Python API
- Hangar Tutorial
- Quick Start Tutorial
- Part 1: Creating A Repository And Working With Data
- Part 2: Checkouts, Branching, & Merging
- Part 3: Working With Remote Servers
- Dataloaders for Machine Learning (Tensorflow & PyTorch)
- “Real World” Quick Start Tutorial
- Hangar Under The Hood
- Hangar CLI Documentation
- Hangar External
- Frequently Asked Questions
- Backend selection
- Contributing to Hangar
- Contributor Code of Conduct
- Hangar Performance Benchmarking Suite
- Change Log
- v0.5.2 (2020-05-08)
- v0.5.1 (2020-04-05)
- v0.5.0 (2020-04-4)
- v0.4.0 (2019-11-21)
- v0.3.0 (2019-09-10)
- v0.2.0 (2019-08-09)
- v0.1.1 (2019-05-24)
- v0.1.0 (2019-05-24)
- v0.0.0 (2019-04-15)