Part 3: Working With Remote Servers¶

This tutorial will introduce how to start a remote Hangar server, and how to work with remotes from the client side.

Particular attention is paid to the concept of a *partially fetch* / *partial clone* operations. This is a key component of the Hangar design which provides the ability to quickly and efficiently work with data contained in remote repositories whose full size would be significatly prohibitive to local use under most circumstances.

Note:

At the time of writing, the API, user-facing functionality, client-server negotiation protocols, and test coverage of the remotes implementation is generally adqequate for this to serve as an “alpha” quality preview. However, please be warned that significantly less time has been spent in this module to optimize speed, refactor for simplicity, and assure stability under heavy loads than the rest of the Hangar core. While we can guarantee that your data is secure on disk, you may experience crashes from time to time when working with remotes. In addition, sending data over the wire should NOT be considered secure in ANY way. No in-transit encryption, user authentication, or secure access limitations are implemented at this moment. We realize the importance of these types of protections, and they are on our radar for the next release cycle. If you are interested in making a contribution to Hangar, this module contains a lot of low hanging fruit which would would provide drastic improvements and act as a good intro the the internal Hangar data model. Please get in touch with us to discuss!

Starting a Hangar Server¶

To start a Hangar server, navigate to the command line and simply execute:

$ hangar server

This will get a local server instance running at localhost:50051. The IP and port can be configured by setting the --ip and --port flags to the desired values in the command line.

A blocking process will begin in that terminal session. Leave it running while you experiment with connecting from a client repo.

Using Remotes with a Local Repository¶

The CLI is the easiest way to interact with the remote server from a local repository (though all functioanlity is mirrorred via the repository API (more on that later).

Before we begin we will set up a repository with some data, a few commits, two branches, and a merge.

Setup a Test Repo¶

As normal, we shall begin with creating a repository and adding some data. This should be familiar to you from previous tutorials.

[1]:

from hangar import Repository
import numpy as np
from tqdm import tqdm

testData = np.loadtxt('/Users/rick/projects/tensorwerk/hangar/dev/data/dota2Dataset/dota2Test.csv', delimiter=',', dtype=np.uint8)
trainData = np.loadtxt('/Users/rick/projects/tensorwerk/hangar/dev/data/dota2Dataset/dota2Train.csv', delimiter=',', dtype=np.uint16)

testName = 'test'
testPrototype = testData[0]
trainName = 'train'
trainPrototype = trainData[0]

[2]:

repo = Repository('/Users/rick/projects/tensorwerk/hangar/dev/intro/')
repo.init(user_name='Rick Izzo', user_email='rick@tensorwerk.com', remove_old=True)
co = repo.checkout(write=True)

Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/intro/.hangar

/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/context.py:94: UserWarning: No repository exists at /Users/rick/projects/tensorwerk/hangar/dev/intro/.hangar, please use `repo.init()` method
  warnings.warn(msg, UserWarning)

[3]:

co.add_ndarray_column(testName, prototype=testPrototype)
testcol = co.columns[testName]

pbar = tqdm(total=testData.shape[0])
with testcol as tcol:
    for gameIdx, gameData in enumerate(testData):
        if (gameIdx % 500 == 0):
            pbar.update(500)
        tcol.append(gameData)
pbar.close()

co.commit('initial commit on master with test data')

repo.create_branch('add-train')
co.close()
repo.log()

10500it [00:02, 4286.17it/s]

* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 (add-train) (master) : initial commit on master with test data

[6]:

co = repo.checkout(write=True, branch='add-train')

co.add_ndarray_column(trainName, prototype=trainPrototype)
traincol = co.columns[trainName]

pbar = tqdm(total=trainData.shape[0])
with traincol as trcol:
    for gameIdx, gameData in enumerate(trainData):
        if (gameIdx % 500 == 0):
            pbar.update(500)
        trcol.append(gameData)
pbar.close()

co.commit('added training data on another branch')
co.close()
repo.log()

93000it [00:22, 4078.73it/s]

* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) : added training data on another branch
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 (master) : initial commit on master with test data

[7]:

co = repo.checkout(write=True, branch='master')
co.metadata['earaea'] = 'eara'
co.commit('more changes here')
co.close()
repo.log()

* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (master) : more changes here
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

Pushing to a Remote¶

We will use the API remote add() method to add a remote, however, this can also be done with the CLI command:

$ hangar remote add origin localhost:50051

[8]:

repo.remote.add('origin', 'localhost:50051')

[8]:

RemoteInfo(name='origin', address='localhost:50051')

Pushing is as simple as running the push() method from the API or CLI:

$ hangar push origin master

Push the master branch:

[9]:

repo.remote.push('origin', 'master')

counting objects: 100%|██████████| 2/2 [00:00<00:00,  5.47it/s]
pushing schemas: 100%|██████████| 1/1 [00:00<00:00, 133.74it/s]
pushing data:  97%|█████████▋| 10001/10294 [00:01<00:00, 7676.23it/s]
pushing metadata: 100%|██████████| 1/1 [00:00<00:00, 328.50it/s]
pushing commit refs: 100%|██████████| 2/2 [00:00<00:00, 140.73it/s]

[9]:

'master'

Push the add-train branch:

[10]:

repo.remote.push('origin', 'add-train')

counting objects: 100%|██████████| 1/1 [00:01<00:00,  1.44s/it]
pushing schemas: 100%|██████████| 1/1 [00:00<00:00, 126.05it/s]
pushing data:  99%|█████████▉| 92001/92650 [00:12<00:00, 7107.60it/s]
pushing metadata: 0it [00:00, ?it/s]
pushing commit refs: 100%|██████████| 1/1 [00:00<00:00, 17.05it/s]

[10]:

'add-train'

Details of the Negotiation Processs¶

The following details are not necessary to use the system, but may be of interest to some readers

When we push data, we perform a negotation with the server which basically occurs like this:

Hi, I would like to push this branch, do you have it?
If yes, what is the latest commit you record on it?
- Is that the same commit I’m trying to push? If yes, abort.
- Is that a commit I don’t have? If yes, someone else has updated that branch, abort.
Here’s the commit digests which are parents of my branches head, which commits are you missing?
Ok great, I’m going to scan through each of those commits to find the data hashes they contain. Tell me which ones you are missing.
Thanks, now I’ll send you all of the data corresponding to those hashes. It might be a lot of data, so we’ll handle this in batches so that if my connection cuts out, we can resume this later
Now that you have the data, I’m going to send the actual commit references for you to store, this isn’t that much information, but you’ll be sure to verify that I’m not trying to pull any funny buisness and send you incorrect data.
Now that you’ve received everything, and have verified it matches what I told you it is, go ahead and make those commits I’ve pushed available as the HEAD of the branch I just sent. It’s some good work that others will want!

When we want to fetch updates to a branch, essentially the exact same thing happens in reverse. Instead of asking the server what it doesn’t have, we ask it what it does have, and then request the stuff that we are missing!

Partial Fetching and Clones¶

Now we will introduce one of the most important and unique features of Hangar remotes: Partial fetch/clone of data!

There is a very real problem with keeping the full history of data - **it’s huge*!* The size of data can very easily exceeds what can fit on (most) contributors laptops or personal workstations. This section explains how Hangar can handle working with columns which are prohibitively large to download or store on a single machine.

As mentioned in High Performance From Simplicity, under the hood Hangar deals with “Data” and “Bookkeeping” completely separately. We’ve previously covered what exactly we mean by Data in How Hangar Thinks About Data, so we’ll briefly cover the second major component of Hangar here. In short “Bookkeeping” describes everything about the repository. By everything, we do mean that the Bookkeeping records describe everything: all commits, parents, branches, columns, samples, data descriptors, schemas, commit message, etc. Though complete, these records are fairly small (tens of MB in size for decently sized repositories with decent history), and are highly compressed for fast transfer between a Hangar client/server.

A brief technical interlude

There is one very important (and rather complex) property which gives Hangar Bookeeping massive power: existence of some data piece is always known to Hangar and stored immutably once committed. However, the access pattern, backend, and locating information for this data piece may (and over time, will) be unique in every hangar repository instance.

Though the details of how this works is well beyond the scope of this document, the following example may provide some insight into the implications of this property:

If you clone some Hangar repository, Bookeeping says that “some number of data pieces exist” and they should retrieved from the server. However, the bookeeping records transfered in a fetch / push / clone operation do not include information about where that piece of data existed on the client (or server) computer. Two synced repositories can use completly different backends to store the data, in completly different locations, and it does not matter - Hangar only guarantees that when collaborators ask for a data sample in some checkout, that they will be provided with identical arrays, not that they will come from the same place or be stored in the same way. Only when data is actually retrieved the “locating information” is set for that repository instance. Because Hangar makes no assumptions about how/where it should retrieve some piece of data, or even an assumption that it exists on the local machine, and because records are small and completely describe history, once a machine has the Bookkeeping, it can decide what data it actually wants to materialize on its local disk! These partial fetch / partial clone operations can materialize any desired data, whether it be for a few records at the head branch, for all data in a commit, or for the entire historical data. A future release will even include the ability to stream data directly to a Hangar checkout and materialize the data in memory without having to save it to disk at all!

More importantly: since Bookkeeping describes all history, merging can be performed between branches which may contain partial (or even no) actual data. Aka you don’t need data on disk to merge changes into it. It’s an odd concept which will be shown in this tutorial

Cloning a Remote Repo¶

$ hangar clone localhost:50051

[11]:

cloneRepo = Repository('/Users/rick/projects/tensorwerk/hangar/dev/dota-clone/')

/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/context.py:94: UserWarning: No repository exists at /Users/rick/projects/tensorwerk/hangar/dev/dota-clone/.hangar, please use `repo.init()` method
  warnings.warn(msg, UserWarning)

When we perform the initial clone, we will only receive the master branch by default.

[12]:

cloneRepo.clone('rick izzo', 'rick@tensorwerk.com', 'localhost:50051', remove_old=True)

fetching commit data refs:   0%|          | 0/2 [00:00<?, ?it/s]

Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/dota-clone/.hangar

fetching commit data refs: 100%|██████████| 2/2 [00:00<00:00,  5.73it/s]
fetching commit spec: 100%|██████████| 2/2 [00:00<00:00, 273.30it/s]

Hard reset requested with writer_lock: 27634b20-3c5b-4ee0-aac3-b5ce6cb7daf0

[12]:

'master'

[13]:

cloneRepo.log()

* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (master) (origin/master) : more changes here
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

[14]:

cloneRepo.list_branches()

[14]:

['master', 'origin/master']

To get the add-train branch, we fetch it from the remote:

[15]:

cloneRepo.remote.fetch('origin', 'add-train')

fetching commit data refs: 100%|██████████| 1/1 [00:01<00:00,  1.51s/it]
fetching commit spec: 100%|██████████| 1/1 [00:00<00:00, 35.85it/s]

[15]:

'origin/add-train'

[16]:

cloneRepo.list_branches()

[16]:

['master', 'origin/add-train', 'origin/master']

[17]:

cloneRepo.log(branch='origin/add-train')

* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (origin/add-train) : added training data on another branch
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

We will create a local branch from the origin/add-train branch, just like in Git

[18]:

cloneRepo.create_branch('add-train', 'a=957d20e4b921f41975591cc8ee51a4a6912cb919')

[18]:

BranchHead(name='add-train', digest='a=957d20e4b921f41975591cc8ee51a4a6912cb919')

[19]:

cloneRepo.list_branches()

[19]:

['add-train', 'master', 'origin/add-train', 'origin/master']

[20]:

cloneRepo.log(branch='add-train')

* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

Checking out a Parial Clone/Fetch¶

When we fetch/clone, the transfers are very quick, because only the commit records/history were retrieved. The data was not sent, because it may be very large to get the entire data across all of history.

When you check out a commit with partial data, you will be shown a warning indicating that some data is not available locally. An error is raised if you try to access that particular sample data. Otherwise, everything will appear as normal.

[21]:

co = cloneRepo.checkout(branch='master')

 * Checking out BRANCH: master with current HEAD: a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d

/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/constructors.py:45: UserWarning: Column: test contains `reference-only` samples, with actual data residing on a remote server. A `fetch-data` operation is required to access these samples.
  f'operation is required to access these samples.', UserWarning)

[22]:

co

[22]:

Hangar ReaderCheckout
    Writer       : False
    Commit Hash  : a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d
    Num Columns  : 1
    Num Metadata : 1

we can see from the repr that the columns contain partial remote references

[23]:

co.columns

[23]:

Hangar Columns
    Writeable         : False
    Number of Columns : 1
    Column Names / Partial Remote References:
      - test / True

[24]:

co.columns['test']

[24]:

Hangar FlatSampleReader
    Column Name              : test
    Writeable                : False
    Column Type              : ndarray
    Column Layout            : flat
    Schema Type              : fixed_shape
    DType                    : uint8
    Shape                    : (117,)
    Number of Samples        : 10294
    Partial Remote Data Refs : True

[25]:

testKey = next(co.columns['test'].keys())

[26]:

co.columns['test'][testKey]

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-26-cb069e761eb3> in <module>
----> 1 co.columns['test'][testKey]

~/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/layout_flat.py in __getitem__(self, key)
    222         """
    223         spec = self._samples[key]
--> 224         return self._be_fs[spec.backend].read_data(spec)
    225
    226     def get(self, key: KeyType, default=None):

~/projects/tensorwerk/hangar/hangar-py/src/hangar/backends/remote_50.py in read_data(self, hashVal)
    172     def read_data(self, hashVal: REMOTE_50_DataHashSpec) -> None:
    173         raise FileNotFoundError(
--> 174             f'data hash spec: {REMOTE_50_DataHashSpec} does not exist on this machine. '
    175             f'Perform a `data-fetch` operation to retrieve it from the remote server.')
    176

FileNotFoundError: data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.

Fetching Data from a Remote¶

To retrieve the data, we use the fetch_data() method (accessible via the API or fetch-data via the CLI).

The amount / type of data to retrieve is extremly configurable via the following options:

Remotes.fetch_data(remote: str, branch: str = None, commit: str = None, *, column_names: Optional[Sequence[str]] = None, max_num_bytes: int = None, retrieve_all_history: bool = False) → List[str]

Retrieve the data for some commit which exists in a partial state.

Parameters

remote (str) – name of the remote to pull the data from
branch (str, optional) – The name of a branch whose HEAD will be used as the data fetch point. If None, commit argument expected, by default None
commit (str, optional) – Commit hash to retrieve data for, If None, branch argument expected, by default None
column_names (Optional[Sequence[str]]) – Names of the columns which should be retrieved for the particular commits, any columns not named will not have their data fetched from the server. Default behavior is to retrieve all columns
max_num_bytes (Optional[int]) – If you wish to limit the amount of data sent to the local machine, set a max_num_bytes parameter. This will retrieve only this amount of data from the server to be placed on the local disk. Default is to retrieve all data regardless of how large.
retrieve_all_history (Optional[bool]) – if data should be retrieved for all history accessible by the parents of this commit HEAD. by default False

Returns

commit hashes of the data which was returned.

Return type

List[str]

Raises

ValueError – if branch and commit args are set simultaneously.
ValueError – if specified commit does not exist in the repository.
ValueError – if branch name does not exist in the repository.

This will retrieve all the data on the master branch, but not on the add-train branch.

[29]:

cloneRepo.remote.fetch_data('origin', branch='master')

counting objects: 100%|██████████| 1/1 [00:00<00:00, 27.45it/s]
fetching data: 100%|██████████| 10294/10294 [00:01<00:00, 6664.60it/s]

[29]:

['a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d']

[30]:

co = cloneRepo.checkout(branch='master')

 * Checking out BRANCH: master with current HEAD: a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d

[31]:

co

[31]:

Hangar ReaderCheckout
    Writer       : False
    Commit Hash  : a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d
    Num Columns  : 1
    Num Metadata : 1

Unlike before, we see that there is no partial references from the repr

[32]:

co.columns

[32]:

Hangar Columns
    Writeable         : False
    Number of Columns : 1
    Column Names / Partial Remote References:
      - test / False

[33]:

co.columns['test']

[33]:

Hangar FlatSampleReader
    Column Name              : test
    Writeable                : False
    Column Type              : ndarray
    Column Layout            : flat
    Schema Type              : fixed_shape
    DType                    : uint8
    Shape                    : (117,)
    Number of Samples        : 10294
    Partial Remote Data Refs : False

*When we access the data this time, it is available and retrieved as requested!*

[34]:

co['test', testKey]

[34]:

array([255, 223,   8,   2,   0, 255,   0,   0,   0,   0,   0,   0,   1,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,
         0,   0,   0,   0,   0, 255,   0,   0,   0,   0,   1,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0],
      dtype=uint8)

[35]:

co.close()

Working with mixed local / remote checkout Data¶

If we were to checkout the add-train branch now, we would see that there is no arrayset "train" data, but there will be data common to the ancestor that master and add-train share.

[36]:

cloneRepo.log('add-train')

* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

In this case, the common ancestor is commit: 9b93b393e8852a1fa57f0170f54b30c2c0c7d90f

To show that there is no data on the add-train branch:

[37]:

co = cloneRepo.checkout(branch='add-train')

 * Checking out BRANCH: add-train with current HEAD: a=957d20e4b921f41975591cc8ee51a4a6912cb919

/Users/rick/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/constructors.py:45: UserWarning: Column: train contains `reference-only` samples, with actual data residing on a remote server. A `fetch-data` operation is required to access these samples.
  f'operation is required to access these samples.', UserWarning)

[38]:

co

[38]:

Hangar ReaderCheckout
    Writer       : False
    Commit Hash  : a=957d20e4b921f41975591cc8ee51a4a6912cb919
    Num Columns  : 2
    Num Metadata : 0

[39]:

co.columns

[39]:

Hangar Columns
    Writeable         : False
    Number of Columns : 2
    Column Names / Partial Remote References:
      - test / False
      - train / True

[40]:

co['test', testKey]

[40]:

array([255, 223,   8,   2,   0, 255,   0,   0,   0,   0,   0,   0,   1,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   1,   0,   0,   0, 255,   0,   0,
         0,   0,   0,   0,   0, 255,   0,   0,   0,   0,   1,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0],
      dtype=uint8)

[41]:

trainKey = next(co.columns['train'].keys())

[42]:

co.columns['train'][trainKey]

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-42-549d3e1dc7a1> in <module>
----> 1 co.columns['train'][trainKey]

~/projects/tensorwerk/hangar/hangar-py/src/hangar/columns/layout_flat.py in __getitem__(self, key)
    222         """
    223         spec = self._samples[key]
--> 224         return self._be_fs[spec.backend].read_data(spec)
    225
    226     def get(self, key: KeyType, default=None):

~/projects/tensorwerk/hangar/hangar-py/src/hangar/backends/remote_50.py in read_data(self, hashVal)
    172     def read_data(self, hashVal: REMOTE_50_DataHashSpec) -> None:
    173         raise FileNotFoundError(
--> 174             f'data hash spec: {REMOTE_50_DataHashSpec} does not exist on this machine. '
    175             f'Perform a `data-fetch` operation to retrieve it from the remote server.')
    176

FileNotFoundError: data hash spec: <class 'hangar.backends.specs.REMOTE_50_DataHashSpec'> does not exist on this machine. Perform a `data-fetch` operation to retrieve it from the remote server.

[43]:

co.close()

Merging Branches with Partial Data¶

Even though we don’t have the actual data references in the add-train branch, it is still possible to merge the two branches!

This is possible because Hangar doesn’t use the data contents in its internal model of checkouts / commits, but instead thinks of a checkouts as a sequence of columns / metadata / keys & their associated data hashes (which are very small text records; ie. “bookkeeping”). To show this in action, lets merge the two branches master (containing all data locally) and add-train (containing partial remote references for the train arrayset) together and push it to the Remote!

[44]:

cloneRepo.log('master')

* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (master) (origin/master) : more changes here
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

[45]:

cloneRepo.log('add-train')

* a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

Perform the Merge

[46]:

cloneRepo.merge('merge commit here', 'master', 'add-train')

Selected 3-Way Merge Strategy

[46]:

'a=ace3dacbd94f475664ee136dcf05430a2895aca3'

IT WORKED!

[47]:

cloneRepo.log()

*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (master) : merge commit here
|\
* | a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (origin/master) : more changes here
| * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
|/
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

We can check the summary of the master commit to check that the contents are what we expect (containing both test and train columns)

[48]:

cloneRepo.summary()

Summary of Contents Contained in Data Repository

==================
| Repository Info
|-----------------
|  Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/dota-clone
|  Disk Usage: 42.03 MB

===================
| Commit Details
-------------------
|  Commit: a=ace3dacbd94f475664ee136dcf05430a2895aca3
|  Created: Tue Feb 25 19:18:30 2020
|  By: rick izzo
|  Email: rick@tensorwerk.com
|  Message: merge commit here

==================
| DataSets
|-----------------
|  Number of Named Columns: 2
|
|  * Column Name: ColumnSchemaKey(column="test", layout="flat")
|    Num Data Pieces: 10294
|    Details:
|    - column_layout: flat
|    - column_type: ndarray
|    - schema_type: fixed_shape
|    - shape: (117,)
|    - dtype: uint8
|    - backend: 10
|    - backend_options: {}
|
|  * Column Name: ColumnSchemaKey(column="train", layout="flat")
|    Num Data Pieces: 92650
|    Details:
|    - column_layout: flat
|    - column_type: ndarray
|    - schema_type: fixed_shape
|    - shape: (117,)
|    - dtype: uint16
|    - backend: 10
|    - backend_options: {}

==================
| Metadata:
|-----------------
|  Number of Keys: 1

Pushing the Merge back to the Remote¶

To push this merge back to our original copy of the Repository (repo), we just push the master branch back to the remote via the API or CLI.

[49]:

cloneRepo.remote.push('origin', 'master')

counting objects: 100%|██████████| 1/1 [00:00<00:00,  1.02it/s]
pushing schemas: 0it [00:00, ?it/s]
pushing data: 0it [00:00, ?it/s]
pushing metadata: 0it [00:00, ?it/s]
pushing commit refs: 100%|██████████| 1/1 [00:00<00:00, 34.26it/s]

[49]:

'master'

Looking at our current state of our other instance of the repo repo we see that the merge changes aren’t yet propogated to it (since it hasn’t fetched from the remote yet).

[50]:

repo.log()

* a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (master) (origin/master) : more changes here
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

To fetch the merged changes, just fetch() the branch as normal. Like all fetches, this will be a fast operation, as it will be a partial fetch operation, not actually transfering the data.

[51]:

repo.remote.fetch('origin', 'master')

fetching commit data refs: 100%|██████████| 1/1 [00:01<00:00,  1.33s/it]
fetching commit spec: 100%|██████████| 1/1 [00:00<00:00, 37.61it/s]

[51]:

'origin/master'

[52]:

repo.log('origin/master')

*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (origin/master) : merge commit here
|\
* | a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d (master) : more changes here
| * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
|/
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

To bring our master branch up to date is a simple fast-forward merge.

[53]:

repo.merge('ff-merge', 'master', 'origin/master')

Selected Fast-Forward Merge Strategy

[53]:

'a=ace3dacbd94f475664ee136dcf05430a2895aca3'

[54]:

repo.log()

*   a=ace3dacbd94f475664ee136dcf05430a2895aca3 (master) (origin/master) : merge commit here
|\
* | a=bb1b108ef17b7d7667a2ff396f257d82bad11e1d : more changes here
| * a=957d20e4b921f41975591cc8ee51a4a6912cb919 (add-train) (origin/add-train) : added training data on another branch
|/
* a=b98f6b65c0036489e53ddaf2b30bf797ddc40da0 : initial commit on master with test data

Everything is as it should be! Now, try it out for yourself!

[55]:

repo.summary()

Summary of Contents Contained in Data Repository

==================
| Repository Info
|-----------------
|  Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/intro
|  Disk Usage: 77.43 MB

===================
| Commit Details
-------------------
|  Commit: a=ace3dacbd94f475664ee136dcf05430a2895aca3
|  Created: Tue Feb 25 19:18:30 2020
|  By: rick izzo
|  Email: rick@tensorwerk.com
|  Message: merge commit here

==================
| DataSets
|-----------------
|  Number of Named Columns: 2
|
|  * Column Name: ColumnSchemaKey(column="test", layout="flat")
|    Num Data Pieces: 10294
|    Details:
|    - column_layout: flat
|    - column_type: ndarray
|    - schema_type: fixed_shape
|    - shape: (117,)
|    - dtype: uint8
|    - backend: 10
|    - backend_options: {}
|
|  * Column Name: ColumnSchemaKey(column="train", layout="flat")
|    Num Data Pieces: 92650
|    Details:
|    - column_layout: flat
|    - column_type: ndarray
|    - schema_type: fixed_shape
|    - shape: (117,)
|    - dtype: uint16
|    - backend: 10
|    - backend_options: {}

==================
| Metadata:
|-----------------
|  Number of Keys: 1

[ ]: