{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Creating A Repository And Working With Data\n", "\n", "This tutorial will review the first steps of working with a hangar repository.\n", "\n", "To fit with the beginner's theme, we will use the MNIST dataset. Later examples will show off how to work with much more complex data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from hangar import Repository\n", "\n", "import numpy as np\n", "import pickle\n", "import gzip\n", "import matplotlib.pyplot as plt\n", "\n", "from tqdm import tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating & Interacting with a Hangar Repository\n", "\n", "Hangar is designed to “just make sense” in every operation you have to perform.\n", "As such, there is a single interface which all interaction begins with: the\n", " designed to “just make sense” in every operation you have to perform.\n", "As such, there is a single interface which all interaction begins with: the\n", "[Repository](api.rst#hangar.repository.Repository) object.\n", "\n", "Whether a hangar repository exists at the path you specify or not, just tell\n", "hangar where it should live!\n", "\n", "#### Intitializing a repository\n", "\n", "The first time you want to work with a new repository, the repository\n", "[init()](api.rst#hangar.repository.Repository.init) method\n", "must be called. This is where you provide Hangar with your name and email\n", "address (to be used in the commit log), as well as implicitly confirming that\n", "you do want to create the underlying data files hangar uses on disk." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hangar Repo initialized at: /Users/rick/projects/tensorwerk/hangar/dev/mnist/.hangar\n" ] }, { "data": { "text/plain": [ "'/Users/rick/projects/tensorwerk/hangar/dev/mnist/.hangar'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "repo = Repository(path='/Users/rick/projects/tensorwerk/hangar/dev/mnist/')\n", "\n", "# First time a repository is accessed only!\n", "# Note: if you feed a path to the `Repository` which does not contain a pre-initialized hangar repo,\n", "# when the Repository object is initialized it will let you know that you need to run `init()`\n", "\n", "repo.init(user_name='Rick Izzo', user_email='rick@tensorwerk.com', remove_old=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Checking out the repo for writing\n", "\n", "A repository can be checked out in two modes:\n", "\n", "1. [write-enabled](api.rst#hangar.checkout.WriterCheckout): applies all operations to the staging area’s current\n", " state. Only one write-enabled checkout can be active at a different time,\n", " must be closed upon last use, or manual intervention will be needed to remove\n", " the writer lock.\n", "\n", "2. [read-only](api.rst#read-only-checkout): checkout a commit or branch to view repository state as it\n", " existed at that point in time.\n", "\n", "#### Lots of useful information is in the iPython `__repr__`\n", "\n", "If you're ever in doubt about what the state of the object your working\n", "on is, just call its reps, and the most relevant information will be\n", "sent to your screen!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar WriterCheckout \n", " Writer : True \n", " Base Branch : master \n", " Num Columns : 0\n" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co = repo.checkout(write=True)\n", "co" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A checkout allows access to [columns](api.rst#hangar.columns.column.Columns)\n", "\n", "The [columns](api.rst#hangar.checkout.WriterCheckout.columns) attributes\n", "of a checkout provide the interface to working with all of the data on disk!" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar Columns \n", " Writeable : True \n", " Number of Columns : 0 \n", " Column Names / Partial Remote References: \n", " - " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Before data can be added to a repository, a column must be initialized.\n", "\n", "We're going to first load up a the MNIST pickled dataset so it can be added to\n", "the repo!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Load the dataset\n", "with gzip.open('/Users/rick/projects/tensorwerk/hangar/dev/data/mnist.pkl.gz', 'rb') as f:\n", " train_set, valid_set, test_set = pickle.load(f, encoding='bytes')\n", "\n", "def rescale(array):\n", " array = array * 256\n", " rounded = np.round(array)\n", " return rounded.astype(np.uint8())\n", "\n", "sample_trimg = rescale(train_set[0][0])\n", "sample_trlabel = np.array([train_set[1][0]])\n", "trimgs = rescale(train_set[0])\n", "trlabels = train_set[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Before data can be added to a repository, a column must be initialized.\n", "\n", "An \"Column\" is a named grouping of data samples where each sample shares a\n", "number of similar attributes and array properties.\n", "\n", "See the docstrings below or in [add_ndarray_column()](api.rst#hangar.checkout.WriterCheckout.add_ndarray_column)\n", "\n", ".. include:: ./noindexapi/apiinit.rst" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "col = co.add_ndarray_column(name='mnist_training_images', prototype=trimgs[0])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar FlatSampleWriter \n", " Column Name : mnist_training_images \n", " Writeable : True \n", " Column Type : ndarray \n", " Column Layout : flat \n", " Schema Type : fixed_shape \n", " DType : uint8 \n", " Shape : (784,) \n", " Number of Samples : 0 \n", " Partial Remote Data Refs : False\n" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interaction\n", "\n", "#### Through columns attribute\n", "\n", "When a column is initialized, a column accessor object will be returned,\n", "however, depending on your use case, this may or may not be the most convenient\n", "way to access a arrayset.\n", "\n", "In general, we have implemented a full `dict` mapping interface on top of all\n", "objects. To access the `'mnist_training_images'` arrayset you can just use a\n", "dict style access like the following (note: if operating in iPython/Jupyter, the\n", "arrayset keys will autocomplete for you).\n", "\n", "The column objects returned here contain many useful instrospecion methods which\n", "we will review over the rest of the tutorial." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar FlatSampleWriter \n", " Column Name : mnist_training_images \n", " Writeable : True \n", " Column Type : ndarray \n", " Column Layout : flat \n", " Schema Type : fixed_shape \n", " DType : uint8 \n", " Shape : (784,) \n", " Number of Samples : 0 \n", " Partial Remote Data Refs : False\n" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co.columns['mnist_training_images']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar FlatSampleWriter \n", " Column Name : mnist_training_images \n", " Writeable : True \n", " Column Type : ndarray \n", " Column Layout : flat \n", " Schema Type : fixed_shape \n", " DType : uint8 \n", " Shape : (784,) \n", " Number of Samples : 0 \n", " Partial Remote Data Refs : False\n" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_aset = co.columns['mnist_training_images']\n", "\n", "# OR an equivalent way using the `.get()` method\n", "\n", "train_aset = co.columns.get('mnist_training_images')\n", "train_aset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Through the checkout object (arrayset and sample access)\n", "\n", "In addition to the standard `co.columns` access methods, we have implemented a convenience mapping to [columns](api.rst#hangar.columns.column.Columns) and [flat samples](api.rst#hangar.columns.layout_flat.FlatSampleWriter) or [nested samples](api.rst#hangar.columns.layout_nested.NestedSampleWriter) / [nested subsamples](api.rst#hangar.columns.layout_nested.FlatSubsampleWriter) (ie. data) for both reading and writing from the [checkout](api.rst#hangar.checkout.WriterCheckout) object itself.\n", "\n", "To get the same arrayset object from the checkout, simply use:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Hangar FlatSampleWriter \n", " Column Name : mnist_training_images \n", " Writeable : True \n", " Column Type : ndarray \n", " Column Layout : flat \n", " Schema Type : fixed_shape \n", " DType : uint8 \n", " Shape : (784,) \n", " Number of Samples : 0 \n", " Partial Remote Data Refs : False\n" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_asets = co['mnist_training_images']\n", "train_asets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though that works as expected, most use cases will take advantage of adding and reading data from multiple columns / samples at a time. This is shown in the next section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding Data\n", "\n", "To add data to a named arrayset, we can use dict-style setting\n", "(refer to the `__setitem__`, `__getitem__`, and `__delitem__` methods),\n", "or the `update()` method. Sample keys can be either `str` or `int` type." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "train_aset['0'] = trimgs[0]\n", "\n", "data = {\n", " '1': trimgs[1],\n", " '2': trimgs[2],\n", "}\n", "train_aset.update(data)\n", "\n", "train_aset[51] = trimgs[51]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the checkout method" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "co['mnist_training_images', 60] = trimgs[60]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How many samples are in the arrayset?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(train_aset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Containment Testing" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'hi' in train_aset" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'0' in train_aset" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "60 in train_aset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dictionary Style Retrieval for known keys" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAOYElEQVR4nO3dbYxc5XnG8euKbUwxJvHGseMQFxzjFAg0Jl0ZkBFQoVCCIgGKCLGiiFBapwlOQutKUFoVWtHKrRIiSimSKS6m4iWQgPAHmsSyECRqcFmoAROHN+MS4+0aswIDIfZ6fffDjqsFdp5dZs68eO//T1rNzLnnzLk1cPmcmeeceRwRAjD5faDTDQBoD8IOJEHYgSQIO5AEYQeSmNrOjR3i6XGoZrRzk0Aqv9Fb2ht7PFatqbDbPkfS9ZKmSPrXiFhVev6hmqGTfVYzmwRQsDE21K01fBhve4qkGyV9TtLxkpbZPr7R1wPQWs18Zl8i6fmI2BoReyXdJem8atoCULVmwn6kpF+Nery9tuwdbC+33We7b0h7mtgcgGY0E/axvgR4z7m3EbE6InojoneapjexOQDNaCbs2yXNH/X445J2NNcOgFZpJuyPSlpke4HtQyR9SdK6atoCULWGh94iYp/tFZJ+rJGhtzUR8XRlnQGoVFPj7BHxgKQHKuoFQAtxuiyQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJNDWLK7qfp5b/E0/5yOyWbv+ZPz+6bm34sP3FdY9auLNYP+wbLtb/97pD6tYe7/1+cd1dw28V6yffs7JYP+bPHinWO6GpsNveJukNScOS9kVEbxVNAaheFXv234+IXRW8DoAW4jM7kESzYQ9JP7H9mO3lYz3B9nLbfbb7hrSnyc0BaFSzh/FLI2KH7TmS1tv+ZUQ8PPoJEbFa0mpJOsI90eT2ADSoqT17ROyo3e6UdJ+kJVU0BaB6DYfd9gzbMw/cl3S2pM1VNQagWs0cxs+VdJ/tA69zR0T8qJKuJpkpxy0q1mP6tGJ9xxkfKtbfPqX+mHDPB8vjxT/9dHm8uZP+49czi/V/+OdzivWNJ95Rt/bi0NvFdVcNfLZY/9hPD75PpA2HPSK2Svp0hb0AaCGG3oAkCDuQBGEHkiDsQBKEHUiCS1wrMHzmZ4r16269sVj/5LT6l2JOZkMxXKz/9Q1fLdanvlUe/jr1nhV1azNf3ldcd/qu8tDcYX0bi/VuxJ4dSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnL0C05/ZUaw/9pv5xfonpw1U2U6lVvafUqxvfbP8U9S3LvxB3drr+8vj5HP/6T+L9VY6+C5gHR97diAJwg4kQdiBJAg7kARhB5Ig7EAShB1IwhHtG1E8wj1xss9q2/a6xeAlpxbru88p/9zzlCcPL9af+MYN77unA67d9bvF+qNnlMfRh197vViPU+v/APG2bxVX1YJlT5SfgPfYGBu0OwbHnMuaPTuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJME4exeYMvvDxfrwq4PF+ot31B8rf/r0NcV1l/z9N4v1OTd27ppyvH9NjbPbXmN7p+3No5b12F5v+7na7awqGwZQvYkcxt8q6d2z3l8paUNELJK0ofYYQBcbN+wR8bCkdx9Hnidpbe3+WknnV9wXgIo1+gXd3Ijol6Ta7Zx6T7S93Haf7b4h7WlwcwCa1fJv4yNidUT0RkTvNE1v9eYA1NFo2Adsz5Ok2u3O6loC0AqNhn2dpItr9y+WdH817QBolXF/N972nZLOlDTb9nZJV0taJelu25dKeknSha1scrIb3vVqU+sP7W58fvdPffkXxforN00pv8D+8hzr6B7jhj0iltUpcXYMcBDhdFkgCcIOJEHYgSQIO5AEYQeSYMrmSeC4K56tW7vkxPKgyb8dtaFYP+PCy4r1md9/pFhH92DPDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJMM4+CZSmTX7168cV131p3dvF+pXX3las/8UXLyjW478/WLc2/+9+XlxXbfyZ8wzYswNJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEkzZnNzgH55arN9+9XeK9QVTD21425+6bUWxvujm/mJ939ZtDW97smpqymYAkwNhB5Ig7EAShB1IgrADSRB2IAnCDiTBODuKYuniYv2IVduL9Ts/8eOGt33sg39UrP/O39S/jl+Shp/b2vC2D1ZNjbPbXmN7p+3No5ZdY/tl25tqf+dW2TCA6k3kMP5WSeeMsfx7EbG49vdAtW0BqNq4YY+IhyUNtqEXAC3UzBd0K2w/WTvMn1XvSbaX2+6z3TekPU1sDkAzGg37TZIWSlosqV/Sd+s9MSJWR0RvRPRO0/QGNwegWQ2FPSIGImI4IvZLulnSkmrbAlC1hsJue96ohxdI2lzvuQC6w7jj7LbvlHSmpNmSBiRdXXu8WFJI2ibpaxFRvvhYjLNPRlPmzinWd1x0TN3axiuuL677gXH2RV9+8exi/fXTXi3WJ6PSOPu4k0RExLIxFt/SdFcA2orTZYEkCDuQBGEHkiDsQBKEHUiCS1zRMXdvL0/ZfJgPKdZ/HXuL9c9/8/L6r33fxuK6Byt+ShoAYQeyIOxAEoQdSIKwA0kQdiAJwg4kMe5Vb8ht/2nln5J+4cLylM0nLN5WtzbeOPp4bhg8qVg/7P6+pl5/smHPDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJMM4+ybn3hGL92W+Vx7pvXrq2WD/90PI15c3YE0PF+iODC8ovsH/cXzdPhT07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBOPtBYOqCo4r1Fy75WN3aNRfdVVz3C4fvaqinKlw10FusP3T9KcX6rLXl353HO427Z7c93/aDtrfYftr2t2vLe2yvt/1c7XZW69sF0KiJHMbvk7QyIo6TdIqky2wfL+lKSRsiYpGkDbXHALrUuGGPiP6IeLx2/w1JWyQdKek8SQfOpVwr6fxWNQmgee/rCzrbR0s6SdJGSXMjol8a+QdB0pw66yy33We7b0h7musWQMMmHHbbh0v6oaTLI2L3RNeLiNUR0RsRvdM0vZEeAVRgQmG3PU0jQb89Iu6tLR6wPa9WnydpZ2taBFCFcYfebFvSLZK2RMR1o0rrJF0saVXt9v6WdDgJTD36t4v1139vXrF+0d/+qFj/kw/dW6y30sr+8vDYz/+l/vBaz63/VVx31n6G1qo0kXH2pZK+Iukp25tqy67SSMjvtn2ppJckXdiaFgFUYdywR8TPJI05ubuks6ptB0CrcLoskARhB5Ig7EAShB1IgrADSXCJ6wRNnffRurXBNTOK6359wUPF+rKZAw31VIUVL59WrD9+U3nK5tk/2Fys97zBWHm3YM8OJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0mkGWff+wflny3e+6eDxfpVxzxQt3b2b73VUE9VGRh+u27t9HUri+se+1e/LNZ7XiuPk+8vVtFN2LMDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBJpxtm3nV/+d+3ZE+9p2bZvfG1hsX79Q2cX6x6u9+O+I4699sW6tUUDG4vrDhermEzYswNJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEo6I8hPs+ZJuk/RRjVy+vDoirrd9jaQ/lvRK7alXRUT9i74lHeGeONlM/Aq0ysbYoN0xOOaJGRM5qWafpJUR8bjtmZIes72+VvteRHynqkYBtM5E5mfvl9Rfu/+G7S2Sjmx1YwCq9b4+s9s+WtJJkg6cg7nC9pO219ieVWed5bb7bPcNaU9TzQJo3ITDbvtwST+UdHlE7JZ0k6SFkhZrZM//3bHWi4jVEdEbEb3TNL2ClgE0YkJhtz1NI0G/PSLulaSIGIiI4YjYL+lmSUta1yaAZo0bdtuWdIukLRFx3ajl80Y97QJJ5ek8AXTURL6NXyrpK5Kesr2ptuwqSctsL5YUkrZJ+lpLOgRQiYl8G/8zSWON2xXH1AF0F86gA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJDHuT0lXujH7FUn/M2rRbEm72tbA+9OtvXVrXxK9NarK3o6KiI+MVWhr2N+zcbsvIno71kBBt/bWrX1J9NaodvXGYTyQBGEHkuh02Fd3ePsl3dpbt/Yl0Vuj2tJbRz+zA2ifTu/ZAbQJYQeS6EjYbZ9j+xnbz9u+shM91GN7m+2nbG+y3dfhXtbY3ml786hlPbbX236udjvmHHsd6u0a2y/X3rtNts/tUG/zbT9oe4vtp21/u7a8o+9doa+2vG9t/8xue4qkZyV9VtJ2SY9KWhYRv2hrI3XY3iapNyI6fgKG7dMlvSnptog4obbsHyUNRsSq2j+UsyLiii7p7RpJb3Z6Gu/abEXzRk8zLul8SV9VB9+7Ql9fVBvet07s2ZdIej4itkbEXkl3STqvA310vYh4WNLguxafJ2lt7f5ajfzP0nZ1eusKEdEfEY/X7r8h6cA04x197wp9tUUnwn6kpF+Nerxd3TXfe0j6ie3HbC/vdDNjmBsR/dLI/zyS5nS4n3cbdxrvdnrXNONd8941Mv15szoR9rGmkuqm8b+lEfEZSZ+TdFntcBUTM6FpvNtljGnGu0Kj0583qxNh3y5p/qjHH5e0owN9jCkidtRud0q6T903FfXAgRl0a7c7O9zP/+umabzHmmZcXfDedXL6806E/VFJi2wvsH2IpC9JWteBPt7D9ozaFyeyPUPS2eq+qajXSbq4dv9iSfd3sJd36JZpvOtNM64Ov3cdn/48Itr+J+lcjXwj/4Kkv+xED3X6+oSkJ2p/T3e6N0l3auSwbkgjR0SXSvqwpA2Snqvd9nRRb/8u6SlJT2okWPM61NtpGvlo+KSkTbW/czv93hX6asv7xumyQBKcQQckQdiBJAg7kARhB5Ig7EAShB1IgrADSfwfs4RxaLJFjqkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "out1 = train_aset['0']\n", "# OR\n", "out2 = co['mnist_training_images', '0']\n", "\n", "print(np.allclose(out1, out2))\n", "\n", "plt.imshow(out1.reshape(28, 28))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dict style iteration supported out of the box" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "51\n", "60\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlAAAACBCAYAAAAPH4TmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAZWUlEQVR4nO3deZgV1ZkG8Pf0AnSzN9BsIg1Cs4kBaRSQJQYRNa4ji8QIIThmNC4oKkicSaIYMZNHRgVUVECNwd2IjqLCdAwisquADYIsgiCbIMja3ffMH7Tn1HfTRd+6a93q9/c8Pv2d/ureOvbXdftQdeqU0lqDiIiIiCKXkeoOEBEREaUbDqCIiIiIPOIAioiIiMgjDqCIiIiIPOIAioiIiMgjDqCIiIiIPIppAKWUukgptV4ptVEpNSFenaLUYD2Dg7UMFtYzOFjL4FDRrgOllMoE8CWAQQC2A1gGYITW+ov4dY+ShfUMDtYyWFjP4GAtgyUrhteeA2Cj1noTACilXgRwBQDXX4Qaqqauhdox7JJicQyHcUIfVy5pT/VkLVPvEPbv1Vo3qSTFYzPN8NgMFh6bwXGqYzOWAVRLANsc7e0Azg3fSCl1A4AbAKAWcnGuGhjDLikWS/SCU6WrrCdr6S/z9atbXVI8NtMMj81g4bEZHKc6NmOZA1XZiOxfrgdqrWdorYu01kXZqBnD7ijBqqwna5k2eGwGC4/N4OCxGSCxDKC2A2jlaJ8GYEds3aEUYj2Dg7UMFtYzOFjLAIllALUMQHulVBulVA0A1wCYG59uUQqwnsHBWgYL6xkcrGWARD0HSmtdppS6GcB7ADIBzNRar41bzyipWM/gYC2DhfUMDtYyWGKZRA6t9TsA3olTXyjFWM/gYC2DhfUMDtYyOLgSOREREZFHHEARERERecQBFBEREZFHHEARERERecQBFBEREZFHHEAREREReRTTMgZEQVH2sx4m3nnTcZH7rPezJv7J4lEi12JaDRNnFq9MUO+IiMhveAaKiIiIyCMOoIiIiIg84iW8Sqgs+2PJbNI4otesv7NAtMtzQyZufcZukcu9yT6Q+9uHa4jcyqKXTLy3/LDInfvKOBO3u+OTiPpFlQsN6C7aj86cauJ22fKwCDniVb1nidz6onIT31XQK34dpJQ7PORc0X7oz4+b+P5hI0VOL1+TlD4FVahvNxPv6Jcrcp/fPDV884hkKnt+oOTEEZEbO+w3trF0dVTvXx2UXtBDtLPnr0hRT4B9/97bxE3nbRO5sm3bk90dADwDRUREROQZB1BEREREHnEARURERORRoOdAZXZqb2JdM1vkdgxoYOKjveRco7z6tr3wJy8hVu8eqSvaD029yMRLuv5N5DaXHjXx5F2DRK7FQh1zX6qz0guLTHz39OdFrjDbzkULiVlPwKbSUhN/H6opct0dzeMX9xS5nGI7tyJ07Jj3DqeBo1ecY+NGmSKXN3NxsrsTV7uL5L8v799yWYp6EgyqexfR3jS0noknDbGfg1fX3i+2CyG6z72QtvMT22XL43bwzEUmfvuOn4lc9vvLo9pfUBy6xs7lnP7gIyK36Gg7E7/VvYXI6eNy+ZdY7bmxt2gX/+5hEw++6jqRq39JXHcdMZ6BIiIiIvKIAygiIiIijwJ1Ca/8p2eL9sOzp5nYeYkmGUodp4//67FfiVzWYXtKuvcrN4tc3W/KTFxz71GRy12+JI49DKbMevVE+3D/jia+fYq9THB+zg9hr3T/t8Ts/X1MvGC6PK286A+PmviDp58Quc5/tbVtOz69L2e52dHf/txyzzggkzOT3Jl4yLCXIfXp8vgbmL/OxAtUH1DVMhs3MnGnp0tE7q1my1xepVy+Hz+3NNxg4if7Dxa5gvcTvntfOThCLr8y9U/2M61rDTn1pWuNrSZ+W7UUuXhPMMkMuyJYqu3UilmdnxO5YW9cb+IWV30R55644xkoIiIiIo84gCIiIiLyiAMoIiIiIo8CNQeq5vodor3iWCsTF2bvivn9x+2U14o3/WAf8zL7jFdF7vuQvSLc9NGPo9ofFy3wbvtz8rr8sp7TXLaM3H35dq7GvDpy7svoLRea+NmC+SJXr/O+mPftd3+89BUTP1Ry4Sm2TA+ZZ7Q28boBchJXt6W/NHGLZXz8x48ym+abeOv0JiL3wtn2Z9ilRvz/3OwP2eVBVp+Q8x/71zoR9/0FRWaD+ibuf7d8LFg3R53KUC5ynRbYR+C0P/FZgnp3UvgyKPPHn2bioXXkZ+u4Tvaz9+VGZ4pc+b7vEtC7k3gGioiIiMgjDqCIiIiIPArUJbyynd+K9mMPDTXxAxfJ1cYzP69j4s9uesz1PSftPcvEGy+QTwkvP7DTxL/ofZPIbbnVxm2Q2FOd1V3Zz+wTw+d0k09uz0Dly1eM3jpQtJfP72Ti1WPkexQfrWXi/OXy1vaN++0yCdl/Kpb7Tvzd2CmXrcqq3iiNZD19xDV39Kt6rrnqbNt1dnXqT3uFf5Ym9k/MSwc7m3jGrJ+L3Mrb3T/Xq7stT9vpLW/lF7tu1+3jX4t2+5ErE9anWFxX1/7tnzJyiMg1mxLdFJpI8AwUERERkUccQBERERF5VOUASik1Uym1Wym1xvG9PKXUB0qpDRVfGya2mxQvrGegFLCWwcFjM1B4bFYDkVygng1gKgDn2ukTACzQWk9WSk2oaI+Pf/dikzfL3gbZ5K1GIue8tbHLmfI679r+9tbbuTMGmDj/gPu1VLVYznNq498nd8xGmtbzR6EB3UX70Zl2zlK7bPkrHYJd/v/ydVeZOHOInBPX4Od20YjOz8vH6xRO22bijG2rRK7hQhuXPiBv+X3tLPt79OvzbxW5zOK4zCXYC+AXSGItQ327iXa/Wh/F6619oaC2+9ITreaXu+biZDbS4NjMalsg2r2Gxj7Hs8Nrdg5pnS2ZIldr4B4TL+r2osg986Rj3lNyn9ZVlaQfm6dSfr58zNnzPZyPnZKfmatPlJq45XT5KBeSqjwDpbX+J4DwhRSuAPBsRfwsgCvj3C9KENYzUH4AaxkYPDYDhcdmNRDtHKimWuudAFDxNd9tQ6XUDUqp5Uqp5aU47rYZpVZE9WQt0wKPzWDhsRkcPDYDJuHLGGitZwCYAQD1VF7KFtcu3+t+ar70oPu53y7X2ic773lcnlpGKOGn9H0llbVUPbqYeO8dcimBwmxbvxVhnzX/94O9zXnfi/bW3Ub75TXW+n+1q/HWhxTtjfpNM2vafY+Vt8ef4s7hpImmnlsvzRHt/Mxcly3TQ1bB6aI9JG+u67Y5m/eb2G9HfjKPzXPe+FK0JzZ2X5W9VNuf1Ocn5OfntX//rYk7/H6tiUOHDontsuY0M/FlLUaKXLPPlpo4o6GcUtR/4DAT//Osl1376EfxqKdzeZfHZz0qcmdk5YRvbox5cKyJGxf7Zy7KpLWXmHjouc+7bvfIzU+I9oNTznLZMnbRnoHapZRqDgAVX3fHr0uUAqxncLCWwcJ6BgdrGTDRDqDmAhhVEY8C8GZ8ukMpwnoGB2sZLKxncLCWARPJMgZzACwG0EEptV0pNQbAZACDlFIbAAyqaFMaYD0DpQ1Yy8DgsRkoPDargSrnQGmtR7ikBrp8P+10Gi+v6Y/uav/XZrVeYOIBQ38rtqv7knyKdTpIl3pm5Mq5NWV/PmjiTzq+LnKby+xT1++YOE7kGi782sT5te0Z82TPYTmn+VbR3hKft92stS6q5PsJq2VWu0OuuWPrGiRqtwmz7X9qi/Z5Ne2yF88cPE1ufOAgEsnPx+aJwfbXbESDR8KyteDGOe/p9217iFw72M/PENyJR3SFPa5LaCLnQLWsE34TXFIl/dgMt+0COzf0VHOe7tvbVbTz55ilq05Zl2RrNcp+lk/66EyRu7ex7XMtVYpk4UrkRERERB5xAEVERETkUcKXMUgH5Qe+F+19N3Yy8ddz7S3zEyY9J7a7Z5hd2Vqvkje/t3rAcfunTtnqDWnr6IAuov1ex+mu215/2+0mrvt3eVk12iUIyLv85f454Z/Z2D55YNfVhSKXN2y7iT8sfCbslfZy1OPT5DqH+bsS91R337vTrgbeJsv9kl0451IFzkt2ibDtksaivbLtnITuz+9mDH/SNbfihJ3E8MGf+olc3UP+nJriXN7iYJn772D9DLmWTWaXDiYuX7s+rn3iGSgiIiIijziAIiIiIvKIl/AqEfqsxMTX/PEuE7/w+7+I7T7t5bik10u+R5fa9oG07Z/aKXJlm7bE3smAO+v+T0U7wzHWH71V3siS8/el8INsJVdaLnVcuc1Uwb+MezRP/nustst24UL95MOhdaYy8bYLaorciRb2DpuMGvYyxPv9HhPbZdu3wLfl8j3+c5O99P5dSF52zM2w79l0ibzjMPgVjJ3zocBA2Arjye5MNffTHPsTLw/75f3dpn8zsZ/uJs9q09rEx1s3ct2uZc1/uOYKs+Xlvatf/dDEL3dqFr55THgGioiIiMgjDqCIiIiIPOIAioiIiMgjzoGqQt5MuxzBzevlSuT1Jtvboee0fU/k1o6cauKOra4XuQ5/tOPW8g2b4tLPIDhwXW8T39tUzjcLwa6qu+L9ziJ3Ovxxe7nzyfMAEHLM+phXIvvcHiuT0qd4O34sW7RDjplBsyZOEbm5N3eL6D3HN3patDNgJzAd1SdEbke5/RlP3fNTE18wf6zYrsEq+/vS/P1dIqe22uN2T4lcoblppp1jpZetrqrrgbX5wd6iXdJ5mqOlRO4Tx13j+UtlznnreaK1+Iv8HDjr3F+ZeE2fZ91fqNxTQXVX63km/s3jo0Wu05R9Ub3nvl75Ji4dEt0q8MPb2M/FO/Pis+RAnxz7N/ZlcA4UERERUUpxAEVERETkES/heaAWyVvrjwyxpyx7Dr9F5JaMtw/cXHe+vERxbcGFJv6+bzx7mN7KHFdT6mfUELnFx+yt6G2f2yFfl9BeSeEPOV73F+dDLVeI3LWbLjZxx9s2i1yyH2YcL+1+uUq0uzxol+to1fObqN6zeLdcKXzPu/Yhvo3WygeD1pi3zNGyuUIsd33/8J/1N+P7mLhnzcUi9+IPLavobTURdtt76BSLOIxe8msTt/mrf26JD4XstblT9T+o61P0X22X6yg+8zWRG5hjr7tuvPwJ+cLLE9qthPu67Iho3zL6VhNnxnnqBM9AEREREXnEARQRERGRRxxAEREREXnEOVAxKN+128RNH90tcsfutjNzcpWcz/NUwdsmvvQqeft17htL4tnFwNhXXsfEyX4UjnPe0/rJXUVu3RV2uYp3j9QXuR3T2pm47n7/zA2Jpzb3LK56I4+a4+u4v6dTbv89rrl7i682cSH88YggomjkDLbzLs956xqRW3r2i8nuTkRu2NbfxMVLznTd7omfPyPazjldV382RuSaFCduyRiegSIiIiLyiAMoIiIiIo94Cc+DUF+5svJXQ+1Tn8/stkXkwi/bOT32nX36fO6b7rdfk3XnoqEmLgxbLiDeQgO6i/buO46auKRoqsgNXD3cxLUvkqvK10UwL9sFWes3A3pPewScn2+ThvwthT2JnPPy+qaJPxG5hX2cTzOoJXKXrbf36rd9aI3IhRA8TYZsFe3L6lxg4o3jOohcqPWxiN6z7mK5in/dbXbayu6z7dCi7aPrIu6nPmr33f6I++fne33lVIqBOfbvaHlxo4j3FyuegSIiIiLyiAMoIiIiIo84gCIiIiLyiHOgKqGK7O2TX95q5zI9dZ58onf/WvJJ8W6Oa/k4ik++a2MboZ1R9DCgHE9Fzwgb2z/Sd46Jp0E++iMett5nnz7/2siHRa4w2/4OnL10lMi1uOqLuPeFKBUyPrKPqrr31V+I3FWjpoZv7gvOeU9rRof3sRbcHCm1x3TOoUPx7pbv6OPHRbvc0W4zMf5LkbR+07GvOL2nPs/O0buswXNxetfY8AwUERERkUdVDqCUUq2UUsVKqRKl1Fql1G0V389TSn2glNpQ8bVh4rtLsQghBNYyULJZz2DgsRk4PDargUgu4ZUBGKe1XqmUqgtghVLqAwC/ArBAaz1ZKTUBwAQA4xPX1fjKatPaxF+NbiFyfxhuV2m9us7eqN5/4q4iE3/4SC+Ra/hs/E+ZeuDfWjruIA+F3Uw8IGeficfO7iFyZ8yy22Z/K0/H7xrQxMR5w7eb+JbTF4jtLs61SyPMPdxU5EauvsjEjZ+s7dr9FPFvPdNAppL/htxfmG3iZu8muzesZVW+/q8+or1w1H87Wu6X7DaXyVvzQ0/lO7Nx6FmlWM84UovsJea3Dsglhfo1S81yQFWegdJa79Rar6yIDwEoAdASwBUAfpwU9CyAKxPVSYqPDGSAtQyUUtYzGHhsBg6PzWrA0xwopVQBgO4AlgBoqrXeCZwcZAHId3nNDUqp5Uqp5aU4XtkmlAKsZbCwnsHBWgYL6xlcEQ+glFJ1ALwGYKzW+mCkr9Naz9BaF2mti7JRM5o+UpyxlsHCegYHaxksrGewRbSMgVIqGyd/CV7QWr9e8e1dSqnmWuudSqnmAHYnqpPRyio43cTf92gucsPvm2fi/2jwOqIxbqed27R4epHI5c22T3JvGErpnCchXWtZS9lf1ZJBT4jcR/3s3IcNx5uJ3Oj6WyJ6/9t29DPxvI/l9fX2t/n3kSzpWk+/KNdhD+5I4X3J6VrLN3rb43HhF+1E7tWbBpu45sZdEb3f9+eeJtq/vP9tEw+q/WeRa5hhHyeyt/yoyG0ts7m7xt0hcrXfWBJRX2KRrvVMB58faCnaEzPscdzyf+XvWbyWUahMJHfhKQDPACjRWjsXyJkL4MdFcUYBeDP8teQv+uQsbdYyWFjPAOCxGUisZ8BFcgbqPADXAVitlPpxGvxEAJMBvKyUGgPgawBDXV5PPlF+cizOWgZHHbCegcBjM3B4bFYDVQ6gtNYfQawRLQyMb3e8y2puL9l8N1PeYn5jmw9NPKJuZKePw938TV8Tr3xcXtpp/Kp9infeIf9cpnOThSxorX1by6b/sGezx/+mt8g91Mz95+tcEb5vrS2u2606bk+4jvjwBpErHG2XMWgP/16yC/ODn+uZjo70PJKS/frt2Kz3lWz/85hduTv8CQzOlfoL638tcmNeeMrzvjPC/tyEnOubIEfknMsTXDnjbpFr9cDHJs5F4i/ZheGxmUA1rpcXz1Znn2Hi8i+/Ct88YbgSOREREZFHHEARERERecQBFBEREZFHES1jkGonBtslAk7c/p3ITWz3jokvzDkc1fvvCrv9tf/ccSbueO86E+cdkPNwwm6Aphg5r11vGFogcp1vucXEXwx7LOL37PjOTSbuMN3ObylctaKyzamaCX+UC53U6Bn5WTfpmktN/H7n6JZ9SYQrn7rLxM45TxRsZZu3proLAHgGioiIiMgzDqCIiIiIPEqLS3hbrrTjvC+7vhLx66YdsLc2PvLhhSKnyu0dph0nyadxt99lb3lN5Cqm5K5s0xbRbne7bV9+e8+I36cQy0ysT7EdVR/H5zcxcXk3XoiPRM2J9Uy88RX5bLZ22Yl91Ej3JSNN3PipXJE7ff5yE/P4pmTjGSgiIiIijziAIiIiIvKIAygiIiIij9JiDlThjUtNfOmNPaJ7Dyx1zXGeE1H10WyKvd39kilni1xbfBq+OQHQy1abeGxBn6TuuyXWuuY474lSiWegiIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDziAIqIiIjIIw6giIiIiDxSWidvLVel1B4AWwE0BrA3aTt2V9360Vpr3aTqzarGWp4S6xm76tYP1jI50rWeh1H9foZVSXktkzqAMjtVarnWuijpO2Y/4s4vffdLPwB/9cUrv/Sd/YidX/rul34A/uqLF37qt1/64od+8BIeERERkUccQBERERF5lKoB1IwU7Tcc+xE7v/TdL/0A/NUXr/zSd/Yjdn7pu1/6AfirL174qd9+6UvK+5GSOVBERERE6YyX8IiIiIg84gCKiIiIyKOkDqCUUhcppdYrpTYqpSYked8zlVK7lVJrHN/LU0p9oJTaUPG1YRL60UopVayUKlFKrVVK3ZaqvsQqVfVkLeOPx2Zw6slaBqeWAOtZsU9f1jNpAyilVCaAaQAuBtAZwAilVOdk7R/AbAAXhX1vAoAFWuv2ABZUtBOtDMA4rXUnAL0A/Lbi55CKvkQtxfWcDdYybnhsGmlfT9bSSPtaAqyngz/rqbVOyn8AegN4z9G+B8A9ydp/xT4LAKxxtNcDaF4RNwewPpn9qdjvmwAG+aEv6VRP1jI4tWQ9WUvWkvVMx3om8xJeSwDbHO3tFd9LpaZa650AUPE1P5k7V0oVAOgOYEmq+xIFv9WTtYye32oJsJ7RYi3DpHEtAdbzX/ipnskcQKlKvldt11BQStUB8BqAsVrrg6nuTxRYzwqsZbCkeT1ZS4c0ryXAegp+q2cyB1DbAbRytE8DsCOJ+6/MLqVUcwCo+Lo7GTtVSmXj5C/BC1rr11PZlxj4rZ6sZfT8VkuA9YwWa1khALUEWE/Dj/VM5gBqGYD2Sqk2SqkaAK4BMDeJ+6/MXACjKuJROHldNaGUUgrAMwBKtNYPp7IvMfJbPVnL6PmtlgDrGS3WEoGpJcB6AvBxPZM88esSAF8C+ArA75K87zkAdgIoxclR/RgAjXBy5v6Giq95SehHX5w8Bfs5gE8r/rskFX1J13qylsGpJevJWrKWrGe61pOPciEiIiLyiCuRExEREXnEARQRERGRRxxAEREREXnEARQRERGRRxxAEREREXnEARQRERGRRxxAEREREXn0/6qK5FZQqcBNAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# iterate normally over keys\n", "\n", "for k in train_aset:\n", " # equivalent method: for k in train_aset.keys():\n", " print(k)\n", "\n", "# iterate over items (plot results)\n", "\n", "fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(10, 10))\n", "\n", "for idx, v in enumerate(train_aset.values()):\n", " axs[idx].imshow(v.reshape(28, 28))\n", "plt.show()\n", "\n", "# iterate over items, store k, v in dict\n", "\n", "myDict = {}\n", "for k, v in train_aset.items():\n", " myDict[k] = v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Performance\n", "\n", "Once you’ve completed an interactive exploration, be sure to use the context\n", "manager form of the `update()` and `get()` methods!\n", "\n", "In order to make sure that all your data is always safe in Hangar, the backend\n", "diligently ensures that all contexts (operations which can somehow interact\n", "with the record structures) are opened and closed appropriately. When you use the\n", "context manager form of a arrayset object, we can offload a significant amount of\n", "work to the python runtime, and dramatically increase read and write speeds.\n", "\n", "Most columns we’ve tested see an increased throughput differential of 250% -\n", "500% for writes and 300% - 600% for reads when comparing using the context\n", "manager form vs the naked form!" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Beginning non-context manager form\n", "----------------------------------\n", "Finished non-context manager form in: 78.54769086837769 seconds\n", "Hard reset requested with writer_lock: 8910b50e-1f9d-4cb1-986c-b99ea84c8a54\n", "\n", "Beginning context manager form\n", "--------------------------------\n", "Finished context manager form in: 11.608536720275879 seconds\n", "Hard reset requested with writer_lock: ad4a2ef9-8494-49f8-84ef-40c3990b1e9b\n" ] } ], "source": [ "import time\n", "\n", "# ----------------- Non Context Manager Form ----------------------\n", "\n", "co = repo.checkout(write=True)\n", "aset_trimgs = co.add_ndarray_column(name='train_images', prototype=sample_trimg)\n", "aset_trlabels = co.add_ndarray_column(name='train_labels', prototype=sample_trlabel)\n", "\n", "print(f'Beginning non-context manager form')\n", "print('----------------------------------')\n", "start_time = time.time()\n", "\n", "for idx, img in enumerate(trimgs):\n", " aset_trimgs[idx] = img\n", " aset_trlabels[idx] = np.array([trlabels[idx]])\n", "\n", "print(f'Finished non-context manager form in: {time.time() - start_time} seconds')\n", "\n", "co.reset_staging_area()\n", "co.close()\n", "\n", "# ----------------- Context Manager Form --------------------------\n", "\n", "co = repo.checkout(write=True)\n", "aset_trimgs = co.add_ndarray_column(name='train_images', prototype=sample_trimg)\n", "aset_trlabels = co.add_ndarray_column(name='train_labels', prototype=sample_trlabel)\n", "\n", "print(f'\\nBeginning context manager form')\n", "print('--------------------------------')\n", "start_time = time.time()\n", "\n", "with aset_trimgs, aset_trlabels:\n", " for idx, img in enumerate(trimgs):\n", " aset_trimgs[idx] = img\n", " aset_trlabels[idx] = np.array([trlabels[idx]])\n", "\n", "print(f'Finished context manager form in: {time.time() - start_time} seconds')\n", "\n", "co.reset_staging_area()\n", "co.close()\n", "\n", "print(f'Finished context manager with checkout form in: {time.time() - start_time} seconds')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clearly, the context manager form is far and away superior, however we fell that\n", "for the purposes of interactive use that the \"Naked\" form is valubal to the\n", "average user!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Commiting Changes\n", "\n", "Once you have made a set of changes you want to commit, just simply call the [commit()](api.rst#hangar.checkout.WriterCheckout.commit) method (and pass in a message)!" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b'" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co.commit('hello world, this is my first hangar commit')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned value (`'e11d061dc457b361842801e24cbd119a745089d6'`) is the commit hash of this commit. It\n", "may be useful to assign this to a variable and follow this up by creating a\n", "branch from this commit!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Don't Forget to Close the Write-Enabled Checkout to Release the Lock!\n", "\n", "We mentioned in `Checking out the repo for writing` that when a\n", "`write-enabled` checkout is created, it places a lock on writers until it is\n", "closed. If for whatever reason the program terminates via a non python `SIGKILL` or fatal\n", "interpreter error without closing the\n", "write-enabled checkout, this lock will persist (forever technically, but\n", "realistically until it is manually freed).\n", "\n", "Luckily, preventing this issue from occurring is as simple as calling\n", "[close()](api.rst#hangar.checkout.WriterCheckout.close)!\n", "\n", "If you forget, normal interperter shutdown should trigger an `atexit` hook automatically,\n", "however this behavior should not be relied upon. Is better to just call\n", "[close()](api.rst#hangar.checkout.WriterCheckout.close)." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "co.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### But if you did forget, and you recieve a `PermissionError` next time you open a checkout\n", "\n", "```\n", "PermissionError: Cannot acquire the writer lock. Only one instance of\n", "a writer checkout can be active at a time. If the last checkout of this\n", "repository did not properly close, or a crash occured, the lock must be\n", "manually freed before another writer can be instantiated.\n", "```\n", "\n", "You can manually free the lock with the following method. However!\n", "\n", "This is a dangerous operation, and it's one of the only ways where a user can put\n", "data in their repository at risk! If another python process is still holding the\n", "lock, do NOT force the release. Kill the process (that's totally fine to do at\n", "any time, then force the lock release)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "repo.force_release_writer_lock()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading Data\n", "\n", "Two different styles of access are considered below, In general, the contex manager form\n", "if recomended (though marginal performance improvements are expected to be seen at best)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Neither BRANCH or COMMIT specified.\n", " * Checking out writing HEAD BRANCH: master\n", "\n", "Begining Key Iteration\n", "-----------------------\n", "completed in 5.838773965835571 sec\n", "\n", "Begining Items Iteration with Context Manager\n", "---------------------------------------------\n", "completed in 5.516948938369751 sec\n" ] } ], "source": [ "co = repo.checkout()\n", "\n", "trlabel_col = co['train_labels']\n", "trimg_col = co['train_images']\n", "\n", "print(f'\\nBegining Key Iteration')\n", "print('-----------------------')\n", "start = time.time()\n", "\n", "for idx in trimg_col.keys():\n", " image_data = trimg_col[idx]\n", " label_data = trlabel_col[idx]\n", "\n", "print(f'completed in {time.time() - start} sec')\n", "\n", "print(f'\\nBegining Items Iteration with Context Manager')\n", "print('---------------------------------------------')\n", "start = time.time()\n", "\n", "with trlabel_col, trimg_col:\n", " for index, image_data in trimg_col.items():\n", " label_data = trlabel_col[index]\n", "\n", "print(f'completed in {time.time() - start} sec')\n", "\n", "co.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspecting state from the top!\n", "\n", "After your first commit, the summary and log methods will begin to work, and you can either print the stream to the console (as shown below), or you can\n", "dig deep into the internal of how hangar thinks about your data! (To be covered in an advanced tutorial later on).\n", "\n", "The point is, regardless of your level of interaction with a live hangar repository, all level of state is accessable from the top, and in general has been built to be the only way to directly access it!" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Summary of Contents Contained in Data Repository \n", " \n", "================== \n", "| Repository Info \n", "|----------------- \n", "| Base Directory: /Users/rick/projects/tensorwerk/hangar/dev/mnist \n", "| Disk Usage: 57.29 MB \n", " \n", "=================== \n", "| Commit Details \n", "------------------- \n", "| Commit: a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b \n", "| Created: Tue Feb 25 19:03:06 2020 \n", "| By: Rick Izzo \n", "| Email: rick@tensorwerk.com \n", "| Message: hello world, this is my first hangar commit \n", " \n", "================== \n", "| DataSets \n", "|----------------- \n", "| Number of Named Columns: 2 \n", "|\n", "| * Column Name: ColumnSchemaKey(column=\"train_images\", layout=\"flat\") \n", "| Num Data Pieces: 50000 \n", "| Details: \n", "| - column_layout: flat \n", "| - column_type: ndarray \n", "| - schema_type: fixed_shape \n", "| - shape: (784,) \n", "| - dtype: uint8 \n", "| - backend: 00 \n", "| - backend_options: {'complib': 'blosc:lz4hc', 'complevel': 5, 'shuffle': 'byte'} \n", "|\n", "| * Column Name: ColumnSchemaKey(column=\"train_labels\", layout=\"flat\") \n", "| Num Data Pieces: 50000 \n", "| Details: \n", "| - column_layout: flat \n", "| - column_type: ndarray \n", "| - schema_type: fixed_shape \n", "| - shape: (1,) \n", "| - dtype: int64 \n", "| - backend: 10 \n", "| - backend_options: {} \n", " \n", "================== \n", "| Metadata: \n", "|----------------- \n", "| Number of Keys: 0 \n", "\n" ] } ], "source": [ "repo.summary()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* a=8eb01eaf0c657f8526dbf9a8ffab0a4606ebfd3b (\u001B[1;31mmaster\u001B[m) : hello world, this is my first hangar commit\n" ] } ], "source": [ "repo.log()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }