Hangar External

High level interaction interface between hangar and everything external.

High Level Methods

High level methods let user interact with hangar without diving into the internal methods of hangar. We have enabled four basic entry points as high level methods

  1. load()
  2. save()
  3. show()
  4. board_show()

These entry points by itself is not capable of doing anything. But they are entry points to the same methods in the hangar.external plugins available in pypi. These high level entry points are used by the CLI for doing import, export and view operations as well as the hangarboard for visualization (using board_show)

board_show(arr: numpy.ndarray, plugin: str = None, extension: str = None, **plugin_kwargs)

Wrapper to convert the numpy array using the board_show method of the plugin to make it displayable in the web UI

Parameters:
  • arr (numpy.ndarray) – Data to process into some human understandable representation.
  • plugin (str, optional) – Name of plugin to use. By default, the preferred plugin for the given file format tried until a suitable. This cannot be None if extension is also None
  • extension (str, optional) – Format of the file. This is used to infer which plugin to use in case plugin name is not provided. This cannot be None if plugin is also None
Other Parameters:
 

plugin_kwargs (dict) – Plugin specific keyword arguments. If the function is being called from command line argument, all the unknown keyword arguments will be collected as plugin_kwargs

load(fpath: str, plugin: str = None, extension: str = None, **plugin_kwargs) → Tuple[numpy.ndarray, str]

Wrapper to load data from file into memory as numpy arrays using plugin’s load method

Parameters:
  • fpath (str) – Data file path, e.g. path/to/test.jpg
  • plugin (str, optional) – Name of plugin to use. By default, the preferred plugin for the given file format tried until a suitable. This cannot be None if extension is also None
  • extension (str, optional) – Format of the file. This is used to infer which plugin to use in case plugin name is not provided. This cannot be None if plugin is also None
Other Parameters:
 

plugin_kwargs (dict) – Plugin specific keyword arguments. If the function is being called from command line argument, all the unknown keyword arguments will be collected as plugin_kwargs

Returns:

img_array – data returned from the given plugin.

Return type:

numpy.ndarray

save(arr: numpy.ndarray, outdir: str, sample_det: str, extension: str, plugin: str = None, **plugin_kwargs)

Wrapper plugin save methods which dump numpy.ndarray to disk.

Parameters:
  • arr (numpy.ndarray) – Numpy array to be saved to file
  • outdir (str) – Target directory
  • sample_det (str) – Sample name and type of the sample name formatted as sample_name_type:sample_name
  • extension (str) – Format of the file. This is used to infer which plugin to use in case plugin name is not provided. This cannot be None if plugin is also None
  • plugin (str, optional) – Name of plugin to use. By default, the preferred plugin for the given file format tried until a suitable. This cannot be None if extension is also None
Other Parameters:
 

plugin_kwargs (dict) – Plugin specific keyword arguments. If the function is being called from command line argument, all the unknown keyword arguments will be collected as plugin_kwargs

Notes

CLI or this method does not create the file name where to save. Instead they pass the required details downstream to the plugins to do that once they verify the given outdir is a valid directory. It is because we expect to get data entries where one data entry is one file (like images) and also data entries where multiple entries goes to single file (like CSV). With these ambiguous cases in hand, it’s more sensible to let the plugin handle the file handling accordingly.

show(arr: numpy.ndarray, plugin: str = None, extension: str = None, **plugin_kwargs)

Wrapper to display numpy.ndarray via plugin show method.

Parameters:
  • arr (numpy.ndarray) – Data to process into some human understandable representation.
  • plugin (str, optional) – Name of plugin to use. By default, the preferred plugin for the given file format tried until a suitable. This cannot be None if extension is also None
  • extension (str, optional) – Format of the file. This is used to infer which plugin to use in case plugin name is not provided. This cannot be None if plugin is also None
Other Parameters:
 

plugin_kwargs (dict) – Plugin specific keyword arguments. If the function is being called from command line argument, all the unknown keyword arguments will be collected as plugin_kwargs

Plugin System

Hangar’s external plugin system is designed to make it flexible for users to write custom plugins for custom data formats. External plugins should be python installables and should make itself discoverable using package meta data. A detailed documentation can be found in the official python doc. But for a headstart and to avoid going through this somewhat complex process, we have made a cookiecutter package. All the hangar plugins follow the naming standard similar to Flask plugins i.e hangar_pluginName

class BasePlugin(provides, accepts)

Base plugin class from where all the external plugins should be inherited.

Child classes can have four methods to expose - load, save, show and board_show. These are considered as valid methods and should be passed as the first argument while initializing the parent from child. Child should also inform the parent about the acceptable file formats by passing that as second argument. BasePlugin accepts provides and accepts on init and exposes them which is then used by plugin manager while loading the modules. BasePlugin also provides sample_name function to figure out the sample name from the file path. This function is used by load method to return the sample name which is then used by hangar as a key to save the data

board_show(arr, *args, **kwargs)

Show/display data in hangarboard format.

Hangarboard is capable of displaying three most common data formats: image, text and audio. This function should process the input numpy.ndarray data and convert it to any of the supported formats.

load(fpath, *args, **kwargs)

Load some data file on disk to recover it in numpy.ndarray form.

Loads the data provided from the disk for the file path given and returns the data as numpy.ndarray and name of the data sample. Names returned from this function will be used by the import cli system as the key for the returned data. This function can return either a single numpy.ndarray, sample name, combination, or a generator that produces one of the the above combinations. This helps when the input file is not a single data entry like an image but has multiple data points like CSV files.

An example implementation that returns a single data point:

def load(self, fpath, *args, **kwargs):
    data = create_np_array('myimg.jpg')
    name = create_sample_name('myimg.jpg')  # could use `self.sample_name`
    return data, name

An example implementation that returns a generator could look like this:

def load(self, fpath, *args, **kwargs):
    for i, line in enumerate('myfile.csv'):
        data = create_np_array(line)
        name = create_sample_name(fpath, i)
        yield data, name
static sample_name(fpath: os.PathLike) → str

Sample the name from file path.

This function comes handy since the load() method needs to yield or return both data and sample name. If there no specific requirements regarding sample name creation, you can use this function which removes the extension from the file name and returns just the name. For example, if filepath is /path/to/myfile.ext, then it returns myfile

Parameters:fpath (os.PathLike) – Path to the file which is being loaded by load
save(arr, outdir, sample_detail, extension, *args, **kwargs)

Save data in a numpy.ndarray to a specific file format on disk.

If the plugin is developed for files like CSV, JSON, etc - where multiple data entry would go to the same file - this should check whether the file exist already and weather it should modify / append the new data entry to the structure, instead of overwriting it or throwing an exception.

Note

Name of the file and the whole path to save the data should be constructed by this function. This can be done using the information gets as arguments such as, outdir, sample_detail and extension. It has been offloaded to this function instead of handling it before because, decisions like whether the multiple data entry should go to a single file or mutltpile file cannot be predicted before hand as are always data specific (and hence plugin specific)

Note

If the call to this function is initiated by the CLI, sample_detail argument will be a string formatted as sample_name_type:sample_name. For example, if the sample name is sample1 (and type of sample name is str) then sample_detail will be str:sample1. This is to avoid the ambiguity that could arise by having both integer and string form of numerical as the sample name (ex: if column[123] and column[“123”] exist). Formatting sample_detail to make a proper filename (not necessary) is upto the plugin developer.

show(arr, *args, **kwargs)

Show/Display the data to the user.

This function should process the input numpy.ndarray and show that to the user using a data dependant display mechanism. A good example for such a system is matplotlib.pyplot’s plt.show, which displays the image data inline in the running terminal / kernel ui.