MLCube concepts¶
Command Line Arguments¶
MLCube® runtime and MLCube runners accept multiple command line arguments. They can be classified into two categories:
- Fixed command-specific parameters such as
--mlcube
,--platform
and--task
for the MLCube'srun
command, orcreate_platform
andrename_platform
for theconfig
command. - Parameters that override system settings or MLCube configuration. These parameters start with
-P
and should follow OmegaConf's format (MLCube uses this library to manage its settings and configurations). For instance, to override docker runner settings and instruct it to always build MLCube images, one should provide the following command line argument:-Pdocker.build_strategy=always
.
These command line arguments override system settings and MLCube configuration parameters. The Command Line Interface section provides detailed description of MLCube commands and their arguments.
Effective MLCube configuration¶
Effective MLCube configuration
is the actual configuration that MLCube runners use to run MLCubes. This effective
configuration is built using the following algorithm. A platform configuration for a runner is retrieved from system
settings (users provide the platform name on a command line by passing --platform=PLATFORM_NAME
argument). Then,
the MLCube configuration is loaded from the MLCube project to run, and this configuration can override default
configuration retrieved from system settings. The source for this configuration is specified by a user on a command line
using --mlcube=MLCUBE_PATH
argument. The third source of configuration is the command line. Users can provide
configuration parameters that override default behavior of a MLCube runner, or default parameters of the MLCube project.
These parameters start with -P
, for instance, -Pdocker.build_strategy=always
. These parameters have the highest
priority and override any other parameters loaded so far.
MLCube Configuration¶
MLCube Configuration
provide MLCube-specific configuration, such as implemented tasks
and, optionally, specific
platform (hardware) requirements, for instance, GPU and host memory required to run the tasks. This configuration
overrides system settings. For MLCubes that are distributed via GitHub (with source files), this configuration is
stored in a YAML file with default location being ${MLCUBE_ROOT}/mlcube.yaml
. The
MLCube configuration
section provides detailed description.
MLCube Configuration Parameter¶
MLCube configuration paramter
is a configuration parameter for MLCube runners or MLCube projects that has (usually)
a type from the following set: (integer, floating point number, string, bool). Every MLCube runner and all MLCube
projects have such parameters, usually organized in a hierarchy. Users can also provide these parameters on a command
line when they interact with MLCube runtime to override default values for these parameters. MLCube uses
OmegaConf library to manage its configuration. When users provide these parameters
on a command line, they need to follow OmegaConf rules, in particular, nested parameters should use .
symbol. Also,
when providing these parameters on a command line, these parameters must have -P
prefix. Several examples:
# Overriding top level parameter. Here, the `description` is a
# parameter in a global namespace.
-Pdescription="MLCube project description"
# Overriding nested parameter. Here, the `build_strategy` is a
# parameter defined in the `docker` namespace.
-Pdocker.build_strategy=always
MLCube Home Directory¶
MLCube home directory
is the synonym for MLCube Root Directory.
MLCube Runtime¶
The MLCube Runtime
term is used to describe the core MLCube library with MLCube runners. MLCube runtime is responsible
for managing MLCube system settings and MLCube configurations, and run MLCubes in various environments.
MLCube Root Directory¶
MLCube root directory
is the directory that contains MLCube configuration file (mlcube.yaml
). This definition
applies to MLCubes that are distributed, for instance, via GitHub.
MLCubes¶
The term MLCubes
(or MLCube project
in singular form) refers to Machine Learning projects packaged and distributed
using MLCube stack.
Platform¶
A platform
is a configured instance of a MLCube runner. Every runner has a default configured instance with the same
name as runner. For instance, the MLCube docker runner (named docker
) has a default platform named docker
as well.
Users may find it useful to have multiple configured instances (platforms) of one MLCube runner. For instance, if a user
has personal and corporate accounts in some cloud provider, they can have multiple platforms - one for each account.
Users directly interact with platforms via command line argument --platform
. System settings provide platform
configurations, and users can manually edit system settings to add new platforms, or use MLCube's runtime config
command to perform basic operations with system settings, such as creating a new platform. See System Settings
description for more detailed explanation of MLCube runners and platforms, and how they relate to each other.
Runner¶
MLCube runners
are workhorses of MLCube runtime. They run MLCubes in different environments, such as docker and
singularity, remote on-prem or cloud compute nodes, and orchestration engines such as Kubernetes and KubeFlow. As part
of MLCube ecosystem, we provide multiple MLCube reference runners. Users do not directly interact with MLCube runners,
instead they interact with platforms
, which are configured instances of MLCube runners.
System Settings¶
MLCube System Settings
configure MLCube and MLCube runners at a system level. The term system level
here implies
that these settings are not tied to particular MLCubes (MLCube compliant ML projects). Instead, these settings are
used by MLCube runners on every machine where MLCube runtime is configured to use these settings. By default, system
settings are stored in a YAML file with default location being ${MLCUBE_ROOT}/mlcube.yaml
. The location can be
overriden by exporting the MLCUBE_SYSTEM_SETTINGS
environment variable. Detailed description of system settings
is here.
Task¶
MLCube projects expose their functionality via tasks
. A task implements one particular function, such as downloading
a machine learning dataset, preprocessing this dataset, training a machine learning model, testing a machine learning
model or registering a machine learning model with the external model registry. It's up to developers to decide how they
want to organize their projects into tasks. A close analogy would be machine learning pipelines composed of multiple
steps organized into directed acyclic graph. Tasks in MLCubes are like steps on these pipelines, except that MLCube
runtime at this point is not aware about task dependencies, and so the MLCube task model could be described as
bag of tasks
(similarly to bag of words
term used in natural language processing to describe machine learning models
that do not take into account positions of words in sentences).
The MLCube examples project implements several MLCubes, including
MNIST MLCube that implements two tasks:
download
(download MNIST dataset) and train
(train a simple classifier).
Users can can instruct MLCube runtime to execute a particular task or tasks by providing --task
command line argument:
mlcube run --mlcube=. --task=download --platform=docker
: execute one task (download
).mlcube run --mlcube=. --task=download,train --platform=docker
: execute two tasks named(download
andtrain
).
MLCube runtime executes tasks in the order provided by users. In the above example, MLCube will run the download
task,
and then - the train
task.
Workspace¶
A workspace
is a directory where input and output artifacts are stored. By default, its location is
${MLCUBE_ROOT}/workspace
. Users can override this parameter on a command line by providing the --workspace
argument.
Users need to provide this parameter each time they run MLCube task, even when these tasks are logically grouped into
one execution. A better alternative would be to run multiple tasks at the same time (see task section).