MLCube Configuration¶
MLCube® configuration provides information about MLCube's authors, requirements and
tasks. This is example configuration for
the MNIST MLCube:
name: mnist
description: MLCommons MNIST MLCube example
authors:
- {name: "First Second", email: "first.second@company.com", org: "Company Inc."}
platform:
accelerator_count: 0
accelerator_maker: NVIDIA
accelerator_model: A100-80GB
host_memory_gb: 40
need_internet_access: True
host_disk_space_gb: 100
docker:
image: mlcommons/mnist:0.0.1
singularity:
image: mnist-0.0.1.sif
tasks:
download:
parameters:
inputs:
data_config: {type: file, default: data.yaml}
outputs:
data_dir: {type: directory, default: data}
log_dir: {type: directory, default: logs}
train:
parameters:
inputs:
data_dir: {type: directory, default: data}
train_config: {type: file, default: train.yaml}
outputs:
log_dir: {type: directory, default: logs}
model_dir: {type: directory, default: model}
Metadata¶
MLCube configuration can contain metadata about MLCube developers. The following fields are allowed:
name (type=string)
MLCube name.description (type=string)
MLCube description.authors (type=list)
List of MLCube developers / authors. Each item is a dictionary with the following fields:name (type=string)
Author full name.email (type=string)
Author email.org (type=string)
Author affiliation.
Resources¶
The platform
section (optional) can provide information about resources that MLCubes require.
Warning
Parameters defined in this section are not supported yet by MLCube runners.
This section is intended to be used by MLCube runners. For instance, cloud runners can use information about accelerators, disk space and memory to provision appropriate resources. The exact fields of this section are to be defined.
Tasks¶
This tasks
section provides description of what's implemented in an MLCube. This section is a dictionary that maps
a task name to a task configuration. In the example above, two tasks are defined - download
and train
.
Each task configuration is a dictionary with two parameters:
entrypoing (type=string)
Optional task-specific entrypoint (e.g., executable script, for instance, inside an MLCube container). If not present, it is assumed that global entry point is defined (for instance, via Docker's entry point configuration - see example).parameters (type=dictionary)
Optional specification of input and output parameters. If present, can contain two optional fields -inputs
andoutputs
. Each field specifies task's input and output parameters. This specification is a dictionary mapping from a parameter name to a parameter description. In the above example, thedownload
task defines one input parameter (data_config
) and two output parameters (data_dir
andlog_dir
). Each parameter description is a dictionary with the following fields:type (type=string)
Specifies parameter type, and must be one offile
ordirectory
.default (type=string)
Parameter value: path to a directory of path to a file.- Paths can contain
~
(user home directory) and environment variables (e.g.,${HOME}
). MLCube does not encourage the use of environment variables since this makes MLCube less portable and reproducible. The use of~
should be OK though. - Paths can be absolute or relative. Relative paths are always relative to current
MLCube workspace directory. In the example above, thedata_conig
parameter's default value for thedownload
task is a short form of${workspace}/data.yaml
.
- Paths can contain
opts (type=string)
This optional field specifies file or path access options (e.g., mount options for container runtimes). Valid values arerw
(read and write) andro
(read only). When parameter is a file, these options are set for a volume associated with the file's parent directory. When read-only option is specified for an output parameter, MLCube runner will use it and will log to a log file. When conflicting options are found, MLCube will log a warning message and will use therw
option.
Examples¶
More example configurations of MLCubes can be found in the mlcube_examples repository. In particular, the getting-started example shows the use of the entrypoint specification.