MLC "script" automation specification
Please check the MLC documentation for more details about the MLCflow interface.
See the automatically generated catalog of all the MLC scripts.
Understanding MLC scripts
- An MLC script is identified by a set of tags and by an unique ID.
- Further each MLC script can have multiple variations and they are identified by variation tags which are treated in the same way as tags and identified by a
_prefix.
MLC script execution flow
graph TD
MLC -->|env = incoming env + env_from_meta| B[Script]
B -->|env - local_env_keys| C[List of Dependencies]
C --> D[Preprocess]
D -->|env - local_env_keys| E[Prehook dependencies]
E -->F[Run script]
F -->|env - clean_env_keys_post_deps| G[Posthook dependencies]
G --> H[Postprocess]
H -->|env - clean_env_keys_post_deps| I[Post dependencies]
I -->|"env(new_env_keys)"| J[Script return]
- When an MLC script is invoked (either by tags or by unique ID), its
meta.yamlis processed first which will check for anydepsscript and if there are, then they are executed in order. - Once all the
depsscripts are executed,customize.pyfile is checked and if existingpreprocessfunction inside it is executed if present. - Then any
prehook_depsscripts mentioned inmeta.yamlare executed similar todeps - After this, keys in
envdictionary is exported asENVvariables andrunfile if exists is executed. - Once run file execution is done, any
posthook_depsscripts mentioned inmeta.yamlare executed similar todeps - Then
postprocessfunction inside customize.py is executed if present. - After this stage any
post_depsscripts mentioned inmeta.yamlis executed.
** If a script is already cached, then the preprocess, run file and postprocess executions won't happen and only the dependencies marked as dynamic will be executed from deps, prehook_deps, posthook_deps and postdeps.
Input flags
When we run an MLC script we can also pass inputs to it and any input added in input_mapping dictionary inside meta.yaml gets converted to the corresponding ENV variable.
Conditional execution of any deps, post_deps
We can use skip_if_env dictionary inside any deps, prehook_deps, posthook_deps or post_deps to make its execution conditional
Versions
We can specify any specific version of a script using version. version_max and version_min are also possible options.
-
When
version_minis given, any version above this if present in the cache or detected in the system can be chosen. If nothing is detecteddefault_versionif present and if aboveversion_minwill be used for installation. Otherwiseversion_minwill be used asversion. -
When
version_maxis given, any version below this if present in the cache or detected in the system can be chosen. If nothing is detecteddefault_versionif present and if belowversion_maxwill be used for installation. Otherwiseversion_max_usable(additional needed input forversion_max) will be used asversion.
Variations
- Variations are used to customize MLC script and each unique combination of variations uses a unique cache entry. Each variation can turn on
envkeys also any other meta including dependencies specific to it. Variations are turned on like tags but with a_prefix. For example, if a script is having tags"get,myscript", to call the variation"test"inside it, we have to use tags"get,myscript,_test".
Variation groups
group is a key to map variations into a group and at any time only one variation from a group can be used in the variation tags. For example, both cpu and cuda can be two variations under the device group, but user can at any time use either cpu or cuda as variation tags but not both.
Dynamic variations
Sometimes it is difficult to add all variations needed for a script like say batch_size which can take many different values. To handle this case, we support dynamic variations using '#' where '#' can be dynamically replaced by any string. For example, "_batch_size.8" can be used as a tag to turn on the dynamic variation "_batch_size.#".
ENV flow during MLC script execution
- During a given script execution incoming
envdictionary is saved(saved_env)and all the updates happens on a copy of it. - Once a script execution is over (which includes all the dependent script executions as well), newly created keys and any updated keys are merged with the
saved_envprovided the keys are mentioned innew_env_keys - Same behaviour applies to
statedictionary.
Special env keys
- Any env key with a prefix
MLC_TMP_*andMLC_GIT_*are not passed by default to any dependency. These can be force passed by adding the key(s) to theforce_env_keyslist of the concerned dependency. - Similarly we can avoid any env key from being passed to a given dependency by adding the prefix of the key in the
clean_env_keyslist of the concerned dependency. --inputis automatically converted toMLC_INPUTenv keyversionis converted toMLC_VERSION,`version_mintoMLC_VERSION_MINandversion_maxtoMLC_VERSION_MAX- If
env['MLC_GH_TOKEN']=TOKEN_VALUEis set then git URLs (specified byMLC_GIT_URL) are changed to add this token. - If
env['MLC_GIT_SSH']=yes, then git URLs are changed to SSH from HTTPS.
Script Meta
Special keys in script meta
- TBD:
reuse_version,inherit_variation_tags,update_env_tags_from_env
How cache works?
- If
cache=trueis set in a script meta, the result of the script execution is cached for further use. - For a cached script,
envandstateupdates are done usingnew_envandnew_statedictionaries which are stored in thecm-cached.jsonfile inside the cached folder. - By using
--newinput, a new cache entry can be forced even when an old one exist. - By default no depndencies are run for a cached entry unless
dynamickey is set for it.
Please see here for trying MLC scripts.
© 2022-25 MLCommons