get-dataset-openorca

Automatically generated README for this automation recipe: get-dataset-openorca

Category: AI/ML datasets

License: Apache 2.0

CM meta description for this script: _cm.json
Output cached? True

Reuse this script in your project

Install MLCommons CM automation meta-framework

Pull CM repository with this automation recipe (CM script)

cm pull repo mlcommons@cm4mlops

Print CM help from the command line

cmr "get dataset openorca language-processing original" --help

Run this script

CLICLI AltPythonDocker

Run this script via CLI

cm run script --tags=get,dataset,openorca,language-processing,original[,variations]

Run this script via CLI (alternative)

cmr "get dataset openorca language-processing original [variations]"

Run this script from Python

import cmind

r = cmind.access({'action':'run'
              'automation':'script',
              'tags':'get,dataset,openorca,language-processing,original'
              'out':'con',
              ...
              (other input keys for this script)
              ...
             })

if r['return']>0:
    print (r['error'])

Run this script via Docker (beta)

cm docker script "get dataset openorca language-processing original[variations]"

VariationsDefault environment

Variations

Group "dataset-type"
Click here to expand this section.
- _calibration
  - ENV variables:
    - CM_DATASET_CALIBRATION: yes
- _validation (default)
  - ENV variables:
    - CM_DATASET_CALIBRATION: no
Group "size"
Click here to expand this section.
- _500
  - ENV variables:
    - CM_DATASET_SIZE: 500
- _60 (default)
  - ENV variables:
    - CM_DATASET_SIZE: 60
- _full
  - ENV variables:
    - CM_DATASET_SIZE: 24576
- _size.#
  - ENV variables:
    - CM_DATASET_SIZE: #

Default variations

_60,_validation

Default environment

These keys can be updated via --env.KEY=VALUE or env dictionary in @input.json or using script flags.

CM_DATASET_CALIBRATION: no

Script output

cmr "get dataset openorca language-processing original [variations]"  -j