get-preprocessed-dataset-squad
Automatically generated README for this automation recipe: get-preprocessed-dataset-squad
Category: AI/ML datasets
License: Apache 2.0
- CM meta description for this script: _cm.yaml
- Output cached? True
Reuse this script in your project
Install MLCommons CM automation meta-framework
Pull CM repository with this automation recipe (CM script)
cm pull repo mlcommons@cm4mlops
Print CM help from the command line
cmr "get dataset preprocessed tokenized squad" --help
Run this script
Run this script via CLI
cm run script --tags=get,dataset,preprocessed,tokenized,squad[,variations]
Run this script via CLI (alternative)
cmr "get dataset preprocessed tokenized squad [variations]"
Run this script from Python
import cmind
r = cmind.access({'action':'run'
'automation':'script',
'tags':'get,dataset,preprocessed,tokenized,squad'
'out':'con',
...
(other input keys for this script)
...
})
if r['return']>0:
print (r['error'])
Run this script via Docker (beta)
cm docker script "get dataset preprocessed tokenized squad[variations]"
Variations
-
Group "calibration-set"
Click here to expand this section.
_calib1
- ENV variables:
- CM_DATASET_SQUAD_CALIBRATION_SET:
one
- CM_DATASET_SQUAD_CALIBRATION_SET:
- ENV variables:
_calib2
- ENV variables:
- CM_DATASET_SQUAD_CALIBRATION_SET:
two
- CM_DATASET_SQUAD_CALIBRATION_SET:
- ENV variables:
_no-calib
(default)- ENV variables:
- CM_DATASET_SQUAD_CALIBRATION_SET: ``
- ENV variables:
-
Group "doc-stride"
Click here to expand this section.
_doc-stride.#
- ENV variables:
- CM_DATASET_DOC_STRIDE:
#
- CM_DATASET_DOC_STRIDE:
- ENV variables:
_doc-stride.128
(default)- ENV variables:
- CM_DATASET_DOC_STRIDE:
128
- CM_DATASET_DOC_STRIDE:
- ENV variables:
-
Group "packing"
Click here to expand this section.
_packed
- ENV variables:
- CM_DATASET_SQUAD_PACKED:
yes
- CM_DATASET_SQUAD_PACKED:
- ENV variables:
-
Group "raw"
Click here to expand this section.
_pickle
- ENV variables:
- CM_DATASET_RAW:
no
- CM_DATASET_RAW:
- ENV variables:
_raw
(default)- ENV variables:
- CM_DATASET_RAW:
yes
- CM_DATASET_RAW:
- ENV variables:
-
Group "seq-length"
Click here to expand this section.
_seq-length.#
- ENV variables:
- CM_DATASET_MAX_SEQ_LENGTH:
#
- CM_DATASET_MAX_SEQ_LENGTH:
- ENV variables:
_seq-length.384
(default)- ENV variables:
- CM_DATASET_MAX_SEQ_LENGTH:
384
- CM_DATASET_MAX_SEQ_LENGTH:
- ENV variables:
Default variations
_doc-stride.128,_no-calib,_raw,_seq-length.384
Native script being run
No run file exists for Windows
Script output
cmr "get dataset preprocessed tokenized squad [variations]" -j