get-preprocessed-dataset-squad
Automatically generated README for this automation recipe: get-preprocessed-dataset-squad
Category: AI/ML datasets
License: Apache 2.0
- CM meta description for this script: _cm.yaml
- Output cached? True
Reuse this script in your project
Install MLCommons CM automation meta-framework
Pull CM repository with this automation recipe (CM script)
cm pull repo mlcommons@cm4mlops
Print CM help from the command line
cmr "get dataset preprocessed tokenized squad" --help
Run this script
Run this script via CLI
cm run script --tags=get,dataset,preprocessed,tokenized,squad[,variations] 
Run this script via CLI (alternative)
cmr "get dataset preprocessed tokenized squad [variations]" 
Run this script from Python
import cmind
r = cmind.access({'action':'run'
              'automation':'script',
              'tags':'get,dataset,preprocessed,tokenized,squad'
              'out':'con',
              ...
              (other input keys for this script)
              ...
             })
if r['return']>0:
    print (r['error'])
Run this script via Docker (beta)
cm docker script "get dataset preprocessed tokenized squad[variations]" 
Variations
- 
Group "calibration-set" Click here to expand this section.- _calib1- ENV variables:- CM_DATASET_SQUAD_CALIBRATION_SET: one
 
- CM_DATASET_SQUAD_CALIBRATION_SET: 
 
- ENV variables:
- _calib2- ENV variables:- CM_DATASET_SQUAD_CALIBRATION_SET: two
 
- CM_DATASET_SQUAD_CALIBRATION_SET: 
 
- ENV variables:
- _no-calib(default)- ENV variables:- CM_DATASET_SQUAD_CALIBRATION_SET: ``
 
 
- ENV variables:
 
- 
Group "doc-stride" Click here to expand this section.- _doc-stride.#- ENV variables:- CM_DATASET_DOC_STRIDE: #
 
- CM_DATASET_DOC_STRIDE: 
 
- ENV variables:
- _doc-stride.128(default)- ENV variables:- CM_DATASET_DOC_STRIDE: 128
 
- CM_DATASET_DOC_STRIDE: 
 
- ENV variables:
 
- 
Group "packing" Click here to expand this section.- _packed- ENV variables:- CM_DATASET_SQUAD_PACKED: yes
 
- CM_DATASET_SQUAD_PACKED: 
 
- ENV variables:
 
- 
Group "raw" Click here to expand this section.- _pickle- ENV variables:- CM_DATASET_RAW: no
 
- CM_DATASET_RAW: 
 
- ENV variables:
- _raw(default)- ENV variables:- CM_DATASET_RAW: yes
 
- CM_DATASET_RAW: 
 
- ENV variables:
 
- 
Group "seq-length" Click here to expand this section.- _seq-length.#- ENV variables:- CM_DATASET_MAX_SEQ_LENGTH: #
 
- CM_DATASET_MAX_SEQ_LENGTH: 
 
- ENV variables:
- _seq-length.384(default)- ENV variables:- CM_DATASET_MAX_SEQ_LENGTH: 384
 
- CM_DATASET_MAX_SEQ_LENGTH: 
 
- ENV variables:
 
Default variations
_doc-stride.128,_no-calib,_raw,_seq-length.384
Native script being run
No run file exists for Windows
Script output
cmr "get dataset preprocessed tokenized squad [variations]"  -j