app-mlperf-inference
Automatically generated README for this automation recipe: app-mlperf-inference
Category: Modular MLPerf inference benchmark pipeline
License: Apache 2.0
Developers: Arjun Suresh, Thomas Zhu, Grigori Fursin * Notes from the authors, contributors and users: README-extra
This CM script provides a unified interface to prepare and run a modular version of the MLPerf inference benchmark across diverse ML models, data sets, frameworks, libraries, run-time systems and platforms using the cross-platform automation meta-framework (MLCommons CM).
It is assembled from reusable and interoperable CM scripts for DevOps and MLOps being developed by the open MLCommons taskforce on automation and reproducibility.
It is a higher-level wrapper to several other CM scripts modularizing the MLPerf inference benchmark: * Reference Python implementation * Universal C++ implementation * TFLite C++ implementation * NVidia optimized implementation
See this SCC'23 tutorial to use this script to run a reference (unoptimized) Python implementation of the MLPerf object detection benchmark with RetinaNet model, Open Images dataset, ONNX runtime and CPU target.
See this CM script to automate and validate your MLPerf inference submission.
Get in touch with the open taskforce on automation and reproducibility at MLCommons if you need help with your submission or if you would like to participate in further modularization of MLPerf and collaborative design space exploration and optimization of ML Systems.
- CM meta description for this script: _cm.yaml
- Output cached? False
Reuse this script in your project
Install MLCommons CM automation meta-framework
Pull CM repository with this automation recipe (CM script)
cm pull repo mlcommons@cm4mlops
Print CM help from the command line
cmr "app vision language mlcommons mlperf inference generic" --help
Run this script
Run this script via CLI
cm run script --tags=app,vision,language,mlcommons,mlperf,inference,generic[,variations] [--input_flags]
Run this script via CLI (alternative)
cmr "app vision language mlcommons mlperf inference generic [variations]" [--input_flags]
Run this script from Python
import cmind
r = cmind.access({'action':'run'
'automation':'script',
'tags':'app,vision,language,mlcommons,mlperf,inference,generic'
'out':'con',
...
(other input keys for this script)
...
})
if r['return']>0:
print (r['error'])
Run this script via Docker (beta)
cm docker script "app vision language mlcommons mlperf inference generic[variations]" [--input_flags]
Variations
-
Group "implementation"
Click here to expand this section.
_cpp
- Aliases:
_mil,_mlcommons-cpp
- ENV variables:
- CM_MLPERF_CPP:
yes
- CM_MLPERF_IMPLEMENTATION:
mlcommons_cpp
- CM_IMAGENET_ACCURACY_DTYPE:
float32
- CM_OPENIMAGES_ACCURACY_DTYPE:
float32
- CM_MLPERF_CPP:
- Aliases:
_intel-original
- Aliases:
_intel
- ENV variables:
- CM_MLPERF_IMPLEMENTATION:
intel
- CM_MLPERF_IMPLEMENTATION:
- Aliases:
_kilt
- Aliases:
_qualcomm
- ENV variables:
- CM_MLPERF_IMPLEMENTATION:
qualcomm
- CM_MLPERF_IMPLEMENTATION:
- Aliases:
_nvidia-original
- Aliases:
_nvidia
- ENV variables:
- CM_MLPERF_IMPLEMENTATION:
nvidia
- CM_SQUAD_ACCURACY_DTYPE:
float16
- CM_IMAGENET_ACCURACY_DTYPE:
int32
- CM_CNNDM_ACCURACY_DTYPE:
int32
- CM_LIBRISPEECH_ACCURACY_DTYPE:
int8
- CM_MLPERF_IMPLEMENTATION:
- Aliases:
_reference
(default)- Aliases:
_mlcommons-python,_python
- ENV variables:
- CM_MLPERF_PYTHON:
yes
- CM_MLPERF_IMPLEMENTATION:
mlcommons_python
- CM_SQUAD_ACCURACY_DTYPE:
float32
- CM_IMAGENET_ACCURACY_DTYPE:
float32
- CM_OPENIMAGES_ACCURACY_DTYPE:
float32
- CM_LIBRISPEECH_ACCURACY_DTYPE:
float32
- CM_CNNDM_ACCURACY_DTYPE:
int32
- CM_MLPERF_PYTHON:
- Aliases:
_tflite-cpp
- Aliases:
_ctuning-cpp-tflite
- ENV variables:
- CM_MLPERF_TFLITE_CPP:
yes
- CM_MLPERF_CPP:
yes
- CM_MLPERF_IMPLEMENTATION:
ctuning_cpp_tflite
- CM_IMAGENET_ACCURACY_DTYPE:
float32
- CM_MLPERF_TFLITE_CPP:
- Aliases:
-
Group "backend"
Click here to expand this section.
_deepsparse
- ENV variables:
- CM_MLPERF_BACKEND:
deepsparse
- CM_MLPERF_BACKEND:
- ENV variables:
_glow
- ENV variables:
- CM_MLPERF_BACKEND:
glow
- CM_MLPERF_BACKEND:
- ENV variables:
_ncnn
- ENV variables:
- CM_MLPERF_BACKEND:
ncnn
- CM_MLPERF_BACKEND:
- ENV variables:
_onnxruntime
- ENV variables:
- CM_MLPERF_BACKEND:
onnxruntime
- CM_MLPERF_BACKEND:
- ENV variables:
_pytorch
- ENV variables:
- CM_MLPERF_BACKEND:
pytorch
- CM_MLPERF_BACKEND:
- ENV variables:
_ray
- ENV variables:
- CM_MLPERF_BACKEND:
ray
- CM_MLPERF_BACKEND:
- ENV variables:
_tensorrt
- ENV variables:
- CM_MLPERF_BACKEND:
tensorrt
- CM_MLPERF_BACKEND:
- ENV variables:
_tf
- ENV variables:
- CM_MLPERF_BACKEND:
tf
- CM_MLPERF_BACKEND:
- ENV variables:
_tflite
- ENV variables:
- CM_MLPERF_BACKEND:
tflite
- CM_MLPERF_BACKEND:
- ENV variables:
_tvm-onnx
- ENV variables:
- CM_MLPERF_BACKEND:
tvm-onnx
- CM_MLPERF_BACKEND:
- ENV variables:
_tvm-pytorch
- ENV variables:
- CM_MLPERF_BACKEND:
tvm-pytorch
- CM_MLPERF_BACKEND:
- ENV variables:
_tvm-tflite
- ENV variables:
- CM_MLPERF_BACKEND:
tvm-tflite
- CM_MLPERF_BACKEND:
- ENV variables:
-
Group "device"
Click here to expand this section.
_cpu
(default)- ENV variables:
- CM_MLPERF_DEVICE:
cpu
- CM_MLPERF_DEVICE:
- ENV variables:
_cuda
- ENV variables:
- CM_MLPERF_DEVICE:
gpu
- CM_MLPERF_DEVICE:
- ENV variables:
_qaic
- ENV variables:
- CM_MLPERF_DEVICE:
qaic
- CM_MLPERF_DEVICE:
- ENV variables:
_rocm
- ENV variables:
- CM_MLPERF_DEVICE:
rocm
- CM_MLPERF_DEVICE:
- ENV variables:
_tpu
- ENV variables:
- CM_MLPERF_DEVICE:
tpu
- CM_MLPERF_DEVICE:
- ENV variables:
-
Group "model"
Click here to expand this section.
_3d-unet-99
- ENV variables:
- CM_MODEL:
3d-unet-99
- CM_MODEL:
- ENV variables:
_3d-unet-99.9
- ENV variables:
- CM_MODEL:
3d-unet-99.9
- CM_MODEL:
- ENV variables:
_bert-99
- ENV variables:
- CM_MODEL:
bert-99
- CM_MODEL:
- ENV variables:
_bert-99.9
- ENV variables:
- CM_MODEL:
bert-99.9
- CM_MODEL:
- ENV variables:
_dlrm-v2-99
- ENV variables:
- CM_MODEL:
dlrm-v2-99
- CM_MODEL:
- ENV variables:
_dlrm-v2-99.9
- ENV variables:
- CM_MODEL:
dlrm-v2-99.9
- CM_MODEL:
- ENV variables:
_efficientnet
- ENV variables:
- CM_MODEL:
efficientnet
- CM_MODEL:
- ENV variables:
_gptj-99
- ENV variables:
- CM_MODEL:
gptj-99
- CM_MODEL:
- ENV variables:
_gptj-99.9
- ENV variables:
- CM_MODEL:
gptj-99.9
- CM_MODEL:
- ENV variables:
_llama2-70b-99
- ENV variables:
- CM_MODEL:
llama2-70b-99
- CM_MODEL:
- ENV variables:
_llama2-70b-99.9
- ENV variables:
- CM_MODEL:
llama2-70b-99.9
- CM_MODEL:
- ENV variables:
_mobilenet
- ENV variables:
- CM_MODEL:
mobilenet
- CM_MODEL:
- ENV variables:
_resnet50
(default)- ENV variables:
- CM_MODEL:
resnet50
- CM_MODEL:
- ENV variables:
_retinanet
- ENV variables:
- CM_MODEL:
retinanet
- CM_MODEL:
- ENV variables:
_rnnt
- ENV variables:
- CM_MODEL:
rnnt
- CM_MODEL:
- ENV variables:
_sdxl
- ENV variables:
- CM_MODEL:
stable-diffusion-xl
- CM_MODEL:
- ENV variables:
-
Group "precision"
Click here to expand this section.
_bfloat16
- ENV variables:
- CM_MLPERF_QUANTIZATION:
False
- CM_MLPERF_MODEL_PRECISION:
float32
- CM_MLPERF_QUANTIZATION:
- ENV variables:
_float16
- ENV variables:
- CM_MLPERF_QUANTIZATION:
False
- CM_MLPERF_MODEL_PRECISION:
float32
- CM_MLPERF_QUANTIZATION:
- ENV variables:
_float32
(default)- Aliases:
_fp32
- ENV variables:
- CM_MLPERF_QUANTIZATION:
False
- CM_MLPERF_MODEL_PRECISION:
float32
- CM_MLPERF_QUANTIZATION:
- Aliases:
_int4
- ENV variables:
- CM_MLPERF_QUANTIZATION:
True
- CM_MLPERF_MODEL_PRECISION:
int4
- CM_MLPERF_QUANTIZATION:
- ENV variables:
_int8
- Aliases:
_quantized
- ENV variables:
- CM_MLPERF_QUANTIZATION:
True
- CM_MLPERF_MODEL_PRECISION:
int8
- CM_MLPERF_QUANTIZATION:
- Aliases:
_uint8
- ENV variables:
- CM_MLPERF_QUANTIZATION:
True
- CM_MLPERF_MODEL_PRECISION:
uint8
- CM_MLPERF_QUANTIZATION:
- ENV variables:
-
Group "execution-mode"
Click here to expand this section.
_fast
- ENV variables:
- CM_FAST_FACTOR:
5
- CM_OUTPUT_FOLDER_NAME:
fast_results
- CM_MLPERF_RUN_STYLE:
fast
- CM_FAST_FACTOR:
- ENV variables:
_test
(default)- ENV variables:
- CM_OUTPUT_FOLDER_NAME:
test_results
- CM_MLPERF_RUN_STYLE:
test
- CM_OUTPUT_FOLDER_NAME:
- ENV variables:
_valid
- ENV variables:
- CM_OUTPUT_FOLDER_NAME:
valid_results
- CM_MLPERF_RUN_STYLE:
valid
- CM_OUTPUT_FOLDER_NAME:
- ENV variables:
-
Group "reproducibility"
Click here to expand this section.
_r2.1_default
- ENV variables:
- CM_SKIP_SYS_UTILS:
yes
- CM_TEST_QUERY_COUNT:
100
- CM_SKIP_SYS_UTILS:
- ENV variables:
_r3.0_default
- ENV variables:
- CM_SKIP_SYS_UTILS:
yes
- CM_SKIP_SYS_UTILS:
- ENV variables:
_r3.1_default
_r4.0_default
- ENV variables:
- CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH:
/opt/nvmitten-0.1.3-cp38-cp38-linux_x86_64.whl
- CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH:
- ENV variables:
_r4.1_default
- ENV variables:
- CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH:
/opt/nvmitten-0.1.3b0-cp38-cp38-linux_x86_64.whl
- CM_ENV_NVMITTEN_DOCKER_WHEEL_PATH:
- ENV variables:
-
No group (any combination of variations can be selected)
Click here to expand this section.
_power
- ENV variables:
- CM_MLPERF_POWER:
yes
- CM_SYSTEM_POWER:
yes
- CM_MLPERF_POWER:
- ENV variables:
-
Group "batch_size"
Click here to expand this section.
_batch_size.#
- ENV variables:
- CM_MLPERF_LOADGEN_MAX_BATCHSIZE:
#
- CM_MLPERF_LOADGEN_MAX_BATCHSIZE:
- ENV variables:
-
Group "loadgen-scenario"
Click here to expand this section.
_multistream
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
MultiStream
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_offline
(default)- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
Offline
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_server
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
Server
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
_singlestream
- ENV variables:
- CM_MLPERF_LOADGEN_SCENARIO:
SingleStream
- CM_MLPERF_LOADGEN_SCENARIO:
- ENV variables:
Default variations
_cpu,_float32,_offline,_reference,_resnet50,_test
Input Flags
- --scenario: MLPerf inference scenario {Offline,Server,SingleStream,MultiStream} (Offline)
- --mode: MLPerf inference mode {performance,accuracy} (accuracy)
- --test_query_count: Specifies the number of samples to be processed during a test run
- --target_qps: Target QPS
- --target_latency: Target Latency
- --max_batchsize: Maximum batchsize to be used
- --num_threads: Number of CPU threads to launch the application with
- --hw_name: Valid value - any system description which has a config file (under same name) defined here
- --output_dir: Location where the outputs are produced
- --rerun: Redo the run even if previous run files exist (True)
- --regenerate_files: Regenerates measurement files including accuracy.txt files even if a previous run exists. This option is redundant if
--rerun
is used - --adr.python.name: Python virtual environment name (optional) (mlperf)
- --adr.python.version_min: Minimal Python version (3.8)
- --adr.python.version: Force Python version (must have all system deps)
- --adr.compiler.tags: Compiler for loadgen (gcc)
- --adr.inference-src-loadgen.env.CM_GIT_URL: Git URL for MLPerf inference sources to build LoadGen (to enable non-reference implementations)
- --adr.inference-src.env.CM_GIT_URL: Git URL for MLPerf inference sources to run benchmarks (to enable non-reference implementations)
- --quiet: Quiet run (select default values for all questions) (False)
- --readme: Generate README with the reproducibility report
- --debug: Debug MLPerf script
Script flags mapped to environment
--clean=value
→CM_MLPERF_CLEAN_SUBMISSION_DIR=value
--count=value
→CM_MLPERF_LOADGEN_QUERY_COUNT=value
--debug=value
→CM_DEBUG_SCRIPT_BENCHMARK_PROGRAM=value
--docker=value
→CM_RUN_DOCKER_CONTAINER=value
--gpu_name=value
→CM_NVIDIA_GPU_NAME=value
--hw_name=value
→CM_HW_NAME=value
--imagenet_path=value
→IMAGENET_PATH=value
--max_amps=value
→CM_MLPERF_POWER_MAX_AMPS=value
--max_batchsize=value
→CM_MLPERF_LOADGEN_MAX_BATCHSIZE=value
--max_volts=value
→CM_MLPERF_POWER_MAX_VOLTS=value
--mode=value
→CM_MLPERF_LOADGEN_MODE=value
--multistream_target_latency=value
→CM_MLPERF_LOADGEN_MULTISTREAM_TARGET_LATENCY=value
--ntp_server=value
→CM_MLPERF_POWER_NTP_SERVER=value
--num_threads=value
→CM_NUM_THREADS=value
--offline_target_qps=value
→CM_MLPERF_LOADGEN_OFFLINE_TARGET_QPS=value
--output_dir=value
→OUTPUT_BASE_DIR=value
--power=value
→CM_MLPERF_POWER=value
--power_server=value
→CM_MLPERF_POWER_SERVER_ADDRESS=value
--readme=value
→CM_MLPERF_README=value
--regenerate_files=value
→CM_REGENERATE_MEASURE_FILES=value
--rerun=value
→CM_RERUN=value
--scenario=value
→CM_MLPERF_LOADGEN_SCENARIO=value
--server_target_qps=value
→CM_MLPERF_LOADGEN_SERVER_TARGET_QPS=value
--singlestream_target_latency=value
→CM_MLPERF_LOADGEN_SINGLESTREAM_TARGET_LATENCY=value
--target_latency=value
→CM_MLPERF_LOADGEN_TARGET_LATENCY=value
--target_qps=value
→CM_MLPERF_LOADGEN_TARGET_QPS=value
--test_query_count=value
→CM_TEST_QUERY_COUNT=value
Default environment
These keys can be updated via --env.KEY=VALUE
or env
dictionary in @input.json
or using script flags.
- CM_MLPERF_LOADGEN_MODE:
accuracy
- CM_MLPERF_LOADGEN_SCENARIO:
Offline
- CM_OUTPUT_FOLDER_NAME:
test_results
- CM_MLPERF_RUN_STYLE:
test
- CM_TEST_QUERY_COUNT:
10
- CM_MLPERF_QUANTIZATION:
False
Native script being run
No run file exists for Windows
Script output
cmr "app vision language mlcommons mlperf inference generic [variations]" [--input_flags] -j