Text Summarization using GPT-J
MLPerf Reference Implementation in Python
Tip
- MLCommons reference implementations are only meant to provide a rules compliant reference implementation for the submitters and in most cases are not best performing. If you want to benchmark any system, it is advisable to use the vendor MLPerf implementation for that system like Nvidia, Intel etc.
GPTJ-99
Edge category
In the edge category, gptj-99 has Offline, SingleStream scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for each scenario.
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 80GB(fp32). 40GB(fp16)
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for each scenario.
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
Tip
- It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=10
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Datacenter category
In the datacenter category, gptj-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for each scenario.
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 80GB(fp32). 40GB(fp16)
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should get you to an interactive shell inside the docker container and do a quick test run for the Offline scenario. Once inside the docker container please do the below commands to do the accuracy + performance runs for each scenario.
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
Tip
- It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1. -
Add
--adr.mlperf-implementation.tags=_branch.master,_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the official MLPerf Inference implementation in a custom fork. -
Add
--adr.inference-src.tags=_repo.<CUSTOM_INFERENCE_REPO_LINK>
if you are modifying the model config accuracy script in the submission checker within a custom fork. -
Add
--adr.inference-src.version=custom
if you are using the modified MLPerf Inference code or accuracy script on submission checker within a custom fork.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=10
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
GPTJ-99.9
Edge category
In the edge category, gptj-99.9 has Offline, SingleStream scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 80GB(fp32). 40GB(fp16)
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
Tip
- It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=10
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Datacenter category
In the datacenter category, gptj-99.9 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=bfloat16
can give better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 80GB(fp32). 40GB(fp16)
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
Tip
- It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=10
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
The above command should do a test run of Offline scenario and record the estimated offline_target_qps.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.--precision=float16
can help run on GPUs with less RAM / gives better performance--beam-size=1
Beam size of 4 is mandatory for a closed division submission but reducing the beam size can help in running the model on GPUs with lower device memory
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
- If you want to download the official MLPerf model and dataset for gptj-99.9 you can follow this README.
Nvidia MLPerf Implementation
GPTJ-99
Edge category
In the edge category, gptj-99 has Offline, SingleStream scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 16GB
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
Tip
-
Default batch size is assigned based on GPU memory or the specified GPU. Please click more option for docker launch or run command to see how to specify the GPU name.
-
When run with
--all_models=yes
, all the benchmark models of NVIDIA implementation can be executed within the same container.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
Datacenter category
In the datacenter category, gptj-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 16GB
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
Tip
-
Default batch size is assigned based on GPU memory or the specified GPU. Please click more option for docker launch or run command to see how to specify the GPU name.
-
When run with
--all_models=yes
, all the benchmark models of NVIDIA implementation can be executed within the same container.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
GPTJ-99.9
Edge category
In the edge category, gptj-99.9 has Offline, SingleStream scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 16GB
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cuda \
--quiet
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=edge \
--execution_mode=valid \
--device=cuda \
--quiet
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
Datacenter category
In the datacenter category, gptj-99.9 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 16GB
-
Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for gptj-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists --gpu_name=<Name of the GPU>
: The GPUs with supported configs in CM areorin
,rtx_4090
,rtx_a6000
,rtx_6000_ada
,l4
,t4
anda100
. For other GPUs, default configuration as per the GPU memory will be used.
Intel MLPerf Implementation
Tip
- Intel MLPerf inference implementation is available only for datacenter category and has been tested only on a limited number of systems. Most of the benchmarks using Intel implementation require at least Intel Sapphire Rapids or higher CPU generation.
GPTJ-99
Edge category
In the edge category, gptj-99 has Offline, SingleStream scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=10
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
SingleStream
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--scenario=SingleStream \
--execution_mode=valid \
--device=cpu \
--quiet
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=edge \
--execution_mode=valid \
--device=cpu \
--quiet
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Datacenter category
In the datacenter category, gptj-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
- Disk Space: 50GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=10
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM GitHub repo URL in username@repo format>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cm_repo_branch=<Custom CM GitHub repo Branch>
: to checkout a custom branch of the cloned cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
Tip
-
Number of threads could be adjusted using
--threads=#
, where#
is the desired number of threads. This option works only if the implementation in use supports threading. -
Batch size could be adjusted using
--batch_size=#
, where#
is the desired batch size. This option works only if the implementation in use is supporting the given batch size. -
_r4.1-dev
could also be given instead of_r5.0-dev
if you want to run the benchmark with the MLPerf version being 4.1.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=10
Offline
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r5.0-dev \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r5.0-dev,_all-scenarios \
--model=gptj-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Qualcomm AI100 MLPerf Implementation
WIP