Recommendation using DLRM v2
Benchmark Implementations
MLPerf Reference Implementation in Python
Tip
- MLCommons reference implementations are only meant to provide a rules compliant reference implementation for the submitters and in most cases are not best performing. If you want to benchmark any system, it is advisable to use the vendor MLPerf implementation for that system like Nvidia, Intel etc.
DLRM-V2-99
Datacenter category
In the datacenter category, dlrm-v2-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=50
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM repo URL>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 2x80GB
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM repo URL>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
DLRM-V2-99.9
Datacenter category
In the datacenter category, dlrm-v2-99.9 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 2x80GB
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
ROCm device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=rocm \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=rocm \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=reference \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=rocm \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
- If you want to download the official MLPerf model and dataset for dlrm-v2-99.9 you can follow this README.
Nvidia MLPerf Implementation
DLRM-V2-99
Datacenter category
In the datacenter category, dlrm-v2-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 24GB
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
Tip
If ran with --all_models=yes
, all the benchmark models of NVIDIA implementation could be run within the same container.
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50 \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM repo URL>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
DLRM-V2-99.9
Datacenter category
In the datacenter category, dlrm-v2-99.9 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
TensorRT framework
CUDA device
Please click here to see the minimum system requirements for running the benchmark
-
Device Memory: 24GB
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--quiet \
--test_query_count=50 \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cuda \
--quiet \
--criteo_day23_raw_data_path=<PATH_TO_CRITEO_DAY23_RAW_DATA>
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Intel MLPerf Implementation
Tip
- Intel MLPerf inference implementation is available only for datacenter category and has been tested only on a limited number of systems. Most of the benchmarks using Intel implementation require at least Intel Sapphire Rapids or higher CPU generation.
DLRM-V2-99
Datacenter category
In the datacenter category, dlrm-v2-99 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Docker Container Build and Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--docker --quiet \
--test_query_count=50
Please click here to see more options for the docker launch
-
--docker_cm_repo=<Custom CM repo URL>
: to use a custom fork of cm4mlops repository inside the docker image -
--docker_cache=no
: to not use docker cache during the image build --docker_os=ubuntu
: ubuntu and rhel are supported.--docker_os_version=20.04
: [20.04, 22.04] are supported for Ubuntu and [8, 9] for RHEL
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
# Setup a virtual environment for Python
cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"
# Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
DLRM-V2-99.9
Datacenter category
In the datacenter category, dlrm-v2-99.9 has Offline, Server scenarios and all the scenarios are mandatory for a closed division submission.
Pytorch framework
CPU device
Please click here to see the minimum system requirements for running the benchmark
-
Disk Space: 500GB
-
System Memory(RAM+SWAP): 512GB
Docker Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists
Native Environment
Please refer to the installation page to install CM for running the automated benchmark commands.
You can reuse the same environment as described for dlrm-v2-99.
Performance Estimation for Offline Scenario
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--test_query_count=50
Offline
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Offline \
--execution_mode=valid \
--device=cpu \
--quiet
Server
cm run script --tags=run-mlperf,inference,_r4.1-dev \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--scenario=Server\
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
All Scenarios
cm run script --tags=run-mlperf,inference,_r4.1-dev,_all-scenarios \
--model=dlrm-v2-99.9 \
--implementation=intel \
--framework=pytorch \
--category=datacenter \
--server_target_qps=<SERVER_TARGET_QPS> \
--execution_mode=valid \
--device=cpu \
--quiet
Tip
<SERVER_TARGET_QPS>
must be determined manually. It is usually around 80% of the Offline QPS, but on some systems, it can drop below 50%. If a higher value is specified, the latency constraint will not be met, and the run will be considered invalid.
Please click here to see more options for the RUN command
-
Use
--division=closed
to do a closed division submission which includes compliance runs -
Use
--rerun
to do a rerun even when a valid run exists