devzone-repro.md

NVIDIA Deep Learning Inference Performance Reproduction Guide

This repository provides instructions reproduce inference performance data from the the NVIDIA Deep Learning Performance - AI Inference page.

Prerequisites

Before configuring the orchestrator, ensure you have downloaded the required NVFP4 model weights from Hugging Face:

DeepSeek-R1 (DSR1): DeepSeek-R1-0528-NVFP4-v2
Qwen3.5-397B: Qwen/Qwen3.5-397B-A17B
Kimi-K2.5: Kimi-K2.5-NVFP4

Environment Setup

Benchmarking is orchestrated using srt-slurm, a command-line tool for distributed LLM inference benchmarks on SLURM clusters. (Support for benchmarking Kubernetes clusters coming soon.)

Clone and Install:

# Enter a directory on NFS, accessible by all nodes of your cluster.
git clone https://github.com/ishandhanani/srt-slurm.git 
cd srt-slurm
git checkout recipes/moe

# Initialize virtual environment and install dependencies (not shown)
uv venv
uv pip install -e .

Initialize SLURM Workspace: Execute the setup command below. You will be prompted to specify your SLURM account and partition.

#One-time setup (downloads NATS/ETCD, creates srtslurm.yaml)
make setup ARCH=aarch64  # or ARCH=x86_64

Configure Model Paths: The setup script will generate an srtslurm.yaml file. Edit this file to append your local model paths:

model_path:
  dsr1: /path/to/local/dsr1
  qwen3.5-397b: /path/to/local/qwen3.5-397b
  kimi-k2.5: /path/to/local/kimi-k2.5

Depending on your cluster configuration, you may need to specify additional arguments in srtslurm.yaml. See https://github.com/ishandhanani/srt-slurm/blob/main/srtslurm.yaml.example for details.

Running the Benchmarks

To execute a benchmark, apply the target configuration file using the srtctl CLI:

srtctl apply -f <path-to-config-file>

Available benchmarking configurations for published performance data are mapped below. Select the recipe that matches your target performance profile.

Model	1K/1K	8K/1K	1K/8K	128K/8K
DSR1		GB300	GB300	GB300
gpt-oss-120b	B200, H200	B200, H200
Kimi-K2.5	B200	B200	B200
Qwen3.5-397B	B200	B200	B200

Support

Terminology used in recipes is explained in the Appendix.

For questions or to provide feedback, please contact LLMBenchmarks@nvidia.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Deep Learning Inference Performance Reproduction Guide

Prerequisites

Environment Setup

Running the Benchmarks

Support

FilesExpand file tree

devzone-repro.md

Latest commit

History

devzone-repro.md

File metadata and controls

NVIDIA Deep Learning Inference Performance Reproduction Guide

Prerequisites

Environment Setup

Running the Benchmarks

Support