NVIDIA Performance Benchmarking

Optimize AI workload performance on NVIDIA AI infrastructure.

Overview

Measure What Matters With NVIDIA Performance Benchmarking

NVIDIA Performance Benchmarking is a suite of tools, recipes, and services that take the guesswork out of measuring performance of AI workloads and infrastructure. NVIDIA Performance Benchmarking provides a standardized and objective means of gauging performance across platforms, essential to optimizing AI workloads and speeding outcomes.

How Well Does Your Infrastructure Perform for the Latest AI Workloads?

Achieve higher performance of NVIDIA AI infrastructure and AI workloads with a suite of tools, recipes, and services.

Explore NVIDIA Performance Benchmarking Recipes

Optimize your AI workload with performance recipes, and quickly set up and run standardized benchmarking methodologies in your environment.

What Is NVIDIA Performance Benchmarking?

Optimize AI workload performance on any NVIDIA accelerated infrastructure with NVIDIA Performance Benchmarking’s suite of tools, services, and recipes.

Optimize AI Workloads With NVIDIA Performance Benchmarking

Using Performance Explorer, users can identify the ideal GPU count that minimizes both total training time and costs. The objective is to identify the right number of GPUs for a given workload that maximizes throughput and minimizes expenses—across projects and teams.

Features

Optimize Your AI Workloads With NVIDIA Performance Benchmarking

The Ingredients to Improve AI Workloads

Explore Performance Recipes, ready-to-use templates for evaluating the performance of specific AI workloads across hardware and software combinations.

Connect With Our Experts

Get guidance from NVIDIA advisors to benchmark and optimize your AI workloads when you join the NVIDIA Performance Benchmarking Program.

NVIDIA Exemplar Cloud

The NVIDIA Exemplar Cloud initiative raises the bar across security, usability, performance, and resiliency, matching NVIDIA’s cloud reference architecture, DGX Cloud.

Benefits

Improve AI Workload Performance Across Platforms

Get the most out of your AI workload environments and unlock the full potential of your AI infrastructure with NVIDIA Performance Benchmarking.

Measure True AI Workload TCO

Determine which platform can deliver the fastest time to train or desired GPU scale and at what cost using real-time and end-to-end performance data.

Accelerate AI Outcomes

Tune and optimize your AI workloads according to end-to-end metrics tailored to the performance of modern generative AI applications.

Benchmark Beyond the FLOPS

Evaluate beyond the GPUs, including infrastructure software, cloud platforms, and application configurations, to gain a holistic view of workload performance.

Align to a Definition of “Good”

Get a standardized and objective means of gauging platform performance, and understand the expected performance for given workloads or use cases.

Resources

Learn More About NVIDIA Performance Benchmarking

Performance Benchmark FAQs

In MLPerf Inference v6.0 (April 2026), systems powered by NVIDIA Blackwell Ultra GPUs (GB300 NVL72) delivered the highest throughput across the widest range of models and scenarios. On DeepSeek-R1, GB300 NVL72 delivered 2.5 million tokens per second—up to 2.7x higher token throughput compared to GB300 NVL72 debut submissions just six months prior, as a result of TensorRT-LLM software updates.

When measuring AI inference cost-effectiveness, it is important to look beyond compute pricing or FLOPs per dollar because these metrics give an incomplete view. The most important metric for AI inference cost-effectiveness is cost per token, or the price-performance actually delivered, especially on MoE and reasoning models. NVIDIA GB300 NVL72 delivers AI inference at $0.123 per million tokens at 116 TPS/user interactivity using NVIDIA Dynamo and TensorRT™-LLM—the lowest cost per token among major platforms, according to SemiAnalysis InferenceX benchmarks as of April 2026.

NVIDIA Blackwell B200 achieves $0.02 per million tokens on GPT-OSS-120B using TensorRT-LLM, according to SemiAnalysis InferenceX benchmarks as of April 2026—a 5x improvement from launch-day costs of $0.11/M tokens achieved through software optimization alone.

NVIDIA B300 (Blackwell Ultra) was designed to meet the increased compute and memory capacity demands of long-context and reasoning AI inference. With a 1.5x increase in dense FP4 performance, 2x attention performance, and 1.5x more HBM memory compared with NVIDIA B200, B300 is able to boost AI reasoning throughput for the largest context lengths. GB300 NVL72 delivers AI inference at $0.123 per million tokens at 116 TPS/user interactivity using NVIDIA Dynamo and TensorRT-LLM—the lowest cost per token among major platforms, according to SemiAnalysis InferenceX benchmarks as of April 2026.

There are a few third-party independent AI inference benchmarks widely used across the industry today. MLPerf Inference is the industry-standard benchmark from MLCommons, measuring throughput and latency across standardized workloads. InferenceX, by SemiAnalysis, is the first independent benchmark for measuring total cost of compute across diverse models and real-world scenarios. InferenceX v2 extends this to benchmark the full Pareto frontier curve. As of April 2026, NVIDIA Blackwell Ultra (GB300 NVL72) leads across all three benchmark suites.

Next Steps

Ready to Get Started?

Supercharge AI workloads on NVIDIA AI infrastructure with NVIDIA Performance Benchmarking Recipes.

Become an Exemplar Cloud

Achieve optimal AI workload performance per TCO in partnership with NVIDIA with data-driven validated benchmarks.

Explore NVIDIA Cloud Accelerator Documentation

Access technical documentation about NVIDIA Cloud Accelerator.