README.md

GPT-OSS-120B Recipes

Production-ready deployment for GPT-OSS-120B using TensorRT-LLM on Blackwell (GB200) hardware.

Available Configurations

Configuration	GPUs	Mode	Description
trtllm/agg	4x GB200	Aggregated	WideEP, ARM64
trtllm/disagg	5x Blackwell (GB200/B200)	Disaggregated	Prefill/Decode split

Prerequisites

Dynamo Platform installed — See Kubernetes Deployment Guide
GPU cluster with GB200 (Blackwell) GPUs
HuggingFace token with access to the model

Quick Start

# Set namespace
export NAMESPACE=dynamo-demo
kubectl create namespace ${NAMESPACE}

# Create HuggingFace token secret
kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN="your-token-here" \
  -n ${NAMESPACE}

# Download model (update storageClassName in model-cache/model-cache.yaml first!)
kubectl apply -f model-cache/ -n ${NAMESPACE}
kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=3600s

# Deploy
kubectl apply -f trtllm/agg/deploy.yaml -n ${NAMESPACE}

Test the Deployment

# Port-forward the frontend
kubectl port-forward svc/gpt-oss-agg-frontend 8000:8000 -n ${NAMESPACE}

# Send a test request
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 50
  }'

Notes

Update storageClassName in model-cache/model-cache.yaml before deploying
This recipe requires ARM64 (GB200) nodes — it will not run on x86 Hopper/Ampere hardware
Update the container image tag in deploy.yaml to match your Dynamo release version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-OSS-120B Recipes

Available Configurations

Prerequisites

Quick Start

Test the Deployment

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GPT-OSS-120B Recipes

Available Configurations

Prerequisites

Quick Start

Test the Deployment

Notes