smaug

Smaug

Warning

The convert_checkpoint.py / trtllm-build / run.py workflow described below is legacy and will not receive new features. New projects should use trtllm-serve or the LLM Python API instead.

This document elaborates how to build the Smaug-72B-v0.1 model to runnable engines on multi-GPU node and perform a summarization task using these engines.

Overview

The TensorRT LLM support for Smaug-72B-v0.1 is based on the LLaMA model, the implementation can be found in tensorrt_llm/models/llama/model.py. Smaug model resembles LLaMA very much except it uses bias term in its attention module, we therefore reuse the LLaMA example code for Smaug,

convert_checkpoint.py to convert the LLaMA model into TensorRT LLM checkpoint format.

In addition, there are two shared files in the parent folder examples for inference and evaluation:

../../../run.py to run the inference on an input text;
../../../summarize.py to summarize the articles in the cnn_dailymail dataset.

Support Matrix

FP16

Usage

This section gives a whole process where we convert HF models, build TensorRT LLM engines and ultimately perform summarization.

Build TensorRT engine(s)

Run the following commands and TRT-LLM will first transforms a HF model into its own checkpoint format, then builds a TRT engine based on the checkpoint

python ../../../llama/convert_checkpoint.py \
    --model_dir ./Smaug-72B-v0.1 \
    --output_dir ./tllm_checkpoint_8gpu_tp8 \
    --dtype float16 \
    --tp_size 8

trtllm-build --checkpoint_dir ./tllm_checkpoint_8gpu_tp8 \
    --output_dir ./Smaug_72B_tp8 \
    --gemm_plugin float16 \
    --gpt_attention_plugin float16 \
    --context_fmha=enable \
    --max_batch_size 64 \
    --remove_input_padding=enable

Run Summarization

After building TRT engine, we can use it to perform various tasks. TensorRT LLM provides handy code to run summarization on cnn_dailymail dataset and get ROUGE scores. The ROUGE-1 score can be used to validate model implementations.

mpirun -n 8 -allow-run-as-root python ../../../summarize.py \
    --hf_model_dir ../Smaug-72B-v0.1 \
    --engine_dir ./Smaug_72B_tp8 \
    --data_type fp16 \
    --test_hf \
    --hf_device_map_auto \
    --test_trt_llm

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Smaug

Overview

Support Matrix

Usage

Build TensorRT engine(s)

Run Summarization

FilesExpand file tree

smaug

Directory actions

More options

Directory actions

More options

Latest commit

History

smaug

Folders and files

parent directory

README.md

Smaug

Overview

Support Matrix

Usage

Build TensorRT engine(s)

Run Summarization