TRTUtils CLI Documentation

This document provides a comprehensive guide to the TRTUtils command-line interface (CLI).

Overview

TRTUtils provides a command-line interface with several subcommands for working with TensorRT engines and models. The main commands are:

  • benchmark: Benchmark a TensorRT engine

  • build: Build a TensorRT engine from an ONNX model

  • build_dla: Build a TensorRT engine with mixed GPU/DLA layers and precision automatically

  • can_run_on_dla: Evaluate if a model can run on a DLA and specific layer/chunk compatibility.

  • inspect: Inspect a TensorRT engine

  • trtexec: Run trtexec with the provided options

  • yolo: Run YOLO inference with TensorRT

Commands

Benchmark

Benchmark a TensorRT engine to measure its performance metrics.

python3 -m trtutils benchmark --engine model.engine --iterations 2000 --warmup_iterations 200

Options

  • --engine, -e: Path to the engine file (required)

  • --iterations, -i: Number of iterations to measure over (default: 1000)

  • --warmup_iterations, -wi: Number of iterations to warmup the model before measuring (default: 100)

  • --jetson, -j: Use the Jetson-specific benchmarker to record energy and power draw metrics

Output

The benchmark command will output: * Latency metrics (mean, median, min, max) in milliseconds * Energy consumption metrics (if using Jetson) in Joules * Power draw metrics (if using Jetson) in Watts

Build

Build a TensorRT engine from an ONNX model.

# Basic build with FP16 precision
python3 -m trtutils build --onnx model.onnx --output model.engine --fp16 --workspace 8.0

# Build with INT8 quantization using calibration
python3 -m trtutils build \
    --onnx model.onnx \
    --output model.engine \
    --int8 \
    --calibration_dir ./calibration_images \
    --input_shape 640 640 3 \
    --input_dtype float32 \
    --batch_size 8 \
    --data_order NCHW \
    --resize_method letterbox \
    --input_scale 0.0 1.0

Options

  • --onnx, -o: Path to the ONNX model file (required)

  • --output, -out: Path to save the TensorRT engine file (required)

  • --device, -d: Device to use for the engine (choices: gpu, dla, GPU, DLA; default: gpu)

  • --timing_cache, -tc: Path to store timing cache data (default: ‘timing.cache’)

  • --workspace, -w: Workspace size in GB (default: 4.0)

  • --dla_core: Specify the DLA core (default: engine built for GPU)

  • --calibration_cache, -cc: Path to store calibration cache data (default: ‘calibration.cache’)

  • --calibration_dir, -cd: Directory containing images for INT8 calibration

  • --input_shape, -is: Input shape in HWC format (height, width, channels)

  • --input_dtype, -id: Input data type (choices: float32, float16, int8)

  • --batch_size, -bs: Batch size for calibration (default: 8)

  • --data_order, -do: Data ordering expected by the network (choices: NCHW, NHWC, default: NCHW)

  • --max_images, -mi: Maximum number of images to use for calibration

  • --resize_method, -rm: Method to resize images (choices: letterbox, linear, default: letterbox)

  • --input_scale, -sc: Input value range (default: [0.0, 1.0])

  • --gpu_fallback: Allow GPU fallback for unsupported layers when building for DLA

  • --direct_io: Use direct IO for the engine

  • --prefer_precision_constraints: Prefer precision constraints

  • --reject_empty_algorithms: Reject empty algorithms

  • --ignore_timing_mismatch: Allow different CUDA device timing caches to be used

  • --fp16: Quantize the engine to FP16 precision

  • --int8: Quantize the engine to INT8 precision

  • --verbose: Verbose output from can_run_on_dla

Note

When using INT8 quantization with calibration, you must provide: * --calibration_dir: Directory containing calibration images * --input_shape: Expected input shape in HWC format * --input_dtype: Expected input data type

Build DLA

Build a TensorRT engine for DLA, supporting mixed GPU/DLA layers and precision.

python3 -m trtutils build_dla \
    --onnx model.onnx \
    --output model.engine \
    --dla_core 0 \
    --max_chunks 1 \
    --min_layers 20 \
    --image_dir ./calibration_images \
    --shape 640 640 3 \
    --dtype float32 \
    --batch_size 8 \
    --order NCHW \
    --resize_method letterbox \
    --input_scale 0.0 1.0

Options

  • --onnx, -o: Path to the ONNX model file (required)

  • --output, -out: Path to save the TensorRT engine file (required)

  • --image_dir: Path to the directory containing images for calibration (required)

  • --dla_core: Specify the DLA core (default: 0)

  • --max_chunks: The number of DLA compatible chunks to use in the compiled model (default: 1)

  • --min_layers: The minimum number of layers for a chunk to be scheduled on DLA (default: 20)

  • --shape: Input shape in HWC format (height, width, channels; default: 640 640 3)

  • --dtype: Input data type (choices: float32, float16, int8; default: float32)

  • --batch_size: Batch size for calibration (default: 8)

  • --order: Data ordering expected by the network (choices: NCHW, NHWC, default: NCHW)

  • --max_images: Maximum number of images to use for calibration

  • --resize_method: Method to resize images (choices: letterbox, linear, default: letterbox)

  • --input_scale: Input value range (default: [0.0, 1.0])

  • --timing_cache, -tc: Path to store timing cache data (default: ‘timing.cache’)

  • --verbose: Verbose output from can_run_on_dla

Can Run on DLA

Evaluate if a model can run on a DLA (Deep Learning Accelerator).

# Basic compatibility check
python3 -m trtutils can_run_on_dla --onnx model.onnx

# Detailed layer information
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_layers

# Detailed chunk information
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_chunks

# Full detailed output
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_layers --verbose_chunks

Options

  • --onnx, -o: Path to the ONNX model file (required)

  • --verbose_layers: Print detailed information about each layer’s DLA compatibility

  • --verbose_chunks: Print detailed information about layer chunks and their device assignments

Output

The command will output:

  • Whether the model is fully DLA compatible

  • The percentage of layers that are compatible with DLA

  • If --verbose_layers is enabled:

    • Detailed information about each layer including name, type, precision, and metadata

    • DLA compatibility status for each layer

  • If --verbose_chunks is enabled:

    • Number of layer chunks found

    • For each chunk:

      • Start and end layer indices

      • Number of layers in the chunk

      • Device assignment (DLA or GPU)

Inspect

Inspect a TensorRT engine for metadata and IO information.

python3 -m trtutils inspect --engine model.engine

Options

  • --engine, -e: Path to the engine file (required)

Output

The inspect command will output: * Engine size in MB * Max batch size * Input and output tensor names, shapes, and dtypes

TRTExec

Run trtexec with the provided options.

python3 -m trtutils trtexec [options]

For detailed information about trtexec options, please refer to the NVIDIA TensorRT documentation.

YOLO

Run YOLO inference with TensorRT.

# Run inference on a single image
python3 -m trtutils yolo --engine model.engine --input image.jpg --conf_thres 0.25 --preprocessor cuda

# Run inference on a video with custom settings
python3 -m trtutils yolo \
    --engine model.engine \
    --input video.mp4 \
    --conf_thres 0.3 \
    --input_range 0.0 255.0 \
    --preprocessor cpu \
    --resize_method letterbox \
    --warmup \
    --warmup_iterations 20

Options

  • --engine, -e: Path to the TensorRT engine file (required)

  • --input, -i: Path to the input image or video file (required)

  • --conf_thres, -c: Confidence threshold for detections (default: 0.1)

  • --input_range, -r: Input value range (default: [0.0, 1.0])

  • --preprocessor, -p: Preprocessor to use (choices: cpu, cuda, default: cuda)

  • --resize_method, -rm: Method to resize images (choices: letterbox, linear, default: letterbox)

  • --warmup, -w: Perform warmup iterations

  • --warmup_iterations, -wi: Number of warmup iterations (default: 10)

  • --show: Show the detections

  • --verbose, -v: Output additional debugging information

Output

The YOLO command will output: * Number of detections found in image or per frame.

Examples

Benchmarking an Engine

python3 -m trtutils benchmark --engine model.engine --iterations 2000 --warmup_iterations 200

Building an Engine from ONNX

# Basic build with FP16 precision
python3 -m trtutils build --onnx model.onnx --output model.engine --fp16 --workspace 8.0

# Build with INT8 quantization using calibration
python3 -m trtutils build \
    --onnx model.onnx \
    --output model.engine \
    --int8 \
    --calibration_dir ./calibration_images \
    --input_shape 640 640 3 \
    --input_dtype float32 \
    --batch_size 8 \
    --data_order NCHW \
    --resize_method letterbox \
    --input_scale 0.0 1.0

Building a DLA Engine

python3 -m trtutils build_dla \
    --onnx model.onnx \
    --output model.engine \
    --dla_core 0 \
    --max_chunks 1 \
    --min_layers 20 \
    --image_dir ./calibration_images \
    --shape 640 640 3 \
    --dtype float32 \
    --batch_size 8 \
    --order NCHW \
    --resize_method letterbox \
    --input_scale 0.0 1.0

Checking DLA Compatibility

# Basic compatibility check
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16

# Detailed layer information
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_layers

# Detailed chunk information
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_chunks

# Full detailed output
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_layers --verbose_chunks

Inspecting an Engine

python3 -m trtutils inspect --engine model.engine

Running YOLO Inference

# Run inference on a single image
python3 -m trtutils yolo --engine model.engine --input image.jpg --conf_thres 0.25 --preprocessor cuda

# Run inference on a video with custom settings
python3 -m trtutils yolo \
    --engine model.engine \
    --input video.mp4 \
    --conf_thres 0.3 \
    --input_range 0.0 255.0 \
    --preprocessor cpu \
    --resize_method letterbox \
    --warmup \
    --warmup_iterations 20

Notes

  • All paths can be specified as relative or absolute paths

  • The CLI automatically sets the log level to INFO when running

  • For Jetson-specific features, make sure you’re running on a Jetson device

  • When using INT8 quantization, ensure you have the appropriate calibration data