TRTUtils CLI Documentation¶
This document provides a comprehensive guide to the TRTUtils command-line interface (CLI).
Overview¶
TRTUtils provides a command-line interface with several subcommands for working with TensorRT engines and models. The main commands are:
benchmark: Benchmark a TensorRT enginebuild: Build a TensorRT engine from an ONNX modelbuild_dla: Build a TensorRT engine with mixed GPU/DLA layers and precision automaticallycan_run_on_dla: Evaluate if a model can run on a DLA and specific layer/chunk compatibility.inspect: Inspect a TensorRT enginetrtexec: Run trtexec with the provided optionsyolo: Run YOLO inference with TensorRT
Commands¶
Benchmark¶
Benchmark a TensorRT engine to measure its performance metrics.
python3 -m trtutils benchmark --engine model.engine --iterations 2000 --warmup_iterations 200
Options¶
--engine, -e: Path to the engine file (required)--iterations, -i: Number of iterations to measure over (default: 1000)--warmup_iterations, -wi: Number of iterations to warmup the model before measuring (default: 100)--jetson, -j: Use the Jetson-specific benchmarker to record energy and power draw metrics
Output¶
The benchmark command will output: * Latency metrics (mean, median, min, max) in milliseconds * Energy consumption metrics (if using Jetson) in Joules * Power draw metrics (if using Jetson) in Watts
Build¶
Build a TensorRT engine from an ONNX model.
# Basic build with FP16 precision
python3 -m trtutils build --onnx model.onnx --output model.engine --fp16 --workspace 8.0
# Build with INT8 quantization using calibration
python3 -m trtutils build \
--onnx model.onnx \
--output model.engine \
--int8 \
--calibration_dir ./calibration_images \
--input_shape 640 640 3 \
--input_dtype float32 \
--batch_size 8 \
--data_order NCHW \
--resize_method letterbox \
--input_scale 0.0 1.0
Options¶
--onnx, -o: Path to the ONNX model file (required)--output, -out: Path to save the TensorRT engine file (required)--device, -d: Device to use for the engine (choices: gpu, dla, GPU, DLA; default: gpu)--timing_cache, -tc: Path to store timing cache data (default: ‘timing.cache’)--workspace, -w: Workspace size in GB (default: 4.0)--dla_core: Specify the DLA core (default: engine built for GPU)--calibration_cache, -cc: Path to store calibration cache data (default: ‘calibration.cache’)--calibration_dir, -cd: Directory containing images for INT8 calibration--input_shape, -is: Input shape in HWC format (height, width, channels)--input_dtype, -id: Input data type (choices: float32, float16, int8)--batch_size, -bs: Batch size for calibration (default: 8)--data_order, -do: Data ordering expected by the network (choices: NCHW, NHWC, default: NCHW)--max_images, -mi: Maximum number of images to use for calibration--resize_method, -rm: Method to resize images (choices: letterbox, linear, default: letterbox)--input_scale, -sc: Input value range (default: [0.0, 1.0])--gpu_fallback: Allow GPU fallback for unsupported layers when building for DLA--direct_io: Use direct IO for the engine--prefer_precision_constraints: Prefer precision constraints--reject_empty_algorithms: Reject empty algorithms--ignore_timing_mismatch: Allow different CUDA device timing caches to be used--fp16: Quantize the engine to FP16 precision--int8: Quantize the engine to INT8 precision--verbose: Verbose output from can_run_on_dla
Note
When using INT8 quantization with calibration, you must provide:
* --calibration_dir: Directory containing calibration images
* --input_shape: Expected input shape in HWC format
* --input_dtype: Expected input data type
Build DLA¶
Build a TensorRT engine for DLA, supporting mixed GPU/DLA layers and precision.
python3 -m trtutils build_dla \
--onnx model.onnx \
--output model.engine \
--dla_core 0 \
--max_chunks 1 \
--min_layers 20 \
--image_dir ./calibration_images \
--shape 640 640 3 \
--dtype float32 \
--batch_size 8 \
--order NCHW \
--resize_method letterbox \
--input_scale 0.0 1.0
Options¶
--onnx, -o: Path to the ONNX model file (required)--output, -out: Path to save the TensorRT engine file (required)--image_dir: Path to the directory containing images for calibration (required)--dla_core: Specify the DLA core (default: 0)--max_chunks: The number of DLA compatible chunks to use in the compiled model (default: 1)--min_layers: The minimum number of layers for a chunk to be scheduled on DLA (default: 20)--shape: Input shape in HWC format (height, width, channels; default: 640 640 3)--dtype: Input data type (choices: float32, float16, int8; default: float32)--batch_size: Batch size for calibration (default: 8)--order: Data ordering expected by the network (choices: NCHW, NHWC, default: NCHW)--max_images: Maximum number of images to use for calibration--resize_method: Method to resize images (choices: letterbox, linear, default: letterbox)--input_scale: Input value range (default: [0.0, 1.0])--timing_cache, -tc: Path to store timing cache data (default: ‘timing.cache’)--verbose: Verbose output from can_run_on_dla
Can Run on DLA¶
Evaluate if a model can run on a DLA (Deep Learning Accelerator).
# Basic compatibility check
python3 -m trtutils can_run_on_dla --onnx model.onnx
# Detailed layer information
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_layers
# Detailed chunk information
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_chunks
# Full detailed output
python3 -m trtutils can_run_on_dla --onnx model.onnx --verbose_layers --verbose_chunks
Options¶
--onnx, -o: Path to the ONNX model file (required)--verbose_layers: Print detailed information about each layer’s DLA compatibility--verbose_chunks: Print detailed information about layer chunks and their device assignments
Output¶
The command will output:
Whether the model is fully DLA compatible
The percentage of layers that are compatible with DLA
If
--verbose_layersis enabled:Detailed information about each layer including name, type, precision, and metadata
DLA compatibility status for each layer
If
--verbose_chunksis enabled:Number of layer chunks found
For each chunk:
Start and end layer indices
Number of layers in the chunk
Device assignment (DLA or GPU)
Inspect¶
Inspect a TensorRT engine for metadata and IO information.
python3 -m trtutils inspect --engine model.engine
Options¶
--engine, -e: Path to the engine file (required)
Output¶
The inspect command will output: * Engine size in MB * Max batch size * Input and output tensor names, shapes, and dtypes
TRTExec¶
Run trtexec with the provided options.
python3 -m trtutils trtexec [options]
For detailed information about trtexec options, please refer to the NVIDIA TensorRT documentation.
YOLO¶
Run YOLO inference with TensorRT.
# Run inference on a single image
python3 -m trtutils yolo --engine model.engine --input image.jpg --conf_thres 0.25 --preprocessor cuda
# Run inference on a video with custom settings
python3 -m trtutils yolo \
--engine model.engine \
--input video.mp4 \
--conf_thres 0.3 \
--input_range 0.0 255.0 \
--preprocessor cpu \
--resize_method letterbox \
--warmup \
--warmup_iterations 20
Options¶
--engine, -e: Path to the TensorRT engine file (required)--input, -i: Path to the input image or video file (required)--conf_thres, -c: Confidence threshold for detections (default: 0.1)--input_range, -r: Input value range (default: [0.0, 1.0])--preprocessor, -p: Preprocessor to use (choices: cpu, cuda, default: cuda)--resize_method, -rm: Method to resize images (choices: letterbox, linear, default: letterbox)--warmup, -w: Perform warmup iterations--warmup_iterations, -wi: Number of warmup iterations (default: 10)--show: Show the detections--verbose, -v: Output additional debugging information
Output¶
The YOLO command will output: * Number of detections found in image or per frame.
Examples¶
Benchmarking an Engine¶
python3 -m trtutils benchmark --engine model.engine --iterations 2000 --warmup_iterations 200
Building an Engine from ONNX¶
# Basic build with FP16 precision
python3 -m trtutils build --onnx model.onnx --output model.engine --fp16 --workspace 8.0
# Build with INT8 quantization using calibration
python3 -m trtutils build \
--onnx model.onnx \
--output model.engine \
--int8 \
--calibration_dir ./calibration_images \
--input_shape 640 640 3 \
--input_dtype float32 \
--batch_size 8 \
--data_order NCHW \
--resize_method letterbox \
--input_scale 0.0 1.0
Building a DLA Engine¶
python3 -m trtutils build_dla \
--onnx model.onnx \
--output model.engine \
--dla_core 0 \
--max_chunks 1 \
--min_layers 20 \
--image_dir ./calibration_images \
--shape 640 640 3 \
--dtype float32 \
--batch_size 8 \
--order NCHW \
--resize_method letterbox \
--input_scale 0.0 1.0
Checking DLA Compatibility¶
# Basic compatibility check
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16
# Detailed layer information
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_layers
# Detailed chunk information
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_chunks
# Full detailed output
python3 -m trtutils can_run_on_dla --onnx model.onnx --fp16 --verbose_layers --verbose_chunks
Inspecting an Engine¶
python3 -m trtutils inspect --engine model.engine
Running YOLO Inference¶
# Run inference on a single image
python3 -m trtutils yolo --engine model.engine --input image.jpg --conf_thres 0.25 --preprocessor cuda
# Run inference on a video with custom settings
python3 -m trtutils yolo \
--engine model.engine \
--input video.mp4 \
--conf_thres 0.3 \
--input_range 0.0 255.0 \
--preprocessor cpu \
--resize_method letterbox \
--warmup \
--warmup_iterations 20
Notes¶
All paths can be specified as relative or absolute paths
The CLI automatically sets the log level to INFO when running
For Jetson-specific features, make sure you’re running on a Jetson device
When using INT8 quantization, ensure you have the appropriate calibration data