YOLOv9 Tutorial

This tutorial will guide you through using trtutils with YOLOv9 models. We will cover:

  1. Exporting ONNX weights from YOLOv9

  2. Building a TensorRT engine

  3. Running inference with the engine

  4. Advanced features and optimizations

Exporting ONNX Weights

YOLOv9 is written by the same authors as YOLOv7 and supports similar exporting options. However, it has a unique feature where the input size is explicitly marked as dynamic in the ONNX weights. Here’s how to export:

# Clone the YOLOv9 repository
$ git clone https://github.com/WongKinYiu/yolov9.git
$ cd yolov9

# Export the ONNX weights
# Adjust parameters according to your needs:
# - topk-all: Maximum number of detections
# - iou-thres: IoU threshold for NMS
# - conf-thres: Confidence threshold
# - img-size: Input image size (will be dynamic in ONNX)
$ python3 export.py \
    --weights PATH_TO_WEIGHTS \
    --include onnx_end2end \
    --simplify \
    --iou-thres 0.5 \
    --conf-thres 0.25 \
    --topk-all 100 \
    --img-size 640 640

Building TensorRT Engine

When building the TensorRT engine for YOLOv9, you need to explicitly specify the input shape since it’s dynamic in the ONNX weights:

from trtutils.trtexec import build_engine

# Build the engine with FP16 precision
# Note: The input shape must match the img-size used during export
build_engine(
    weights="yolov9.onnx",
    output="yolov9.engine",
    fp16=True,
    shapes=[("images", (1, 3, 640, 640))],  # Must match export img-size
    workspace_size=1 << 30,  # 1GB workspace
)

# For Jetson devices with DLA support
build_engine(
    weights="yolov9.onnx",
    output="yolov9_dla.engine",
    fp16=True,
    shapes=[("images", (1, 3, 640, 640))],
    dla_core=0,  # Use DLA core 0
    workspace_size=1 << 30,
)

Running Inference

The YOLO class provides a high-level interface for running YOLOv9 inference:

import cv2
from trtutils.impls.yolo import YOLO, YOLO9

# Load the YOLOv9 model
yolo = YOLO("yolov9.engine")

# OR, use the YOLO9 class
yolo = YOLO9("yolov9.engine")

# Read and process an image
img = cv2.imread("example.jpg")
detections = yolo.end2end(img)

# Print results
for bbox, confidence, class_id in detections:
    print(f"Class: {class_id}, Confidence: {confidence}")
    print(f"Bounding Box: {bbox}")

Advanced Features

Parallel Execution

You can run multiple YOLOv9 models in parallel:

from trtutils.impls.yolo import ParallelYOLO

# Create a parallel YOLO instance with multiple engines
yolo = ParallelYOLO(["yolov9_1.engine", "yolov9_2.engine"])

# Run inference on multiple images
images = [cv2.imread(f"image{i}.jpg") for i in range(2)]
results = yolo.end2end(images)

Benchmarking

Measure performance with the built-in benchmarking utilities:

from trtutils import benchmark_engine

# Run 1000 iterations
results = benchmark_engine("yolov9.engine", iterations=1000)
print(f"Average latency: {results.latency.mean:.2f}ms")
print(f"Throughput: {1000/results.latency.mean:.2f} FPS")

# On Jetson devices, measure power consumption
from trtutils.jetson import benchmark_engine as jetson_benchmark

results = jetson_benchmark(
    "yolov9.engine",
    iterations=1000,
    tegra_interval=1  # More frequent power measurements
)
print(f"Average power draw: {results.power_draw.mean:.2f}W")
print(f"Total energy used: {results.energy.sum:.2f}J")

Troubleshooting

Common issues and solutions:

  1. Engine Creation Fails - Ensure the input shape matches the img-size used during export - Check if you have enough GPU memory (workspace_size parameter) - Verify the ONNX weights are valid

  2. Incorrect Detections - Verify the input image preprocessing matches the training - Check if the confidence and IoU thresholds are appropriate

  3. Performance Issues - Try enabling FP16 precision - On Jetson devices, ensure MAXN power mode and enable jetson_clocks

  4. Dynamic Shape Issues - Always specify the input shape when building the engine - The shape must match the img-size used during export - If you need multiple input sizes, build separate engines