trtutils.builder package¶
Module contents¶
Submodule containing tools for building TensorRT engines.
Classes¶
EngineCalibratorCalibrates an engine during quantization.
ImageBatcherBatches images for calibration during engine building.
ProgressBarProgress bar implementation for TensorRT engine building.
Functions¶
build_engine()Build a TensorRT engine from an ONNX file.
build_dla_engine()Build an efficient TensorRT engine for DLA.
can_run_on_dla()Evaluate if the model can run on a DLA.
read_onnx()Read an ONNX file and get TensorRT objects.
- class trtutils.builder.EngineCalibrator(calibration_cache: Path | str | None = None)[source]¶
Bases:
IInt8EntropyCalibrator2Implements the trt.IInt8EntropyCalibrator2.
- get_batch_size() int[source]¶
Get the batch size.
Overrides from trt.IInt8EntropyCalibrator2.
- Returns:
The batch size
- Return type:
- get_batch(names: list[str]) list[int] | None[source]¶
Get the next batch of data.
Overrides from trt.IInt8EntropyCalibrator2.
- class trtutils.builder.ImageBatcher(image_dir: Path | str, shape: tuple[int, int, int], dtype: np.dtype, batch_size: int = 8, order: str = 'NCHW', max_images: int | None = None, resize_method: str = 'letterbox', input_scale: tuple[float, float] = (0.0, 1.0), *, verbose: bool | None = None)[source]¶
Bases:
AbstractBatcherCreates image batches for calibrating TensorRT engines.
- trtutils.builder.build_dla_engine(onnx: Path | str, output_path: Path | str, data_batcher: AbstractBatcher, dla_core: int, max_chunks: int = 1, min_layers: int = 20, timing_cache: Path | str | None = None, *, verbose: bool | None = None) None[source]¶
Automatically build a TensorRT engine for DLA with automatic layer assignments.
This function will: 1. Check which layers can run on DLA 2. Find the largest chunk of DLA-compatible layers 3. Assign those layers to DLA with INT8 precision 4. Assign remaining layers to GPU with FP16 precision
- Parameters:
onnx (Path, str) – The path to the ONNX model or a pre-made TensorRT network
output_path (Path, str) – The path where the engine should be saved
data_batcher (AbstractBatcher) – The data batcher instance for INT8 calibration
dla_core (int) – The DLA core to use
max_chunks (int, optional) – The maximum number of DLA-compatible chunks to assign to the DLA. By default 1, which will assign the first compatible chunk. Can set to 0 to assign all chunks which meet min_layers.
min_layers (int, optional) – The minimum number of layers in a chunk to be assigned to DLA. By default 20, which will assign chunks with at least 20 layers. Can set to 0 to assign all chunks.
timing_cache (Path, str, optional) – The path to the timing cache file
verbose (bool, optional) – Whether to print verbose output, by default False
- trtutils.builder.build_engine(onnx: Path | str, output: Path | str, default_device: trt.DeviceType | str = <DeviceType.GPU: 0>, timing_cache: Path | str | None = None, workspace: float = 4.0, dla_core: int | None = None, calibration_cache: Path | str | None = None, data_batcher: AbstractBatcher | None = None, layer_precision: list[tuple[int, trt.DataType | None]] | None = None, layer_device: list[tuple[int, trt.DeviceType | None]] | None = None, shapes: list[tuple[str, tuple[int, ...]]] | None = None, input_tensor_formats: list[tuple[str, trt.DataType, trt.TensorFormat]] | None = None, output_tensor_formats: list[tuple[str, trt.DataType, trt.TensorFormat]] | None = None, hooks: list[Callable[[trt.INetworkDefinition], trt.INetworkDefinition]] | None = None, *, gpu_fallback: bool = False, direct_io: bool = False, prefer_precision_constraints: bool = False, reject_empty_algorithms: bool = False, ignore_timing_mismatch: bool = False, fp16: bool | None = None, int8: bool | None = None, cache: bool | None = None, verbose: bool | None = None) None[source]¶
Build a TensorRT engine from an ONNX model.
The order in which operations occur inside build_engine:
Parse the ONNX model
Apply any network hooks
Create optimization profile and apply any manual shapes
Apply builder flags (precision constraints, empty algorithms, direct I/O)
Configure tensor formats if specified
Configure precision (FP16, INT8)
Set default device and DLA core
Apply individual layer precision and device settings
Set up timing cache
Build the engine
Save timing cache and engine
- Parameters:
onnx (Path, str) – The path to the onnx model.
output (Path, str) – The location to save the TensorRT engine.
default_device (trt.DeviceType, str, optional) – The device to use for the engine. By default, trt.DeviceType.GPU. Options are trt.DeviceType.GPU, trt.DeviceType.DLA, or a string of “gpu” or “dla”.
timing_cache (Path, str, optional) – Where to store the timing cache data. Default is None.
workspace (float) – The size of the workspace in gigabytes. Default is 4.0 GiB.
calibration_cache (Path, str, optional) – The path to the calibration cache.
data_batcher (AbstractBatcher, optional) – The data batcher to use for calibration.
dla_core (int, optional) – The DLA core to build the engine for. By default, None or build the engine for GPU.
layer_precision (list[tuple[int, trt.DataType | None]], optional) – The precision to use for specific layers. By default, None.
layer_device (list[tuple[int, trt.DeviceType | None]], optional) – The device to use for specific layers. By default, None.
shapes (list[tuple[str, tuple[int, ...]]], optional) – A list of (input_name, shape) pairs to specify the shapes of the input layers. For example, shapes=[(“images”, (1, 3, imgsz, imgsz))] will set the input “images” to a fixed shape. This shape will be used as the min, optimal, and max shape for the binding. By default, None.
input_tensor_formats (list[tuple[str, trt.DataType, trt.TensorFormat]], optional) – A list of (name, dtype format) to allow deep specification of input layers. For example, input_tensor_formats=[(“input”, trt.DataType.UINT8, trt.TensorFormat.HWC)] By default, None
output_tensor_formats (list[tuple[str, trt.DataType, trt.TensorFormat]], optional) – A list of (name, dtype format) to allow deep specification of output layers. For example, output_tensor_formats=[(“output”, trt.DataType.HALF, trt.TensorFormat.LINEAR)] By default, None
hooks (list[Callable[[trt.INetworkDefinition], trt.INetworkDefinition]], optional) – An optional list of ‘hook’ functions to modify the TensorRT network before the remainder of the build phase occurs. By default, None
gpu_fallback (bool) – Whether or not to allow GPU fallback for unsupported layers when building the engine for DLA. By default, False
direct_io (bool) – Use direct IO for the engine. By default, False
prefer_precision_constraints (bool) – Whether or not to prefer precision constraints. By default, False
reject_empty_algorithms (bool) – Whether or not to reject empty algorithms. By default, False
ignore_timing_mismatch (bool) – Whether or not to allow different CUDA device generated timing caches to be used in the building of engines. By default, False
fp16 (bool, optional) – If True, quantize the engine to FP16 precision.
int8 (bool, optional) – If True, quantize the engine to INT8 precision.
cache (bool, optional) – Whether or not to cache the engine in the trtutils engine cache. If an existing version is found will use that. Uses the name of the output file to assess if the engine has been compiled before. As such, naming the output ‘engine’, ‘model’ or similiar will result in unintended caching behavior. By default None, will not cache the engine.
verbose (bool, optional) – If True, print verbose output. By default, None or False
- Raises:
RuntimeError – If the ONNX model cannot be parsed
RuntimeError – If the TensorRT engines fails to build
ValueError – If layer is manually assigned to DLA and DLA is not supported and gpu_fallback is False
- trtutils.builder.can_run_on_dla(onnx: Path | str | trt.INetworkDefinition, config: trt.IBuilderConfig | None = None, *, verbose_layers: bool | None = None, verbose_chunks: bool | None = None) tuple[bool, list[tuple[list[trt.ILayer], int, int, bool]]][source]¶
Whether or not the entire model can be run on a DLA.
- Parameters:
onnx (Path, str, or trt.INetworkDefinition) – The path to the onnx file or a pre-made TensorRT network.
config (trt.IBuilderConfig, optional) – The TensorRT builder config. Required if onnx is a network.
verbose_layers (bool, optional) – Whether to print verbose output for individual layers, by default None
verbose_chunks (bool, optional) – Whether to print verbose output for layer chunks, by default None
- Returns:
Whether or not the model will all run on DLA and each block of layers. Where each block can run on a single device, DLA or GPU.
- Return type:
- Raises:
ValueError – If config is not provided when onnx is a network
- trtutils.builder.read_onnx(onnx: Path | str, workspace: float = 4.0) tuple[trt.INetworkDefinition, trt.IBuilder, trt.IBuilderConfig, trt.IOnnxParser][source]¶
Open an ONNX model and generate TensorRT network, builder, config, and parser.
- Parameters:
- Returns:
The network, builder, config, and parser.
- Return type:
tuple[trt.INetworkDefinition, trt.IBuilder, trt.IBuilderConfig, trt.IOnnxParser]
- Raises:
FileNotFoundError – If the onnx model does not exist
IsADirectoryError – If the onnx model path is a directory
ValueError – If the onnx model path does not have .onnx extension
RuntimeError – If the ONNX model cannot be parsed
- class trtutils.builder.ProgressBar[source]¶
Bases:
IProgressMonitorA progress bar for building TensorRT engines.
- phase_start(phase_name: str, parent_phase: str | None, num_steps: int) None[source]¶
Start a new phase.