trtutils package¶
Subpackages¶
- trtutils.builder package
- trtutils.core package
- Submodules
- Module contents
- Submodules
- Classes
- Functions
BindingKernelTRTEngineInterfaceTRTEngineInterface.nameTRTEngineInterface.engineTRTEngineInterface.contextTRTEngineInterface.loggerTRTEngineInterface.streamTRTEngineInterface.memsizeTRTEngineInterface.dla_coreTRTEngineInterface.pagelocked_memTRTEngineInterface.unified_memTRTEngineInterface.input_specTRTEngineInterface.input_shapesTRTEngineInterface.input_dtypesTRTEngineInterface.output_specTRTEngineInterface.output_shapesTRTEngineInterface.output_dtypesTRTEngineInterface.input_bindingsTRTEngineInterface.output_bindingsTRTEngineInterface.execute()TRTEngineInterface.direct_exec()TRTEngineInterface.get_random_input()TRTEngineInterface.mock_execute()TRTEngineInterface.warmup()
allocate_bindings()allocate_managed_memory()allocate_pinned_memory()allocate_to_device()compile_and_load_kernel()compile_kernel()create_binding()create_context()create_engine()create_kernel_args()create_stream()cuda_call()cuda_free()cuda_host_free()cuda_malloc()destroy_context()destroy_stream()free_device_ptrs()init_cuda()launch_kernel()load_kernel()memcpy_device_to_host()memcpy_device_to_host_async()memcpy_host_to_device()memcpy_host_to_device_async()nvrtc_call()stream_synchronize()
- trtutils.impls package
- trtutils.inspect package
- trtutils.jetson package
- trtutils.trtexec package
Module contents¶
A package for enabling high-level usage of TensorRT in Python.
This package provides a high-level interface for using TensorRT in Python. It provides a class for creating TensorRT engines from serialized engine files, a class for running inference on those engines, and a variety of other utilities.
Submodules¶
builderA module for building TensorRT engines.
coreA module for the core functionality of the package.
jetsonA module implementating additional functionality for Jetson devices.
implsA module containing implementations for different neural networks.
inspectA module for inspecting TensorRT engines.
trtexecA module for utilities related to the trtexec tool.
Classes¶
BenchmarkResultA dataclass for storing profiling information from benchmarking engines.
MetricA dataclass storing specific metric information from benchmarking.
TRTEngineA class for creating TensorRT engines from serialized engine files.
TRTModelA class for running inference on TensorRT engines.
ParallelTRTEnginesA class for running many TRTEngines in parallel.
ParallelTRTModelsA class for running many TRTModels in parallel.
QueuedTRTEngineA class for running a TRTEngine in a seperate thread asynchronously.
QueuedTRTModelA class for running a TRTModel in a seperate thread asynchronously.
Functions¶
benchmark_engine()Benchmark a TensorRT engine.
benchmark_engines()Benchmark TensorRT engines in parallel or serially.
build_engine()Build a TensorRT engine.
find_trtexec()Find an instance of the trtexec binary on the system.
inspect_engine()Inspect a TensorRT engine.
run_trtexec()Run a command with trtexec.
set_log_level()Set the log level of the trtutils package.
enable_jit()Enable just-in-time compilation using Numba.
disable_jit()Disable just-in-time compilation using Numba.
register_jit()Decorator for registering functions for potential JIT compilation.
Objects¶
FLAGSThe flag storage object for trtutils.
LOGThe TensorRT compatible logger for trtutils.
JITA context manager for enabling just-in-time compilation using Numba.
- class trtutils.BenchmarkResult(latency: Metric)[source]¶
Bases:
objectA dataclass to store the results of a benchmark.
- class trtutils.Metric(raw: list[float | int], mean: float | int = -1.0, median: float | int = -1.0, min: float | int = -1.0, max: float | int = -1.0)[source]¶
Bases:
objectA dataclass to store the results of a benchmark.
- class trtutils.ParallelTRTEngines(engines: Sequence[TRTEngine | Path | str | tuple[TRTEngine | Path | str, int]], warmup_iterations: int = 5, *, warmup: bool | None = None)[source]¶
Bases:
objectHandle many TRTEngines in parallel.
- get_random_input(*, new: bool | None = None) list[list[np.ndarray]][source]¶
Get a random input to the underlying TRTEngines.
- submit(inputs: list[list[np.ndarray]]) None[source]¶
Submit data to be processed by the engines.
- Parameters:
inputs (list[list[np.ndarray]]) – The inputs to pass to the engines. Should be a list of the same lenght of engines created.
- Raises:
ValueError – If the inputs are not the same size as the engines.
- class trtutils.ParallelTRTModels(engine_paths: Sequence[Path | str], preprocess: Callable[[list[np.ndarray]], list[np.ndarray]] | list[Callable[[list[np.ndarray]], list[np.ndarray]]] = <function _identity>, postprocess: Callable[[list[np.ndarray]], list[np.ndarray]] | list[Callable[[list[np.ndarray]], list[np.ndarray]]] = <function _identity>, warmup_iterations: int = 5, *, warmup: bool | None = None)[source]¶
Bases:
objectHandle many TRTModels in parallel.
- class trtutils.QueuedTRTEngine(engine: TRTEngine | Path | str, warmup_iterations: int = 5, dla_core: int | None = None, *, warmup: bool | None = None)[source]¶
Bases:
objectInteract with TRTEngine over Thread and Queue.
- property input_spec: list[tuple[list[int], np.dtype]]¶
Get the specs for the input tensor of the network. Useful to prepare memory allocations.
- property input_dtypes: list[np.dtype]¶
Get the datatypes for the input tensors of the network.
- Returns:
A list with the datatype of each input tensor.
- Return type:
list[np.dtype]
- property output_spec: list[tuple[list[int], np.dtype]]¶
Get the specs for the output tensor of the network. Useful to prepare memory allocations.
- property output_shapes: list[tuple[int, ...]]¶
Get the shapes for the output tensors of the network.
- property output_dtypes: list[np.dtype]¶
Get the datatypes for the output tensors of the network.
- Returns:
A list with the datatype of each output tensor.
- Return type:
list[np.dtype]
- get_random_input(*, new: bool | None = None) list[np.ndarray][source]¶
Get a random input to the underlying TRTEngine.
- class trtutils.QueuedTRTModel(engine_path: Path | str, preprocess: Callable[[list[np.ndarray]], list[np.ndarray]] = <function _identity>, postprocess: Callable[[list[np.ndarray]], list[np.ndarray]] = <function _identity>, warmup_iterations: int = 5, engine_type: type[TRTEngine] | None = None, *, warmup: bool | None = None)[source]¶
Bases:
objectInteract with TRTModel over a Thread and Queue.
- class trtutils.TRTEngine(engine_path: Path | str, warmup_iterations: int = 5, backend: str = 'auto', stream: cuda.cudaStream_t | None = None, dla_core: int | None = None, *, warmup: bool | None = None, pagelocked_mem: bool | None = None, unified_mem: bool | None = None, no_warn: bool | None = None, verbose: bool | None = None)[source]¶
Bases:
TRTEngineInterfaceImplements a generic interface for TensorRT engines.
It is thread and process safe to create multiple TRTEngines. It is valid to create a TRTEngine in one thread and use in another. Each TRTEngine has its own CUDA context and there is no safeguards implemented in the class for datarace conditions. As such, a single TRTEngine should not be used in multiple threads or processes.
- execute(data: list[np.ndarray], *, no_copy: bool | None = None, verbose: bool | None = None, debug: bool | None = None) list[np.ndarray][source]¶
Execute the network with the given inputs.
- Parameters:
data (list[np.ndarray]) – The inputs to the network.
no_copy (bool, optional) – If True, the outputs will not be copied out from the cuda allocated host memory. Instead, the host memory will be returned directly. This memory WILL BE OVERWRITTEN INPLACE by future inferences.
verbose (bool, optional) – Whether or not to output additional information to stdout. If not provided, will default to overall engines verbose setting.
debug (bool, optional) – Enable intermediate stream synchronize for debugging.
- Returns:
The outputs of the network.
- Return type:
list[np.ndarray]
- direct_exec(pointers: list[int], *, no_warn: bool | None = None, verbose: bool | None = None, debug: bool | None = None) list[np.ndarray][source]¶
Execute the network with the given GPU memory pointers.
The outputs of this function are not copied on return. The data will be updated inplace if execute or direct_exec is called. Calling this method while giving bad pointers will also cause CUDA runtime to crash and program to crash.
- Parameters:
- Returns:
The outputs of the network.
- Return type:
list[np.ndarray]
- raw_exec(pointers: list[int], *, no_warn: bool | None = None, verbose: bool | None = None, debug: bool | None = None) list[int][source]¶
Execute the network with the given GPU memory pointers.
The outputs of this function are the direct GPU pointers of the output allocations.
- Parameters:
- Returns:
The pointers to the network outputs.
- Return type:
- class trtutils.TRTModel(engine_path: Path | str, preprocess: Callable[[list[np.ndarray]], list[np.ndarray]] = <function _identity>, postprocess: Callable[[list[np.ndarray]], list[np.ndarray]] = <function _identity>, warmup_iterations: int = 5, engine_type: type[TRTEngine] | None = None, *, warmup: bool | None = None)[source]¶
Bases:
objectA wrapper around a TensorRT engine that handles the device memory.
It is thread and process safe to create multiple TRTModels. It is valid to create a TRTModel in one thread and use in another. Each TRTModel has its own CUDA context and there is no safeguards implemented in the class for datarace conditions. As such, a single TRTModel should not be used in multiple threads or processes.
- property stream: cudart.cudaStream_t¶
Access the underlying CUDA stream.
- property preprocessor: Callable[[list[np.ndarray]], list[np.ndarray]]¶
The preprocessing function used in this model.
- property postprocessor: Callable[[list[np.ndarray]], list[np.ndarray]]¶
The postprocessing function used in this model.
- mock_run(data: list[np.ndarray] | None = None) list[np.ndarray][source]¶
Execute the model with random inputs.
- run(inputs: list[np.ndarray], *, preprocessed: bool | None = None, postprocess: bool | None = None) list[np.ndarray][source]¶
Execute the model with the given inputs.
- Parameters:
inputs (list[np.ndarray]) – The inputs to the model
preprocessed (bool, optional) – Whether the inputs are already preprocessed, by default None If None, the inputs will be preprocessed
postprocess (bool, optional) – Whether or not to postprocess the outputs, by default None If None, the outputs will be postprocessed
- Returns:
The outputs of the model
- Return type:
list[np.ndarray]
- trtutils.benchmark_engine(engine: TRTEngine | Path | str, iterations: int = 1000, warmup_iterations: int = 50, dla_core: int | None = None, *, warmup: bool | None = None, verbose: bool | None = None) BenchmarkResult[source]¶
Benchmark a TensorRT engine.
- Parameters:
engine (TRTEngine | Path | str) – The engine to benchmark. Either a TRTEngine object or path to the engine file. If a path is given, then a TRTEngine will be created automatically.
iterations (int, optional) – The number of iterations to run the benchmark for, by default 1000.
warmup_iterations (int, optional) – The number of warmup iterations to run before the benchmark, by default 50.
dla_core (int, optional) – The DLA core to assign DLA layers of the engine to. Default is None. If None, any DLA layers will be assigned to DLA core 0.
warmup (bool, optional) – Whether to do warmup iterations, by default None If None, warmup will be set to True.
verbose (bool, optional) – Whether ot not to output additional information to stdout. Default None/False.
- Returns:
A dataclass containing the results of the benchmark.
- Return type:
- trtutils.benchmark_engines(engines: Sequence[TRTEngine | Path | str | tuple[TRTEngine | Path | str, int]], iterations: int = 1000, warmup_iterations: int = 50, *, warmup: bool | None = None, parallel: bool | None = None, verbose: bool | None = None) list[BenchmarkResult][source]¶
Benchmark a TensorRT engine.
- Parameters:
engines (Sequence[TRTEngine | Path | str | tuple[TRTEngine | Path | str, int]],) – The engines to benchmark as paths to the engine files.
iterations (int, optional) – The number of iterations to run the benchmark for, by default 1000.
warmup_iterations (int, optional) – The number of warmup iterations to run before the benchmark, by default 50.
warmup (bool, optional) – Whether to do warmup iterations, by default None If None, warmup will be set to True.
parallel (bool, optional) – Whether or not to process the engines in parallel. Useful for assessing concurrent execution performance. Will execute the engines in lockstep. If None, will benchmark each engine individually.
verbose (bool, optional) – Whether ot not to output additional information to stdout. Default None/False.
- Returns:
A list of dataclasses containing the results of the benchmark. If parallel was True, will only contain one item.
- Return type:
- trtutils.build_engine(onnx: Path | str, output: Path | str, default_device: trt.DeviceType | str = <DeviceType.GPU: 0>, timing_cache: Path | str | None = None, workspace: float = 4.0, dla_core: int | None = None, calibration_cache: Path | str | None = None, data_batcher: AbstractBatcher | None = None, layer_precision: list[tuple[int, trt.DataType | None]] | None = None, layer_device: list[tuple[int, trt.DeviceType | None]] | None = None, shapes: list[tuple[str, tuple[int, ...]]] | None = None, input_tensor_formats: list[tuple[str, trt.DataType, trt.TensorFormat]] | None = None, output_tensor_formats: list[tuple[str, trt.DataType, trt.TensorFormat]] | None = None, hooks: list[Callable[[trt.INetworkDefinition], trt.INetworkDefinition]] | None = None, *, gpu_fallback: bool = False, direct_io: bool = False, prefer_precision_constraints: bool = False, reject_empty_algorithms: bool = False, ignore_timing_mismatch: bool = False, fp16: bool | None = None, int8: bool | None = None, cache: bool | None = None, verbose: bool | None = None) None[source]¶
Build a TensorRT engine from an ONNX model.
The order in which operations occur inside build_engine:
Parse the ONNX model
Apply any network hooks
Create optimization profile and apply any manual shapes
Apply builder flags (precision constraints, empty algorithms, direct I/O)
Configure tensor formats if specified
Configure precision (FP16, INT8)
Set default device and DLA core
Apply individual layer precision and device settings
Set up timing cache
Build the engine
Save timing cache and engine
- Parameters:
onnx (Path, str) – The path to the onnx model.
output (Path, str) – The location to save the TensorRT engine.
default_device (trt.DeviceType, str, optional) – The device to use for the engine. By default, trt.DeviceType.GPU. Options are trt.DeviceType.GPU, trt.DeviceType.DLA, or a string of “gpu” or “dla”.
timing_cache (Path, str, optional) – Where to store the timing cache data. Default is None.
workspace (float) – The size of the workspace in gigabytes. Default is 4.0 GiB.
calibration_cache (Path, str, optional) – The path to the calibration cache.
data_batcher (AbstractBatcher, optional) – The data batcher to use for calibration.
dla_core (int, optional) – The DLA core to build the engine for. By default, None or build the engine for GPU.
layer_precision (list[tuple[int, trt.DataType | None]], optional) – The precision to use for specific layers. By default, None.
layer_device (list[tuple[int, trt.DeviceType | None]], optional) – The device to use for specific layers. By default, None.
shapes (list[tuple[str, tuple[int, ...]]], optional) – A list of (input_name, shape) pairs to specify the shapes of the input layers. For example, shapes=[(“images”, (1, 3, imgsz, imgsz))] will set the input “images” to a fixed shape. This shape will be used as the min, optimal, and max shape for the binding. By default, None.
input_tensor_formats (list[tuple[str, trt.DataType, trt.TensorFormat]], optional) – A list of (name, dtype format) to allow deep specification of input layers. For example, input_tensor_formats=[(“input”, trt.DataType.UINT8, trt.TensorFormat.HWC)] By default, None
output_tensor_formats (list[tuple[str, trt.DataType, trt.TensorFormat]], optional) – A list of (name, dtype format) to allow deep specification of output layers. For example, output_tensor_formats=[(“output”, trt.DataType.HALF, trt.TensorFormat.LINEAR)] By default, None
hooks (list[Callable[[trt.INetworkDefinition], trt.INetworkDefinition]], optional) – An optional list of ‘hook’ functions to modify the TensorRT network before the remainder of the build phase occurs. By default, None
gpu_fallback (bool) – Whether or not to allow GPU fallback for unsupported layers when building the engine for DLA. By default, False
direct_io (bool) – Use direct IO for the engine. By default, False
prefer_precision_constraints (bool) – Whether or not to prefer precision constraints. By default, False
reject_empty_algorithms (bool) – Whether or not to reject empty algorithms. By default, False
ignore_timing_mismatch (bool) – Whether or not to allow different CUDA device generated timing caches to be used in the building of engines. By default, False
fp16 (bool, optional) – If True, quantize the engine to FP16 precision.
int8 (bool, optional) – If True, quantize the engine to INT8 precision.
cache (bool, optional) – Whether or not to cache the engine in the trtutils engine cache. If an existing version is found will use that. Uses the name of the output file to assess if the engine has been compiled before. As such, naming the output ‘engine’, ‘model’ or similiar will result in unintended caching behavior. By default None, will not cache the engine.
verbose (bool, optional) – If True, print verbose output. By default, None or False
- Raises:
RuntimeError – If the ONNX model cannot be parsed
RuntimeError – If the TensorRT engines fails to build
ValueError – If layer is manually assigned to DLA and DLA is not supported and gpu_fallback is False
- trtutils.find_trtexec() Path[source]¶
Find an instance of the trtexec binary on the system.
Requires the locate command to be installed on the system. As such, only works on Unix-like systems.
- Returns:
The path to the trtexec binary
- Return type:
Path
- Raises:
FileNotFoundError – If the trtexec binary is not found on the system
- trtutils.inspect_engine(engine: Path | str | ICudaEngine, *, verbose: bool | None = None) tuple[int, int, list[tuple[str, tuple[int, ...], DataType, TensorFormat]], list[tuple[str, tuple[int, ...], DataType, TensorFormat]]][source]¶
Inspect a TensorRT engine.
- trtutils.register_jit(*, fastmath: bool = False, parallel: bool = False, nogil: bool = False, cache: bool = False, inline: str = 'never') Callable[[Callable[_P, _R]], Callable[_P, _R]][source]¶
Register a function to be re-imported whenever JIT status changes.
- Parameters:
func (Callable[_P, _R], optional) – The function to optionally JIT compile. If None, the decorator returns a partially applied function.
fastmath (bool, optional) – If True, enable fastmath during jit. Default is False.
parallel (bool, optional) – If True, enable parallel jit. Default is False.
nogil (bool, optional) – If True, disable the GIL when running jit compiled functions. Default is False.
cache (bool, optional) – If True, cache jit compiled functions to disk. Default is False.
inline (str, optional) – Whether or not to inline functions at the Numba IR level. Default is ‘never’. Options are: [‘never’, ‘always’]
- Returns:
The registered and optionally JIT-compiled function.
- Return type:
Callable[[Callable[_P, _R]], Callable[_P, _R]]
Examples
>>> @register_jit(fastmath=True, parallel=True) ... def my_func(x): ... return x * x
- trtutils.run_trtexec(command: str, trtexec_path: Path | str | None = None) tuple[bool, str, str][source]¶
Run a command using trtexec.
The goal of this function is make it easier to use trtexec within Python scripts. By returning the stdout/stderr streams via strings back to the Python program it can simplify logic or scripts which utilize trtexec.