load_trt_model¶
inference_models.models.common.trt.load_trt_model
¶
Load a TensorRT engine from a serialized engine file.
Deserializes a TensorRT engine from a .plan file and returns the engine object ready for inference. Handles errors during deserialization and provides detailed error messages.
Parameters:
-
(model_path¶str) –Path to the serialized TensorRT engine file (.plan).
-
(engine_host_code_allowed¶bool, default:False) –Allow the engine to execute host code. Security risk - only enable if you trust the engine source. Default: False.
Returns:
-
ICudaEngine–TensorRT CUDA engine (ICudaEngine) ready for creating execution contexts
-
ICudaEngine–and running inference.
Raises:
-
CorruptedModelPackageError–If the engine file cannot be loaded due to: - File not found - Incompatible TensorRT version - Incompatible CUDA version - Corrupted engine file - Runtime deserialization errors
-
MissingDependencyError–If TensorRT is not installed.
Examples:
Load TensorRT engine:
>>> from inference_models.developer_tools import load_trt_model
>>>
>>> engine = load_trt_model("model.plan")
>>> print(f"Engine loaded: {engine.name}")
>>>
>>> # Create execution context
>>> context = engine.create_execution_context()
Load engine with host code allowed:
>>> # Only if you trust the engine source!
>>> engine = load_trt_model(
... "custom_model.plan",
... engine_host_code_allowed=True
... )
Complete inference setup:
>>> from inference_models.developer_tools import (
... load_trt_model,
... get_trt_engine_inputs_and_outputs
... )
>>>
>>> # Load engine
>>> engine = load_trt_model("yolov8n.plan")
>>>
>>> # Get input/output info
>>> inputs, outputs = get_trt_engine_inputs_and_outputs(engine)
>>> print(f"Inputs: {inputs}")
>>> print(f"Outputs: {outputs}")
>>>
>>> # Create context for inference
>>> context = engine.create_execution_context()
Note
- Requires TensorRT to be installed (TensorRT 10.x recommended)
- Engine files are platform and TensorRT version specific
- Engines built on one GPU architecture may not work on another
- Engine files typically have .plan or .engine extension
- Provides detailed error messages from TensorRT runtime
See Also
get_trt_engine_inputs_and_outputs(): Inspect engine inputs/outputsinfer_from_trt_engine(): Run inference with loaded engine