Skip to content

load_trt_model

inference_models.models.common.trt.load_trt_model

load_trt_model(model_path, engine_host_code_allowed=False)

Load a TensorRT engine from a serialized engine file.

Deserializes a TensorRT engine from a .plan file and returns the engine object ready for inference. Handles errors during deserialization and provides detailed error messages.

Parameters:

  • model_path

    (str) –

    Path to the serialized TensorRT engine file (.plan).

  • engine_host_code_allowed

    (bool, default: False ) –

    Allow the engine to execute host code. Security risk - only enable if you trust the engine source. Default: False.

Returns:

  • ICudaEngine

    TensorRT CUDA engine (ICudaEngine) ready for creating execution contexts

  • ICudaEngine

    and running inference.

Raises:

  • CorruptedModelPackageError

    If the engine file cannot be loaded due to: - File not found - Incompatible TensorRT version - Incompatible CUDA version - Corrupted engine file - Runtime deserialization errors

  • MissingDependencyError

    If TensorRT is not installed.

Examples:

Load TensorRT engine:

>>> from inference_models.developer_tools import load_trt_model
>>>
>>> engine = load_trt_model("model.plan")
>>> print(f"Engine loaded: {engine.name}")
>>>
>>> # Create execution context
>>> context = engine.create_execution_context()

Load engine with host code allowed:

>>> # Only if you trust the engine source!
>>> engine = load_trt_model(
...     "custom_model.plan",
...     engine_host_code_allowed=True
... )

Complete inference setup:

>>> from inference_models.developer_tools import (
...     load_trt_model,
...     get_trt_engine_inputs_and_outputs
... )
>>>
>>> # Load engine
>>> engine = load_trt_model("yolov8n.plan")
>>>
>>> # Get input/output info
>>> inputs, outputs = get_trt_engine_inputs_and_outputs(engine)
>>> print(f"Inputs: {inputs}")
>>> print(f"Outputs: {outputs}")
>>>
>>> # Create context for inference
>>> context = engine.create_execution_context()
Note
  • Requires TensorRT to be installed (TensorRT 10.x recommended)
  • Engine files are platform and TensorRT version specific
  • Engines built on one GPU architecture may not work on another
  • Engine files typically have .plan or .engine extension
  • Provides detailed error messages from TensorRT runtime
See Also
  • get_trt_engine_inputs_and_outputs(): Inspect engine inputs/outputs
  • infer_from_trt_engine(): Run inference with loaded engine