AutoModel¶

inference_models.AutoModel ¶

Functions¶

from_pretrained `classmethod` ¶

from_pretrained(model_id_or_path, weights_provider='roboflow', api_key=None, model_package_id=None, backend=None, batch_size=None, quantization=None, onnx_execution_providers=None, device=DEFAULT_DEVICE, default_onnx_trt_options=True, max_package_loading_attempts=None, verbose=False, model_download_file_lock_acquire_timeout=FILE_LOCK_ACQUIRE_TIMEOUT, allow_untrusted_packages=False, trt_engine_host_code_allowed=True, allow_local_code_packages=True, verify_hash_while_download=True, download_files_without_hash=False, use_auto_resolution_cache=True, auto_resolution_cache=None, allow_direct_local_storage_loading=True, model_access_manager=None, nms_fusion_preferences=None, model_type=None, task_type=None, allow_loading_dependency_models=True, dependency_models_params=None, point_model_directory=None, forwarded_kwargs=None, weights_provider_extra_query_params=None, weights_provider_extra_headers=None, **kwargs)

Load and initialize a computer vision model with automatic backend selection.

This is the primary entry point for loading models in inference-models. It automatically:

Downloads model weights from the specified provider (default: Roboflow)
Selects the optimal backend (TensorRT > PyTorch Hugging Face> > ONNX > others)
Configures the model for your hardware (CPU/GPU)
Handles caching of atrefacts

Parameters:

model_id_or_path ¶
(str) –

Model identifier or local path. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base", "resnet50") - Custom Roboflow model (e.g., "my-project/2") - Local directory path containing model files - Local checkpoint file path (e.g., "/path/to/checkpoint.pth")
weights_provider ¶
(str, default: 'roboflow' ) –

Source for model weights. Options: - "roboflow" (default): Download from Roboflow platform - "local": Load from local filesystem - Custom provider name (if registered via register_model_provider())
api_key ¶
(Optional[str], default: None ) –

Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.
model_package_id ¶
(Optional[str], default: None ) –

Specific model package to load (advanced). If not provided, automatically selects the best package based on your environment and requested backend/quantization. Use AutoModel.describe_model() to see available packages.
backend ¶
(Optional[Union[str, BackendType, List[Union[str, BackendType]]]], default: None ) –

Preferred inference backend(s). Can be: - Single backend: "torch", "onnx", "trt" (TensorRT), "hugging-face" - List of backends: ["trt", "torch"] (tries in order) - BackendType enum value(s) - None (default): Automatic selection (TensorRT > PyTorch > ONNX > HF)
batch_size ¶
(Optional[Union[int, Tuple[int, int]]], default: None ) –

Preferred batch size for inference. Can be: - Single integer: Fixed batch size (e.g., 1, 8, 16) - Tuple: Range of batch sizes (e.g., (1, 8) for dynamic batching) - None (default): Use model's default batch size Note: Only affects models with multiple batch size variants.
quantization ¶
(Optional[Union[str, Quantization, List[Union[str, Quantization]]]], default: None ) –

Model quantization level(s). Can be: - Single value: "fp32", "fp16", "bf16", "int8" - List: ["fp16", "fp32"] (tries in order) - Quantization enum value(s) - None (default): Automatic selection based on device capabilities
onnx_execution_providers ¶
(Optional[List[Union[str, tuple]]], default: None ) –

ONNX Runtime execution providers (ONNX backend only). Examples: - ["CUDAExecutionProvider", "CPUExecutionProvider"] - [("TensorrtExecutionProvider", {"trt_fp16_enable": True})] If not provided, automatically selects based on available hardware.
device ¶
(Union[device, str], default: DEFAULT_DEVICE ) –

PyTorch device for model execution. Can be: - String: "cpu", "cuda", "cuda:0", "cuda:1", "mps" - torch.device object Default: "cuda" if available, otherwise "cpu"
default_onnx_trt_options ¶
(bool, default: True ) –

Whether to use default TensorRT optimization options for ONNX Runtime's TensorRT execution provider. Default: True.
max_package_loading_attempts ¶
(Optional[int], default: None ) –

Maximum number of model packages to try before failing. Useful when multiple packages are available. Default: Try all matching packages.
verbose ¶
(bool, default: False ) –

Enable detailed logging during model loading. Useful for debugging package selection and download issues. Default: False.
model_download_file_lock_acquire_timeout ¶
(int, default: FILE_LOCK_ACQUIRE_TIMEOUT ) –

Timeout in seconds for acquiring file locks during concurrent downloads. Default: 10.
allow_untrusted_packages ¶
(bool, default: False ) –

Allow loading model packages with custom code that haven't been verified. Security risk - only enable for trusted sources. Default: False.
trt_engine_host_code_allowed ¶
(bool, default: True ) –

Allow TensorRT engines to execute host code. Required for some TensorRT optimizations. Default: True.
allow_local_code_packages ¶
(bool, default: True ) –

Allow loading models with custom Python code from local directories. Default: True.
verify_hash_while_download ¶
(bool, default: True ) –

Verify file integrity using checksums during download. Recommended for production. Default: True.
download_files_without_hash ¶
(bool, default: False ) –

Allow downloading files that don't have checksums. Security risk - only enable for trusted sources. Default: False.
use_auto_resolution_cache ¶
(bool, default: True ) –

Enable caching of model resolution results to speed up subsequent loads. Default: True.
auto_resolution_cache ¶
(Optional[AutoResolutionCache], default: None ) –

Custom cache implementation. If None, uses default file-based cache. Advanced usage only.
allow_direct_local_storage_loading ¶
(bool, default: True ) –

Allow loading models directly from local paths without going through the weights provider. Default: True.
model_access_manager ¶
(Optional[ModelAccessManager], default: None ) –

Custom model access control manager. If None, uses permissive default. Advanced usage only.
nms_fusion_preferences ¶
(Optional[Union[bool, dict]], default: None ) –

Non-Maximum Suppression fusion preferences for ONNX models. Can be: - True: Enable NMS fusion with default settings - False: Disable NMS fusion - dict: Custom NMS fusion configuration - None (default): Use model's default settings
model_type ¶
(Optional[str], default: None ) –

Override model architecture type (advanced). Only needed when loading local models without metadata. Examples: "yolov8", "rfdetr".
task_type ¶
(Optional[str], default: None ) –

Override task type (advanced). Only needed when loading local models without metadata. Examples: "object-detection", "classification".
allow_loading_dependency_models ¶
(bool, default: True ) –

Allow loading models that depend on other models (e.g., some VLMs depend on separate vision encoders). Default: True.
dependency_models_params ¶
(Optional[dict], default: None ) –

Parameters to pass to dependency models. Dict mapping dependency names to parameter dicts. Advanced usage only.
point_model_directory ¶
(Optional[Callable[[str], None]], default: None ) –

Callback function called with the model directory path after loading. Advanced usage only.
forwarded_kwargs ¶
(Optional[List[str]], default: None ) –

List of kwargs to forward to dependency models. Advanced usage only.
weights_provider_extra_query_params ¶
(Optional[List[Tuple[str, str]]], default: None ) –

Extra query parameters to pass to the weights' provider. Advanced usage only.
weights_provider_extra_headers ¶
(Optional[Dict[str, str]], default: None ) –

Extra headers to pass to the weights' provider. Advanced usage only.
**kwargs ¶
–

Additional model-specific parameters passed to the model's from_pretrained() method. Varies by model type.

Returns:

AnyModel –

Loaded model instance. The specific type depends on the model's task: - ObjectDetectionModel: For object detection (YOLO, RF-DETR, etc.) - ClassificationModel: For single-label classification - MultiLabelClassificationModel: For multi-label classification - InstanceSegmentationModel: For instance segmentation - KeyPointsDetectionModel: For keypoint detection - DepthEstimationModel: For depth estimation - StructuredOCRModel: For OCR with structured output - TextImageEmbeddingModel: For vision-language embeddings (CLIP, etc.) - OpenVocabularyObjectDetectionModel: For open-vocabulary detection

Raises:

UnauthorizedModelAccessError –

If API key is invalid or model access is denied.
ModelPackageNotFoundError –

If no compatible model package is found for your environment and requested parameters.
CorruptedModelPackageError –

If model files are corrupted or incomplete.
InvalidParameterError –

If provided parameters are invalid.
DirectLocalStorageAccessError –

If local path loading is disabled but a local path was provided.

Examples:

Basic usage with pre-trained model:

>>> from inference_models import AutoModel
>>> model = AutoModel.from_pretrained("yolov8n-640")
>>> predictions = model(image)

Load custom Roboflow model:

>>> model = AutoModel.from_pretrained(
...     "my-project/2",
...     api_key="your_api_key"
... )

Force specific backend and device:

>>> model = AutoModel.from_pretrained(
...     "rfdetr-base",
...     backend="torch",
...     device="cuda:1"
... )

Load with quantization:

>>> model = AutoModel.from_pretrained(
...     "yolov8n-640",
...     quantization="fp16"
... )

Load from local checkpoint:

>>> model = AutoModel.from_pretrained(
...     "/path/to/checkpoint.pth",
...     model_type="rfdetr-base",
...     labels=["cat", "dog"]
... )

Enable verbose logging:

>>> model = AutoModel.from_pretrained(
...     "yolov8n-640",
...     verbose=True
... )

See Also

AutoModel.describe_model(): View model metadata before loading
AutoModel.describe_model_package(): View specific package details
AutoModel.describe_compute_environment(): Check available backends
AutoModel.list_available_models(): List all registered models

describe_model `classmethod` ¶

describe_model(model_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=False, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)

Display comprehensive metadata and available packages for a model.

Shows detailed information about a model without loading it, including:

Model architecture and variant
Task type (object detection, classification, etc.)
Available model packages (different backends, quantizations, batch sizes)
Package requirements and compatibility
Model dependencies (if any)
Package sizes (optional, requires network requests)

This is useful for:

Exploring available models before loading
Understanding which backends are available for a model
Checking model requirements and compatibility
Debugging model loading issues
Selecting the right package for your environment

Parameters:

model_id ¶
(str) –

Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")
weights_provider ¶
(str, default: 'roboflow' ) –

Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)
api_key ¶
(Optional[str], default: None ) –

Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.
pull_artefacts_size ¶
(bool, default: False ) –

Whether to calculate and display the total size of each model package. This requires making network requests to check file sizes, so it's slower. Default: False.
weights_provider_extra_query_params ¶
(Optional[List[Tuple[str, str]]], default: None ) –

Extra query parameters to pass to the weights' provider. Advanced usage only.
weights_provider_extra_headers ¶
(Optional[Dict[str, str]], default: None ) –

Extra headers to pass to the weights' provider. Advanced usage only.

Returns:

None –

None. Prints formatted tables to the console showing: 1. Model overview table with architecture, task type, and dependencies 2. Available packages table with backend, quantization, and batch size info

Raises:

UnauthorizedModelAccessError –

If API key is invalid or model access is denied.
ModelNotFoundError –

If the model ID doesn't exist in the weights provider.

Examples:

View model information:

>>> from inference_models import AutoModel
>>> AutoModel.describe_model("yolov8n-640")

View with package sizes:

>>> AutoModel.describe_model("rfdetr-base", pull_artefacts_size=True)
# Same as above, but includes a "Size" column showing package sizes

View private model:

>>> AutoModel.describe_model(
...     "my-workspace/my-model/2",
...     api_key="your_api_key"
... )

See Also

AutoModel.describe_model_package(): View detailed info for a specific package
AutoModel.describe_compute_environment(): Check your runtime environment
AutoModel.from_pretrained(): Load a model after inspecting it

describe_model_package `classmethod` ¶

describe_model_package(model_id, package_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=True, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)

Display detailed information about a specific model package.

Shows comprehensive details for a single model package, including:

Backend type (PyTorch, ONNX, TensorRT, etc.)
Quantization level (FP32, FP16, INT8, etc.)
Batch size configuration (fixed or dynamic)
Required dependencies and environment
Package artifacts (model files, configs, etc.)
Total package size (optional)
Hardware requirements (CUDA version, TensorRT version, etc.)

This is useful for:

Understanding package requirements before loading
Debugging compatibility issues
Checking package size before download
Verifying package contents

Parameters:

model_id ¶
(str) –

Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")
package_id ¶
(str) –

Specific package identifier to inspect. Get this from AutoModel.describe_model() output.
weights_provider ¶
(str, default: 'roboflow' ) –

Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)
api_key ¶
(Optional[str], default: None ) –

Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.
pull_artefacts_size ¶
(bool, default: True ) –

Whether to calculate and display the size of each artifact in the package. This requires making network requests to check file sizes, so it's slower. Default: True.
weights_provider_extra_query_params ¶
(Optional[List[Tuple[str, str]]], default: None ) –

Extra query parameters to pass to the weights' provider. Advanced usage only.
weights_provider_extra_headers ¶
(Optional[Dict[str, str]], default: None ) –

Extra headers to pass to the weights' provider. Advanced usage only.

Returns:

None –

None. Prints a formatted table to the console showing package details.

Raises:

UnauthorizedModelAccessError –

If API key is invalid or model access is denied.
ModelNotFoundError –

If the model ID doesn't exist in the weights provider.
NoModelPackagesAvailableError –

If the specified package_id doesn't exist for this model.

Examples:

View package details:

>>> from inference_models import AutoModel
>>> # First, see available packages
>>> AutoModel.describe_model("yolov8n-640")
>>> # Then inspect a specific package
>>> AutoModel.describe_model_package("yolov8n-640", "pkg-trt-fp16-1-32")

View without artifact sizes (faster):

>>> AutoModel.describe_model_package(
...     "rfdetr-base",
...     "pkg-torch-fp32",
...     pull_artefacts_size=False
... )

See Also

AutoModel.describe_model(): View all available packages for a model
AutoModel.describe_compute_environment(): Check your runtime environment
AutoModel.from_pretrained(): Load a model with a specific package

describe_compute_environment `classmethod` ¶

describe_compute_environment()

Inspect and display the current runtime environment and available backends.

Performs a comprehensive scan of your system to detect:

Hardware: GPU availability, GPU models, compute capability
CUDA: Driver version, CUDA toolkit version
TensorRT: TensorRT version and availability
PyTorch: PyTorch and torchvision versions
ONNX Runtime: Version and available execution providers
Other backends: Hugging Face Transformers, Ultralytics, MediaPipe
Platform: OS version, Jetson type (if applicable), L4T version

This is useful for:

Debugging model loading issues
Verifying backend installations
Checking hardware compatibility
Understanding which model packages will work in your environment
Troubleshooting performance issues

Returns:

None –

None. Prints a formatted table to the console showing all detected
None –

environment information.

Examples:

Check your environment:

>>> from inference_models import AutoModel
>>> AutoModel.describe_compute_environment()
# Displays output like:
                            Compute environment details
Detected GPUs:                      N/A
Detected GPUs CUDA CC:              N/A
NVIDIA driver:                      N/A
CUDA version:                       N/A
TRT version:                        N/A
TRT Python package available:       False
OS version:                         macos-26.2-arm64-arm-64bit
torch version:                      2.6.0
torchvision version:                0.21.0
ONNX runtime version:               1.21.0
Detected ONNX execution providers:  CoreMLExecutionProvider, AzureExecutionProvider, CPUExecutionProvider

See Also

AutoModel.describe_model(): View model metadata and requirements
AutoModel.from_pretrained(): Load a model (uses this environment info)

AutoModel¶

inference_models.AutoModel ¶

Functions¶

from_pretrained classmethod ¶

model_id_or_path ¶

weights_provider ¶

api_key ¶

model_package_id ¶

backend ¶

batch_size ¶

quantization ¶

onnx_execution_providers ¶

device ¶

default_onnx_trt_options ¶

max_package_loading_attempts ¶

verbose ¶

model_download_file_lock_acquire_timeout ¶

allow_untrusted_packages ¶

trt_engine_host_code_allowed ¶

allow_local_code_packages ¶

verify_hash_while_download ¶

download_files_without_hash ¶

use_auto_resolution_cache ¶

auto_resolution_cache ¶

allow_direct_local_storage_loading ¶

model_access_manager ¶

nms_fusion_preferences ¶

model_type ¶

task_type ¶

allow_loading_dependency_models ¶

dependency_models_params ¶

point_model_directory ¶

forwarded_kwargs ¶

weights_provider_extra_query_params ¶

weights_provider_extra_headers ¶

**kwargs ¶

describe_model classmethod ¶

model_id ¶

weights_provider ¶

api_key ¶

pull_artefacts_size ¶

weights_provider_extra_query_params ¶

weights_provider_extra_headers ¶

describe_model_package classmethod ¶

model_id ¶

package_id ¶

weights_provider ¶

api_key ¶

pull_artefacts_size ¶

weights_provider_extra_query_params ¶

weights_provider_extra_headers ¶

describe_compute_environment classmethod ¶

from_pretrained `classmethod` ¶

`model_id_or_path` ¶

`weights_provider` ¶

`api_key` ¶

`model_package_id` ¶

`backend` ¶

`batch_size` ¶

`quantization` ¶

`onnx_execution_providers` ¶

`device` ¶

`default_onnx_trt_options` ¶

`max_package_loading_attempts` ¶

`verbose` ¶

`model_download_file_lock_acquire_timeout` ¶

`allow_untrusted_packages` ¶

`trt_engine_host_code_allowed` ¶

`allow_local_code_packages` ¶

`verify_hash_while_download` ¶

`download_files_without_hash` ¶

`use_auto_resolution_cache` ¶

`auto_resolution_cache` ¶

`allow_direct_local_storage_loading` ¶

`model_access_manager` ¶

`nms_fusion_preferences` ¶

`model_type` ¶

`task_type` ¶

`allow_loading_dependency_models` ¶

`dependency_models_params` ¶

`point_model_directory` ¶

`forwarded_kwargs` ¶

`weights_provider_extra_query_params` ¶

`weights_provider_extra_headers` ¶

`kwargs`** ¶

describe_model `classmethod` ¶

`model_id` ¶

`weights_provider` ¶

`api_key` ¶

`pull_artefacts_size` ¶

`weights_provider_extra_query_params` ¶

`weights_provider_extra_headers` ¶

describe_model_package `classmethod` ¶

`model_id` ¶

`package_id` ¶

`weights_provider` ¶

`api_key` ¶

`pull_artefacts_size` ¶

`weights_provider_extra_query_params` ¶

`weights_provider_extra_headers` ¶

describe_compute_environment `classmethod` ¶