Skip to content

AutoModel

inference_models.AutoModel

Functions

from_pretrained classmethod

from_pretrained(model_id_or_path, weights_provider='roboflow', api_key=None, model_package_id=None, backend=None, batch_size=None, quantization=None, onnx_execution_providers=None, device=DEFAULT_DEVICE, default_onnx_trt_options=True, max_package_loading_attempts=None, verbose=False, model_download_file_lock_acquire_timeout=FILE_LOCK_ACQUIRE_TIMEOUT, allow_untrusted_packages=False, trt_engine_host_code_allowed=True, allow_local_code_packages=True, verify_hash_while_download=True, download_files_without_hash=False, use_auto_resolution_cache=True, auto_resolution_cache=None, allow_direct_local_storage_loading=True, model_access_manager=None, nms_fusion_preferences=None, model_type=None, task_type=None, allow_loading_dependency_models=True, dependency_models_params=None, point_model_directory=None, forwarded_kwargs=None, weights_provider_extra_query_params=None, weights_provider_extra_headers=None, **kwargs)

Load and initialize a computer vision model with automatic backend selection.

This is the primary entry point for loading models in inference-models. It automatically:

  • Downloads model weights from the specified provider (default: Roboflow)
  • Selects the optimal backend (TensorRT > PyTorch Hugging Face> > ONNX > others)
  • Configures the model for your hardware (CPU/GPU)
  • Handles caching of atrefacts

Parameters:

  • model_id_or_path
    (str) –

    Model identifier or local path. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base", "resnet50") - Custom Roboflow model (e.g., "my-project/2") - Local directory path containing model files - Local checkpoint file path (e.g., "/path/to/checkpoint.pth")

  • weights_provider
    (str, default: 'roboflow' ) –

    Source for model weights. Options: - "roboflow" (default): Download from Roboflow platform - "local": Load from local filesystem - Custom provider name (if registered via register_model_provider())

  • api_key
    (Optional[str], default: None ) –

    Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.

  • model_package_id
    (Optional[str], default: None ) –

    Specific model package to load (advanced). If not provided, automatically selects the best package based on your environment and requested backend/quantization. Use AutoModel.describe_model() to see available packages.

  • backend
    (Optional[Union[str, BackendType, List[Union[str, BackendType]]]], default: None ) –

    Preferred inference backend(s). Can be: - Single backend: "torch", "onnx", "trt" (TensorRT), "hugging-face" - List of backends: ["trt", "torch"] (tries in order) - BackendType enum value(s) - None (default): Automatic selection (TensorRT > PyTorch > ONNX > HF)

  • batch_size
    (Optional[Union[int, Tuple[int, int]]], default: None ) –

    Preferred batch size for inference. Can be: - Single integer: Fixed batch size (e.g., 1, 8, 16) - Tuple: Range of batch sizes (e.g., (1, 8) for dynamic batching) - None (default): Use model's default batch size Note: Only affects models with multiple batch size variants.

  • quantization
    (Optional[Union[str, Quantization, List[Union[str, Quantization]]]], default: None ) –

    Model quantization level(s). Can be: - Single value: "fp32", "fp16", "bf16", "int8" - List: ["fp16", "fp32"] (tries in order) - Quantization enum value(s) - None (default): Automatic selection based on device capabilities

  • onnx_execution_providers
    (Optional[List[Union[str, tuple]]], default: None ) –

    ONNX Runtime execution providers (ONNX backend only). Examples: - ["CUDAExecutionProvider", "CPUExecutionProvider"] - [("TensorrtExecutionProvider", {"trt_fp16_enable": True})] If not provided, automatically selects based on available hardware.

  • device
    (Union[device, str], default: DEFAULT_DEVICE ) –

    PyTorch device for model execution. Can be: - String: "cpu", "cuda", "cuda:0", "cuda:1", "mps" - torch.device object Default: "cuda" if available, otherwise "cpu"

  • default_onnx_trt_options
    (bool, default: True ) –

    Whether to use default TensorRT optimization options for ONNX Runtime's TensorRT execution provider. Default: True.

  • max_package_loading_attempts
    (Optional[int], default: None ) –

    Maximum number of model packages to try before failing. Useful when multiple packages are available. Default: Try all matching packages.

  • verbose
    (bool, default: False ) –

    Enable detailed logging during model loading. Useful for debugging package selection and download issues. Default: False.

  • model_download_file_lock_acquire_timeout
    (int, default: FILE_LOCK_ACQUIRE_TIMEOUT ) –

    Timeout in seconds for acquiring file locks during concurrent downloads. Default: 10.

  • allow_untrusted_packages
    (bool, default: False ) –

    Allow loading model packages with custom code that haven't been verified. Security risk - only enable for trusted sources. Default: False.

  • trt_engine_host_code_allowed
    (bool, default: True ) –

    Allow TensorRT engines to execute host code. Required for some TensorRT optimizations. Default: True.

  • allow_local_code_packages
    (bool, default: True ) –

    Allow loading models with custom Python code from local directories. Default: True.

  • verify_hash_while_download
    (bool, default: True ) –

    Verify file integrity using checksums during download. Recommended for production. Default: True.

  • download_files_without_hash
    (bool, default: False ) –

    Allow downloading files that don't have checksums. Security risk - only enable for trusted sources. Default: False.

  • use_auto_resolution_cache
    (bool, default: True ) –

    Enable caching of model resolution results to speed up subsequent loads. Default: True.

  • auto_resolution_cache
    (Optional[AutoResolutionCache], default: None ) –

    Custom cache implementation. If None, uses default file-based cache. Advanced usage only.

  • allow_direct_local_storage_loading
    (bool, default: True ) –

    Allow loading models directly from local paths without going through the weights provider. Default: True.

  • model_access_manager
    (Optional[ModelAccessManager], default: None ) –

    Custom model access control manager. If None, uses permissive default. Advanced usage only.

  • nms_fusion_preferences
    (Optional[Union[bool, dict]], default: None ) –

    Non-Maximum Suppression fusion preferences for ONNX models. Can be: - True: Enable NMS fusion with default settings - False: Disable NMS fusion - dict: Custom NMS fusion configuration - None (default): Use model's default settings

  • model_type
    (Optional[str], default: None ) –

    Override model architecture type (advanced). Only needed when loading local models without metadata. Examples: "yolov8", "rfdetr".

  • task_type
    (Optional[str], default: None ) –

    Override task type (advanced). Only needed when loading local models without metadata. Examples: "object-detection", "classification".

  • allow_loading_dependency_models
    (bool, default: True ) –

    Allow loading models that depend on other models (e.g., some VLMs depend on separate vision encoders). Default: True.

  • dependency_models_params
    (Optional[dict], default: None ) –

    Parameters to pass to dependency models. Dict mapping dependency names to parameter dicts. Advanced usage only.

  • point_model_directory
    (Optional[Callable[[str], None]], default: None ) –

    Callback function called with the model directory path after loading. Advanced usage only.

  • forwarded_kwargs
    (Optional[List[str]], default: None ) –

    List of kwargs to forward to dependency models. Advanced usage only.

  • weights_provider_extra_query_params
    (Optional[List[Tuple[str, str]]], default: None ) –

    Extra query parameters to pass to the weights' provider. Advanced usage only.

  • weights_provider_extra_headers
    (Optional[Dict[str, str]], default: None ) –

    Extra headers to pass to the weights' provider. Advanced usage only.

  • **kwargs

    Additional model-specific parameters passed to the model's from_pretrained() method. Varies by model type.

Returns:

  • AnyModel

    Loaded model instance. The specific type depends on the model's task: - ObjectDetectionModel: For object detection (YOLO, RF-DETR, etc.) - ClassificationModel: For single-label classification - MultiLabelClassificationModel: For multi-label classification - InstanceSegmentationModel: For instance segmentation - KeyPointsDetectionModel: For keypoint detection - DepthEstimationModel: For depth estimation - StructuredOCRModel: For OCR with structured output - TextImageEmbeddingModel: For vision-language embeddings (CLIP, etc.) - OpenVocabularyObjectDetectionModel: For open-vocabulary detection

Raises:

  • UnauthorizedModelAccessError

    If API key is invalid or model access is denied.

  • ModelPackageNotFoundError

    If no compatible model package is found for your environment and requested parameters.

  • CorruptedModelPackageError

    If model files are corrupted or incomplete.

  • InvalidParameterError

    If provided parameters are invalid.

  • DirectLocalStorageAccessError

    If local path loading is disabled but a local path was provided.

Examples:

Basic usage with pre-trained model:

>>> from inference_models import AutoModel
>>> model = AutoModel.from_pretrained("yolov8n-640")
>>> predictions = model(image)

Load custom Roboflow model:

>>> model = AutoModel.from_pretrained(
...     "my-project/2",
...     api_key="your_api_key"
... )

Force specific backend and device:

>>> model = AutoModel.from_pretrained(
...     "rfdetr-base",
...     backend="torch",
...     device="cuda:1"
... )

Load with quantization:

>>> model = AutoModel.from_pretrained(
...     "yolov8n-640",
...     quantization="fp16"
... )

Load from local checkpoint:

>>> model = AutoModel.from_pretrained(
...     "/path/to/checkpoint.pth",
...     model_type="rfdetr-base",
...     labels=["cat", "dog"]
... )

Enable verbose logging:

>>> model = AutoModel.from_pretrained(
...     "yolov8n-640",
...     verbose=True
... )
See Also
  • AutoModel.describe_model(): View model metadata before loading
  • AutoModel.describe_model_package(): View specific package details
  • AutoModel.describe_compute_environment(): Check available backends
  • AutoModel.list_available_models(): List all registered models

describe_model classmethod

describe_model(model_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=False, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)

Display comprehensive metadata and available packages for a model.

Shows detailed information about a model without loading it, including:

  • Model architecture and variant
  • Task type (object detection, classification, etc.)
  • Available model packages (different backends, quantizations, batch sizes)
  • Package requirements and compatibility
  • Model dependencies (if any)
  • Package sizes (optional, requires network requests)

This is useful for:

  • Exploring available models before loading
  • Understanding which backends are available for a model
  • Checking model requirements and compatibility
  • Debugging model loading issues
  • Selecting the right package for your environment

Parameters:

  • model_id
    (str) –

    Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")

  • weights_provider
    (str, default: 'roboflow' ) –

    Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)

  • api_key
    (Optional[str], default: None ) –

    Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.

  • pull_artefacts_size
    (bool, default: False ) –

    Whether to calculate and display the total size of each model package. This requires making network requests to check file sizes, so it's slower. Default: False.

  • weights_provider_extra_query_params
    (Optional[List[Tuple[str, str]]], default: None ) –

    Extra query parameters to pass to the weights' provider. Advanced usage only.

  • weights_provider_extra_headers
    (Optional[Dict[str, str]], default: None ) –

    Extra headers to pass to the weights' provider. Advanced usage only.

Returns:

  • None

    None. Prints formatted tables to the console showing: 1. Model overview table with architecture, task type, and dependencies 2. Available packages table with backend, quantization, and batch size info

Raises:

  • UnauthorizedModelAccessError

    If API key is invalid or model access is denied.

  • ModelNotFoundError

    If the model ID doesn't exist in the weights provider.

Examples:

View model information:

>>> from inference_models import AutoModel
>>> AutoModel.describe_model("yolov8n-640")

View with package sizes:

>>> AutoModel.describe_model("rfdetr-base", pull_artefacts_size=True)
# Same as above, but includes a "Size" column showing package sizes

View private model:

>>> AutoModel.describe_model(
...     "my-workspace/my-model/2",
...     api_key="your_api_key"
... )
See Also
  • AutoModel.describe_model_package(): View detailed info for a specific package
  • AutoModel.describe_compute_environment(): Check your runtime environment
  • AutoModel.from_pretrained(): Load a model after inspecting it

describe_model_package classmethod

describe_model_package(model_id, package_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=True, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)

Display detailed information about a specific model package.

Shows comprehensive details for a single model package, including:

  • Backend type (PyTorch, ONNX, TensorRT, etc.)
  • Quantization level (FP32, FP16, INT8, etc.)
  • Batch size configuration (fixed or dynamic)
  • Required dependencies and environment
  • Package artifacts (model files, configs, etc.)
  • Total package size (optional)
  • Hardware requirements (CUDA version, TensorRT version, etc.)

This is useful for:

  • Understanding package requirements before loading
  • Debugging compatibility issues
  • Checking package size before download
  • Verifying package contents

Parameters:

  • model_id
    (str) –

    Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")

  • package_id
    (str) –

    Specific package identifier to inspect. Get this from AutoModel.describe_model() output.

  • weights_provider
    (str, default: 'roboflow' ) –

    Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)

  • api_key
    (Optional[str], default: None ) –

    Roboflow API key for accessing private models. If not provided, uses the ROBOFLOW_API_KEY environment variable. Not required for public pre-trained models.

  • pull_artefacts_size
    (bool, default: True ) –

    Whether to calculate and display the size of each artifact in the package. This requires making network requests to check file sizes, so it's slower. Default: True.

  • weights_provider_extra_query_params
    (Optional[List[Tuple[str, str]]], default: None ) –

    Extra query parameters to pass to the weights' provider. Advanced usage only.

  • weights_provider_extra_headers
    (Optional[Dict[str, str]], default: None ) –

    Extra headers to pass to the weights' provider. Advanced usage only.

Returns:

  • None

    None. Prints a formatted table to the console showing package details.

Raises:

  • UnauthorizedModelAccessError

    If API key is invalid or model access is denied.

  • ModelNotFoundError

    If the model ID doesn't exist in the weights provider.

  • NoModelPackagesAvailableError

    If the specified package_id doesn't exist for this model.

Examples:

View package details:

>>> from inference_models import AutoModel
>>> # First, see available packages
>>> AutoModel.describe_model("yolov8n-640")
>>> # Then inspect a specific package
>>> AutoModel.describe_model_package("yolov8n-640", "pkg-trt-fp16-1-32")

View without artifact sizes (faster):

>>> AutoModel.describe_model_package(
...     "rfdetr-base",
...     "pkg-torch-fp32",
...     pull_artefacts_size=False
... )
See Also
  • AutoModel.describe_model(): View all available packages for a model
  • AutoModel.describe_compute_environment(): Check your runtime environment
  • AutoModel.from_pretrained(): Load a model with a specific package

describe_compute_environment classmethod

describe_compute_environment()

Inspect and display the current runtime environment and available backends.

Performs a comprehensive scan of your system to detect:

  • Hardware: GPU availability, GPU models, compute capability
  • CUDA: Driver version, CUDA toolkit version
  • TensorRT: TensorRT version and availability
  • PyTorch: PyTorch and torchvision versions
  • ONNX Runtime: Version and available execution providers
  • Other backends: Hugging Face Transformers, Ultralytics, MediaPipe
  • Platform: OS version, Jetson type (if applicable), L4T version

This is useful for:

  • Debugging model loading issues
  • Verifying backend installations
  • Checking hardware compatibility
  • Understanding which model packages will work in your environment
  • Troubleshooting performance issues

Returns:

  • None

    None. Prints a formatted table to the console showing all detected

  • None

    environment information.

Examples:

Check your environment:

>>> from inference_models import AutoModel
>>> AutoModel.describe_compute_environment()
# Displays output like:
                            Compute environment details
Detected GPUs:                      N/A
Detected GPUs CUDA CC:              N/A
NVIDIA driver:                      N/A
CUDA version:                       N/A
TRT version:                        N/A
TRT Python package available:       False
OS version:                         macos-26.2-arm64-arm-64bit
torch version:                      2.6.0
torchvision version:                0.21.0
ONNX runtime version:               1.21.0
Detected ONNX execution providers:  CoreMLExecutionProvider, AzureExecutionProvider, CPUExecutionProvider
See Also
  • AutoModel.describe_model(): View model metadata and requirements
  • AutoModel.from_pretrained(): Load a model (uses this environment info)