AutoModel¶
inference_models.AutoModel
¶
Functions¶
from_pretrained
classmethod
¶
from_pretrained(model_id_or_path, weights_provider='roboflow', api_key=None, model_package_id=None, backend=None, batch_size=None, quantization=None, onnx_execution_providers=None, device=DEFAULT_DEVICE, default_onnx_trt_options=True, max_package_loading_attempts=None, verbose=False, model_download_file_lock_acquire_timeout=FILE_LOCK_ACQUIRE_TIMEOUT, allow_untrusted_packages=False, trt_engine_host_code_allowed=True, allow_local_code_packages=True, verify_hash_while_download=True, download_files_without_hash=False, use_auto_resolution_cache=True, auto_resolution_cache=None, allow_direct_local_storage_loading=True, model_access_manager=None, nms_fusion_preferences=None, model_type=None, task_type=None, allow_loading_dependency_models=True, dependency_models_params=None, point_model_directory=None, forwarded_kwargs=None, weights_provider_extra_query_params=None, weights_provider_extra_headers=None, **kwargs)
Load and initialize a computer vision model with automatic backend selection.
This is the primary entry point for loading models in inference-models. It automatically:
- Downloads model weights from the specified provider (default: Roboflow)
- Selects the optimal backend (TensorRT > PyTorch Hugging Face> > ONNX > others)
- Configures the model for your hardware (CPU/GPU)
- Handles caching of atrefacts
Parameters:
-
(model_id_or_path¶str) –Model identifier or local path. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base", "resnet50") - Custom Roboflow model (e.g., "my-project/2") - Local directory path containing model files - Local checkpoint file path (e.g., "/path/to/checkpoint.pth")
-
(weights_provider¶str, default:'roboflow') –Source for model weights. Options: - "roboflow" (default): Download from Roboflow platform - "local": Load from local filesystem - Custom provider name (if registered via
register_model_provider()) -
(api_key¶Optional[str], default:None) –Roboflow API key for accessing private models. If not provided, uses the
ROBOFLOW_API_KEYenvironment variable. Not required for public pre-trained models. -
(model_package_id¶Optional[str], default:None) –Specific model package to load (advanced). If not provided, automatically selects the best package based on your environment and requested backend/quantization. Use
AutoModel.describe_model()to see available packages. -
(backend¶Optional[Union[str, BackendType, List[Union[str, BackendType]]]], default:None) –Preferred inference backend(s). Can be: - Single backend: "torch", "onnx", "trt" (TensorRT), "hugging-face" - List of backends: ["trt", "torch"] (tries in order) - BackendType enum value(s) - None (default): Automatic selection (TensorRT > PyTorch > ONNX > HF)
-
(batch_size¶Optional[Union[int, Tuple[int, int]]], default:None) –Preferred batch size for inference. Can be: - Single integer: Fixed batch size (e.g., 1, 8, 16) - Tuple: Range of batch sizes (e.g., (1, 8) for dynamic batching) - None (default): Use model's default batch size Note: Only affects models with multiple batch size variants.
-
(quantization¶Optional[Union[str, Quantization, List[Union[str, Quantization]]]], default:None) –Model quantization level(s). Can be: - Single value: "fp32", "fp16", "bf16", "int8" - List: ["fp16", "fp32"] (tries in order) - Quantization enum value(s) - None (default): Automatic selection based on device capabilities
-
(onnx_execution_providers¶Optional[List[Union[str, tuple]]], default:None) –ONNX Runtime execution providers (ONNX backend only). Examples: - ["CUDAExecutionProvider", "CPUExecutionProvider"] - [("TensorrtExecutionProvider", {"trt_fp16_enable": True})] If not provided, automatically selects based on available hardware.
-
(device¶Union[device, str], default:DEFAULT_DEVICE) –PyTorch device for model execution. Can be: - String: "cpu", "cuda", "cuda:0", "cuda:1", "mps" - torch.device object Default: "cuda" if available, otherwise "cpu"
-
(default_onnx_trt_options¶bool, default:True) –Whether to use default TensorRT optimization options for ONNX Runtime's TensorRT execution provider. Default: True.
-
(max_package_loading_attempts¶Optional[int], default:None) –Maximum number of model packages to try before failing. Useful when multiple packages are available. Default: Try all matching packages.
-
(verbose¶bool, default:False) –Enable detailed logging during model loading. Useful for debugging package selection and download issues. Default: False.
-
(model_download_file_lock_acquire_timeout¶int, default:FILE_LOCK_ACQUIRE_TIMEOUT) –Timeout in seconds for acquiring file locks during concurrent downloads. Default: 10.
-
(allow_untrusted_packages¶bool, default:False) –Allow loading model packages with custom code that haven't been verified. Security risk - only enable for trusted sources. Default: False.
-
(trt_engine_host_code_allowed¶bool, default:True) –Allow TensorRT engines to execute host code. Required for some TensorRT optimizations. Default: True.
-
(allow_local_code_packages¶bool, default:True) –Allow loading models with custom Python code from local directories. Default: True.
-
(verify_hash_while_download¶bool, default:True) –Verify file integrity using checksums during download. Recommended for production. Default: True.
-
(download_files_without_hash¶bool, default:False) –Allow downloading files that don't have checksums. Security risk - only enable for trusted sources. Default: False.
-
(use_auto_resolution_cache¶bool, default:True) –Enable caching of model resolution results to speed up subsequent loads. Default: True.
-
(auto_resolution_cache¶Optional[AutoResolutionCache], default:None) –Custom cache implementation. If None, uses default file-based cache. Advanced usage only.
-
(allow_direct_local_storage_loading¶bool, default:True) –Allow loading models directly from local paths without going through the weights provider. Default: True.
-
(model_access_manager¶Optional[ModelAccessManager], default:None) –Custom model access control manager. If None, uses permissive default. Advanced usage only.
-
(nms_fusion_preferences¶Optional[Union[bool, dict]], default:None) –Non-Maximum Suppression fusion preferences for ONNX models. Can be: - True: Enable NMS fusion with default settings - False: Disable NMS fusion - dict: Custom NMS fusion configuration - None (default): Use model's default settings
-
(model_type¶Optional[str], default:None) –Override model architecture type (advanced). Only needed when loading local models without metadata. Examples: "yolov8", "rfdetr".
-
(task_type¶Optional[str], default:None) –Override task type (advanced). Only needed when loading local models without metadata. Examples: "object-detection", "classification".
-
(allow_loading_dependency_models¶bool, default:True) –Allow loading models that depend on other models (e.g., some VLMs depend on separate vision encoders). Default: True.
-
(dependency_models_params¶Optional[dict], default:None) –Parameters to pass to dependency models. Dict mapping dependency names to parameter dicts. Advanced usage only.
-
(point_model_directory¶Optional[Callable[[str], None]], default:None) –Callback function called with the model directory path after loading. Advanced usage only.
-
(forwarded_kwargs¶Optional[List[str]], default:None) –List of kwargs to forward to dependency models. Advanced usage only.
-
(weights_provider_extra_query_params¶Optional[List[Tuple[str, str]]], default:None) –Extra query parameters to pass to the weights' provider. Advanced usage only.
-
(weights_provider_extra_headers¶Optional[Dict[str, str]], default:None) –Extra headers to pass to the weights' provider. Advanced usage only.
-
–**kwargs¶Additional model-specific parameters passed to the model's
from_pretrained()method. Varies by model type.
Returns:
-
AnyModel–Loaded model instance. The specific type depends on the model's task: - ObjectDetectionModel: For object detection (YOLO, RF-DETR, etc.) - ClassificationModel: For single-label classification - MultiLabelClassificationModel: For multi-label classification - InstanceSegmentationModel: For instance segmentation - KeyPointsDetectionModel: For keypoint detection - DepthEstimationModel: For depth estimation - StructuredOCRModel: For OCR with structured output - TextImageEmbeddingModel: For vision-language embeddings (CLIP, etc.) - OpenVocabularyObjectDetectionModel: For open-vocabulary detection
Raises:
-
UnauthorizedModelAccessError–If API key is invalid or model access is denied.
-
ModelPackageNotFoundError–If no compatible model package is found for your environment and requested parameters.
-
CorruptedModelPackageError–If model files are corrupted or incomplete.
-
InvalidParameterError–If provided parameters are invalid.
-
DirectLocalStorageAccessError–If local path loading is disabled but a local path was provided.
Examples:
Basic usage with pre-trained model:
>>> from inference_models import AutoModel
>>> model = AutoModel.from_pretrained("yolov8n-640")
>>> predictions = model(image)
Load custom Roboflow model:
Force specific backend and device:
>>> model = AutoModel.from_pretrained(
... "rfdetr-base",
... backend="torch",
... device="cuda:1"
... )
Load with quantization:
Load from local checkpoint:
>>> model = AutoModel.from_pretrained(
... "/path/to/checkpoint.pth",
... model_type="rfdetr-base",
... labels=["cat", "dog"]
... )
Enable verbose logging:
See Also
AutoModel.describe_model(): View model metadata before loadingAutoModel.describe_model_package(): View specific package detailsAutoModel.describe_compute_environment(): Check available backendsAutoModel.list_available_models(): List all registered models
describe_model
classmethod
¶
describe_model(model_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=False, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)
Display comprehensive metadata and available packages for a model.
Shows detailed information about a model without loading it, including:
- Model architecture and variant
- Task type (object detection, classification, etc.)
- Available model packages (different backends, quantizations, batch sizes)
- Package requirements and compatibility
- Model dependencies (if any)
- Package sizes (optional, requires network requests)
This is useful for:
- Exploring available models before loading
- Understanding which backends are available for a model
- Checking model requirements and compatibility
- Debugging model loading issues
- Selecting the right package for your environment
Parameters:
-
(model_id¶str) –Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")
-
(weights_provider¶str, default:'roboflow') –Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)
-
(api_key¶Optional[str], default:None) –Roboflow API key for accessing private models. If not provided, uses the
ROBOFLOW_API_KEYenvironment variable. Not required for public pre-trained models. -
(pull_artefacts_size¶bool, default:False) –Whether to calculate and display the total size of each model package. This requires making network requests to check file sizes, so it's slower. Default: False.
-
(weights_provider_extra_query_params¶Optional[List[Tuple[str, str]]], default:None) –Extra query parameters to pass to the weights' provider. Advanced usage only.
-
(weights_provider_extra_headers¶Optional[Dict[str, str]], default:None) –Extra headers to pass to the weights' provider. Advanced usage only.
Returns:
-
None–None. Prints formatted tables to the console showing: 1. Model overview table with architecture, task type, and dependencies 2. Available packages table with backend, quantization, and batch size info
Raises:
-
UnauthorizedModelAccessError–If API key is invalid or model access is denied.
-
ModelNotFoundError–If the model ID doesn't exist in the weights provider.
Examples:
View model information:
View with package sizes:
>>> AutoModel.describe_model("rfdetr-base", pull_artefacts_size=True)
# Same as above, but includes a "Size" column showing package sizes
View private model:
See Also
AutoModel.describe_model_package(): View detailed info for a specific packageAutoModel.describe_compute_environment(): Check your runtime environmentAutoModel.from_pretrained(): Load a model after inspecting it
describe_model_package
classmethod
¶
describe_model_package(model_id, package_id, weights_provider='roboflow', api_key=None, pull_artefacts_size=True, weights_provider_extra_query_params=None, weights_provider_extra_headers=None)
Display detailed information about a specific model package.
Shows comprehensive details for a single model package, including:
- Backend type (PyTorch, ONNX, TensorRT, etc.)
- Quantization level (FP32, FP16, INT8, etc.)
- Batch size configuration (fixed or dynamic)
- Required dependencies and environment
- Package artifacts (model files, configs, etc.)
- Total package size (optional)
- Hardware requirements (CUDA version, TensorRT version, etc.)
This is useful for:
- Understanding package requirements before loading
- Debugging compatibility issues
- Checking package size before download
- Verifying package contents
Parameters:
-
(model_id¶str) –Model identifier. Can be: - Pre-trained model ID (e.g., "yolov8n-640", "rfdetr-base") - Custom Roboflow model (e.g., "my-project/2")
-
(package_id¶str) –Specific package identifier to inspect. Get this from
AutoModel.describe_model()output. -
(weights_provider¶str, default:'roboflow') –Source for model metadata. Options: - "roboflow" (default): Query Roboflow platform - Custom provider name (if registered)
-
(api_key¶Optional[str], default:None) –Roboflow API key for accessing private models. If not provided, uses the
ROBOFLOW_API_KEYenvironment variable. Not required for public pre-trained models. -
(pull_artefacts_size¶bool, default:True) –Whether to calculate and display the size of each artifact in the package. This requires making network requests to check file sizes, so it's slower. Default: True.
-
(weights_provider_extra_query_params¶Optional[List[Tuple[str, str]]], default:None) –Extra query parameters to pass to the weights' provider. Advanced usage only.
-
(weights_provider_extra_headers¶Optional[Dict[str, str]], default:None) –Extra headers to pass to the weights' provider. Advanced usage only.
Returns:
-
None–None. Prints a formatted table to the console showing package details.
Raises:
-
UnauthorizedModelAccessError–If API key is invalid or model access is denied.
-
ModelNotFoundError–If the model ID doesn't exist in the weights provider.
-
NoModelPackagesAvailableError–If the specified package_id doesn't exist for this model.
Examples:
View package details:
>>> from inference_models import AutoModel
>>> # First, see available packages
>>> AutoModel.describe_model("yolov8n-640")
>>> # Then inspect a specific package
>>> AutoModel.describe_model_package("yolov8n-640", "pkg-trt-fp16-1-32")
View without artifact sizes (faster):
>>> AutoModel.describe_model_package(
... "rfdetr-base",
... "pkg-torch-fp32",
... pull_artefacts_size=False
... )
See Also
AutoModel.describe_model(): View all available packages for a modelAutoModel.describe_compute_environment(): Check your runtime environmentAutoModel.from_pretrained(): Load a model with a specific package
describe_compute_environment
classmethod
¶
Inspect and display the current runtime environment and available backends.
Performs a comprehensive scan of your system to detect:
- Hardware: GPU availability, GPU models, compute capability
- CUDA: Driver version, CUDA toolkit version
- TensorRT: TensorRT version and availability
- PyTorch: PyTorch and torchvision versions
- ONNX Runtime: Version and available execution providers
- Other backends: Hugging Face Transformers, Ultralytics, MediaPipe
- Platform: OS version, Jetson type (if applicable), L4T version
This is useful for:
- Debugging model loading issues
- Verifying backend installations
- Checking hardware compatibility
- Understanding which model packages will work in your environment
- Troubleshooting performance issues
Returns:
-
None–None. Prints a formatted table to the console showing all detected
-
None–environment information.
Examples:
Check your environment:
>>> from inference_models import AutoModel
>>> AutoModel.describe_compute_environment()
# Displays output like:
Compute environment details
Detected GPUs: N/A
Detected GPUs CUDA CC: N/A
NVIDIA driver: N/A
CUDA version: N/A
TRT version: N/A
TRT Python package available: False
OS version: macos-26.2-arm64-arm-64bit
torch version: 2.6.0
torchvision version: 0.21.0
ONNX runtime version: 1.21.0
Detected ONNX execution providers: CoreMLExecutionProvider, AzureExecutionProvider, CPUExecutionProvider
See Also
AutoModel.describe_model(): View model metadata and requirementsAutoModel.from_pretrained(): Load a model (uses this environment info)