Backends and Installation Options¶
The inference-models package uses a composable extras system - install only the backends and models you need. This page explains all available backends, their use cases, and installation options.
Philosophy¶
Rather than forcing you to install all dependencies upfront, inference-models lets you compose your installation based on:
- Your hardware - CPU vs GPU, CUDA version
- Your models - Which model architectures you'll use
- Your deployment target - Development vs production, edge vs cloud
This approach:
- ✅ Reduces installation size and time
- ✅ Avoids dependency conflicts
- ✅ Gives you full control over your environment
- ✅ Enables reproducible builds
See Understand Core Concepts for more details on this design philosophy.
Available Backends¶
PyTorch (Torch)¶
Default backend for maximum flexibility and development.
- ✅ Pros: Widest model support, easy debugging, dynamic graphs
- ⚠️ Cons: Slower than optimized backends, larger memory footprint
- 🎯 Best for: Development, prototyping, models without ONNX/TRT support
Installation:
# CPU only
uv pip install inference-models
# GPU with CUDA 12.8
uv pip install "inference-models[torch-cu128]"
# GPU with CUDA 12.6
uv pip install "inference-models[torch-cu126]"
# GPU with CUDA 12.4
uv pip install "inference-models[torch-cu124]"
# GPU with CUDA 11.8
uv pip install "inference-models[torch-cu118]"
# Jetson (JetPack 6, CUDA 12.6)
uv pip install "inference-models[torch-jp6-cu126]"
# CPU only
pip install inference-models
# GPU with CUDA 12.8
pip install "inference-models[torch-cu128]"
# GPU with CUDA 12.6
pip install "inference-models[torch-cu126]"
# GPU with CUDA 12.4
pip install "inference-models[torch-cu124]"
# GPU with CUDA 11.8
pip install "inference-models[torch-cu118]"
# Jetson (JetPack 6, CUDA 12.6)
pip install "inference-models[torch-jp6-cu126]"
Supported Models: All PyTorch-based models (YOLOv8, RFDetr, SAM, Florence-2, etc.)
ONNX Runtime¶
Cross-platform compatibility with good performance.
- ✅ Pros: Good CPU/GPU performance, cross-platform, required for Roboflow-trained models
- ⚠️ Cons: Not as fast as TensorRT on GPU, limited to static graphs
- 🎯 Best for: Production CPU deployments, Roboflow-trained models, cross-platform compatibility
Installation:
Supported Models: YOLO family (v8-v12), YOLO-NAS, Roboflow-trained models
Required for Roboflow Models
Models trained on the Roboflow platform are exported to ONNX format. You must install the ONNX backend to use them.
TensorRT (TRT)¶
Maximum GPU performance for NVIDIA hardware.
- ✅ Pros: Fastest inference on NVIDIA GPUs, optimized kernels, low latency
- ⚠️ Cons: NVIDIA-only, requires exact environment match, longer first load
- 🎯 Best for: Production GPU deployments, real-time applications, maximum throughput
Installation:
Supported Models: YOLO family (v8-v12), YOLO-NAS, RFDetr
Environment Matching Required
TensorRT engines are compiled for specific environments. The runtime environment must exactly match:
- TensorRT version
- CUDA version
- GPU architecture (compute capability)
Mismatches will cause loading failures. Install the TensorRT version compatible with your target environment by specifying the exact version: tensorrt==x.y.z.
Hugging Face Transformers¶
Access to transformer-based models from Hugging Face Hub.
- ✅ Pros: Huge model ecosystem, latest research models, easy fine-tuning
- ⚠️ Cons: Larger models, slower than specialized backends
- 🎯 Best for: Vision-language models, transformers, Hugging Face ecosystem
Installation:
Included in base installation - no extra required.
Supported Models: OWLv2, TrOCR, and other Hugging Face models
MediaPipe¶
Optimized for mobile and edge devices.
- ✅ Pros: Highly optimized, mobile-friendly, efficient
- ⚠️ Cons: Limited model selection, specific use cases
- 🎯 Best for: Face detection, mobile deployment, edge devices
Installation:
Supported Models: MediaPipe Face Detection
Model-Specific Extras¶
Some models require additional dependencies beyond backends:
SAM (Segment Anything)¶
SAM2 (Segment Anything 2)¶
CLIP¶
DocTR (Document Text Recognition)¶
Grounding DINO¶
YOLO-World¶
CogVLM¶
Combining Extras¶
You can combine multiple extras in a single installation:
# GPU setup with multiple backends and models
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10,sam,clip]" tensorrt
# CPU setup with ONNX and specific models
uv pip install "inference-models[onnx-cpu,sam2,doctr]"
Recommended Installations¶
Development (CPU)¶
Includes ONNX for Roboflow models and popular model extras.
Development (GPU)¶
Includes PyTorch and ONNX backends with popular models.
Production (CPU)¶
Minimal installation with ONNX for best CPU performance.
Production (GPU - Maximum Performance)¶
All GPU backends for maximum flexibility and performance.
Edge/Embedded (Jetson)¶
Optimized for NVIDIA Jetson devices with JetPack 6.
Backend Selection Priority¶
When multiple backends are installed, AutoModel selects backends in this order:
- TensorRT (if GPU available and model supports it)
- PyTorch (default, widest compatibility)
- ONNX (good performance, cross-platform)
- Hugging Face (for transformer models)
- MediaPipe (for specific models)
You can override this by specifying backend_type:
from inference_models import AutoModel
# Force ONNX backend
model = AutoModel.from_pretrained("yolov8n-640", backend="onnx")
# Force TensorRT backend
model = AutoModel.from_pretrained("yolov8n-640", backend="trt")
Checking Installed Backends¶
See what backends are available in your environment:
This shows: - Installed backends - CUDA version and availability - TensorRT version - Available extras
Next Steps¶
- Installation Guide - Detailed installation instructions
- Understand Core Concepts - Design philosophy
- Quick Overview - Get started with your first model
- Supported Models - Browse available models