📦 Installation Guide¶

This guide covers all installation options for the inference-models package.

🧩 Composable Extras¶

The inference-models package uses composable extras to give you fine-grained control over dependencies. Instead of installing everything at once, you can mix and match components based on your needs:

Backend extras (torch-cu128, onnx-cpu, trt10) - Choose your inference runtime
Model extras (mediapipe) - Add support for specific model families

This modular approach keeps installations lightweight and avoids dependency conflicts. For example, you can combine torch-cu128 (PyTorch with CUDA 12.8) + onnx-cu12 (ONNX Runtime) + trt10 (TensorRT) in a single installation for maximum flexibility.

✅ Prerequisites¶

Python 3.10 - 3.12
pip or uv package manager
For GPU support: CUDA-compatible GPU with appropriate drivers

🚀 Recommended: Using uv¶

We recommend using uv for faster and more reliable installations:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

Learn more about uv at docs.astral.sh/uv.

📋 What Gets Installed¶

Base Installation¶

The base inference-models package includes:

PyTorch (CPU) - Deep learning framework
Hugging Face Transformers - Transformer models support
OpenCV - Computer vision utilities
Supervision - Vision utilities and annotations

For information about which extras are required for specific model architectures, see the Supported Models documentation.

Optional Extras¶

Install additional backends and specialized models using extras:

Extra	What It Provides	When to Use
Backend Extras
`torch-cpu`	PyTorch CPU-only	CPU-only environments, development
`torch-cu118`	PyTorch + CUDA 11.8	NVIDIA GPUs with CUDA 11.8 (legacy)
`torch-cu124`	PyTorch + CUDA 12.4	NVIDIA GPUs with CUDA 12.4
`torch-cu126`	PyTorch + CUDA 12.6	NVIDIA GPUs with CUDA 12.6
`torch-cu128`	PyTorch + CUDA 12.8	NVIDIA GPUs with CUDA 12.8
`torch-jp6-cu126`	PyTorch for Jetson JetPack 6	NVIDIA Jetson devices (see Hardware Compatibility)
`onnx-cpu`	ONNX Runtime CPU	CPU inference, Roboflow models
`onnx-cu118`	ONNX Runtime + CUDA 11.8	GPU inference with CUDA 11.8
`onnx-cu12`	ONNX Runtime + CUDA 12.x	GPU inference with CUDA 12.x
`onnx-jp6-cu126`	ONNX Runtime for Jetson	NVIDIA Jetson devices (see Hardware Compatibility)
`trt10`	TensorRT 10	Maximum GPU performance, production
Model Extras
`mediapipe`	MediaPipe models	Face detection, pose estimation

💻 Basic Installation¶

CPU Installation¶

For CPU-only environments:

# Using uv (recommended)
uv pip install inference-models

# Using pip
pip install inference-models

This installs the base package with PyTorch CPU support.

CPU with ONNX Backend¶

For running models trained on Roboflow platform (recommended for CPU):

# Using uv
uv pip install "inference-models[onnx-cpu]"

# Using pip
pip install "inference-models[onnx-cpu]"

🎮 GPU Installation¶

TensorRT Version Compatibility

TensorRT engines are sensitive to version compatibility. A TensorRT engine compiled with a specific TensorRT version may not work with a different runtime version.

Roboflow platform provides TensorRT packages compiled with TensorRT 10.12.0.36 and maintains forward compatibility within the 10.x series
Custom compiled engines are not guaranteed to be forward compatible - match the exact TensorRT version used during compilation
Best practice: Match your TensorRT version with other dependencies in your environment

When installing the trt10 extra, we recommend pinning to tensorrt==10.12.0.36 for compatibility with Roboflow-provided engines.

CUDA 12.8¶

# Using uv (recommended)
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

# Using pip
pip install "inference-models[torch-cu128,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

CUDA 12.6¶

uv pip install "inference-models[torch-cu126,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

CUDA 12.4¶

uv pip install "inference-models[torch-cu124,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

CUDA 11.8 (Legacy)¶

uv pip install "inference-models[torch-cu118,onnx-cu118,trt10]" "tensorrt==10.12.0.36"

🤖 Jetson Installation¶

For NVIDIA Jetson devices, see the Hardware Compatibility guide for detailed installation instructions and platform-specific requirements.

Jetson with JetPack 6 (CUDA 12.6)¶

uv pip install "inference-models[torch-jp6-cu126,onnx-jp6-cu126]"

Jetson TensorRT

Jetson installations should use the pre-compiled TensorRT package shipped with JetPack. Do not install the trt10 extra on Jetson devices.

🔧 Additional Features¶

MediaPipe Models¶

Enables MediaPipe-based models including Face Detection:

uv pip install "inference-models[mediapipe]"

SAM2 Real-Time¶

SAM2 Real-Time requires manual installation from GitHub:

# First install inference-models with any CUDA backend
pip install "inference-models[torch-cu128]"  # or torch-cu126, torch-cu124, etc.

# Then install SAM2 Real-Time
pip install git+https://github.com/Gy920/segment-anything-2-real-time.git

PyPI Restriction

Due to PyPI restrictions on Git dependencies, SAM2 Real-Time must be installed separately.

🔗 Combining Extras¶

You can combine multiple extras in a single installation:

# GPU with multiple backends and additional models
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10,mediapipe]" "tensorrt==10.12.0.36"

Conflicting Extras

Some extras cannot be installed together:

Only one torch-* extra at a time
Only one onnx-* extra at a time

The library will prevent conflicting installations when using uv.

🔒 Reproducible Installations¶

For production deployments requiring strict dependency control, use the uv.lock file:

# Clone the repository
git clone https://github.com/roboflow/inference.git
cd inference/inference_models

# Install from lock file
uv sync --frozen

See the official Docker builds for examples.

✅ Verifying Installation¶

Test your installation:

from inference_models import AutoModel

# This will show available backends
AutoModel.describe_compute_environment()

# Try loading a model
model = AutoModel.from_pretrained("rfdetr-base")
print("Installation successful!")

🔧 Troubleshooting¶

Missing Dependencies Error¶

If you see an error about missing dependencies when loading a model:

Check which backend the model requires
Install the appropriate extra (e.g., onnx-cpu, trt10)

CUDA Version Mismatch¶

Rule of thumb: Match the major CUDA version between your system and the installed extras. Do not install packages built for a newer CUDA version than what's installed on your system, as they may require CUDA symbols from *.so libraries that aren't available in older installations.

Check your CUDA version:

# Check CUDA compiler version (most reliable)
nvcc --version

# Check where CUDA is installed
ls -la /usr/local/cuda

nvidia-smi Can Be Misleading

nvidia-smi shows the driver version and maximum supported CUDA version, not the actual CUDA toolkit version installed. Always verify with nvcc --version or check the /usr/local/cuda symlink.

Install matching extras:

# For CUDA 12.x
uv pip install "inference-models[torch-cu128,onnx-cu12]"

# For CUDA 11.8
uv pip install "inference-models[torch-cu118,onnx-cu118]"

🚀 Next Steps¶

Quick Overview - Learn basic usage and concepts
Understand Core Concepts - Understand the design
Models Overview - Explore available models