Skip to content

📦 Installation Guide

This guide covers all installation options for the inference-models package.

🧩 Composable Extras

The inference-models package uses composable extras to give you fine-grained control over dependencies. Instead of installing everything at once, you can mix and match components based on your needs:

  • Backend extras (torch-cu128, onnx-cpu, trt10) - Choose your inference runtime
  • Model extras (mediapipe) - Add support for specific model families

This modular approach keeps installations lightweight and avoids dependency conflicts. For example, you can combine torch-cu128 (PyTorch with CUDA 12.8) + onnx-cu12 (ONNX Runtime) + trt10 (TensorRT) in a single installation for maximum flexibility.

Backend-specific extras and PyPI package installation

The granular extras defined in pyproject.toml use a dependency control mechanism that works seamlessly when building inference-models locally with uv. However, packages published to PyPI are standard Python wheels, so resolution of additional indexes for certain dependencies requires manual adjustment. For example, installing torch with CUDA 12.8 support requires specifying a dedicated index: https://download.pytorch.org/whl/cu128. For this reason, when installing from PyPI, we recommend first installing dependencies such as torch, onnxruntime, or tensorrt in the specific versions you need. The inference-models package only defines loose constraints for these dependencies (e.g. torch>=2.0.0,<3.0.0), giving you full control over your build. Detailed instructions are provided in the sections below — you can also inspect pyproject.toml for exact dependency requirements and the additional indexes that provide them.

✅ Prerequisites

  • Python 3.10 - 3.12
  • pip or uv package manager
  • For GPU support: CUDA-compatible GPU with appropriate drivers

We recommend using uv for faster and more reliable installations:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

Learn more about uv at docs.astral.sh/uv.

📋 What Gets Installed

Base Installation

The base inference-models package includes:

  • PyTorch (CPU) - Deep learning framework
  • Hugging Face Transformers - Transformer models support
  • OpenCV - Computer vision utilities
  • Supervision - Vision utilities and annotations

For information about which extras are required for specific model architectures, see the Supported Models documentation.

Optional Extras

Install additional backends and specialized models using extras:

Extra What It Provides When to Use
Backend Extras
torch-cpu PyTorch CPU-only CPU-only environments, development
torch-cu118 PyTorch + CUDA 11.8 NVIDIA GPUs with CUDA 11.8 (legacy)
torch-cu124 PyTorch + CUDA 12.4 NVIDIA GPUs with CUDA 12.4
torch-cu126 PyTorch + CUDA 12.6 NVIDIA GPUs with CUDA 12.6
torch-cu128 PyTorch + CUDA 12.8 NVIDIA GPUs with CUDA 12.8
torch-cu130 PyTorch + CUDA 13.0 NVIDIA GPUs with CUDA 13.0
torch-jp6-cu126 PyTorch for Jetson JetPack 6 NVIDIA Jetson devices (see Hardware Compatibility)
onnx-cpu ONNX Runtime CPU CPU inference, Roboflow models
onnx-cu118 ONNX Runtime + CUDA 11.8 GPU inference with CUDA 11.8
onnx-cu12 ONNX Runtime + CUDA 12.x GPU inference with CUDA 12.x
onnx-jp6-cu126 ONNX Runtime for Jetson NVIDIA Jetson devices (see Hardware Compatibility)
trt10 TensorRT 10 Maximum GPU performance, production
Model Extras
mediapipe MediaPipe models Face detection, pose estimation

💻 Basic Installation

CPU Installation

For CPU-only environments:

# Using uv (recommended)
uv pip install inference-models

# Using pip
pip install inference-models

This installs the base package with PyTorch CPU support.

CPU with ONNX Backend

For running models trained on Roboflow platform (recommended for CPU):

# Using uv
uv pip install "inference-models[onnx-cpu]"

# Using pip
pip install "inference-models[onnx-cpu]"

🎮 GPU Installation

TensorRT Version Compatibility

TensorRT engines are sensitive to version compatibility. A TensorRT engine compiled with a specific TensorRT version may not work with a different runtime version.

  • Roboflow platform provides TensorRT packages compiled with TensorRT 10.12.0.36 and maintains forward compatibility within the 10.x series
  • Custom compiled engines are not guaranteed to be forward compatible - match the exact TensorRT version used during compilation
  • Best practice: Match your TensorRT version with other dependencies in your environment

When installing the trt10 extra, we recommend pinning to tensorrt==10.12.0.36 for compatibility with Roboflow-provided engines.

CUDA 13.0

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu130 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu130,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu130 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu130,onnx-cu12,trt10]"

CUDA 12.8

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu128,onnx-cu12,trt10]"

CUDA 12.6

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu126 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu126,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu126 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu126,onnx-cu12,trt10]"

CUDA 12.4

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu124,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu124,onnx-cu12,trt10]"

🤖 Jetson Installation

For NVIDIA Jetson devices, see the Hardware Compatibility guide for detailed installation instructions and platform-specific requirements.

Jetson with JetPack 6 (CUDA 12.6)

# Using uv (recommended)
uv pip install --index-url https://pypi.jetson-ai-lab.io/jp6/cu126/+simple torch torchvision onnxruntime-gpu
uv pip install "inference-models[torch-jp6-cu126,onnx-jp6-cu126]"

# Using pip
pip install --index-url https://pypi.jetson-ai-lab.io/jp6/cu126/+simple torch torchvision onnxruntime-gpu
pip install "inference-models[torch-jp6-cu126,onnx-jp6-cu126]"

Jetson TensorRT

Jetson installations should use the pre-compiled TensorRT package shipped with JetPack. Do not install the trt10 extra on Jetson devices.

🔧 Additional Features

MediaPipe Models

Enables MediaPipe-based models including Face Detection:

uv pip install "inference-models[mediapipe]"

SAM2 Real-Time

SAM2 Real-Time requires manual installation from GitHub:

# First install inference-models with any CUDA backend
pip install "inference-models[torch-cu128]"  # or torch-cu126, torch-cu124, etc.

# Then install SAM2 Real-Time
pip install git+https://github.com/Gy920/segment-anything-2-real-time.git

PyPI Restriction

Due to PyPI restrictions on Git dependencies, SAM2 Real-Time must be installed separately.

🔗 Combining Extras

You can combine multiple extras in a single installation:

# GPU with multiple backends and additional models
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10,mediapipe]" "tensorrt==10.12.0.36"

Conflicting Extras

Some extras cannot be installed together:

  • Only one torch-* extra at a time
  • Only one onnx-* extra at a time

The library will prevent conflicting installations when using uv.

🔒 Reproducible Installations

For production deployments requiring strict dependency control, use the uv.lock file:

# Clone the repository
git clone https://github.com/roboflow/inference.git
cd inference/inference_models

# Install from lock file
uv sync --frozen

See the official Docker builds for examples.

✅ Verifying Installation

Test your installation:

from inference_models import AutoModel

# This will show available backends
AutoModel.describe_compute_environment()

# Try loading a model
model = AutoModel.from_pretrained("rfdetr-base")
print("Installation successful!")

🔧 Troubleshooting

Missing Dependencies Error

If you see an error about missing dependencies when loading a model:

  1. Check which backend the model requires
  2. Install the appropriate extra (e.g., onnx-cpu, trt10)

CUDA Version Mismatch

Rule of thumb: Match the major CUDA version between your system and the installed extras. Do not install packages built for a newer CUDA version than what's installed on your system, as they may require CUDA symbols from *.so libraries that aren't available in older installations.

Check your CUDA version:

# Check CUDA compiler version (most reliable)
nvcc --version

# Check where CUDA is installed
ls -la /usr/local/cuda

nvidia-smi Can Be Misleading

nvidia-smi shows the driver version and maximum supported CUDA version, not the actual CUDA toolkit version installed. Always verify with nvcc --version or check the /usr/local/cuda symlink.

Install matching extras:

# For CUDA 12.x
uv pip install "inference-models[torch-cu128,onnx-cu12]"

# For CUDA 11.8
uv pip install "inference-models[torch-cu118,onnx-cu118]"

🚀 Next Steps