📦 Installation Guide¶

This guide covers all installation options for the inference-models package.

🧩 Composable Extras¶

The inference-models package uses composable extras to give you fine-grained control over dependencies. Instead of installing everything at once, you can mix and match components based on your needs:

Backend extras (torch-cu128, onnx-cpu, trt10) - Choose your inference runtime

This modular approach keeps installations lightweight and avoids dependency conflicts. For example, you can combine torch-cu128 (PyTorch with CUDA 12.8) + onnx-cu12 (ONNX Runtime) + trt10 (TensorRT) in a single installation for maximum flexibility.

Backend-specific extras and PyPI package installation

The granular extras defined in pyproject.toml use a dependency control mechanism that works seamlessly when building inference-models locally with uv. However, packages published to PyPI are standard Python wheels, so resolution of additional indexes for certain dependencies requires manual adjustment. For example, installing torch with CUDA 12.8 support requires specifying a dedicated index: https://download.pytorch.org/whl/cu128. For this reason, when installing from PyPI, we recommend first installing dependencies such as torch, onnxruntime, or tensorrt in the specific versions you need. The inference-models package only defines loose constraints for these dependencies (e.g. torch>=2.0.0,<3.0.0), giving you full control over your build. Detailed instructions are provided in the sections below — you can also inspect pyproject.toml for exact dependency requirements and the additional indexes that provide them.

✅ Prerequisites¶

Python 3.10 - 3.12
pip or uv package manager
For GPU support: CUDA-compatible GPU with appropriate drivers

🚀 Recommended: Using uv¶

We recommend using uv for faster and more reliable installations:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

Learn more about uv at docs.astral.sh/uv.

📋 What Gets Installed¶

Base Installation¶

The base inference-models package includes:

PyTorch (CPU) - Deep learning framework
Hugging Face Transformers - Transformer models support
OpenCV - Computer vision utilities
Supervision - Vision utilities and annotations

For information about which extras are required for specific model architectures, see the Supported Models documentation.

Optional Extras¶

Install additional backends and specialized models using extras:

Extra	What It Provides	When to Use
Backend Extras
`torch-cpu`	PyTorch CPU-only	CPU-only environments, development
`torch-cu118`	PyTorch + CUDA 11.8	NVIDIA GPUs with CUDA 11.8 (legacy)
`torch-cu124`	PyTorch + CUDA 12.4	NVIDIA GPUs with CUDA 12.4
`torch-cu126`	PyTorch + CUDA 12.6	NVIDIA GPUs with CUDA 12.6
`torch-cu128`	PyTorch + CUDA 12.8	NVIDIA GPUs with CUDA 12.8
`torch-cu130`	PyTorch + CUDA 13.0	NVIDIA GPUs with CUDA 13.0
`torch-jp6-cu126`	PyTorch for Jetson JetPack 6	NVIDIA Jetson devices (see Hardware Compatibility)
`onnx-cpu`	ONNX Runtime CPU	CPU inference, Roboflow models
`onnx-cu118`	ONNX Runtime + CUDA 11.8	GPU inference with CUDA 11.8
`onnx-cu12`	ONNX Runtime + CUDA 12.x	GPU inference with CUDA 12.x
`onnx-jp6-cu126`	ONNX Runtime for Jetson	NVIDIA Jetson devices (see Hardware Compatibility)
`trt10`	TensorRT 10	Maximum GPU performance, production

💻 Basic Installation¶

CPU Installation¶

For CPU-only environments:

# Using uv (recommended)
uv pip install inference-models

# Using pip
pip install inference-models

This installs the base package with PyTorch CPU support.

CPU with ONNX Backend¶

For running models trained on Roboflow platform (recommended for CPU):

# Using uv
uv pip install "inference-models[onnx-cpu]"

# Using pip
pip install "inference-models[onnx-cpu]"

🎮 GPU Installation¶

TensorRT Version Compatibility

TensorRT engines are sensitive to version compatibility. A TensorRT engine compiled with a specific TensorRT version may not work with a different runtime version.

Roboflow platform provides TensorRT packages compiled with TensorRT 10.12.0.36 and maintains forward compatibility within the 10.x series
Custom compiled engines are not guaranteed to be forward compatible - match the exact TensorRT version used during compilation
Best practice: Match your TensorRT version with other dependencies in your environment

When installing the trt10 extra, we recommend pinning to tensorrt==10.12.0.36 for compatibility with Roboflow-provided engines.

CUDA 13.0¶

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu130 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu130,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu130 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu130,onnx-cu12,trt10]"

CUDA 12.8¶

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu128 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu128,onnx-cu12,trt10]"

CUDA 12.6¶

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu126 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu126,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu126 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu126,onnx-cu12,trt10]"

CUDA 12.4¶

# Using uv (recommended)
uv pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision
uv pip install "tensorrt==10.12.0.36"
uv pip install "inference-models[torch-cu124,onnx-cu12,trt10]"

# Using pip
pip install --index-url https://download.pytorch.org/whl/cu124 torch torchvision
pip install "tensorrt==10.12.0.36"
pip install "inference-models[torch-cu124,onnx-cu12,trt10]"

🤖 Jetson Installation¶

For NVIDIA Jetson devices, see the Hardware Compatibility guide for detailed installation instructions and platform-specific requirements.

Jetson with JetPack 6 (CUDA 12.6)¶

# Using uv (recommended)
uv pip install --index-url https://pypi.jetson-ai-lab.io/jp6/cu126/+simple torch torchvision onnxruntime-gpu
uv pip install "inference-models[torch-jp6-cu126,onnx-jp6-cu126]"

# Using pip
pip install --index-url https://pypi.jetson-ai-lab.io/jp6/cu126/+simple torch torchvision onnxruntime-gpu
pip install "inference-models[torch-jp6-cu126,onnx-jp6-cu126]"

Jetson TensorRT

Jetson installations should use the pre-compiled TensorRT package shipped with JetPack. Do not install the trt10 extra on Jetson devices.

🔧 Additional Features¶

SAM2 Real-Time¶

SAM2 Real-Time requires manual installation from GitHub:

# First install inference-models with any CUDA backend
pip install "inference-models[torch-cu128]"  # or torch-cu126, torch-cu124, etc.

# Then install SAM2 Real-Time
pip install git+https://github.com/Gy920/segment-anything-2-real-time.git

PyPI Restriction

Due to PyPI restrictions on Git dependencies, SAM2 Real-Time must be installed separately.

🔗 Combining Extras¶

You can combine multiple extras in a single installation:

# GPU with multiple backends
uv pip install "inference-models[torch-cu128,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

Conflicting Extras

Some extras cannot be installed together:

Only one torch-* extra at a time
Only one onnx-* extra at a time

The library will prevent conflicting installations when using uv.

🔒 Reproducible Installations¶

For production deployments requiring strict dependency control, use the uv.lock file:

# Clone the repository
git clone https://github.com/roboflow/inference.git
cd inference/inference_models

# Install from lock file
uv sync --frozen

See the official Docker builds for examples.

✅ Verifying Installation¶

Test your installation:

from inference_models import AutoModel

# This will show available backends
AutoModel.describe_compute_environment()

# Try loading a model
model = AutoModel.from_pretrained("rfdetr-base")
print("Installation successful!")

🔧 Troubleshooting¶

Missing Dependencies Error¶

If you see an error about missing dependencies when loading a model:

Check which backend the model requires
Install the appropriate extra (e.g., onnx-cpu, trt10)

CUDA Version Mismatch¶

Rule of thumb: Match the major CUDA version between your system and the installed extras. Do not install packages built for a newer CUDA version than what's installed on your system, as they may require CUDA symbols from *.so libraries that aren't available in older installations.

Check your CUDA version:

# Check CUDA compiler version (most reliable)
nvcc --version

# Check where CUDA is installed
ls -la /usr/local/cuda

nvidia-smi Can Be Misleading

nvidia-smi shows the driver version and maximum supported CUDA version, not the actual CUDA toolkit version installed. Always verify with nvcc --version or check the /usr/local/cuda symlink.

Install matching extras:

# For CUDA 12.x
uv pip install "inference-models[torch-cu128,onnx-cu12]"

# For CUDA 11.8
uv pip install "inference-models[torch-cu118,onnx-cu118]"

🚀 Next Steps¶

Quick Overview - Learn basic usage and concepts
Understand Core Concepts - Understand the design
Models Overview - Explore available models