Skip to content

Models Overview

The inference-models library supports a wide range of computer vision models across multiple tasks and backends.

Supported Tasks

  • Object Detection: Detect and localize objects in images
  • Instance Segmentation: Detect objects with pixel-level masks
  • Semantic Segmentation: Classify every pixel in an image
  • Classification: Classify entire images or image regions
  • Embeddings: Generate vector representations for images and text
  • OCR & Document Parsing: Extract text and structure from documents
  • Interactive Segmentation: Interactive and automatic segmentation
  • Vision-Language Models: Multi-modal understanding and generation
  • Depth Estimation: Predict depth maps from images
  • Specialized: Gaze detection, face detection, and more

Model Catalog

Legend: ✅ Available | ❌ Not available | 🔑 Requires API key | 📤 Upload only

Object Detection

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
RF-DETR torch, onnx, trt Apache 2.0 N/A
YOLOv5 onnx AGPL-3.0 📤
YOLOv8 onnx, torch-script, trt AGPL-3.0
YOLOv9 onnx, torch-script, trt GPL-3.0 📤
YOLOv10 onnx, trt AGPL-3.0 📤
YOLOv11 onnx, torch-script, trt AGPL-3.0
YOLOv12 onnx, torch-script, trt AGPL-3.0
YOLO-NAS onnx, trt Apache 2.0 N/A
Grounding DINO torch Apache 2.0 N/A 🔑
OWLv2 hugging-face Apache 2.0 N/A 🔑
Roboflow Instant hugging-face Roboflow

Instance Segmentation

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
RF-DETR Seg torch Apache 2.0 N/A
YOLOv5 Seg onnx AGPL-3.0
YOLOv7 Seg onnx AGPL-3.0
YOLOv8 Seg onnx, torch-script, trt AGPL-3.0
YOLOv11 Seg onnx, torch-script, trt AGPL-3.0
YOLACT onnx MIT N/A

Classification

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
ResNet torch, onnx, trt Apache 2.0 N/A
ViT torch Apache 2.0 N/A
DINOv3 torch Meta DINO N/A
YOLOv8 Cls onnx, trt AGPL-3.0

Embeddings

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
CLIP torch, onnx MIT N/A
Perception Encoder torch FAIR Noncommercial

Semantic Segmentation

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
DeepLabV3+ torch, onnx, trt MIT N/A

OCR & Document Parsing

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
DocTR torch Apache 2.0 N/A 🔑
EasyOCR torch Apache 2.0 N/A 🔑
TrOCR hugging-face MIT N/A 🔑

Interactive Segmentation

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
SAM torch Apache 2.0 N/A 🔑
SAM2 torch Apache 2.0 N/A 🔑
SAM2 RT torch Apache 2.0 N/A

Vision-Language Models

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
Florence-2 torch MIT N/A
PaliGemma torch Gemma License N/A
Qwen2.5-VL torch Apache 2.0 N/A
Qwen3-VL torch Apache 2.0 N/A
Qwen3.5 torch Apache 2.0 N/A
SmolVLM torch Apache 2.0 N/A
Moondream2 torch Apache 2.0 N/A

Depth Estimation

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
Depth Anything V2 torch Apache 2.0 N/A 🔑
Depth Anything V3 torch Apache 2.0 N/A 🔑

Specialized Models

Model Backends License Commercial License in RF Plan Pre-trained Weights Trainable at RF
L2CS torch MIT N/A
MediaPipe Face mediapipe Apache 2.0 N/A

Next Steps