Work with Model Predictions¶
This guide shows you how to work with predictions from inference-models, including converting to different formats, accessing prediction data, filtering results, and visualizing outputs.
Understanding Prediction Formats¶
inference-models vs Supervision¶
inference-models uses its own prediction dataclasses that are similar but not identical to Supervision's format. This may seem confusing at first, but there are important reasons for this design:
Why separate formats?
- Performance:
inference-modelsuses PyTorch tensors internally for maximum throughput - Minimal metadata: Predictions carry only essential data, reducing memory overhead
- Backend optimization: Different backends (TensorRT, PyTorch, ONNX) can optimize tensor operations
Easy conversion: Use .to_supervision() to convert to Supervision format when you need rich visualization and analysis tools.
from inference_models import AutoModel
import supervision as sv
model = AutoModel.from_pretrained("yolov8n-640")
predictions = model(image)
# inference-models format (lightweight, tensor-based)
result = predictions[0] # Detections
# Convert to Supervision format (rich features, numpy-based)
detections = result.to_supervision() # sv.Detections
When to convert
- Keep
inference-modelsformat for high-throughput pipelines and filtering - Convert to Supervision when you need visualization, tracking, or advanced analysis
Work in Progress
The prediction format is evolving. We're working on making essential properties like class_names and other metadata transferrable to Supervision objects. However, these will likely use references to the model instance rather than duplicating data in each prediction.
For example, model.class_names is the source of truth, rather than each prediction holding a separate copy of the class names list. This reduces memory overhead and ensures consistency.
Prediction Types¶
Different model types return different prediction formats:
Supervision-Compatible Formats¶
These formats can be converted to Supervision using .to_supervision():
| Model Type | inference-models Class |
Supervision Class | Key Attributes |
|---|---|---|---|
| Object Detection | Detections |
sv.Detections |
xyxy, confidence, class_id |
| Instance Segmentation | InstanceDetections |
sv.Detections |
Same as detection + mask |
| Keypoint Detection | KeyPoints |
sv.KeyPoints |
xy, confidence, class_id |
Other Formats¶
These formats have specialized structures:
| Model Type | inference-models Class |
Key Attributes |
|---|---|---|
| Classification | ClassificationPrediction |
class_id, confidence |
| Multi-Label Classification | MultiLabelClassificationPrediction |
class_ids, confidence |
| Semantic Segmentation | SemanticSegmentationResult |
segmentation_map, confidence |
Other supported formats include embeddings (CLIP, SAM), depth estimation, OCR, gaze detection, and vision-language models (VLM/LMM).
Working with inference-models Format¶
Object Detection¶
from inference_models import AutoModel
import cv2
# Load model and run inference
model = AutoModel.from_pretrained("yolov8n-640")
image = cv2.imread("path/to/image.jpg")
predictions = model(image)
# Access first image predictions (batch of 1)
result = predictions[0] # Detections
# Access tensor data (PyTorch tensors for performance)
print(f"Boxes (xyxy): {result.xyxy}") # torch.Tensor shape [N, 4]
print(f"Confidence: {result.confidence}") # torch.Tensor shape [N]
print(f"Class IDs: {result.class_id}") # torch.Tensor shape [N]
# Access model metadata (source of truth)
print(f"All classes: {model.class_names}") # Reference to model's class list
# Iterate over detections
for i in range(len(result.xyxy)):
print(f"Detection {i}:")
print(f" Box: {result.xyxy[i]}")
print(f" Confidence: {result.confidence[i]:.2f}")
print(f" Class ID: {result.class_id[i]}")
print(f" Class name: {model.class_names[result.class_id[i]]}")
Classification¶
from inference_models import AutoModel
model = AutoModel.from_pretrained("resnet50")
predictions = model(image)
# Access prediction
result = predictions[0] # ClassificationPrediction
print(f"Class ID: {result.class_id}") # torch.Tensor
print(f"Confidence: {result.confidence:.2f}") # torch.Tensor
print(f"Class name: {model.class_names[result.class_id]}")
Multi-Label Classification¶
model = AutoModel.from_pretrained("multi-label-model")
predictions = model(image)
result = predictions[0] # MultiLabelClassificationPrediction
print(f"Predicted class IDs: {result.class_ids}") # torch.Tensor
print(f"Confidences: {result.confidence}") # torch.Tensor
for class_id, conf in zip(result.class_ids, result.confidence):
print(f"{model.class_names[class_id]}: {conf:.2f}")
Converting to Supervision Format¶
When you need rich visualization and analysis features, convert to Supervision format using .to_supervision():
Object Detection¶
import supervision as sv
from inference_models import AutoModel
model = AutoModel.from_pretrained("yolov8n-640")
predictions = model(image)
# Convert to Supervision Detections
detections = predictions[0].to_supervision()
# Now you have sv.Detections with numpy arrays
print(f"Number of detections: {len(detections)}")
print(f"Boxes: {detections.xyxy}") # numpy array [[x1, y1, x2, y2], ...]
print(f"Confidence: {detections.confidence}") # numpy array [0.95, 0.87, ...]
print(f"Class IDs: {detections.class_id}") # numpy array [0, 1, 2, ...]
print(f"Class names: {detections.data['class_name']}") # numpy array ['person', 'car', ...]
Format Differences
inference-models: Uses PyTorch tensors (torch.Tensor) for speed, references model metadata- Supervision: Uses NumPy arrays (
np.ndarray) for compatibility, includes metadata in object
The .to_supervision() method handles the conversion automatically, copying necessary metadata from the model instance.
Instance Segmentation¶
model = AutoModel.from_pretrained("yolov8n-seg-640")
predictions = model(image)
# Convert to Supervision Detections (includes masks)
detections = predictions[0].to_supervision()
# Access masks
print(f"Masks shape: {detections.mask.shape}") # (N, H, W)
print(f"Has masks: {detections.mask is not None}")
Filtering Predictions¶
By Confidence Threshold¶
import supervision as sv
from inference_models import AutoModel
# Run inference
model = AutoModel.from_pretrained("yolov8n-640")
predictions = model(image)
# Convert to Supervision format
detections = predictions[0].to_supervision() # sv.Detections
# Filter by confidence threshold
high_confidence = detections[detections.confidence > 0.7]
print(f"High confidence detections: {len(high_confidence)}")
By Bounding Box Area¶
import supervision as sv
from inference_models import AutoModel
# Run inference
model = AutoModel.from_pretrained("yolov8n-640")
predictions = model(image)
# Convert to Supervision format
detections = predictions[0].to_supervision() # sv.Detections
# Calculate areas and filter
areas = detections.box_area
large_objects = detections[areas > 10000] # Filter objects larger than 10000 pixels
print(f"Large objects: {len(large_objects)}")
Visualization¶
Use Supervision's annotators to visualize predictions:
import cv2
import supervision as sv
from inference_models import AutoModel
# Load model and run inference
model = AutoModel.from_pretrained("yolov8n-640")
image = cv2.imread("path/to/image.jpg")
predictions = model(image)
# Convert to Supervision format
detections = predictions[0].to_supervision() # sv.Detections
# Create annotator and visualize
box_annotator = sv.BoundingBoxAnnotator()
annotated_image = box_annotator.annotate(
scene=image.copy(),
detections=detections
)
# Save result
cv2.imwrite("output.jpg", annotated_image)
Next Steps¶
- Supervision Documentation - Learn more about Supervision annotators and utilities
- Choose the Right Backend - Optimize performance for your use case
- Supported Models - Browse available models