SAM2 RT - Video Object Tracking & Segmentation¶
SAM2 RT (Real-Time) is an optimized fork of SAM2 designed for efficient video object tracking and instance segmentation.
SAM2 RT provides:
- Video object tracking - Track and segment objects across video frames
- Stateful tracking - Maintain object identities across frames
- Prompt-based initialization - Start tracking with bounding boxes
- Instance segmentation - Segment objects in single images
- Optimized inference - Faster than standard SAM2 for video use cases
License & Attribution
License: Apache 2.0 Source: Segment Anything 2 Real Time fork by Gy920 Original: Based on Meta's Segment Anything 2
Model IDs¶
SAM2 RT models do not require a Roboflow API key.
| Model Size | Model ID |
|---|---|
| Tiny | Gy920/sam2-1-hiera-tiny |
| Small | Gy920/sam2-1-hiera-small |
| Base+ | Gy920/sam2-1-hiera-base-plus |
| Large | Gy920/sam2-1-hiera-large |
Installation¶
SAM2 RT requires special installation from GitHub:
# First install inference-models with a CUDA backend (GPU required)
pip install "inference-models[torch-cu128]" # or torch-cu126, torch-cu124, etc.
# Then install SAM2 Real-Time
pip install git+https://github.com/Gy920/segment-anything-2-real-time.git
GPU Required
SAM2 RT requires a CUDA-capable GPU. CPU-only installation is not supported.
PyPI Restriction
Due to PyPI restrictions on Git dependencies, SAM2 Real-Time must be installed separately.
Supported Backends¶
| Backend | Extras Required |
|---|---|
torch |
torch-cu118, torch-cu124, torch-cu126, torch-cu128 |
Roboflow Platform Compatibility¶
| Feature | Supported |
|---|---|
| Training | ❌ No custom training |
| Upload Weights | ❌ Not applicable |
| Serverless API (v2) | ❌ Not available |
| Workflows | ❌ Not available |
| Edge Deployment (Jetson) | ❌ Not supported |
| Self-Hosting | ✅ Deploy with inference-models |
Video Object Tracking¶
Track objects across video frames with persistent IDs:
import cv2
import supervision as sv
from inference_models import AutoModel
# Load model (no API key needed)
model = AutoModel.from_pretrained("Gy920/sam2-1-hiera-tiny")
mask_annotator = sv.MaskAnnotator(opacity=0.7, color_lookup=sv.ColorLookup.TRACK)
vid = cv2.VideoCapture("video.mp4")
frame_num = 0
while True:
is_ok, frame = vid.read()
if not is_ok:
break
if frame_num == 0:
# Initialize tracking with bounding boxes on first frame
object_ids, masks, state = model.prompt(
frame,
bboxes=[(477, 337, 560, 529), (633, 570, 843, 804)]
)
frame_num += 1
else:
# Track objects in subsequent frames
object_ids, masks, state = model.track(frame)
# Visualize results
detections = sv.Detections(
xyxy=sv.mask_to_xyxy(masks=masks),
mask=masks,
tracker_id=object_ids
)
annotated_frame = mask_annotator.annotate(scene=frame, detections=detections)
cv2.imshow("Tracking", annotated_frame)
cv2.waitKey(1)
Instance Segmentation¶
Segment objects in a single image:
import cv2
import numpy as np
import supervision as sv
from inference_models import AutoModel
# Load model
model = AutoModel.from_pretrained("sam2-rt-hiera-tiny")
mask_annotator = sv.MaskAnnotator()
box_annotator = sv.BoxAnnotator(color=sv.Color.BLACK)
# Load image
image = cv2.imread("image.jpg")
# Segment with bounding box prompt
masks, object_ids, state = model.prompt(image, bboxes=[(117, 303, 670, 650)])
# Create detections
classes = np.array([0 for _ in masks])
detections = sv.Detections(
xyxy=sv.mask_to_xyxy(masks=masks),
mask=masks,
class_id=classes,
)
# Visualize
annotated_frame = mask_annotator.annotate(scene=image, detections=detections)
annotated_frame = box_annotator.annotate(scene=annotated_frame, detections=detections)
cv2.imshow("Segmentation", annotated_frame)
cv2.waitKey(0)
API Reference¶
prompt(image, bboxes, ...)¶
Initialize tracking or segment an image with bounding box prompts. Returns (masks, object_ids, state).
track(image, ...)¶
Track previously initialized objects in a new frame. Returns (masks, object_ids, state).
Must Call prompt() First
You must call prompt() at least once before calling track(). The model needs initial prompts to know what to track.
Use Cases¶
SAM2 RT is ideal for:
- ✅ Video object tracking - Track multiple objects across video frames
- ✅ Sports analytics - Track players, balls, and equipment in sports videos
- ✅ Surveillance - Monitor and track objects in security footage
- ✅ Traffic analysis - Track vehicles and pedestrians
- ✅ Wildlife monitoring - Track animals in nature videos
- ✅ Interactive video annotation - Quickly annotate video datasets
Key Differences from SAM2¶
| Feature | SAM2 | SAM2 RT |
|---|---|---|
| Primary Use Case | Interactive image segmentation | Video object tracking |
| API Key | Required | Not required |
| Stateful Tracking | ❌ No | ✅ Yes |
| Video Optimization | ❌ No | ✅ Yes |
| Caching Support | ✅ Yes | ❌ No |
| Point Prompts | ✅ Yes | ❌ No (boxes only) |
| Multi-mask Output | ✅ Yes | ❌ No |
Performance Tips¶
- Use smaller models for speed -
hiera-tinyis fastest for tracking - GPU is essential - Video tracking requires GPU for acceptable performance
- Batch processing - Process video frames sequentially, don't skip frames
- State management - Keep the state dictionary if you need to pause/resume tracking