Skip to content

ViT - Classification

Vision Transformer (ViT) is a transformer-based architecture that applies the transformer model directly to image patches, achieving excellent performance on image classification tasks.

Overview

ViT for classification brings the power of transformers to computer vision. Key features include:

  • Transformer architecture - Self-attention mechanisms for global context
  • Patch-based processing - Images divided into patches and processed as sequences
  • Scalable design - Performance improves with model size and data
  • Strong transfer learning - Excellent feature representations for fine-tuning
  • Multiple variants - Different sizes and patch configurations available

License

Apache 2.0

Open Source License

Vision Transformer is licensed under Apache 2.0, making it free for both commercial and non-commercial use without restrictions.

Learn more: Apache 2.0 License

Pre-trained Model IDs

ViT models must be trained on Roboflow or uploaded as custom weights. There are no pre-trained public model IDs available for classification tasks.

Custom model ID format: project-url/version (e.g., my-project-abc123/2)

Supported Backends

Backend Extras Required
onnx onnx-cpu, onnx-cu12, onnx-cu118, onnx-jp6-cu126
hugging-face torch-cpu, torch-cu118, torch-cu124, torch-cu126, torch-cu128, torch-jp6-cu126
trt trt10

Roboflow Platform Compatibility

Feature Supported
Training ✅ Train custom models on Roboflow
Upload Weights ✅ Upload pre-trained weights (guide)
Serverless API (v2) Deploy via hosted API
Workflows ✅ Use in Workflows via Classification block
Edge Deployment (Jetson) ✅ Deploy on NVIDIA Jetson devices
Self-Hosting ✅ Deploy with inference-models

Installation

Install with one of the following extras depending on your backend:

  • ONNX: onnx-cpu, onnx-cu12
  • TensorRT: trt10 (requires CUDA 12.x)
  • Hugging Face (PyTorch): torch-cpu, torch-cu118, torch-cu124, torch-cu126, torch-cu128, torch-jp6-cu126

Usage Example

import cv2
from inference_models import AutoModel

# Load your custom model (requires Roboflow API key)
model = AutoModel.from_pretrained(
    "my-project-abc123/2",
    api_key="your_roboflow_api_key"
)
image = cv2.imread("path/to/image.jpg")

# Run inference
prediction = model(image)

# Get top prediction
top_class_id = prediction.class_id[0].item()
top_class = model.class_names[top_class_id]
confidence = prediction.confidence[0][top_class_id].item()

print(f"Class: {top_class}")
print(f"Confidence: {confidence:.2f}")

# Get all class confidences
for idx, class_name in enumerate(model.class_names):
    conf = prediction.confidence[0][idx].item()
    print(f"{class_name}: {conf:.3f}")