Architecture Overview¶
This guide provides a deep dive into the inference-models architecture for contributors and advanced users.
System Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ User Application │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Public API Layer │
│ - AutoModel.from_pretrained() │
│ - AutoModelPipeline.from_pretrained() │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Auto-Loading & Resolution Layer │
│ - Model metadata retrieval │
│ - Backend selection & negotiation │
│ - Package filtering & ranking │
│ - Dependency resolution │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Weights Provider Layer │
│ - Roboflow API client │
│ - Local filesystem provider │
│ - Custom providers (extensible) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Download & Cache Layer │
│ - File download with retry │
│ - Hash verification │
│ - Local caching │
│ - File locking for concurrent access │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Model Registry │
│ Maps (architecture, task, backend) → Model Class │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Model Implementation Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PyTorch │ │ ONNX │ │ TensorRT │ │
│ │ Models │ │ Models │ │ Models │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Hugging Face │ │ MediaPipe │ │
│ │ Models │ │ Models │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Core Components¶
1. AutoModel Class¶
Location: inference_models/models/auto_loaders/core.py
The main entry point for model loading. Key responsibilities:
- Validate user inputs
- Coordinate the loading process
- Manage caching
- Handle errors gracefully
Key Methods:
class AutoModel:
@classmethod
def from_pretrained(cls, model_name_or_path, **kwargs) -> AnyModel:
"""Load a model with automatic backend selection"""
@classmethod
def describe_model(cls, model_id, **kwargs) -> None:
"""Show model information without loading"""
@classmethod
def describe_runtime(cls) -> None:
"""Show runtime environment information"""
2. Model Registry¶
Location: inference_models/models/auto_loaders/models_registry.py
Maps model specifications to implementation classes:
REGISTERED_MODELS: Dict[
Tuple[ModelArchitecture, TaskType, BackendType],
Union[LazyClass, RegistryEntry]
] = {
("yolov8", "object-detection", BackendType.ONNX): LazyClass(
module_name="inference_models.models.yolov8.yolov8_object_detection_onnx",
class_name="YOLOv8ForObjectDetectionOnnx",
),
("yolov8", "object-detection", BackendType.TRT): LazyClass(
module_name="inference_models.models.yolov8.yolov8_object_detection_trt",
class_name="YOLOv8ForObjectDetectionTRT",
),
# ... hundreds more entries
}
LazyClass: Defers imports until needed, reducing startup time.
3. Weights Providers¶
Location: inference_models/weights_providers/
Retrieve model metadata and download URLs.
Roboflow Provider¶
def get_roboflow_model(model_id: str, api_key: Optional[str]) -> ModelMetadata:
"""Fetch model metadata from Roboflow API"""
# 1. Call API endpoint
# 2. Parse response
# 3. Validate model packages
# 4. Return structured metadata
API Endpoint: https://api.roboflow.com/models/v1/external/weights
Response Structure:
{
"modelMetadata": {
"modelId": "yolov8n-640",
"modelArchitecture": "yolov8",
"taskType": "object-detection",
"modelPackages": [
{
"type": "external-model-package-v1",
"packageId": "yolov8n-640-onnx-fp32",
"packageManifest": {...},
"packageFiles": [...]
}
]
}
}
4. Backend Selection¶
Location: inference_models/models/auto_loaders/core.py
The backend selection algorithm:
def select_best_backend(
available_packages: List[ModelPackageMetadata],
installed_backends: Set[BackendType],
hardware_info: HardwareInfo,
user_preference: Optional[BackendType]
) -> ModelPackageMetadata:
"""
1. Filter packages by installed backends
2. Filter by hardware compatibility
3. Apply user preference if specified
4. Rank by default preference order
5. Return best match or raise error
"""
Default Preference Order:
-
TensorRT (only for relevant GPUs)
-
ONNX
-
PyTorch
-
Hugging Face
-
MediaPipe
-
others
5. Model Base Classes¶
Location: inference_models/models/base/
Abstract base classes defining model interfaces:
class ObjectDetectionModel(ABC):
@classmethod
@abstractmethod
def from_pretrained(cls, model_name_or_path: str, **kwargs):
"""Load model from path"""
@abstractmethod
def pre_process(self, images, **kwargs):
"""Preprocess inputs"""
@abstractmethod
def forward(self, pre_processed_images, **kwargs):
"""Run model inference"""
@abstractmethod
def post_process(self, model_results, **kwargs):
"""Post-process outputs"""
def __call__(self, images, **kwargs):
"""Convenience method for inference"""
return self.infer(images, **kwargs)
All models follow this pattern:
1. pre_process: Resize, normalize, convert to tensor
2. forward: Run backend-specific inference
3. post_process: Parse outputs, apply NMS, convert to standard format
6. Caching System¶
Auto-Resolution Cache¶
Caches backend selection decisions:
class AutoResolutionCache:
def __init__(self):
self._cache: Dict[str, Tuple[ModelClass, str]] = {}
def get(self, cache_key: str):
"""Retrieve cached resolution"""
def set(self, cache_key: str, model_class, package_dir):
"""Store resolution result"""
Cache Key: Hash of (model_id, backend_preferences, environment)
Model Package Cache¶
Downloaded files are stored at:
~/.cache/inference-models/
├── {model_id}/
│ ├── {package_id}/
│ │ ├── model.{pt,onnx,engine}
│ │ ├── class_names.txt
│ │ └── model_config.json
7. Download System¶
Location: inference_models/utils/download.py
Features: - Parallel downloads - Retry with exponential backoff - MD5 hash verification - Atomic file operations (download to temp, then rename) - File locking for concurrent access
def download_files_to_directory(
files: List[FileDownloadSpecs],
target_directory: str,
verify_hash: bool = True,
on_file_created: Optional[Callable] = None
):
"""Download files with verification and callbacks"""
Data Flow¶
Model Loading Flow¶
User calls AutoModel.from_pretrained("yolov8n-640")
↓
Check auto-resolution cache
↓ (cache miss)
Retrieve model metadata from Roboflow API
↓
Parse metadata → ModelMetadata object
↓
Filter packages by installed backends
↓
Rank packages by preference
↓
Select best package → ModelPackageMetadata
↓
Check local cache for package files
↓ (cache miss)
Download package files to cache
↓
Verify file hashes
↓
Resolve model class from registry
↓
Instantiate model class
↓
Load weights into model
↓
Return model instance
Inference Flow¶
User calls model(images)
↓
model.infer(images)
↓
pre_processed, metadata = model.pre_process(images)
↓
raw_predictions = model.forward(pre_processed)
↓
predictions = model.post_process(raw_predictions, metadata)
↓
Return predictions
Extension Points¶
Adding a New Model¶
- Implement model class inheriting from appropriate base
- Register in model registry
- Add to weights provider (if pre-trained)
- Write tests
- Document
See Adding Models for details.
Adding a New Backend¶
- Define backend type in
BackendTypeenum - Implement model classes for the backend
- Add backend detection logic
- Update package parsing in weights provider
- Add to extras in
pyproject.toml
Adding a New Weights Provider¶
from inference_models.weights_providers.core import WEIGHTS_PROVIDERS
def my_provider(model_id: str, api_key: Optional[str]) -> ModelMetadata:
# Implement retrieval logic
return ModelMetadata(...)
WEIGHTS_PROVIDERS["my-provider"] = my_provider
Next Steps¶
- Multi-Backend System - Backend implementation details
- Dependency Management - Managing extras and conflicts
- Adding Models - Step-by-step guide
- Testing - Testing strategies