Writing Tests¶
This guide covers best practices for writing tests in inference-models.
Test Types¶
The inference_models library uses three types of tests, each serving a different purpose:
1. Integration Tests¶
Location: inference_models/tests/integration_tests/models/
Integration tests verify that your model produces correct predictions with actual model packages. They:
- Load models using
from_pretrained()with real model packages - Run inference on test images
- Assert that predictions match expected outputs (bounding boxes, confidence scores, class IDs, etc.)
- Test different input formats (numpy arrays, torch tensors, single images, batches)
- Test model-specific features (custom image sizes, preprocessing options, etc.)
Example: test_yolov8_object_detection_predictions_onnx.py tests YOLOv8 ONNX models with various configurations.
When to add: Always add integration tests when implementing a new model. These are the most important tests.
2. E2E Platform Tests¶
Location: inference_models/tests/e2e_platform_tests/
E2E (end-to-end) platform tests verify the full auto-loading pipeline using AutoModel.from_pretrained():
- Test the complete workflow: weights provider → package download → model instantiation
- Verify that models work correctly when loaded through the platform
- Typically simpler than integration tests, focusing on basic functionality
Example: test_moondream2_e2e.py tests loading Moondream2 via AutoModel and running basic inference.
When to add: Add an E2E test to verify your model integrates correctly with the AutoModel system.
3. Unit Tests¶
Location: inference_models/tests/unit_tests/
Unit tests verify individual components in isolation:
- Auto-loader logic (
models/auto_loaders/) - Model registry functionality
- Utilities and helper functions
- Weights provider logic
When to add: Add unit tests if you created new utilities, helper functions, or custom auto-loader logic.
Test Organization¶
Directory Structure¶
inference_models/tests/
├── integration_tests/
│ └── models/
│ ├── test_yolov8_object_detection_predictions_onnx.py
│ ├── test_yolov8_object_detection_predictions_torch.py
│ └── test_<your_model>_predictions_<backend>.py
├── e2e_platform_tests/
│ ├── test_moondream2_e2e.py
│ └── test_<your_model>_e2e.py
└── unit_tests/
├── models/
│ └── auto_loaders/
└── weights_providers/
Test File Naming¶
- Integration tests:
test_<model_name>_predictions_<backend>.py - E2E tests:
test_<model_name>_e2e.py - Unit tests:
test_<component>.py - Test functions:
test_* - Test classes:
Test*
Writing Integration Tests¶
Integration tests are the most important tests for model implementations. Here's how to write them:
Basic Integration Test Structure¶
import numpy as np
import pytest
import torch
@pytest.mark.slow
@pytest.mark.onnx_extras
def test_onnx_package_with_dynamic_batch_size_and_letterbox_numpy(
coin_counting_yolov8n_onnx_dynamic_bs_letterbox_package: str,
coins_counting_image_numpy: np.ndarray,
) -> None:
# given
from inference_models.models.yolov8.yolov8_object_detection_onnx import (
YOLOv8ForObjectDetectionOnnx,
)
model = YOLOv8ForObjectDetectionOnnx.from_pretrained(
model_name_or_path=coin_counting_yolov8n_onnx_dynamic_bs_letterbox_package,
onnx_execution_providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
# when
predictions = model(coins_counting_image_numpy)
# then
assert torch.allclose(
predictions[0].confidence.cpu(),
torch.tensor([0.9608, 0.9449, 0.9339]).cpu(),
atol=0.01,
)
assert torch.allclose(
predictions[0].class_id.cpu(),
torch.tensor([4, 1, 1], dtype=torch.int32).cpu(),
)
expected_xyxy = torch.tensor(
[[1296, 528, 3024, 1979], [1172, 2632, 1376, 2847]],
dtype=torch.int32,
)
assert torch.allclose(
predictions[0].xyxy.cpu(),
expected_xyxy.cpu(),
atol=2,
)
Key Elements of Integration Tests¶
- Use fixtures for test data - Model packages and test images should be provided via pytest fixtures
- Test multiple input formats - Test with numpy arrays, torch tensors, single images, and batches
- Assert on actual outputs - Compare predictions against known-good outputs
- Use appropriate tolerances - Use
atolfor floating-point comparisons - Mark tests appropriately - Use
@pytest.mark.slow,@pytest.mark.onnx_extras, etc.
Testing Different Input Formats¶
# Test single numpy array
def test_single_numpy(model_package, test_image_numpy):
model = YourModel.from_pretrained(model_package)
predictions = model(test_image_numpy)
assert len(predictions) == 1
# Test batch of numpy arrays
def test_batch_numpy(model_package, test_image_numpy):
model = YourModel.from_pretrained(model_package)
predictions = model([test_image_numpy, test_image_numpy])
assert len(predictions) == 2
# Test single torch tensor
def test_single_torch(model_package, test_image_torch):
model = YourModel.from_pretrained(model_package)
predictions = model(test_image_torch)
assert len(predictions) == 1
# Test stacked torch tensors
def test_batch_torch_stacked(model_package, test_image_torch):
model = YourModel.from_pretrained(model_package)
batch = torch.stack([test_image_torch, test_image_torch], dim=0)
predictions = model(batch)
assert len(predictions) == 2
# Test list of torch tensors
def test_batch_torch_list(model_package, test_image_torch):
model = YourModel.from_pretrained(model_package)
predictions = model([test_image_torch, test_image_torch])
assert len(predictions) == 2
Writing E2E Platform Tests¶
E2E tests verify the AutoModel integration:
Basic E2E Test Structure¶
import pytest
import cv2
from inference_models import AutoModel
@pytest.mark.e2e_model_inference
def test_moondream2_e2e():
"""Test Moondream2 loads and runs via AutoModel."""
# Load model through AutoModel
model = AutoModel.from_pretrained("vikhyatk/moondream2")
# Run basic inference
image = cv2.imread("path/to/test/image.jpg")
prompt = "Describe this image"
result = model(image, prompt)
# Basic assertions
assert result is not None
assert isinstance(result, str)
assert len(result) > 0
Writing Unit Tests¶
Unit tests verify individual components:
Basic Unit Test Structure¶
import pytest
import numpy as np
from inference_models.models.your_model import YourModel
def test_model_initialization():
"""Test that model can be initialized."""
model = YourModel(
weights_path="path/to/model.pt",
device="cpu",
)
assert model is not None
assert model.device == "cpu"
def test_single_image_inference():
"""Test inference on a single image."""
model = YourModel(weights_path="path/to/model.pt", device="cpu")
image = np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
predictions = model(image)
assert len(predictions) == 1
assert hasattr(predictions[0], 'xyxy')
assert hasattr(predictions[0], 'confidence')
assert hasattr(predictions[0], 'class_id')
Testing Error Handling¶
def test_missing_dependency_error():
"""Test that missing dependencies raise appropriate errors."""
from inference_models.exceptions import MissingDependencyError
with pytest.raises(MissingDependencyError):
# Code that requires missing dependency
pass
def test_invalid_input():
"""Test that invalid inputs raise ValueError."""
model = YourModel(weights_path="path/to/model.pt", device="cpu")
with pytest.raises(ValueError):
model(None) # Invalid input
Fixtures¶
Using Fixtures¶
import pytest
@pytest.fixture
def sample_image():
"""Provide a sample image for testing."""
return np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
@pytest.fixture
def model():
"""Provide a model instance for testing."""
return YourModel(weights_path="path/to/model.pt", device="cpu")
def test_with_fixtures(model, sample_image):
"""Test using fixtures."""
predictions = model(sample_image)
assert len(predictions) == 1
Fixture Scope¶
@pytest.fixture(scope="module")
def expensive_model():
"""Load model once per test module."""
model = YourModel(weights_path="path/to/model.pt", device="cpu")
yield model
# Cleanup code here if needed
Parametrized Tests¶
Test multiple scenarios efficiently:
@pytest.mark.parametrize("device", ["cpu", "cuda"])
def test_different_devices(device):
"""Test model on different devices."""
if device == "cuda" and not torch.cuda.is_available():
pytest.skip("CUDA not available")
model = YourModel(weights_path="path/to/model.pt", device=device)
assert model.device == device
@pytest.mark.parametrize("batch_size", [1, 2, 4, 8])
def test_different_batch_sizes(batch_size):
"""Test model with different batch sizes."""
model = YourModel(weights_path="path/to/model.pt", device="cpu")
images = [
np.random.randint(0, 255, (640, 640, 3), dtype=np.uint8)
for _ in range(batch_size)
]
predictions = model(images)
assert len(predictions) == batch_size
Pytest Markers¶
Use markers to categorize and control test execution:
Available Markers¶
@pytest.mark.slow- Tests that take significant time (e.g., loading large models, processing many images)@pytest.mark.gpu_only- Tests that require GPU hardware@pytest.mark.cpu_only- Tests that should only run on CPU@pytest.mark.onnx_extras- Tests for ONNX-specific functionality@pytest.mark.torch_models- Tests for PyTorch-specific functionality@pytest.mark.e2e_model_inference- End-to-end platform tests
Using Markers¶
@pytest.mark.slow
@pytest.mark.onnx_extras
def test_onnx_large_batch():
"""Test ONNX model with large batch (slow)."""
# Test code here
pass
@pytest.mark.gpu_only
def test_gpu_inference():
"""Test GPU inference."""
if not torch.cuda.is_available():
pytest.skip("CUDA not available")
# Test code here
pass
@pytest.mark.e2e_model_inference
def test_automodel_loading():
"""Test loading via AutoModel."""
# Test code here
pass
Running Tests by Marker¶
# Skip slow tests
uv run pytest -m "not slow" tests/
# Run only GPU tests
uv run pytest -m "gpu_only" tests/
# Run only ONNX tests
uv run pytest -m "onnx_extras" tests/
# Run E2E tests
uv run pytest -m "e2e_model_inference" tests/
Running Tests¶
Run All Tests¶
Run Specific Test File¶
Run Specific Test Function¶
uv run pytest inference_models/tests/integration_tests/models/test_your_model.py::test_model_initialization
Run with Coverage¶
Run in Parallel¶
Best Practices¶
- ✅ Test one thing per test - Each test should verify a single behavior
- ✅ Use descriptive names - Test names should explain what they test
- ✅ Keep tests independent - Tests should not depend on each other
- ✅ Use fixtures for setup - Avoid duplicating setup code
- ✅ Test edge cases - Empty inputs, large batches, invalid data
- ✅ Mark slow tests - Allow fast test runs during development
- ✅ Clean up resources - Use fixtures with cleanup or context managers
- ❌ Don't test implementation details - Test behavior, not internals
- ❌ Don't skip tests without reason - Fix or remove broken tests
- ❌ Don't use real API calls - Mock external dependencies
Next Steps¶
- Development Environment - Set up your test environment
- Adding a Model - Add tests for new models
- Core Architecture - Understand what to test