Changelog¶
0.28.0¶
Removed (BREAKING)¶
- MediaPipe is no longer supported. The
mediapipeextra and every symbol coupled to it have been removed. Consumers comparing againstBackendType.MEDIAPIPEwill hitAttributeError. Roboflow Universe payloads of typemediapipe-model-package-v1are now silently filtered byMODEL_PACKAGE_PARSERS.get(...). Removed symbols: inference_models.models.mediapipe_face_detection.MediaPipeFaceDetectorinference_models.model_pipelines.face_and_gaze_detection.FaceAndGazeDetectionMPAndL2CSBackendType.MEDIAPIPEmediapipe_package_matches_runtime_environmentand its entry inMODEL_TO_RUNTIME_COMPATIBILITY_MATCHERS- Models registry entry for
("mediapipe-face-detector", KEYPOINT_DETECTION_TASK, BackendType.MEDIAPIPE) BACKEND_PRIORITY[BackendType.MEDIAPIPE]- Pipelines registry's
face-and-gaze-detectionentry +mediapipe/face-detectordefault parameter MediapipeModelPackageV1,parse_mediapipe_model_package, and the"mediapipe-model-package-v1"entry inMODEL_PACKAGE_PARSERSRuntimeXRayResult.mediapipe_availableandis_mediapipe_available()INFERENCE_MODELS_MEDIAPIPE_FACE_DETECTOR_DEFAULT_CONFIDENCE- The
[project.optional-dependencies] mediapipeextra inpyproject.toml
The standalone L2CSNetOnnx (under inference_models.models.l2cs) is
unaffected and remains supported.
Fixed¶
- RFDetr pre- and post-processing aligned with training transforms. Pre-processing replaced with a dedicated
PIL → F.resize → F.to_tensor → F.normalizechain matching the training pipeline. For model packages with non-stretchdataset_version_resize_dimensions, the dataset-version resize (cv2 letterbox / center-crop) runs first, then the PIL stretch totraining_input_size. Post-processing uses topk-flat across (queries × classes) via sharedselect_topk_predictions. Fixes a cross-backend divergence at low confidence thresholds. - Fixed a bug where 'best' and 'default' confidence modes were not correctly handled by
RoboflowInstantHFmodels.
0.27.2¶
Fixed¶
- Temporarily disabled flash-attention in GLM-OCR for Jetsons, due to incompatibility detected before release.
0.27.1¶
Added¶
- Improved logging for auto-negotiation of model packages.
0.27.0¶
Added¶
- COCO RLE masks format for all instance segmentation predictions and all models
supported in the library were patched.
InstanceDetectionsmask can now beInstancesRLEMasksobject which follows the structure ofpycocotoolsmasks (providing memory-efficient alternative for dense representation). Clients who want to use the format, should passmask_format="rle"to**kwargsof model forward pass.
Changed¶
-
The change with RLE masks format yielded change to base interface of Instance Segmentation models - new abstract property
supported_mask_formatswas added, which is a breaking change for local-code instance-segmentation models. We are not aware of anyone using the library in such mode currently, due to the maturity of the library, so we are introducing this change, such that the interface does not implicitly enforce supported format. -
Representation of
InstanceDetectionschanged - newInstancesRLEMasksformat is now an alternative fortorch.Tensorused fordensemask representation. This change is considered non-breaking, as alternative representation must be requested by the caller.
0.26.1¶
Changed¶
- For Roboflow weights provider, Roboflow License Server proxy transitioned into
Roboflow Secure Gateway, altering naming conventions of all helper functions which are
considered private interface of weights provider (hence should not be considered breaking
for any clients). Along with this change,
LICENSE_SERVERenvironmental variable controlling the proxy address was replaced to beSECURE_GATEWAY- old variable will be deleted in the release following after the end of Q3 2026.
0.26.0¶
Added¶
- Bringing back changes to filtering proposed in retracted release
0.25.0along with fixes for bugs which caused retraction.
0.25.2¶
Fixed¶
- OWLv2 compilation procedure clash with
transformers~=5.5brought to dependencies along with0.25.1release and Gemma 4.
0.25.1¶
Added¶
- Documentation for Gemma 4 multimodal models (
Gemma4HF/gemma4_hf.py): dedicated model page, catalog and site navigation updates, home page pointer, and environment variables forINFERENCE_MODELS_GEMMA4_*defaults.
0.25.0 (retracted)¶
Added¶
post_process(...)on object detection, instance segmentation, keypoint detection, classification, and semantic segmentation models now acceptsconfidenceas"best"(use per-class or global thresholds fromRecommendedParameterswhen available),"default"(model's built-in default), or a float override. Shared NMS helpers accept a per-classtorch.Tensorfor single-pass per-class filtering.
0.24.4¶
Changed¶
- Behavior of Roboflow weights provider was changed - instead of throwing error each time any known model package is fetched with manifest not passing validation - it warns about this fact and skips the package. This change is dictated by potential negative impact on stability which malformed manifests could have, in the face of broader change on Roboflow platform making it possible tp externally register packages - sanitization and validation is enabled on registry API side, but we introduce defensive change here to prevent potential instability.
Added¶
- RF-DETR NAS capabilities for Instance Segmentation
0.24.3¶
Changed¶
- Added
sigmoidsmoothing for instance-segmentation masks in YOLOv8, YOLOv11, YOLOv12 models family. Smoothing can be enabled / disabled viamasks_smoothing_enabledparameter ofpost_process(...)method (which can be passed as**kwargtoforward(...)) with default set withINFERENCE_MODELS_YOLO_ULTRALYTICS_DEFAULT_MASKS_SMOOTHING_ENABLED(set toTrue). Additionally, the binarization threshold for masks can be controlled viamasks_binarization_thresholdparameter - default to be controlled withINFERENCE_MODELS_YOLO_ULTRALYTICS_DEFAULT_MASKS_BINARIZATION_THRESHOLD(set to0.5or0.0depending onINFERENCE_MODELS_YOLO_ULTRALYTICS_DEFAULT_MASKS_SMOOTHING_ENABLED).
Instance-segmentation masks will change
Due to smoothing, there is slight change to segmentation masks expected - mainly regarding edges
of predictions which should be smoother now. Change is dictated by alignment to old inference versions
behaviour, effectively drifting from ultralytics post-processing.
0.24.2¶
Fixed¶
- Issue with
INFERENCE_HOMEderived paths issues when running on Windows (lack/tmp/cachedereference to Windows path).
0.24.1¶
Changed¶
- Added optional field
alternatives_errorstoModelPackageAlternativesExhaustedError, making it possible to report to the caller what types of errors happened during the load - making it possible to deduce if problem with loading is recoverable.
0.24.0¶
Added¶
- Support for Roboflow License Server proxy in Roboflow weights provider
0.23.0¶
Added¶
- Support for CUDA 13.0 on x86 architecture - as a result of
torch 2.11release which makes CUDA 13.0 default version
0.22.1¶
Added¶
-
Ability to restrict maximum input resolution for models
-
Restriction of input resolution for RF-DETR - providing ability for caller to avoid OOM when loading models with large input resolutions
-
New type of error
ModelPackageRestrictedError- to manifest restrictions of runtime environment with package
0.22.0¶
Added¶
- GLM-OCR model added to models zoo
0.21.1¶
Fixed¶
- Lack of model package features denoted in auto-negotiation cache entries was causing errors while re-initialization
of models which had
required_featuresdenoted in model registry.
0.21.0¶
Added¶
- Support for CUDA Graphs in TRT backend - all TRT models got upgraded - added ability to run with CUDA graphs, at the expense of additional VRAM allocation, but with caller control on how many execution contexts for different input shapes should be allowed.
0.20.2¶
Added¶
- Ability to override certain aspects of model pre-processing (like center-crop, contrast enhancement or grayscale which may be performed by caller).
0.20.1¶
Fixed¶
AnyModeltyping regarding semantic segmentation model
0.20.0¶
Added¶
-
Support for
transformers>=5 -
Model registry feature allowing to treat specific model features as required during auto-negotiation
0.19.4¶
Fixed¶
- CUDA stream synchronization issues in TRT models.
0.19.3¶
Fixed¶
- Post-processing for RF-DETR segmentation model - missing remapping for class ids regarding masks.
0.19.2¶
Fixed¶
- Changed the default ranking for model packages in
AutoLoader- ONNX to be preferred over Torch.
0.19.1¶
Fixed¶
- Fixed issue with RF-DETR model post-processing causing all results to be empty (TRT implementation)
0.19.0¶
First stable release of inference-models library.
Added¶
- Locks for thread safety of torch models
Maintenance¶
- Established documentation hosting
- Provided documentation links to error messages
- Fixed bugs spotted during tests
0.18.5 and earlier versions¶
Added¶
- Initial releases of
inference-modelslibrary - Support for 50+ computer vision models
- Multi-backend support (ONNX, PyTorch, TensorRT)
- AutoModel API for automatic model loading
- AutoModelPipeline for multi-model workflows
- Comprehensive model package system
- Automatic backend negotiation
- Model caching and optimization
- Support for object detection, instance segmentation, classification, OCR, keypoint detection, and more
- Vision-language models (Florence-2, PaliGemma, Qwen2.5-VL, etc.)
- Interactive segmentation (SAM, SAM2)
- Depth estimation models
- Gaze detection
- Open-vocabulary object detection
- Embeddings models (CLIP, Perception Encoder)
Documentation¶
- Complete API reference documentation
- Getting started guides
- Model-specific documentation for all supported models
- How-to guides for common tasks
- Contributors guide
- Error reference documentation
Backends¶
- ONNX Runtime support (CPU and GPU)
- PyTorch support (CPU, CUDA, MPS)
- TensorRT support for NVIDIA GPUs
- Automatic backend selection based on hardware
Features¶
- Automatic model package negotiation
- Multi-device support (CPU, CUDA, MPS)
- Batch processing support
- Quantization support (FP32, FP16, INT8)
- Model dependency resolution
- Custom weights provider support
- Local model loading
- Docker support with pre-built images