download_files_to_directory¶
inference_models.developer_tools.download_files_to_directory
¶
download_files_to_directory(target_dir, files_specs, verbose=True, response_codes_to_retry=None, request_timeout=None, max_parallel_downloads=8, max_threads_per_download=8, file_lock_acquire_timeout=FILE_LOCK_ACQUIRE_TIMEOUT, verify_hash_while_download=True, download_files_without_hash=False, name_after='file_handle', on_file_created=None, on_file_renamed=None)
Download multiple files to a directory with parallel downloads and hash verification.
Downloads files from URLs to a target directory with support for parallel downloads, automatic retries, hash verification, and progress tracking. Skips files that already exist in the target directory.
Parameters:
-
(target_dir¶str) –Absolute path to the directory where files should be downloaded. Will be created if it doesn't exist.
-
(files_specs¶List[Tuple[FileHandle, DownloadUrl, MD5Hash]]) –List of tuples, each containing: - file_handle (str): Logical name for the file (used as filename by default) - download_url (str): URL to download the file from - md5_hash (Optional[str]): Expected MD5 hash for verification (None if unknown)
-
(verbose¶bool, default:True) –Show progress bars during download. Default: True.
-
(response_codes_to_retry¶Optional[Set[int]], default:None) –HTTP status codes that should trigger a retry. Default: Uses library defaults (typically 429, 500, 502, 503, 504).
-
(request_timeout¶Optional[int], default:None) –Timeout in seconds for HTTP requests. Default: Uses library default.
-
(max_parallel_downloads¶int, default:8) –Maximum number of files to download simultaneously. Default: 8.
-
(max_threads_per_download¶int, default:8) –Maximum number of threads to use for downloading a single large file. Default: 8.
-
(file_lock_acquire_timeout¶int, default:FILE_LOCK_ACQUIRE_TIMEOUT) –Timeout in seconds for acquiring file locks during concurrent downloads. Default: 10.
-
(verify_hash_while_download¶bool, default:True) –Verify MD5 hash during download. Default: True.
-
(download_files_without_hash¶bool, default:False) –Allow downloading files without MD5 hashes. Security risk. Default: False.
-
(name_after¶Literal['file_handle', 'md5_hash'], default:'file_handle') –How to name downloaded files. Options: - "file_handle": Use the file_handle from files_specs - "md5_hash": Use the MD5 hash as filename Default: "file_handle".
-
(on_file_created¶Optional[Callable[[str], None]], default:None) –Optional callback called when a file is created. Receives the file path as argument.
-
(on_file_renamed¶Optional[Callable[[str, str], None]], default:None) –Optional callback called when a file is renamed. Receives old and new paths as arguments.
Returns:
-
Dict[str, str]–Dictionary mapping file handles to their absolute paths in the target directory.
Raises:
-
UntrustedFileError–If
download_files_without_hash=Falseand files without hashes are encountered. -
FileHashSumMissmatch–If downloaded file's hash doesn't match expected hash.
-
RetryError–If download fails after all retry attempts.
-
InvalidParameterError–If
name_afterhas an invalid value.
Examples:
Download model files:
>>> from inference_models.developer_tools import download_files_to_directory
>>>
>>> files_to_download = [
... ("model.onnx", "https://example.com/model.onnx", "abc123..."),
... ("config.json", "https://example.com/config.json", "def456..."),
... ]
>>>
>>> file_paths = download_files_to_directory(
... target_dir="/path/to/cache",
... files_specs=files_to_download,
... verbose=True
... )
>>>
>>> print(file_paths["model.onnx"]) # /path/to/cache/model.onnx
>>> print(file_paths["config.json"]) # /path/to/cache/config.json
Download without hash verification (not recommended):
>>> files_to_download = [
... ("weights.pt", "https://example.com/weights.pt", None),
... ]
>>>
>>> file_paths = download_files_to_directory(
... target_dir="/path/to/cache",
... files_specs=files_to_download,
... download_files_without_hash=True, # Allow files without hashes
... verify_hash_while_download=False
... )
Note
- Files are downloaded in parallel for better performance
- Large files (>32MB) are downloaded using multiple threads
- Existing files are skipped automatically
- Progress bars are disabled if DISABLE_INTERACTIVE_PROGRESS_BARS env var is set