Skip to content

generate_batch_chunks

inference_models.models.common.torch.generate_batch_chunks

generate_batch_chunks(input_batch, chunk_size)

Generate fixed-size chunks from a batch tensor with automatic padding.

Splits a batch tensor into fixed-size chunks along the batch dimension (dim 0). If the last chunk is smaller than chunk_size, it is automatically padded with zeros to maintain consistent chunk sizes.

This is useful for processing large batches through models with fixed batch size requirements or to avoid GPU memory issues.

Parameters:

  • input_batch

    (Tensor) –

    Input tensor with batch dimension as the first dimension. Shape: (batch_size, ...).

  • chunk_size

    (int) –

    Size of each chunk. All chunks will have this size in the batch dimension, with the last chunk padded if necessary.

Yields:

  • Tuple[Tensor, int]

    Tuples of (chunk, padding_size) where: - chunk: Tensor of shape (chunk_size, ...) containing the batch chunk - padding_size: Number of padding elements added (0 for full chunks)

Examples:

Process large batch in chunks:

>>> from inference_models.developer_tools import generate_batch_chunks
>>> import torch
>>>
>>> # Large batch of images
>>> batch = torch.randn(100, 3, 640, 640)
>>>
>>> results = []
>>> for chunk, padding in generate_batch_chunks(batch, chunk_size=16):
...     # Process chunk through model
...     output = model(chunk)
...
...     # Remove padding from results
...     if padding > 0:
...         output = output[:-padding]
...
...     results.append(output)
>>>
>>> # Concatenate all results
>>> final_output = torch.cat(results, dim=0)

Handle models with static batch size:

>>> # Model requires exactly batch size of 8
>>> batch = torch.randn(20, 3, 224, 224, device="cuda")
>>>
>>> for chunk, padding in generate_batch_chunks(batch, chunk_size=8):
...     # chunk.shape[0] is always 8 (padded if needed)
...     output = model(chunk)
...
...     # Last chunk has padding=4, so remove it
...     if padding > 0:
...         output = output[:-padding]
Note
  • Chunks are created as views when possible (no padding needed)
  • Padding is added with zeros matching the input dtype and device
  • The last chunk is always padded to chunk_size if needed
  • Padding size is 0 for all chunks except potentially the last one
See Also
  • run_onnx_session_with_batch_size_limit(): Uses this internally for ONNX models