grain.DataLoader

grain.DataLoader#

class grain.DataLoader(*, data_source, sampler, operations=(), worker_count=0, worker_buffer_size=1, shard_options=None, read_options=None, enable_profiling=False)#

DataLoader loads and transforms input data.

Parameters:
  • data_source (RandomAccessDataSource)

  • sampler (Sampler)

  • operations (Sequence[transforms.Transformation | Operation])

  • worker_count (Optional[int])

  • worker_buffer_size (int)

  • shard_options (sharding.ShardOptions | None)

  • read_options (options.ReadOptions | None)

  • enable_profiling (bool)

__init__(*, data_source, sampler, operations=(), worker_count=0, worker_buffer_size=1, shard_options=None, read_options=None, enable_profiling=False)#

Loads and transforms input data.

Parameters:
  • data_source (RandomAccessDataSource) – Responsible for retrieving individual records based on their indices.

  • sampler (Sampler) – Sampler is responsible for providing the index of the next record to read and transform.

  • operations (Sequence[Batch | MapTransform | RandomMapTransform | TfRandomMapTransform | Filter | FlatMapTransform | MapWithIndex | Operation]) – Sequence of operations (e.g. Map, Filter) applied to the data.

  • worker_count (int | None) – Number of child processes launched to parallelize the transformations among. Zero means processing runs in the same process. None lets the python backend choose the value.

  • worker_buffer_size (int) – Count of output batches to produce in advance per worker. This ensures batches are ready when the consumer requests them.

  • shard_options (ShardOptions | None) – Options for how data should be sharded when using multiple machines (~ JAX processes) and data parallelism.

  • read_options (ReadOptions | None) – Options to use for reading. See ReadOptions.

  • enable_profiling (bool) – If True, profiling info is logged. Note, it only supports worker_count >= 1 at the moment.

Methods

__init__(*, data_source, sampler[, ...])

Loads and transforms input data.

Attributes

multiprocessing_options