grain.DataLoader

grain.DataLoader#

class grain.DataLoader(*, data_source, sampler, operations=(), worker_count=0, worker_buffer_size=1, shard_options=None, read_options=None, enable_profiling=False)#

DataLoader loads and transforms input data.

Parameters:

data_source (RandomAccessDataSource)
sampler (Sampler)
operations (Sequence[transforms.Transformation | Operation])
worker_count (Optional[int])
worker_buffer_size (int)
shard_options (sharding.ShardOptions | None)
read_options (options.ReadOptions | None)
enable_profiling (bool)

__init__(*, data_source, sampler, operations=(), worker_count=0, worker_buffer_size=1, shard_options=None, read_options=None, enable_profiling=False)#

Loads and transforms input data.

Parameters:

data_source (RandomAccessDataSource) – Responsible for retrieving individual records based on their indices.
sampler (Sampler) – Sampler is responsible for providing the index of the next record to read and transform.
operations (Sequence[Batch | MapTransform | RandomMapTransform | TfRandomMapTransform | Filter | FlatMapTransform | MapWithIndex | Operation]) – Sequence of operations (e.g. Map, Filter) applied to the data.
worker_count (int | None) – Number of child processes launched to parallelize the transformations among. Zero means processing runs in the same process. None lets the python backend choose the value.
worker_buffer_size (int) – Count of output batches to produce in advance per worker. This ensures batches are ready when the consumer requests them.
shard_options (ShardOptions | None) – Options for how data should be sharded when using multiple machines (~ JAX processes) and data parallelism.
read_options (ReadOptions | None) – Options to use for reading. See ReadOptions.
enable_profiling (bool) – If True, profiling info is logged. Note, it only supports worker_count >= 1 at the moment.

Methods

__init__(*, data_source, sampler[, ...])

Loads and transforms input data.

Attributes

multiprocessing_options

grain.DataLoader

Contents

grain.DataLoader#