grain.experimental.ParquetIterDataset

grain.experimental.ParquetIterDataset#

class grain.experimental.ParquetIterDataset(path, **read_kwargs)#

An IterDataset for a parquet format file.

Parameters:

path (str)

__init__(path, **read_kwargs)#

Initializes ParquetIterDataset.

Parameters:
  • path (str) – A path to a parquet format file.

  • **read_kwargs – Keyword arguments to pass to pyarrow.parquet.ParquetFile.

Methods

__init__(path, **read_kwargs)

Initializes ParquetIterDataset.

apply(transformations)

Returns a dataset with the given transformation(s) applied.

batch(batch_size, *[, drop_remainder, batch_fn])

Returns a dataset of elements batched along a new first dimension.

filter(transform)

Returns a dataset containing only the elements that match the filter.

map(transform)

Returns a dataset containing the elements transformed by transform.

map_with_index(transform)

Returns a dataset of the elements transformed by the transform.

mp_prefetch([options, worker_init_fn])

Returns a dataset prefetching elements in multiple processes.

pipe(func, /, *args, **kwargs)

Syntactic sugar for applying a callable to this dataset.

prefetch(multiprocessing_options)

Deprecated, use mp_prefetch instead.

random_map(transform, *[, seed])

Returns a dataset containing the elements transformed by transform.

seed(seed)

Returns a dataset that uses the seed for default seed generation.

Attributes

parents