plaid.storage.reader

PLAID storage reader module.

This module provides high-level functions for loading PLAID datasets from local disk or Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets, and Zarr, providing a unified interface for data access and conversion.

Key features: - Unified interface for loading datasets across different backends - Local disk and streaming Hub access - Automatic backend detection and converter creation - Sample conversion between storage formats and PLAID objects

Classes

Converter

Converter class for transforming samples between storage and PLAID formats.

Functions

init_from_disk(→ tuple[dict[str, Any], dict[str, ...)

Initialize dataset and converters from local disk.

download_from_hub(repo_id, local_dir[, split_ids, ...])

Download a PLAID dataset from Hugging Face Hub to local disk.

init_streaming_from_hub(→ tuple[dict[str, Any], ...)

Initialize streaming datasets from Hugging Face Hub.

Module Contents

class Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_features: Any, constant_features: Any, num_samples: Any)[source]

Converter class for transforming samples between storage and PLAID formats.

This class provides methods to convert samples between backend-specific storage formats and PLAID Sample objects. It handles the schema transformations and metadata required for proper data conversion.

Initialize a Converter.

Parameters:
  • backend – The storage backend (‘hf_datasets’, ‘zarr’, or ‘cgns’).

  • flat_cst – Flattened constants for the dataset.

  • cgns_types – CGNS type information.

  • variable_features – Set of variable feature names.

  • constant_features – Set of constant feature names.

  • num_samples – Mapping providing the number of samples for each split.

backend[source]
backend_spec[source]
flat_cst[source]
cgns_types[source]
variable_features[source]
constant_features[source]
num_samples[source]
to_dict(dataset: Any, idx: int, features: list[str] | None = None) dict[float, dict[str, Any]][source]

Convert a dataset sample to dictionary format.

Parameters:
  • dataset – The dataset object containing the sample.

  • idx – Index of the sample to convert.

  • features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included.

Returns:

Sample data in dictionary format.

Return type:

dict

Raises:

ValueError – If called with CGNS backend.

to_plaid(dataset: Any, idx: int, features: list[str] | None = None) plaid.Sample[source]

Convert a dataset sample to PLAID Sample object.

Parameters:
  • dataset – The dataset object containing the sample.

  • idx – Index of the sample to convert.

  • features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included. Features are retreated based on self.constant_features and self.variable_features to satisfy the CGNS conventions.

Returns:

A PLAID Sample object.

Return type:

Sample

sample_to_dict(sample: plaid.Sample) dict[float, dict[str, Any]][source]

Convert a PLAID Sample to dictionary format.

Parameters:

sample – The PLAID Sample object to convert.

Returns:

Sample data in dictionary format.

Return type:

dict

Raises:

ValueError – If called with CGNS backend.

sample_to_plaid(sample: plaid.Sample) plaid.Sample[source]

Convert a sample to PLAID format (identity function for most backends).

Parameters:

sample – The sample object to convert.

Returns:

A PLAID Sample object.

Return type:

Sample

plaid_to_dict(plaid_sample: plaid.Sample) dict[str, Any][source]

Convert a PLAID Sample to dictionary format for storage.

Parameters:

plaid_sample – The PLAID Sample object to convert.

Returns:

Sample data in dictionary format suitable for storage.

Return type:

dict

init_from_disk(local_dir: pathlib.Path | str, splits: list[str] | None = None) tuple[dict[str, Any], dict[str, Converter]][source]

Initialize dataset and converters from local disk.

This function loads a previously saved PLAID dataset from local disk, automatically detecting the backend and creating appropriate converters for sample transformation.

Parameters:
  • local_dir – Path to the local directory containing the saved dataset.

  • splits – Optional list of split names to load converters for. If None, converters are created for all splits present in the dataset.

Returns:

A tuple containing (datasetdict, converterdict) where datasetdict maps

split names to dataset objects and converterdict maps split names to Converter objects.

Return type:

tuple

download_from_hub(repo_id: str, local_dir: str | pathlib.Path, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None, overwrite: bool = False)[source]

Download a PLAID dataset from Hugging Face Hub to local disk.

This function downloads a dataset from Hugging Face Hub, including data, metadata, infos, and problem definitions, saving everything to local disk.

Parameters:
  • repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).

  • local_dir – Local directory path where the dataset will be downloaded.

  • split_ids – Optional dictionary mapping split names to iterables of sample IDs to download.

  • features – Optional list of features to download.

  • overwrite – If True, overwrites existing local directory.

init_streaming_from_hub(repo_id: str, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None) tuple[dict[str, Any], dict[str, Converter]][source]

Initialize streaming datasets from Hugging Face Hub.

This function creates streaming dataset objects from a Hugging Face Hub repository, along with converters for sample transformation.

Parameters:
  • repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).

  • split_ids – Optional dictionary mapping split names to iterables of sample IDs to stream.

  • features – Optional list of features to stream.

Returns:

A tuple containing (datasetdict, converterdict) where datasetdict maps

split names to streaming dataset objects and converterdict maps split names to Converter objects.

Return type:

tuple