plaid.storage.reader¶

PLAID storage reader module.

This module provides high-level functions for loading PLAID datasets from local disk or Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets, and Zarr, providing a unified interface for data access and conversion.

Key features: - Unified interface for loading datasets across different backends - Local disk and streaming Hub access - Automatic backend detection and converter creation - Sample conversion between storage formats and PLAID objects

Classes¶

Converter

Converter class for transforming samples between storage and PLAID formats.

Functions¶

`init_from_disk`(→ tuple[dict[str, Any], dict[str, ...)	Initialize dataset and converters from local disk.
`download_from_hub`(repo_id, local_dir[, split_ids, ...])	Download a PLAID dataset from Hugging Face Hub to local disk.
`init_streaming_from_hub`(→ tuple[dict[str, Any], ...)	Initialize streaming datasets from Hugging Face Hub.

Module Contents¶

class Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_features: Any, constant_features: Any, num_samples: Any)[source]¶

Converter class for transforming samples between storage and PLAID formats.

This class provides methods to convert samples between backend-specific storage formats and PLAID Sample objects. It handles the schema transformations and metadata required for proper data conversion.

Initialize a Converter.

Parameters:

backend – The storage backend (‘hf_datasets’, ‘zarr’, or ‘cgns’).
flat_cst – Flattened constants for the dataset.
cgns_types – CGNS type information.
variable_features – Set of variable feature names.
constant_features – Set of constant feature names.
num_samples – Mapping providing the number of samples for each split.

backend[source]¶

backend_spec[source]¶

flat_cst[source]¶

cgns_types[source]¶

variable_features[source]¶

constant_features[source]¶

num_samples[source]¶

to_dict(dataset: Any, idx: int, features: list[str] | None = None) → dict[float, dict[str, Any]][source]¶

Convert a dataset sample to dictionary format.

Parameters:

dataset – The dataset object containing the sample.
idx – Index of the sample to convert.
features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included.

Returns:

Sample data in dictionary format.

Return type:

dict

Raises:

ValueError – If called with CGNS backend.

to_plaid(dataset: Any, idx: int, features: list[str] | None = None) → plaid.Sample[source]¶

Convert a dataset sample to PLAID Sample object.

Parameters:

dataset – The dataset object containing the sample.
idx – Index of the sample to convert.
features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included. Features are retreated based on self.constant_features and self.variable_features to satisfy the CGNS conventions.

Returns:

A PLAID Sample object.

Return type:

Sample

sample_to_dict(sample: plaid.Sample) → dict[float, dict[str, Any]][source]¶

Convert a PLAID Sample to dictionary format.

Parameters:: sample – The PLAID Sample object to convert.
Returns:: Sample data in dictionary format.
Return type:: dict
Raises:: ValueError – If called with CGNS backend.

sample_to_plaid(sample: plaid.Sample) → plaid.Sample[source]¶

Convert a sample to PLAID format (identity function for most backends).

Parameters:: sample – The sample object to convert.
Returns:: A PLAID Sample object.
Return type:: Sample

plaid_to_dict(plaid_sample: plaid.Sample) → dict[str, Any][source]¶

Convert a PLAID Sample to dictionary format for storage.

Parameters:: plaid_sample – The PLAID Sample object to convert.
Returns:: Sample data in dictionary format suitable for storage.
Return type:: dict

init_from_disk(local_dir: pathlib.Path | str, splits: list[str] | None = None) → tuple[dict[str, Any], dict[str, Converter]][source]¶

Initialize dataset and converters from local disk.

This function loads a previously saved PLAID dataset from local disk, automatically detecting the backend and creating appropriate converters for sample transformation.

Parameters:

local_dir – Path to the local directory containing the saved dataset.
splits – Optional list of split names to load converters for. If None, converters are created for all splits present in the dataset.

Returns:

A tuple containing (datasetdict, converterdict) where datasetdict maps: split names to dataset objects and converterdict maps split names to Converter objects.

Return type:

tuple

download_from_hub(repo_id: str, local_dir: str | pathlib.Path, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None, overwrite: bool = False)[source]¶

Download a PLAID dataset from Hugging Face Hub to local disk.

This function downloads a dataset from Hugging Face Hub, including data, metadata, infos, and problem definitions, saving everything to local disk.

Parameters:

repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).
local_dir – Local directory path where the dataset will be downloaded.
split_ids – Optional dictionary mapping split names to iterables of sample IDs to download.
features – Optional list of features to download.
overwrite – If True, overwrites existing local directory.

init_streaming_from_hub(repo_id: str, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None) → tuple[dict[str, Any], dict[str, Converter]][source]¶

Initialize streaming datasets from Hugging Face Hub.

This function creates streaming dataset objects from a Hugging Face Hub repository, along with converters for sample transformation.

Parameters:

repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).
split_ids – Optional dictionary mapping split names to iterables of sample IDs to stream.
features – Optional list of features to stream.

Returns:

A tuple containing (datasetdict, converterdict) where datasetdict maps: split names to streaming dataset objects and converterdict maps split names to Converter objects.

Return type:

tuple