plaid.storage.reader¶
PLAID storage reader module.
This module provides high-level functions for loading PLAID datasets from local disk or Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets, and Zarr, providing a unified interface for data access and conversion.
Key features: - Unified interface for loading datasets across different backends - Local disk and streaming Hub access - Automatic backend detection and converter creation - Sample conversion between storage formats and PLAID objects
Classes¶
Converter class for transforming samples between storage and PLAID formats. |
Functions¶
|
Initialize dataset and converters from local disk. |
|
Download a PLAID dataset from Hugging Face Hub to local disk. |
|
Initialize streaming datasets from Hugging Face Hub. |
Module Contents¶
- class Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_features: Any, constant_features: Any, num_samples: Any)[source]¶
Converter class for transforming samples between storage and PLAID formats.
This class provides methods to convert samples between backend-specific storage formats and PLAID Sample objects. It handles the schema transformations and metadata required for proper data conversion.
Initialize a
Converter.- Parameters:
backend – The storage backend (‘hf_datasets’, ‘zarr’, or ‘cgns’).
flat_cst – Flattened constants for the dataset.
cgns_types – CGNS type information.
variable_features – Set of variable feature names.
constant_features – Set of constant feature names.
num_samples – Mapping providing the number of samples for each split.
- to_dict(dataset: Any, idx: int, features: list[str] | None = None) dict[float, dict[str, Any]][source]¶
Convert a dataset sample to dictionary format.
- Parameters:
dataset – The dataset object containing the sample.
idx – Index of the sample to convert.
features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included.
- Returns:
Sample data in dictionary format.
- Return type:
- Raises:
ValueError – If called with CGNS backend.
- to_plaid(dataset: Any, idx: int, features: list[str] | None = None) plaid.Sample[source]¶
Convert a dataset sample to PLAID Sample object.
- Parameters:
dataset – The dataset object containing the sample.
idx – Index of the sample to convert.
features – Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included. Features are retreated based on self.constant_features and self.variable_features to satisfy the CGNS conventions.
- Returns:
A PLAID Sample object.
- Return type:
- sample_to_dict(sample: plaid.Sample) dict[float, dict[str, Any]][source]¶
Convert a PLAID Sample to dictionary format.
- Parameters:
sample – The PLAID Sample object to convert.
- Returns:
Sample data in dictionary format.
- Return type:
- Raises:
ValueError – If called with CGNS backend.
- init_from_disk(local_dir: pathlib.Path | str, splits: list[str] | None = None) tuple[dict[str, Any], dict[str, Converter]][source]¶
Initialize dataset and converters from local disk.
This function loads a previously saved PLAID dataset from local disk, automatically detecting the backend and creating appropriate converters for sample transformation.
- Parameters:
local_dir – Path to the local directory containing the saved dataset.
splits – Optional list of split names to load converters for. If None, converters are created for all splits present in the dataset.
- Returns:
- A tuple containing (datasetdict, converterdict) where datasetdict maps
split names to dataset objects and converterdict maps split names to Converter objects.
- Return type:
- download_from_hub(repo_id: str, local_dir: str | pathlib.Path, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None, overwrite: bool = False)[source]¶
Download a PLAID dataset from Hugging Face Hub to local disk.
This function downloads a dataset from Hugging Face Hub, including data, metadata, infos, and problem definitions, saving everything to local disk.
- Parameters:
repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).
local_dir – Local directory path where the dataset will be downloaded.
split_ids – Optional dictionary mapping split names to iterables of sample IDs to download.
features – Optional list of features to download.
overwrite – If True, overwrites existing local directory.
- init_streaming_from_hub(repo_id: str, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None) tuple[dict[str, Any], dict[str, Converter]][source]¶
Initialize streaming datasets from Hugging Face Hub.
This function creates streaming dataset objects from a Hugging Face Hub repository, along with converters for sample transformation.
- Parameters:
repo_id – Hugging Face repository ID (e.g., ‘username/dataset-name’).
split_ids – Optional dictionary mapping split names to iterables of sample IDs to stream.
features – Optional list of features to stream.
- Returns:
- A tuple containing (datasetdict, converterdict) where datasetdict maps
split names to streaming dataset objects and converterdict maps split names to Converter objects.
- Return type: