plaid.storage.cgns.reader¶
CGNS dataset reader module for PLAID.
This module provides functionality for reading and streaming CGNS datasets for the PLAID library. It includes utilities for loading datasets from local disk or streaming directly from Hugging Face Hub, with support for selective loading of splits and samples.
Key features: - Local dataset loading from disk via CGNSDataset class - Streaming datasets from Hugging Face Hub - Selective loading of splits and sample IDs - Integration with PLAID Sample objects
Attributes¶
Classes¶
CGNS dataset class for local disk access. |
Functions¶
|
Generate Sample objects from a Hugging Face Hub repository. |
|
Create an iterable dataset from CGNS samples on Hugging Face Hub. |
|
Initialize a dataset dictionary from local disk. |
Download a CGNS dataset from Hugging Face Hub to local disk. |
|
|
Initialize streaming datasets from Hugging Face Hub. |
Module Contents¶
- class CGNSDataset(path: str | pathlib.Path)[source]¶
CGNS dataset class for local disk access.
This class represents a CGNS dataset stored on local disk, providing access to individual samples and associated metadata. It supports iteration over samples and attribute access to extra fields.
Initialize a
CGNSDataset.- Parameters:
path – Path to the dataset directory.
- sample_generator(repo_id: str, split: str, ids: Iterable[int]) Iterator[plaid.Sample][source]¶
Generate Sample objects from a Hugging Face Hub repository.
This function downloads individual samples from a CGNS dataset stored on Hugging Face Hub and yields PLAID Sample objects. Each sample is downloaded to a temporary directory and loaded as a Sample.
- Parameters:
repo_id – The Hugging Face repository ID (e.g., ‘username/dataset-name’).
split – The dataset split name (e.g., ‘train’, ‘test’).
ids – Iterable of sample IDs to generate.
- Yields:
Sample – A PLAID Sample object for each requested ID.
- create_CGNS_iterable_dataset(repo_id: str, split: str, ids: Iterable[int]) datasets.IterableDataset[source]¶
Create an iterable dataset from CGNS samples on Hugging Face Hub.
This function creates a Hugging Face IterableDataset that streams CGNS samples from a repository. The dataset can be used for efficient streaming access without loading all samples into memory.
- Parameters:
repo_id – The Hugging Face repository ID (e.g., ‘username/dataset-name’).
split – The dataset split name (e.g., ‘train’, ‘test’).
ids – Iterable of sample IDs to include in the dataset.
- Returns:
A Hugging Face IterableDataset for streaming access.
- Return type:
IterableDataset
- init_datasetdict_from_disk(path: str | pathlib.Path) CGNSDatasetDict[source]¶
Initialize a dataset dictionary from local disk.
This function scans a local directory structure and creates CGNSDataset objects for each split found in the data directory.
- Parameters:
path – Path to the root directory containing the dataset. Should contain a ‘data’ subdirectory with split subdirectories.
- Returns:
Dictionary mapping split names to CGNSDataset objects.
- Return type:
- download_datasetdict_from_hub(repo_id: str, local_dir: str | pathlib.Path, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None, overwrite: bool = False) str[source]¶
Download a CGNS dataset from Hugging Face Hub to local disk.
This function downloads selected parts or the entire CGNS dataset from a Hugging Face repository to a local directory. Supports selective downloading of specific splits and samples.
- Parameters:
repo_id – The Hugging Face repository ID (e.g., ‘username/dataset-name’).
local_dir – Local directory path where the dataset will be downloaded.
split_ids – Optional dictionary mapping split names to iterables of sample IDs to download. If None, downloads all splits and samples.
features – Optional list of features to download (currently unused).
overwrite – If True, removes existing local directory before downloading.
- Returns:
Path to the local directory where the dataset has been downloaded.
- Return type:
- init_datasetdict_streaming_from_hub(repo_id: str, split_ids: dict[str, Iterable[int]] | None = None, features: list[str] | None = None) dict[str, datasets.IterableDataset][source]¶
Initialize streaming datasets from Hugging Face Hub.
This function creates a dictionary of streaming IterableDataset objects for CGNS data stored on Hugging Face Hub. Supports selective streaming of specific splits and samples.
- Parameters:
repo_id – The Hugging Face repository ID (e.g., ‘username/dataset-name’).
split_ids – Optional dictionary mapping split names to iterables of sample IDs to stream. If None, streams all available samples for each split.
features – Optional list of features to stream (currently unused).
- Returns:
- Dictionary mapping split names to IterableDataset objects
for streaming access.
- Return type: