plaid.storage.cgns.reader ========================= .. py:module:: plaid.storage.cgns.reader .. autoapi-nested-parse:: CGNS dataset reader module for PLAID. This module provides functionality for reading and streaming CGNS datasets for the PLAID library. It includes utilities for loading datasets from local disk or streaming directly from Hugging Face Hub, with support for selective loading of splits and samples. Key features: - Local dataset loading from disk via CGNSDataset class - Streaming datasets from Hugging Face Hub - Selective loading of splits and sample IDs - Integration with PLAID Sample objects Classes ------- .. autoapisummary:: plaid.storage.cgns.reader.CGNSDataset Functions --------- .. autoapisummary:: plaid.storage.cgns.reader.sample_generator plaid.storage.cgns.reader.create_CGNS_iterable_dataset plaid.storage.cgns.reader.init_datasetdict_from_disk plaid.storage.cgns.reader.download_datasetdict_from_hub plaid.storage.cgns.reader.init_datasetdict_streaming_from_hub Module Contents --------------- .. py:class:: CGNSDataset(path: Union[str, pathlib.Path], **kwargs) CGNS dataset class for local disk access. This class represents a CGNS dataset stored on local disk, providing access to individual samples and associated metadata. It supports iteration over samples and attribute access to extra fields. Initialize a :class:`CGNSDataset`. :param path: Path to the dataset directory. :param \*\*kwargs: Optional keyword metadata to attach to the dataset instance. All provided kwargs are stored in ``self._extra_fields`` and are accessible as attributes via ``__getattr__`` / ``__setattr__``. .. py:attribute:: path .. py:function:: sample_generator(repo_id: str, split: str, ids: Iterable[int]) -> Iterator[plaid.Sample] Generate Sample objects from a Hugging Face Hub repository. This function downloads individual samples from a CGNS dataset stored on Hugging Face Hub and yields PLAID Sample objects. Each sample is downloaded to a temporary directory and loaded as a Sample. :param repo_id: The Hugging Face repository ID (e.g., 'username/dataset-name'). :param split: The dataset split name (e.g., 'train', 'test'). :param ids: Iterable of sample IDs to generate. :Yields: *Sample* -- A PLAID Sample object for each requested ID. .. py:function:: create_CGNS_iterable_dataset(repo_id: str, split: str, ids: Iterable[int]) -> datasets.IterableDataset Create an iterable dataset from CGNS samples on Hugging Face Hub. This function creates a Hugging Face IterableDataset that streams CGNS samples from a repository. The dataset can be used for efficient streaming access without loading all samples into memory. :param repo_id: The Hugging Face repository ID (e.g., 'username/dataset-name'). :param split: The dataset split name (e.g., 'train', 'test'). :param ids: Iterable of sample IDs to include in the dataset. :returns: A Hugging Face IterableDataset for streaming access. :rtype: IterableDataset .. py:function:: init_datasetdict_from_disk(path: Union[str, pathlib.Path]) -> dict[str, CGNSDataset] Initialize a dataset dictionary from local disk. This function scans a local directory structure and creates CGNSDataset objects for each split found in the data directory. :param path: Path to the root directory containing the dataset. Should contain a 'data' subdirectory with split subdirectories. :returns: Dictionary mapping split names to CGNSDataset objects. :rtype: dict[str, CGNSDataset] .. py:function:: download_datasetdict_from_hub(repo_id: str, local_dir: Union[str, pathlib.Path], split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None, overwrite: bool = False) -> None Download a CGNS dataset from Hugging Face Hub to local disk. This function downloads selected parts or the entire CGNS dataset from a Hugging Face repository to a local directory. Supports selective downloading of specific splits and samples. :param repo_id: The Hugging Face repository ID (e.g., 'username/dataset-name'). :param local_dir: Local directory path where the dataset will be downloaded. :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to download. If None, downloads all splits and samples. :param features: Optional list of features to download (currently unused). :param overwrite: If True, removes existing local directory before downloading. .. py:function:: init_datasetdict_streaming_from_hub(repo_id: str, split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None) -> dict[str, datasets.IterableDataset] Initialize streaming datasets from Hugging Face Hub. This function creates a dictionary of streaming IterableDataset objects for CGNS data stored on Hugging Face Hub. Supports selective streaming of specific splits and samples. :param repo_id: The Hugging Face repository ID (e.g., 'username/dataset-name'). :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to stream. If None, streams all available samples for each split. :param features: Optional list of features to stream (currently unused). :returns: Dictionary mapping split names to IterableDataset objects for streaming access. :rtype: dict[str, IterableDataset]