plaid.storage.reader ==================== .. py:module:: plaid.storage.reader .. autoapi-nested-parse:: PLAID storage reader module. This module provides high-level functions for loading PLAID datasets from local disk or Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets, and Zarr, providing a unified interface for data access and conversion. Key features: - Unified interface for loading datasets across different backends - Local disk and streaming Hub access - Automatic backend detection and converter creation - Sample conversion between storage formats and PLAID objects Classes ------- .. autoapisummary:: plaid.storage.reader.Converter Functions --------- .. autoapisummary:: plaid.storage.reader.init_from_disk plaid.storage.reader.download_from_hub plaid.storage.reader.init_streaming_from_hub Module Contents --------------- .. py:class:: Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_features: Any, constant_features: Any, num_samples: Any) Converter class for transforming samples between storage and PLAID formats. This class provides methods to convert samples between backend-specific storage formats and PLAID Sample objects. It handles the schema transformations and metadata required for proper data conversion. Initialize a :class:`Converter`. :param backend: The storage backend ('hf_datasets', 'zarr', or 'cgns'). :param flat_cst: Flattened constants for the dataset. :param cgns_types: CGNS type information. :param variable_features: Set of variable feature names. :param constant_features: Set of constant feature names. :param num_samples: Mapping providing the number of samples for each split. .. py:attribute:: backend .. py:attribute:: backend_spec .. py:attribute:: flat_cst .. py:attribute:: cgns_types .. py:attribute:: variable_features .. py:attribute:: constant_features .. py:attribute:: num_samples .. py:method:: to_dict(dataset: Any, idx: int, features: Optional[list[str]] = None) -> dict[float, dict[str, Any]] Convert a dataset sample to dictionary format. :param dataset: The dataset object containing the sample. :param idx: Index of the sample to convert. :param features: Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included. :returns: Sample data in dictionary format. :rtype: dict :raises ValueError: If called with CGNS backend. .. py:method:: to_plaid(dataset: Any, idx: int, features: Optional[list[str]] = None) -> plaid.Sample Convert a dataset sample to PLAID Sample object. :param dataset: The dataset object containing the sample. :param idx: Index of the sample to convert. :param features: Optional list of feature names to include from the variable fields. If None, all variable features available for the backend are included. Features are retreated based on self.constant_features and self.variable_features to satisfy the CGNS conventions. :returns: A PLAID Sample object. :rtype: Sample .. py:method:: sample_to_dict(sample: plaid.Sample) -> dict[float, dict[str, Any]] Convert a PLAID Sample to dictionary format. :param sample: The PLAID Sample object to convert. :returns: Sample data in dictionary format. :rtype: dict :raises ValueError: If called with CGNS backend. .. py:method:: sample_to_plaid(sample: plaid.Sample) -> plaid.Sample Convert a sample to PLAID format (identity function for most backends). :param sample: The sample object to convert. :returns: A PLAID Sample object. :rtype: Sample .. py:method:: plaid_to_dict(plaid_sample: plaid.Sample) -> dict[str, Any] Convert a PLAID Sample to dictionary format for storage. :param plaid_sample: The PLAID Sample object to convert. :returns: Sample data in dictionary format suitable for storage. :rtype: dict .. py:function:: init_from_disk(local_dir: Union[pathlib.Path, str], splits: Optional[list[str]] = None) -> tuple[dict[str, Any], dict[str, Converter]] Initialize dataset and converters from local disk. This function loads a previously saved PLAID dataset from local disk, automatically detecting the backend and creating appropriate converters for sample transformation. :param local_dir: Path to the local directory containing the saved dataset. :param splits: Optional list of split names to load converters for. If None, converters are created for all splits present in the dataset. :returns: A tuple containing (datasetdict, converterdict) where datasetdict maps split names to dataset objects and converterdict maps split names to Converter objects. :rtype: tuple .. py:function:: download_from_hub(repo_id: str, local_dir: Union[str, pathlib.Path], split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None, overwrite: bool = False) Download a PLAID dataset from Hugging Face Hub to local disk. This function downloads a dataset from Hugging Face Hub, including data, metadata, infos, and problem definitions, saving everything to local disk. :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name'). :param local_dir: Local directory path where the dataset will be downloaded. :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to download. :param features: Optional list of features to download. :param overwrite: If True, overwrites existing local directory. .. py:function:: init_streaming_from_hub(repo_id: str, split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None) -> tuple[dict[str, Any], dict[str, Converter]] Initialize streaming datasets from Hugging Face Hub. This function creates streaming dataset objects from a Hugging Face Hub repository, along with converters for sample transformation. :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name'). :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to stream. :param features: Optional list of features to stream. :returns: A tuple containing (datasetdict, converterdict) where datasetdict maps split names to streaming dataset objects and converterdict maps split names to Converter objects. :rtype: tuple