plaid.storage.reader
====================

.. py:module:: plaid.storage.reader

.. autoapi-nested-parse::

   PLAID storage reader module.

   This module provides high-level functions for loading PLAID datasets from local disk or
   Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets,
   and Zarr, providing a unified interface for data access and conversion.

   Key features:
   - Unified interface for loading datasets across different backends
   - Local disk and streaming Hub access
   - Automatic backend detection and converter creation
   - Sample conversion between storage formats and PLAID objects


Classes
-------

.. autoapisummary::

   plaid.storage.reader.Converter


Functions
---------

.. autoapisummary::

   plaid.storage.reader.init_from_disk
   plaid.storage.reader.download_from_hub
   plaid.storage.reader.init_streaming_from_hub


Module Contents
---------------

.. py:class:: Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_features: Any, constant_features: Any, num_samples: Any)

   Converter class for transforming samples between storage and PLAID formats.

   This class provides methods to convert samples between backend-specific storage formats
   and PLAID Sample objects. It handles the schema transformations and metadata required
   for proper data conversion.

   Initialize a :class:`Converter`.

   :param backend: The storage backend ('hf_datasets', 'zarr', or 'cgns').
   :param flat_cst: Flattened constants for the dataset.
   :param cgns_types: CGNS type information.
   :param variable_features: Set of variable feature names.
   :param constant_features: Set of constant feature names.
   :param num_samples: Mapping providing the number of samples for each split.


   .. py:attribute:: backend


   .. py:attribute:: backend_spec


   .. py:attribute:: flat_cst


   .. py:attribute:: cgns_types


   .. py:attribute:: variable_features


   .. py:attribute:: constant_features


   .. py:attribute:: num_samples


   .. py:method:: to_dict(dataset: Any, idx: int, features: Optional[list[str]] = None) -> dict[float, dict[str, Any]]

      Convert a dataset sample to dictionary format.

      :param dataset: The dataset object containing the sample.
      :param idx: Index of the sample to convert.
      :param features: Optional list of feature names to include from the variable fields.
                       If None, all variable features available for the backend are included.

      :returns: Sample data in dictionary format.
      :rtype: dict

      :raises ValueError: If called with CGNS backend.


   .. py:method:: to_plaid(dataset: Any, idx: int, features: Optional[list[str]] = None) -> plaid.Sample

      Convert a dataset sample to PLAID Sample object.

      :param dataset: The dataset object containing the sample.
      :param idx: Index of the sample to convert.
      :param features: Optional list of feature names to include from the variable fields.
                       If None, all variable features available for the backend are included.
                       Features are retreated based on self.constant_features and self.variable_features to satisfy the CGNS conventions.

      :returns: A PLAID Sample object.
      :rtype: Sample


   .. py:method:: sample_to_dict(sample: plaid.Sample) -> dict[float, dict[str, Any]]

      Convert a PLAID Sample to dictionary format.

      :param sample: The PLAID Sample object to convert.

      :returns: Sample data in dictionary format.
      :rtype: dict

      :raises ValueError: If called with CGNS backend.


   .. py:method:: sample_to_plaid(sample: plaid.Sample) -> plaid.Sample

      Convert a sample to PLAID format (identity function for most backends).

      :param sample: The sample object to convert.

      :returns: A PLAID Sample object.
      :rtype: Sample


   .. py:method:: plaid_to_dict(plaid_sample: plaid.Sample) -> dict[str, Any]

      Convert a PLAID Sample to dictionary format for storage.

      :param plaid_sample: The PLAID Sample object to convert.

      :returns: Sample data in dictionary format suitable for storage.
      :rtype: dict


.. py:function:: init_from_disk(local_dir: Union[pathlib.Path, str], splits: Optional[list[str]] = None) -> tuple[dict[str, Any], dict[str, Converter]]

   Initialize dataset and converters from local disk.

   This function loads a previously saved PLAID dataset from local disk, automatically
   detecting the backend and creating appropriate converters for sample transformation.

   :param local_dir: Path to the local directory containing the saved dataset.
   :param splits: Optional list of split names to load converters for. If None, converters
                  are created for all splits present in the dataset.

   :returns:

             A tuple containing (datasetdict, converterdict) where datasetdict maps
                 split names to dataset objects and converterdict maps split names to Converter objects.
   :rtype: tuple


.. py:function:: download_from_hub(repo_id: str, local_dir: Union[str, pathlib.Path], split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None, overwrite: bool = False)

   Download a PLAID dataset from Hugging Face Hub to local disk.

   This function downloads a dataset from Hugging Face Hub, including data, metadata,
   infos, and problem definitions, saving everything to local disk.

   :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name').
   :param local_dir: Local directory path where the dataset will be downloaded.
   :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to download.
   :param features: Optional list of features to download.
   :param overwrite: If True, overwrites existing local directory.


.. py:function:: init_streaming_from_hub(repo_id: str, split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None) -> tuple[dict[str, Any], dict[str, Converter]]

   Initialize streaming datasets from Hugging Face Hub.

   This function creates streaming dataset objects from a Hugging Face Hub repository,
   along with converters for sample transformation.

   :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name').
   :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to stream.
   :param features: Optional list of features to stream.

   :returns:

             A tuple containing (datasetdict, converterdict) where datasetdict maps
                 split names to streaming dataset objects and converterdict maps split names to Converter objects.
   :rtype: tuple