plaid.storage.reader ==================== .. py:module:: plaid.storage.reader .. autoapi-nested-parse:: PLAID storage reader module. This module provides high-level functions for loading PLAID datasets from local disk or Hugging Face Hub. It supports multiple storage backends including CGNS, HF Datasets, and Zarr, providing a unified interface for data access and conversion. Key features: - Unified interface for loading datasets across different backends - Local disk and streaming Hub access - Automatic backend detection and converter creation - Sample conversion between storage formats and PLAID objects Attributes ---------- .. autoapisummary:: plaid.storage.reader.init_datasetdict_from_disk plaid.storage.reader.download_datasetdict_from_hub plaid.storage.reader.init_datasetdict_streaming_from_hub plaid.storage.reader.to_var_sample_dict plaid.storage.reader.sample_to_var_sample_dict Classes ------- .. autoapisummary:: plaid.storage.reader.Converter Functions --------- .. autoapisummary:: plaid.storage.reader.init_from_disk plaid.storage.reader.download_from_hub plaid.storage.reader.init_streaming_from_hub Module Contents --------------- .. py:data:: init_datasetdict_from_disk .. py:data:: download_datasetdict_from_hub .. py:data:: init_datasetdict_streaming_from_hub .. py:data:: to_var_sample_dict .. py:data:: sample_to_var_sample_dict .. py:class:: Converter(backend: str, flat_cst: Any, cgns_types: Any, variable_schema: Any, constant_schema: Any) Converter class for transforming samples between storage and PLAID formats. This class provides methods to convert samples between backend-specific storage formats and PLAID Sample objects. It handles the schema transformations and metadata required for proper data conversion. Initialize a :class:`Converter`. :param backend: The storage backend ('hf_datasets', 'zarr', or 'cgns'). :param flat_cst: Flattened constants for the dataset. :param cgns_types: CGNS type information. :param variable_schema: Schema for variable fields. :param constant_schema: Schema for constant fields. .. py:attribute:: backend .. py:attribute:: flat_cst .. py:attribute:: cgns_types .. py:attribute:: variable_schema .. py:attribute:: constant_schema .. py:method:: to_dict(dataset: Any, idx: int) -> dict[str, Any] Convert a dataset sample to dictionary format. :param dataset: The dataset object containing the sample. :param idx: Index of the sample to convert. :returns: Sample data in dictionary format. :rtype: dict :raises ValueError: If called with CGNS backend. .. py:method:: to_plaid(dataset: Any, idx: int) -> plaid.Sample Convert a dataset sample to PLAID Sample object. :param dataset: The dataset object containing the sample. :param idx: Index of the sample to convert. :returns: A PLAID Sample object. :rtype: Sample .. py:method:: sample_to_dict(sample: plaid.Sample) -> dict[str, Any] Convert a PLAID Sample to dictionary format. :param sample: The PLAID Sample object to convert. :returns: Sample data in dictionary format. :rtype: dict :raises ValueError: If called with CGNS backend. .. py:method:: sample_to_plaid(sample: plaid.Sample) -> plaid.Sample Convert a sample to PLAID format (identity function for most backends). :param sample: The sample object to convert. :returns: A PLAID Sample object. :rtype: Sample .. py:method:: plaid_to_dict(plaid_sample: plaid.Sample) -> dict[str, Any] Convert a PLAID Sample to dictionary format for storage. :param plaid_sample: The PLAID Sample object to convert. :returns: Sample data in dictionary format suitable for storage. :rtype: dict .. py:function:: init_from_disk(local_dir: Union[pathlib.Path, str]) -> tuple[dict[str, Any], dict[str, Converter]] Initialize dataset and converters from local disk. This function loads a previously saved PLAID dataset from local disk, automatically detecting the backend and creating appropriate converters for sample transformation. :param local_dir: Path to the local directory containing the saved dataset. :returns: A tuple containing (datasetdict, converterdict) where datasetdict maps split names to dataset objects and converterdict maps split names to Converter objects. :rtype: tuple .. py:function:: download_from_hub(repo_id: str, local_dir: Union[str, pathlib.Path], split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None, overwrite: bool = False) Download a PLAID dataset from Hugging Face Hub to local disk. This function downloads a dataset from Hugging Face Hub, including data, metadata, infos, and problem definitions, saving everything to local disk. :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name'). :param local_dir: Local directory path where the dataset will be downloaded. :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to download. :param features: Optional list of features to download. :param overwrite: If True, overwrites existing local directory. .. py:function:: init_streaming_from_hub(repo_id: str, split_ids: Optional[dict[str, Iterable[int]]] = None, features: Optional[list[str]] = None) -> tuple[dict[str, Any], dict[str, Converter]] Initialize streaming datasets from Hugging Face Hub. This function creates streaming dataset objects from a Hugging Face Hub repository, along with converters for sample transformation. :param repo_id: Hugging Face repository ID (e.g., 'username/dataset-name'). :param split_ids: Optional dictionary mapping split names to iterables of sample IDs to stream. :param features: Optional list of features to stream. :returns: A tuple containing (datasetdict, converterdict) where datasetdict maps split names to streaming dataset objects and converterdict maps split names to Converter objects. :rtype: tuple