plaid.storage.common.reader =========================== .. py:module:: plaid.storage.common.reader .. autoapi-nested-parse:: Common storage reader utilities. This module provides common utilities for reading dataset metadata, problem definitions, and other auxiliary files from disk or downloading them from Hugging Face Hub. Functions --------- .. autoapisummary:: plaid.storage.common.reader.load_infos_from_disk plaid.storage.common.reader.load_problem_definitions_from_disk plaid.storage.common.reader.load_constants_from_disk plaid.storage.common.reader.load_metadata_from_disk plaid.storage.common.reader.load_infos_from_hub plaid.storage.common.reader.load_problem_definitions_from_hub plaid.storage.common.reader.load_metadata_from_hub Module Contents --------------- .. py:function:: load_infos_from_disk(path: Union[str, pathlib.Path]) -> dict[str, Any] Load dataset information from a YAML file stored on disk. :param path: Directory path containing the `infos.yaml` file. :type path: Union[str, Path] :returns: Dictionary containing dataset infos. :rtype: dict[str, dict[str, str]] .. py:function:: load_problem_definitions_from_disk(path: Union[str, pathlib.Path]) -> dict[str, plaid.ProblemDefinition] Load ProblemDefinitions from a local dataset directory. This function reads all serialized ``ProblemDefinition`` files located in the ``problem_definitions/`` subdirectory under ``path`` and reconstructs them into ``ProblemDefinition`` objects. Each file is loaded using ``ProblemDefinition._load_from_file_`` and inserted into a dictionary keyed by the problem definition name. Expected local layout: / problem_definitions/ ... :param path: Root dataset directory containing the ``problem_definitions/`` folder. :type path: Union[str, Path] :returns: Mapping from problem definition names to loaded ``ProblemDefinition`` objects. :rtype: dict[str, ProblemDefinition] :raises ValueError: If the ``problem_definitions/`` directory does not exist. .. py:function:: load_constants_from_disk(path) Load constant features stored under a dataset's "constants" directory. The function expects the following layout under /constants/: - one folder per split (e.g. "train", "test", ...) each containing: * layout.json : mapping constant_name -> {'offset': int, 'shape': [..]} or None * constant_schema.yaml : YAML describing dtype for each constant (dtype string or "string") * data.mmap : raw bytes memory-mapped file containing packed constant data :param path: Root dataset directory that contains the "constants" folder. :type path: str | Path :returns: flat_cst (dict[str, dict[str, Any]]): Mapping split -> {constant_name: numpy array | None}. - Numeric constants are returned as ``np.memmap`` arrays backed by ``data.mmap`` in the dataset directory. - String constants are returned as 1-element numpy arrays of Python str decoded using ASCII. - If layout entry for a key is None, the value is returned as None. constant_schema (dict[str, dict[str, Any]]): Mapping split -> loaded constant schema (from YAML). :rtype: tuple :raises FileNotFoundError: If the expected "constants" directory or required files are missing. .. py:function:: load_metadata_from_disk(path: Union[str, pathlib.Path]) -> tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]] Load dataset metadata from disk. :param path: Directory path containing the metadata files. :type path: Union[str, Path] :returns: - flat_cst: constant features dictionary (numeric constants kept as file-backed ``np.memmap``) - variable_schema: variable schema dictionary - constant_schema: constant schema dictionary - cgns_types: CGNS types dictionary :rtype: tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]] .. py:function:: load_infos_from_hub(repo_id: str) -> dict[str, Any] Load dataset infos from the Hugging Face Hub. Downloads the infos.yaml file from the specified repository and parses it as a dictionary. :param repo_id: The repository ID on the Hugging Face Hub. :type repo_id: str :returns: Dictionary containing dataset infos. :rtype: dict[str, dict[str, str]] .. py:function:: load_problem_definitions_from_hub(repo_id: str) -> Optional[dict[str, plaid.ProblemDefinition]] Load ProblemDefinitions from Hugging Face Hub. :param repo_id: The repository ID on the Hugging Face Hub. :type repo_id: str :returns: List of loaded problem definitions, or None if not found. :rtype: Optional[list[ProblemDefinition]] .. py:function:: load_metadata_from_hub(repo_id: str) -> tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]] Load dataset metadata from Hugging Face Hub. :param repo_id: The repository ID on the Hugging Face Hub. :type repo_id: str :returns: - flat_cst: constant features dictionary (numeric constants are materialized to in-memory arrays) - variable_schema: variable schema dictionary - constant_schema: constant schema dictionary - cgns_types: CGNS types dictionary :rtype: tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]]