plaid.storage.common.reader

Common storage reader utilities.

This module provides common utilities for reading dataset metadata, problem definitions, and other auxiliary files from disk or downloading them from Hugging Face Hub.

Functions

load_infos_from_disk(→ dict[str, Any])

Load dataset information from a YAML file stored on disk.

load_problem_definitions_from_disk(→ dict[str, ...)

Load ProblemDefinitions from a local dataset directory.

load_constants_from_disk(path)

Load constant features stored under a dataset's "constants" directory.

load_metadata_from_disk(→ tuple[dict[str, Any], ...)

Load dataset metadata from disk.

load_infos_from_hub(→ dict[str, Any])

Load dataset infos from the Hugging Face Hub.

load_problem_definitions_from_hub(→ Optional[dict[str, ...)

Load ProblemDefinitions from Hugging Face Hub.

load_metadata_from_hub(→ tuple[dict[str, Any], ...)

Load dataset metadata from Hugging Face Hub.

Module Contents

load_infos_from_disk(path: str | pathlib.Path) dict[str, Any][source]

Load dataset information from a YAML file stored on disk.

Parameters:

path (Union[str, Path]) – Directory path containing the infos.yaml file.

Returns:

Dictionary containing dataset infos.

Return type:

dict[str, dict[str, str]]

load_problem_definitions_from_disk(path: str | pathlib.Path) dict[str, plaid.ProblemDefinition][source]

Load ProblemDefinitions from a local dataset directory.

This function reads all serialized ProblemDefinition files located in the problem_definitions/ subdirectory under path and reconstructs them into ProblemDefinition objects.

Each file is loaded using ProblemDefinition._load_from_file_ and inserted into a dictionary keyed by the problem definition name.

Expected local layout:
<path>/
problem_definitions/

<problem_name_1> <problem_name_2> …

Parameters:

path (Union[str, Path]) – Root dataset directory containing the problem_definitions/ folder.

Returns:

Mapping from problem definition names to loaded ProblemDefinition objects.

Return type:

dict[str, ProblemDefinition]

Raises:

ValueError – If the problem_definitions/ directory does not exist.

load_constants_from_disk(path)[source]

Load constant features stored under a dataset’s “constants” directory.

The function expects the following layout under <path>/constants/:
  • one folder per split (e.g. “train”, “test”, …) each containing:

    • layout.json : mapping constant_name -> {‘offset’: int, ‘shape’: [..]} or None

    • constant_schema.yaml : YAML describing dtype for each constant (dtype string or “string”)

    • data.mmap : raw bytes memory-mapped file containing packed constant data

Parameters:

path (str | Path) – Root dataset directory that contains the “constants” folder.

Returns:

flat_cst (dict[str, dict[str, Any]]): Mapping split -> {constant_name: numpy array | None}.
  • Numeric constants are returned as np.memmap arrays backed by data.mmap in the dataset directory.

  • String constants are returned as 1-element numpy arrays of Python str decoded using ASCII.

  • If layout entry for a key is None, the value is returned as None.

constant_schema (dict[str, dict[str, Any]]): Mapping split -> loaded constant schema (from YAML).

Return type:

tuple

Raises:

FileNotFoundError – If the expected “constants” directory or required files are missing.

load_metadata_from_disk(path: str | pathlib.Path) tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]][source]

Load dataset metadata from disk.

Parameters:

path (Union[str, Path]) – Directory path containing the metadata files.

Returns:

  • flat_cst: constant features dictionary (numeric constants kept as file-backed np.memmap)

  • variable_schema: variable schema dictionary

  • constant_schema: constant schema dictionary

  • cgns_types: CGNS types dictionary

Return type:

tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]]

load_infos_from_hub(repo_id: str) dict[str, Any][source]

Load dataset infos from the Hugging Face Hub.

Downloads the infos.yaml file from the specified repository and parses it as a dictionary.

Parameters:

repo_id (str) – The repository ID on the Hugging Face Hub.

Returns:

Dictionary containing dataset infos.

Return type:

dict[str, dict[str, str]]

load_problem_definitions_from_hub(repo_id: str) dict[str, plaid.ProblemDefinition] | None[source]

Load ProblemDefinitions from Hugging Face Hub.

Parameters:

repo_id (str) – The repository ID on the Hugging Face Hub.

Returns:

List of loaded problem definitions, or None if not found.

Return type:

Optional[list[ProblemDefinition]]

load_metadata_from_hub(repo_id: str) tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]][source]

Load dataset metadata from Hugging Face Hub.

Parameters:

repo_id (str) – The repository ID on the Hugging Face Hub.

Returns:

  • flat_cst: constant features dictionary (numeric constants are materialized to in-memory arrays)

  • variable_schema: variable schema dictionary

  • constant_schema: constant schema dictionary

  • cgns_types: CGNS types dictionary

Return type:

tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]]