plaid.storage.common.reader¶
Common storage reader utilities.
This module provides common utilities for reading dataset metadata, problem definitions, and other auxiliary files from disk or downloading them from Hugging Face Hub.
Functions¶
|
Load dataset information from a YAML file stored on disk. |
|
Load ProblemDefinitions from a local dataset directory. |
|
Load constant features stored under a dataset's "constants" directory. |
|
Load dataset metadata from disk. |
|
Load dataset infos from the Hugging Face Hub. |
|
Load ProblemDefinitions from Hugging Face Hub. |
|
Load dataset metadata from Hugging Face Hub. |
Module Contents¶
- load_infos_from_disk(path: str | pathlib.Path) dict[str, Any][source]¶
Load dataset information from a YAML file stored on disk.
- load_problem_definitions_from_disk(path: str | pathlib.Path) dict[str, plaid.ProblemDefinition][source]¶
Load ProblemDefinitions from a local dataset directory.
This function reads all serialized
ProblemDefinitionfiles located in theproblem_definitions/subdirectory underpathand reconstructs them intoProblemDefinitionobjects.Each file is loaded using
ProblemDefinition._load_from_file_and inserted into a dictionary keyed by the problem definition name.- Expected local layout:
- <path>/
- problem_definitions/
<problem_name_1> <problem_name_2> …
- Parameters:
path (Union[str, Path]) – Root dataset directory containing the
problem_definitions/folder.- Returns:
Mapping from problem definition names to loaded
ProblemDefinitionobjects.- Return type:
- Raises:
ValueError – If the
problem_definitions/directory does not exist.
- load_constants_from_disk(path)[source]¶
Load constant features stored under a dataset’s “constants” directory.
- The function expects the following layout under <path>/constants/:
one folder per split (e.g. “train”, “test”, …) each containing:
layout.json : mapping constant_name -> {‘offset’: int, ‘shape’: [..]} or None
constant_schema.yaml : YAML describing dtype for each constant (dtype string or “string”)
data.mmap : raw bytes memory-mapped file containing packed constant data
- Parameters:
path (str | Path) – Root dataset directory that contains the “constants” folder.
- Returns:
- flat_cst (dict[str, dict[str, Any]]): Mapping split -> {constant_name: numpy array | None}.
Numeric constants are returned as
np.memmaparrays backed bydata.mmapin the dataset directory.String constants are returned as 1-element numpy arrays of Python str decoded using ASCII.
If layout entry for a key is None, the value is returned as None.
constant_schema (dict[str, dict[str, Any]]): Mapping split -> loaded constant schema (from YAML).
- Return type:
- Raises:
FileNotFoundError – If the expected “constants” directory or required files are missing.
- load_metadata_from_disk(path: str | pathlib.Path) tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]][source]¶
Load dataset metadata from disk.
- Parameters:
path (Union[str, Path]) – Directory path containing the metadata files.
- Returns:
flat_cst: constant features dictionary (numeric constants kept as file-backed
np.memmap)variable_schema: variable schema dictionary
constant_schema: constant schema dictionary
cgns_types: CGNS types dictionary
- Return type:
tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]]
- load_infos_from_hub(repo_id: str) dict[str, Any][source]¶
Load dataset infos from the Hugging Face Hub.
Downloads the infos.yaml file from the specified repository and parses it as a dictionary.
- load_problem_definitions_from_hub(repo_id: str) dict[str, plaid.ProblemDefinition] | None[source]¶
Load ProblemDefinitions from Hugging Face Hub.
- Parameters:
repo_id (str) – The repository ID on the Hugging Face Hub.
- Returns:
List of loaded problem definitions, or None if not found.
- Return type:
Optional[list[ProblemDefinition]]
- load_metadata_from_hub(repo_id: str) tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]][source]¶
Load dataset metadata from Hugging Face Hub.
- Parameters:
repo_id (str) – The repository ID on the Hugging Face Hub.
- Returns:
flat_cst: constant features dictionary (numeric constants are materialized to in-memory arrays)
variable_schema: variable schema dictionary
constant_schema: constant schema dictionary
cgns_types: CGNS types dictionary
- Return type:
tuple[dict[str, Any], dict[str, Any], dict[str, Any], dict[str, Any]]