plaid.storage.common.writer

Common storage writer utilities.

This module provides common utilities for writing dataset metadata, problem definitions, and other auxiliary files to disk or uploading them to Hugging Face Hub. It handles serialization of infos, problem definitions, and dataset tree structures.

Functions

save_infos_to_disk(→ None)

Save dataset infos as a YAML file to disk.

save_problem_definitions_to_disk(→ None)

Save ProblemDefinitions to disk.

save_constants_to_disk(path, constant_schema, flat_cst)

Write constant features to disk under <path>/constants/.

save_metadata_to_disk(→ None)

Save the structure of a dataset tree to disk.

push_infos_to_hub(→ None)

Upload dataset infos.yaml to a Hugging Face dataset repository.

push_local_problem_definitions_to_hub(→ None)

Upload local ProblemDefinitions to a Hugging Face dataset repository.

push_local_metadata_to_hub(→ None)

Upload locally stored dataset metadata to a Hugging Face dataset repository.

Module Contents

save_infos_to_disk(path: str | pathlib.Path, infos: dict[str, dict[str, str]]) None[source]

Save dataset infos as a YAML file to disk.

Parameters:
  • path (Union[str, Path]) – The directory path where the infos file will be saved.

  • infos (dict[str, dict[str, str]]) – Dictionary containing dataset infos.

save_problem_definitions_to_disk(path: str | pathlib.Path, pb_defs: dict[str, plaid.ProblemDefinition] | plaid.ProblemDefinition) None[source]

Save ProblemDefinitions to disk.

Parameters:
save_constants_to_disk(path, constant_schema, flat_cst)[source]

Write constant features to disk under <path>/constants/.

For each split in flat_cst this creates a directory:
<path>/constants/<split>/
  • data.mmap : concatenated raw bytes of all constants for that split

  • layout.json : mapping constant_name -> {‘offset’: int, ‘shape’: […] } or None

  • constant_schema.yaml : the provided schema for that split (dtype and ndim)

Behavior:
  • Numeric constants are written as their C-order bytes.

  • String constants support two cases:
    • CGNS string scalar: a 1-element array of Python str -> written as ASCII bytes, shape recorded as [len].

    • CGNS char array: multi-char arrays -> converted to fixed-width bytes and written.

  • If a schema entry’s dtype is None, the layout entry is set to None and no bytes are written.

Parameters:
  • path (str | Path) – Root dataset directory where “constants” will be created.

  • constant_schema (dict) – Mapping split -> {constant_name: {‘dtype’: str | None, ‘ndim’: int, …}}.

  • flat_cst (dict) – Mapping split -> {constant_name: numpy array | None} containing values to save.

Returns:

None

Raises:
  • AssertionError – if a numeric array does not match the expected ndim.

  • OSError / IOError – on file system write errors.

save_metadata_to_disk(path: str | pathlib.Path, flat_cst: dict[str, Any], variable_schema: dict[str, Any], constant_schema: dict[str, Any], cgns_types: dict[str, Any]) None[source]

Save the structure of a dataset tree to disk.

This function writes the constant part of the tree and its key mappings to files in the specified directory. The constant part is serialized as a pickle file, while the key mappings are saved in YAML format.

Parameters:
  • path (Union[str, Path]) – Directory path where the tree structure files will be saved.

  • flat_cst (dict) – Dictionary containing the constant part of the tree.

  • variable_schema (dict) – Dictionary containing the variable schema.

  • constant_schema (dict) – Dictionary containing the constant schema.

  • cgns_types (dict) – Dictionary containing CGNS types.

Returns:

None

push_infos_to_hub(repo_id: str, infos: dict[str, dict[str, str]]) None[source]

Upload dataset infos.yaml to a Hugging Face dataset repository.

Serializes the provided infos mapping to YAML and uploads it as infos.yaml to the target repo_id using the HfApi.

Parameters:
  • repo_id (str) – Hugging Face dataset repository identifier (e.g. “user/repo”).

  • infos (dict[str, dict[str, str]]) – Dataset infos mapping to serialize and upload.

Raises:
  • ValueError – If infos is empty.

  • OSError / IOError – If the upload fails due to I/O errors or network problems.

Notes

  • The function uses HfApi.upload_file and constructs the file contents in-memory.

  • Not covered by unit tests (pragma: no cover).

push_local_problem_definitions_to_hub(repo_id: str, path: pathlib.Path | str) None[source]

Upload local ProblemDefinitions to a Hugging Face dataset repository.

This function uploads the entire local problem_definitions/ directory located under path to the target Hugging Face dataset repository using HfApi.upload_folder.

Expected local layout:
<path>/
problem_definitions/

<name_1> <name_2> …

Each problem definition is assumed to already be serialized on disk (e.g. via ProblemDefinition.save_to_file). The function performs a directory-level upload and does not inspect, validate, or re-serialize individual problem definitions.

Parameters:
  • repo_id (str) – Hugging Face dataset repository identifier (e.g. "username/dataset_name").

  • path (Union[Path, str]) – Root dataset directory containing the problem_definitions/ folder.

Notes

  • Upload is atomic at the folder level.

  • Existing files in problem_definitions/ on the Hub may be overwritten.

  • Uses repo_type="dataset".

  • Not covered by unit tests (pragma: no cover).

Raises:

OSError / IOError – If the local folder does not exist or an upload error occurs.

push_local_metadata_to_hub(repo_id: str, path: pathlib.Path | str) None[source]

Upload locally stored dataset metadata to a Hugging Face dataset repository.

This function uploads the structural metadata of a PLAID dataset from disk to a Hugging Face Hub dataset repository. The upload consists of:

  1. The constants/ directory, containing: - data.mmap files with concatenated constant values, - layout.json files describing byte offsets and shapes, - constant_schema.yaml files describing constant dtypes and dimensions, organized per dataset split.

  2. variable_schema.yaml, describing the schema of variable (sample-dependent) features.

  3. cgns_types.yaml, describing CGNS node types associated with dataset paths.

All metadata files are assumed to have been previously generated on disk (e.g. via save_metadata_to_disk). This function performs no validation, transformation, or serialization; it strictly uploads existing files.

Expected local layout:
<path>/
constants/
<split>/

data.mmap layout.json constant_schema.yaml

variable_schema.yaml cgns_types.yaml

Parameters:
  • repo_id (str) – Hugging Face dataset repository identifier (e.g. "username/dataset_name").

  • path (Union[Path, str]) – Root dataset directory containing the metadata files.

Notes

  • Uploads use repo_type="dataset".

  • Folder uploads may overwrite existing files on the Hub.

  • The operation is atomic per uploaded artifact (constants/ folder, individual YAML files).

  • Not covered by unit tests (pragma: no cover).

Raises:

OSError / IOError – If required local files are missing or if an upload error occurs.