plaid.storage.common.writer =========================== .. py:module:: plaid.storage.common.writer .. autoapi-nested-parse:: Common storage writer utilities. This module provides common utilities for writing dataset metadata, problem definitions, and other auxiliary files to disk or uploading them to Hugging Face Hub. It handles serialization of infos, problem definitions, and dataset tree structures. Functions --------- .. autoapisummary:: plaid.storage.common.writer.save_infos_to_disk plaid.storage.common.writer.save_problem_definitions_to_disk plaid.storage.common.writer.save_constants_to_disk plaid.storage.common.writer.save_metadata_to_disk plaid.storage.common.writer.push_infos_to_hub plaid.storage.common.writer.push_local_problem_definitions_to_hub plaid.storage.common.writer.push_local_metadata_to_hub Module Contents --------------- .. py:function:: save_infos_to_disk(path: Union[str, pathlib.Path], infos: dict[str, dict[str, str]]) -> None Save dataset infos as a YAML file to disk. :param path: The directory path where the infos file will be saved. :type path: Union[str, Path] :param infos: Dictionary containing dataset infos. :type infos: dict[str, dict[str, str]] .. py:function:: save_problem_definitions_to_disk(path: Union[str, pathlib.Path], pb_defs: Union[dict[str, plaid.ProblemDefinition], plaid.ProblemDefinition]) -> None Save ProblemDefinitions to disk. :param path: The directory path for saving. :type path: Union[str, Path] :param pb_defs: The problem definitions to save. :type pb_defs: Union[dict[str, ProblemDefinition], ProblemDefinition] .. py:function:: save_constants_to_disk(path, constant_schema, flat_cst) Write constant features to disk under /constants/. For each split in flat_cst this creates a directory: /constants// - data.mmap : concatenated raw bytes of all constants for that split - layout.json : mapping constant_name -> {'offset': int, 'shape': [...] } or None - constant_schema.yaml : the provided schema for that split (dtype and ndim) Behavior: - Numeric constants are written as their C-order bytes. - String constants support two cases: * CGNS string scalar: a 1-element array of Python str -> written as ASCII bytes, shape recorded as [len]. * CGNS char array: multi-char arrays -> converted to fixed-width bytes and written. - If a schema entry's dtype is None, the layout entry is set to None and no bytes are written. :param path: Root dataset directory where "constants" will be created. :type path: str | Path :param constant_schema: Mapping split -> {constant_name: {'dtype': str | None, 'ndim': int, ...}}. :type constant_schema: dict :param flat_cst: Mapping split -> {constant_name: numpy array | None} containing values to save. :type flat_cst: dict :returns: None :raises AssertionError: if a numeric array does not match the expected ndim. :raises OSError / IOError: on file system write errors. .. py:function:: save_metadata_to_disk(path: Union[str, pathlib.Path], flat_cst: dict[str, Any], variable_schema: dict[str, Any], constant_schema: dict[str, Any], cgns_types: dict[str, Any]) -> None Save the structure of a dataset tree to disk. This function writes the constant part of the tree and its key mappings to files in the specified directory. The constant part is serialized as a pickle file, while the key mappings are saved in YAML format. :param path: Directory path where the tree structure files will be saved. :type path: Union[str, Path] :param flat_cst: Dictionary containing the constant part of the tree. :type flat_cst: dict :param variable_schema: Dictionary containing the variable schema. :type variable_schema: dict :param constant_schema: Dictionary containing the constant schema. :type constant_schema: dict :param cgns_types: Dictionary containing CGNS types. :type cgns_types: dict :returns: None .. py:function:: push_infos_to_hub(repo_id: str, infos: dict[str, dict[str, str]]) -> None Upload dataset infos.yaml to a Hugging Face dataset repository. Serializes the provided `infos` mapping to YAML and uploads it as `infos.yaml` to the target `repo_id` using the HfApi. :param repo_id: Hugging Face dataset repository identifier (e.g. "user/repo"). :type repo_id: str :param infos: Dataset infos mapping to serialize and upload. :type infos: dict[str, dict[str, str]] :raises ValueError: If `infos` is empty. :raises OSError / IOError: If the upload fails due to I/O errors or network problems. .. rubric:: Notes - The function uses HfApi.upload_file and constructs the file contents in-memory. - Not covered by unit tests (pragma: no cover). .. py:function:: push_local_problem_definitions_to_hub(repo_id: str, path: Union[pathlib.Path, str]) -> None Upload local ProblemDefinitions to a Hugging Face dataset repository. This function uploads the entire local ``problem_definitions/`` directory located under ``path`` to the target Hugging Face dataset repository using ``HfApi.upload_folder``. Expected local layout: / problem_definitions/ ... Each problem definition is assumed to already be serialized on disk (e.g. via ``ProblemDefinition.save_to_file``). The function performs a directory-level upload and does not inspect, validate, or re-serialize individual problem definitions. :param repo_id: Hugging Face dataset repository identifier (e.g. ``"username/dataset_name"``). :type repo_id: str :param path: Root dataset directory containing the ``problem_definitions/`` folder. :type path: Union[Path, str] .. rubric:: Notes - Upload is atomic at the folder level. - Existing files in ``problem_definitions/`` on the Hub may be overwritten. - Uses ``repo_type="dataset"``. - Not covered by unit tests (``pragma: no cover``). :raises OSError / IOError: If the local folder does not exist or an upload error occurs. .. py:function:: push_local_metadata_to_hub(repo_id: str, path: Union[pathlib.Path, str]) -> None Upload locally stored dataset metadata to a Hugging Face dataset repository. This function uploads the structural metadata of a PLAID dataset from disk to a Hugging Face Hub *dataset* repository. The upload consists of: 1. The ``constants/`` directory, containing: - ``data.mmap`` files with concatenated constant values, - ``layout.json`` files describing byte offsets and shapes, - ``constant_schema.yaml`` files describing constant dtypes and dimensions, organized per dataset split. 2. ``variable_schema.yaml``, describing the schema of variable (sample-dependent) features. 3. ``cgns_types.yaml``, describing CGNS node types associated with dataset paths. All metadata files are assumed to have been previously generated on disk (e.g. via ``save_metadata_to_disk``). This function performs no validation, transformation, or serialization; it strictly uploads existing files. Expected local layout: / constants/ / data.mmap layout.json constant_schema.yaml variable_schema.yaml cgns_types.yaml :param repo_id: Hugging Face dataset repository identifier (e.g. ``"username/dataset_name"``). :type repo_id: str :param path: Root dataset directory containing the metadata files. :type path: Union[Path, str] .. rubric:: Notes - Uploads use ``repo_type="dataset"``. - Folder uploads may overwrite existing files on the Hub. - The operation is atomic per uploaded artifact (``constants/`` folder, individual YAML files). - Not covered by unit tests (``pragma: no cover``). :raises OSError / IOError: If required local files are missing or if an upload error occurs.