plaid.storage.cgns.writer

CGNS dataset writer module.

This module provides functionality for writing datasets in CGNS format for the PLAID library. It includes utilities for generating datasets from sample generators, saving to disk, uploading to Hugging Face Hub, and configuring dataset cards.

Functions

generate_datasetdict_to_disk(→ None)

Generates and saves a dataset to disk in CGNS format.

push_local_datasetdict_to_hub(→ None)

Pushes a local dataset directory to Hugging Face Hub.

configure_dataset_card(→ None)

Configures and pushes a dataset card to Hugging Face Hub for a CGNS backend dataset.

Module Contents

generate_datasetdict_to_disk(output_folder: str | pathlib.Path, generators: dict[str, Callable[Ellipsis, Generator[plaid.Sample, None, None]]], variable_schema: dict[str, dict] | None = None, gen_kwargs: dict[str, dict[str, list[plaid.types.IndexType]]] | None = None, num_proc: int = 1, verbose: bool = False) None[source]

Generates and saves a dataset to disk in CGNS format.

Parameters:
  • output_folder – Base directory to save the dataset.

  • generators – Dict of split generators.

  • variable_schema – Unused variable schema.

  • gen_kwargs – Optional generator kwargs for parallel processing.

  • num_proc – Number of processes.

  • verbose – Whether to show progress.

push_local_datasetdict_to_hub(repo_id: str, local_dir: str | pathlib.Path, num_workers: int = 1) None[source]

Pushes a local dataset directory to Hugging Face Hub.

Parameters:
  • repo_id – The repository ID.

  • local_dir – Local directory path.

  • num_workers – Number of upload workers.

configure_dataset_card(repo_id: str, infos: dict[str, dict[str, str]], local_dir: str | pathlib.Path, viewer: bool | None = None, pretty_name: str | None = None, dataset_long_description: str | None = None, illustration_urls: list[str] | None = None, arxiv_paper_urls: list[str] | None = None) None[source]

Configures and pushes a dataset card to Hugging Face Hub for a CGNS backend dataset.

This function generates a dataset card in YAML format with metadata, features, splits information, and usage examples. It automatically detects splits and sample counts from the local directory structure, then pushes the card to the specified Hugging Face repository.

Parameters:
  • repo_id (str) – The Hugging Face repository ID where the dataset card will be pushed.

  • infos (dict[str, dict[str, str]]) – Dictionary containing dataset metadata, including legal information like license.

  • local_dir (Union[str, Path]) – Path to the local directory containing the dataset files, expected to have a ‘data’ subdirectory with split folders.

  • variable_schema (Optional[dict]) – Schema describing the variables/features in the dataset, used to generate the features section in the card.

  • viewer (Optional[bool]) – Unused parameter for viewer configuration.

  • pretty_name (Optional[str]) – A human-readable name for the dataset to display in the card.

  • dataset_long_description (Optional[str]) – A detailed description of the dataset to include in the card.

  • illustration_urls (Optional[list[str]]) – List of URLs to images that illustrate the dataset, displayed in the card.

  • arxiv_paper_urls (Optional[list[str]]) – List of arXiv URLs for papers related to the dataset, included as sources.

Returns:

This function does not return a value; it pushes the dataset card

directly to Hugging Face Hub.

Return type:

None