Skip to content

plaid.storage.zarr.bridge

plaid.storage.zarr.bridge

Zarr bridge utilities.

This module provides utility functions for bridging between PLAID samples and Zarr storage format. It includes functions for key transformation and sample data conversion.

plaid.storage.zarr.bridge.to_var_sample_dict

to_var_sample_dict(
    zarr_dataset, idx, features, indexers=None
)

Extracts a sample dictionary from a Zarr dataset by index.

Parameters:

  • zarr_dataset (Group) –

    The Zarr group containing the dataset.

  • idx (int) –

    The sample index to extract.

  • features (Optional[list[str]]) –

    Iterable of feature names (keys) to extract from the dataset.

  • indexers (Optional[dict[str, Any]], default: None ) –

    Optional mapping feature_path -> indexer used to select feature values along the last axis.

Returns:

  • dict[str, Any]

    dict[str, Any]: Dictionary of variable features for the sample.

Source code in plaid/storage/zarr/bridge.py
def to_var_sample_dict(
    zarr_dataset: Any,
    idx: int,
    features: Optional[list[str]],
    indexers: Optional[dict[str, Any]] = None,
) -> dict[str, Any]:
    """Extracts a sample dictionary from a Zarr dataset by index.

    Args:
        zarr_dataset (zarr.Group): The Zarr group containing the dataset.
        idx (int): The sample index to extract.
        features: Iterable of feature names (keys) to extract from the dataset.
        indexers: Optional mapping ``feature_path -> indexer`` used to select
            feature values along the last axis.

    Returns:
        dict[str, Any]: Dictionary of variable features for the sample.
    """
    zarr_sample = zarr_dataset.zarr_group[f"sample_{idx:09d}"]

    if features is None:
        features = [unflatten_path(p) for p in zarr_sample.array_keys()]

    flattened = {feat: flatten_path(feat) for feat in features}
    # missing = set(flattened.values()) - set(zarr_sample.array_keys())
    # if missing:  # pragma: no cover
    #     raise KeyError(f"Missing features in sample {idx}: {sorted(missing)}")

    indexers = indexers or {}
    out = {}
    for feat, flat_feat in flattened.items():
        if flat_feat not in zarr_sample.array_keys():
            continue

        arr = zarr_sample[flat_feat]
        if feat in indexers:
            out[feat] = _apply_indexer(arr, indexers[feat], feat)
        else:
            out[feat] = np.asarray(arr)

    return out

plaid.storage.zarr.bridge.sample_to_var_sample_dict

sample_to_var_sample_dict(zarr_sample)

Converts a Zarr sample to a variable sample dictionary.

Parameters:

  • zarr_sample (dict[str, Any]) –

    The raw Zarr sample data.

Returns:

  • dict[str, Any]

    dict[str, Any]: The processed variable sample dictionary.

Source code in plaid/storage/zarr/bridge.py
def sample_to_var_sample_dict(zarr_sample: dict[str, Any]) -> dict[str, Any]:
    """Converts a Zarr sample to a variable sample dictionary.

    Args:
        zarr_sample (dict[str, Any]): The raw Zarr sample data.

    Returns:
        dict[str, Any]: The processed variable sample dictionary.
    """
    return zarr_sample