Infos¶
This Jupyter Notebook demonstrates the usage of the Infos class for defining dataset metadata using the PLAID library. It includes examples of:
- Initializing Infos from structured fields
- Configuring metadata and retrieving data
- Saving and loading infos
This notebook provides examples of using the Infos class to define dataset metadata, access entries with typed attributes, and save/load infos.
Each section is documented and explained.
Section 1: Initializing Infos¶
This section demonstrates how to initialize Infos with the current API.
Initialize and print Infos¶
print("#---# Infos")
infos = Infos(
owner="PLAID",
license="MIT",
data_production={
"type": "simulation",
"physics": "fluid dynamics",
"simulator": "ExampleSolver",
},
data_description="ExampleDescription",
)
print(f"{infos = }")
#---# Infos
infos = Infos(owner='PLAID', license='MIT', data_production=DataProduction(type='simulation', physics='fluid dynamics', simulator='ExampleSolver', hardware=None, computation_duration=None, script=None, contact=None), data_description='ExampleDescription', num_samples={}, storage_backend=None)
Print available Infos fields¶
Infos fields:
- owner
- license
- data_production
subfields:
- type
- physics
- simulator
- hardware
- computation_duration
- script
- contact
- data_description
- num_samples
note: automatically filled when calling save_to_disk
- storage_backend
note: automatically filled when calling save_to_disk
Section 2: Modifying Infos and retrieve data¶
This section demonstrates how to handle Infos objects and access metadata.
Set data description¶
infos.data_description = "Example dataset generated for the Infos example."
print(f"{infos.data_description = }")
print(f"{infos.num_samples = }") # Populated by save_to_disk for saved datasets.
print(f"{infos.storage_backend = }") # Populated by save_to_disk for saved datasets.
infos.data_description = 'Example dataset generated for the Infos example.'
infos.num_samples = {}
infos.storage_backend = None
Retrieve data with Pydantic attributes¶
print(f"{infos.owner = }")
print(f"{infos.license = }")
print(f"{infos.storage_backend = }")
print(f"{infos.model_dump(exclude_none=True) = }")
infos.owner = 'PLAID'
infos.license = 'MIT'
infos.storage_backend = None
infos.model_dump(exclude_none=True) = {'owner': 'PLAID', 'license': 'MIT', 'data_production': {'type': 'simulation', 'physics': 'fluid dynamics', 'simulator': 'ExampleSolver'}, 'data_description': 'Example dataset generated for the Infos example.', 'num_samples': {}}
Section 3: Saving and Loading Infos¶
This section demonstrates how to save and load Infos from a YAML file.
Save Infos to a YAML file¶
test_pth = Path(
f"/tmp/test_safe_to_delete_{np.random.randint(low=1, high=2_000_000_000)}"
)
infos_save_fname = test_pth / "infos.yaml"
test_pth.mkdir(parents=True, exist_ok=True)
print(f"saving path: {infos_save_fname}")
infos.num_samples = {"train": 0}
infos.storage_backend = "zarr"
infos.save_to_file(infos_save_fname)
saving path: /tmp/test_safe_to_delete_102332384/infos.yaml
Load Infos from a YAML file¶
owner='PLAID' license='MIT' data_production=DataProduction(type='simulation', physics='fluid dynamics', simulator='ExampleSolver', hardware=None, computation_duration=None, script=None, contact=None) data_description='Example dataset generated for the Infos example.' num_samples={'train': 0} storage_backend='zarr'
Load Infos from an explicit infos.yaml path¶
loaded_infos_from_explicit_path = Infos.from_path(test_pth / "infos.yaml")
print(loaded_infos_from_explicit_path)
owner='PLAID' license='MIT' data_production=DataProduction(type='simulation', physics='fluid dynamics', simulator='ExampleSolver', hardware=None, computation_duration=None, script=None, contact=None) data_description='Example dataset generated for the Infos example.' num_samples={'train': 0} storage_backend='zarr'