{ "cells": [ { "cell_type": "markdown", "id": "5f1d134f", "metadata": {}, "source": [ "# Dataset Examples\n", "\n", "This Jupyter Notebook demonstrates various use cases for the Dataset class, including:\n", "\n", "1. Initializing an Empty Dataset and Adding Samples\n", "2. Retrieving and Manipulating Samples from a Dataset\n", "3. Performing Operations on the Dataset\n", "4. Saving and Loading Datasets from directories or files\n", "\n", "This notebook provides detailed examples of using the Dataset class to manage data, Samples, and information within a PLAID Dataset. It is intended for documentation purposes and familiarization with the PLAID library.\n", "\n", "**Each section is documented and explained.**" ] }, { "cell_type": "code", "execution_count": null, "id": "8ac54630", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "from pathlib import Path\n", "import platform\n", "\n", "import numpy as np\n", "import copy" ] }, { "cell_type": "code", "execution_count": null, "id": "dd96f0f9", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries and functions\n", "import Muscat.MeshContainers.ElementsDescription as ElementsDescription\n", "from Muscat.Bridges.CGNSBridge import MeshToCGNS\n", "from Muscat.MeshTools import MeshCreationTools as MCT\n", "\n", "import plaid\n", "from plaid import Dataset\n", "from plaid import Sample" ] }, { "cell_type": "code", "execution_count": null, "id": "1c9902a0", "metadata": {}, "outputs": [], "source": [ "# Print dict util\n", "def dprint(name: str, dictio: dict, end: str = \"\\n\"):\n", " print(name, \"{\")\n", " for key, value in dictio.items():\n", " print(\" \", key, \":\", value)\n", "\n", " print(\"}\", end=end)" ] }, { "cell_type": "markdown", "id": "70e75212", "metadata": {}, "source": [ "## Section 1: Initializing an Empty Dataset and Samples construction\n", "\n", "This section demonstrates how to initialize an empty Dataset and handle Samples." ] }, { "cell_type": "markdown", "id": "09fc2800", "metadata": {}, "source": [ "### Initialize an empty Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "bf2a58ee", "metadata": {}, "outputs": [], "source": [ "print(\"#---# Empty Dataset\")\n", "dataset = Dataset()\n", "print(f\"{dataset=}\")" ] }, { "cell_type": "markdown", "id": "88e28693", "metadata": {}, "source": [ "### Create Sample" ] }, { "cell_type": "code", "execution_count": null, "id": "2ce7d870", "metadata": {}, "outputs": [], "source": [ "# Create Sample\n", "points = np.array(\n", " [\n", " [0.0, 0.0],\n", " [1.0, 0.0],\n", " [1.0, 1.0],\n", " [0.0, 1.0],\n", " [0.5, 1.5],\n", " ]\n", ")\n", "\n", "triangles = np.array(\n", " [\n", " [0, 1, 2],\n", " [0, 2, 3],\n", " [2, 4, 3],\n", " ]\n", ")\n", "\n", "bars = np.array([[0, 1], [0, 2]])\n", "\n", "Mesh = MCT.CreateMeshOfTriangles(points, triangles)\n", "elbars = Mesh.GetElementsOfType(ElementsDescription.Bar_2)\n", "elbars.AddNewElements(bars, [1, 2])\n", "cgns_mesh = MeshToCGNS(Mesh)\n", "\n", "# Initialize an empty Sample\n", "print(\"#---# Empty Sample\")\n", "sample_01 = Sample()\n", "print(f\"{sample_01 = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "0d7ef051", "metadata": {}, "outputs": [], "source": [ "# Add a CGNS tree structure to the Sample\n", "sample_01.features.add_tree(copy.deepcopy(cgns_mesh))\n", "print(f\"{sample_01 = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "9f019091", "metadata": {}, "outputs": [], "source": [ "# Add a scalar to the Sample\n", "sample_01.add_scalar(\"rotation\", np.random.randn())\n", "print(f\"{sample_01 = }\")" ] }, { "cell_type": "markdown", "id": "f723972f", "metadata": {}, "source": [ "### Print Sample general data" ] }, { "cell_type": "code", "execution_count": null, "id": "c2bd75c1", "metadata": {}, "outputs": [], "source": [ "# Initialize another empty Sample\n", "print(\"#---# Empty Sample\")\n", "sample_02 = Sample()\n", "print(f\"{sample_02 = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "dbb2d1e3", "metadata": {}, "outputs": [], "source": [ "# Add a scalar to the second Sample\n", "sample_02.add_scalar(\"rotation\", np.random.randn())\n", "print(f\"{sample_02 = }\")" ] }, { "cell_type": "markdown", "id": "b7b004c4", "metadata": {}, "source": [ "### Display Sample CGNS tree" ] }, { "cell_type": "code", "execution_count": null, "id": "b7d74731", "metadata": {}, "outputs": [], "source": [ "# Initialize a third empty Sample\n", "print(\"#---# Empty Sample\")\n", "sample_03 = Sample()\n", "sample_03.add_scalar(\"speed\", np.random.randn())\n", "sample_03.add_scalar(\"rotation\", sample_01.get_scalar(\"rotation\"))\n", "sample_03.features.add_tree(copy.deepcopy(cgns_mesh))\n", "\n", "# Show Sample CGNS content\n", "sample_03.show_tree()" ] }, { "cell_type": "code", "execution_count": null, "id": "d30c36e2", "metadata": {}, "outputs": [], "source": [ "# Add a field to the third empty Sample\n", "sample_03.add_field(\"temperature\", np.random.rand(5), zone_name=\"Zone\", base_name=\"Base_2_2\")\n", "sample_03.show_tree()" ] }, { "cell_type": "markdown", "id": "edd8e98a", "metadata": {}, "source": [ "### Get Sample data" ] }, { "cell_type": "code", "execution_count": null, "id": "d714fe01", "metadata": {}, "outputs": [], "source": [ "# Print sample general data\n", "print(f\"{sample_03 = }\", end=\"\\n\\n\")\n", "\n", "# Print sample scalar data\n", "print(f\"{sample_03.get_scalar_names() = }\")\n", "print(f\"{sample_03.get_scalar('speed') = }\")\n", "print(f\"{sample_03.get_scalar('rotation') = }\", end=\"\\n\\n\")\n", "\n", "# Print sample scalar data\n", "print(f\"{sample_03.get_field_names() = }\")\n", "print(f\"{sample_03.get_field('temperature') = }\")" ] }, { "cell_type": "markdown", "id": "a65f5242", "metadata": {}, "source": [ "## Section 2: Performing Operations on the Dataset\n", "\n", "This section demonstrates how to add Samples to the Dataset, add information, and access data." ] }, { "cell_type": "markdown", "id": "af7f1ac0", "metadata": {}, "source": [ "### Add Samples in the Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "a745d9dc", "metadata": {}, "outputs": [], "source": [ "# Add Samples by id in the Dataset\n", "dataset.set_sample(id=0, sample=sample_01)\n", "dataset.set_sample(1, sample_02)\n", "\n", "# Add unique Sample and automatically create its id\n", "added_sample_id = dataset.add_sample(sample_03)\n", "print(f\"{added_sample_id = }\")" ] }, { "cell_type": "markdown", "id": "ae2842be", "metadata": {}, "source": [ "### Iterate through Samples in the Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "2099592e", "metadata": {}, "outputs": [], "source": [ "for sample in dataset:\n", " print(sample)" ] }, { "cell_type": "markdown", "id": "53661ac4", "metadata": {}, "source": [ "### Add and display information to the Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "809de07a", "metadata": {}, "outputs": [], "source": [ "# Add node information to the Dataset\n", "dataset.add_info(\"legal\", \"owner\", \"Safran\")\n", "\n", "# Retrive dataset information\n", "import json\n", "\n", "dataset_info = dataset.get_infos()\n", "print(\"dataset info =\", json.dumps(dataset_info, sort_keys=False, indent=4), end=\"\\n\\n\")\n", "\n", "# Overwrite information (logger will display warnings)\n", "infos = {\"legal\": {\"owner\": \"Safran\", \"license\": \"CC0\"}}\n", "dataset.set_infos(infos)\n", "\n", "# Retrive dataset information\n", "dataset_info = dataset.get_infos()\n", "print(\"dataset info =\", json.dumps(dataset_info, sort_keys=False, indent=4), end=\"\\n\\n\")\n", "\n", "# Add tree information to the Dataset (logger will display warnings)\n", "dataset.add_infos(\"data_description\", {\"number_of_samples\": 0, \"number_of_splits\": 0})\n", "\n", "# Pretty print dataset information\n", "dataset.print_infos()" ] }, { "cell_type": "markdown", "id": "88d909a4", "metadata": {}, "source": [ "### Get a list of specific Samples in a Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "691044dc", "metadata": {}, "outputs": [], "source": [ "get_samples_from_ids = dataset.get_samples(ids=[0, 1])\n", "dprint(\"get samples from ids =\", get_samples_from_ids)" ] }, { "cell_type": "markdown", "id": "fbc28116", "metadata": {}, "source": [ "### Get the list of Sample ids in a Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "040d28e0", "metadata": {}, "outputs": [], "source": [ "# Print sample IDs\n", "print(\"get_sample_ids =\", dataset.get_sample_ids())" ] }, { "cell_type": "markdown", "id": "9904745f", "metadata": {}, "source": [ "### Print Dataset general data" ] }, { "cell_type": "code", "execution_count": null, "id": "9bbb2f74", "metadata": {}, "outputs": [], "source": [ "# Print the Dataset\n", "print(f\"{dataset = }\")\n", "print(\"length of dataset =\", len(dataset))" ] }, { "cell_type": "code", "execution_count": null, "id": "e78ba31d", "metadata": {}, "outputs": [], "source": [ "# Create a new dataset with the sample list and ids\n", "# (ids can be provided optionally)\n", "new_dataset = Dataset(samples=[sample_01, sample_02, sample_03], sample_ids=[3, 5, 7])\n", "print(f\"{new_dataset = }\")\n", "print(\"new dataset sample ids =\", new_dataset.get_sample_ids())" ] }, { "cell_type": "markdown", "id": "5912d80d", "metadata": {}, "source": [ "### Add a list of Sample to a Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "f8397653", "metadata": {}, "outputs": [], "source": [ "# Create a new Dataset and add multiple samples\n", "dataset = Dataset()\n", "samples = [sample_01, sample_02, sample_03]\n", "added_ids = dataset.add_samples(samples)\n", "print(f\"{added_ids = }\")\n", "print(f\"{dataset = }\")" ] }, { "cell_type": "markdown", "id": "534bec2a", "metadata": {}, "source": [ "### Access to Samples data through Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "829a9e09", "metadata": {}, "outputs": [], "source": [ "# Access Sample data with indexes through the Dataset\n", "print(f\"{dataset(0) = }\") # call strategy\n", "print(f\"{dataset[1] = }\") # getitem strategy\n", "print(f\"{dataset[2] = }\", end=\"\\n\\n\")\n", "\n", "print(\"scalar of the first sample = \", dataset[0].get_scalar_names())\n", "print(\"scalar of the second sample = \", dataset[1].get_scalar_names())\n", "print(\"scalar of the third sample = \", dataset[2].get_scalar_names())" ] }, { "cell_type": "code", "execution_count": null, "id": "420006c8", "metadata": {}, "outputs": [], "source": [ "# Access dataset information\n", "print(f\"{dataset[0].get_scalar('rotation') = }\")\n", "print(f\"{dataset[1].get_scalar('rotation') = }\")\n", "print(f\"{dataset[2].get_scalar('rotation') = }\")" ] }, { "cell_type": "markdown", "id": "040346a4", "metadata": {}, "source": [ "### Get Dataset scalars to tabular" ] }, { "cell_type": "code", "execution_count": null, "id": "d096d8bd", "metadata": {}, "outputs": [], "source": [ "# Print scalars in tabular format\n", "print(f\"{dataset.get_scalar_names() = }\", end=\"\\n\\n\")\n", "\n", "dprint(\"get rotation scalar = \", dataset.get_scalars_to_tabular([\"rotation\"]))\n", "dprint(\"get speed scalar = \", dataset.get_scalars_to_tabular([\"speed\"]), end=\"\\n\\n\")\n", "\n", "# Get specific scalars in tabular format\n", "dprint(\"get specific scalars =\", dataset.get_scalars_to_tabular([\"speed\", \"rotation\"]))\n", "dprint(\"get all scalars =\", dataset.get_scalars_to_tabular())" ] }, { "cell_type": "code", "execution_count": null, "id": "128076ae", "metadata": {}, "outputs": [], "source": [ "# Get specific scalars np.array\n", "print(\"get all scalar arrays = \", dataset.get_scalars_to_tabular(as_nparray=True))" ] }, { "cell_type": "markdown", "id": "c26d676b", "metadata": {}, "source": [ "### Get Dataset fields" ] }, { "cell_type": "code", "execution_count": null, "id": "8f69ff44", "metadata": {}, "outputs": [], "source": [ "# Print fields in the Dataset\n", "print(\"fields in the dataset = \", dataset.get_field_names())" ] }, { "cell_type": "markdown", "id": "bc7ffab8", "metadata": {}, "source": [ "## Section 3: Various operations on the Dataset\n", "\n", "This section demonstrates operations like merging datasets, adding tabular scalars, and setting information." ] }, { "cell_type": "markdown", "id": "dd8ef570", "metadata": {}, "source": [ "### Initialize a Dataset with a list of Samples" ] }, { "cell_type": "code", "execution_count": null, "id": "5fc47920", "metadata": {}, "outputs": [], "source": [ "# Create another Dataset\n", "other_dataset = Dataset()\n", "nb_samples = 3\n", "samples = []\n", "for _ in range(nb_samples):\n", " sample = Sample()\n", " sample.add_scalar(\"rotation\", np.random.rand() + 1.0)\n", " sample.add_scalar(\"random_name\", np.random.rand() - 1.0)\n", " samples.append(sample)\n", "\n", "# Add a list of Samples\n", "other_dataset.add_samples(samples)\n", "print(f\"{other_dataset = }\")" ] }, { "cell_type": "markdown", "id": "8672c679", "metadata": {}, "source": [ "### Merge two Datasets" ] }, { "cell_type": "code", "execution_count": null, "id": "9adc6305", "metadata": {}, "outputs": [], "source": [ "# Merge the other dataset with the main dataset\n", "print(f\"before merge: {dataset = }\")\n", "dataset.merge_dataset(other_dataset)\n", "print(f\"after merge: {dataset = }\", end=\"\\n\\n\")\n", "\n", "dprint(\"dataset scalars = \", dataset.get_scalars_to_tabular())" ] }, { "cell_type": "markdown", "id": "dddf2602", "metadata": {}, "source": [ "### Add tabular scalars to a Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "226365f1", "metadata": {}, "outputs": [], "source": [ "# Adding tabular scalars to the dataset\n", "new_scalars = np.random.rand(3, 2)\n", "dataset.add_tabular_scalars(new_scalars, names=[\"Tu\", \"random_name\"])\n", "\n", "print(f\"{dataset = }\")\n", "dprint(\"dataset scalars =\", dataset.get_scalars_to_tabular())" ] }, { "cell_type": "markdown", "id": "e8a9cbe4", "metadata": {}, "source": [ "### Set additional information to a dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "72367a88", "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "infos = {\n", " \"legal\": {\"owner\": \"Safran\", \"license\": \"CC0\"},\n", " \"data_production\": {\"type\": \"simulation\", \"simulator\": \"dummy\"},\n", "}\n", "dataset.set_infos(infos)\n", "dataset.print_infos()" ] }, { "cell_type": "markdown", "id": "611ec28a", "metadata": {}, "source": [ "## Section 4: Saving and Loading Dataset\n", "\n", "This section demonstrates how to save and load a Dataset from a directory or file." ] }, { "cell_type": "markdown", "id": "996b7fbf", "metadata": {}, "source": [ "### Save a Dataset as a file tree" ] }, { "cell_type": "code", "execution_count": null, "id": "12aa5436", "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "tmpdir = f\"/tmp/test_safe_to_delete_{np.random.randint(low=1, high=2_000_000_000)}\"\n", "print(f\"Save dataset in: {tmpdir}\")\n", "\n", "dataset.save_to_dir(tmpdir)" ] }, { "cell_type": "markdown", "id": "8e83717e", "metadata": {}, "source": [ "### Get the number of Samples that can be loaded from a directory" ] }, { "cell_type": "code", "execution_count": null, "id": "8b192468", "metadata": {}, "outputs": [], "source": [ "nb_samples = plaid.get_number_of_samples(tmpdir)\n", "print(f\"{nb_samples = }\")" ] }, { "cell_type": "markdown", "id": "aa90917b", "metadata": {}, "source": [ "### Load a Dataset from a directory via initialization" ] }, { "cell_type": "code", "execution_count": null, "id": "8cd05b03", "metadata": {}, "outputs": [], "source": [ "loaded_dataset_from_init = Dataset(tmpdir)\n", "print(f\"{loaded_dataset_from_init = }\")\n", "\n", "if platform.system() == \"Linux\":\n", " multi_process_loaded_dataset = Dataset(tmpdir, processes_number=3)\n", " print(f\"{multi_process_loaded_dataset = }\")" ] }, { "cell_type": "markdown", "id": "0c30ce8e", "metadata": {}, "source": [ "### Load a Dataset from a directory via the Dataset class" ] }, { "cell_type": "code", "execution_count": null, "id": "970d0eec", "metadata": {}, "outputs": [], "source": [ "loaded_dataset_from_class = Dataset.load_from_dir(tmpdir)\n", "print(f\"{loaded_dataset_from_class = }\")\n", "\n", "if platform.system() == \"Linux\":\n", " multi_process_loaded_dataset = Dataset.load_from_dir(tmpdir, processes_number=3)\n", " print(f\"{multi_process_loaded_dataset = }\")" ] }, { "cell_type": "markdown", "id": "658e6ddc", "metadata": {}, "source": [ "### Load the dataset from a directory via a Dataset instance" ] }, { "cell_type": "code", "execution_count": null, "id": "fcc8690d", "metadata": {}, "outputs": [], "source": [ "loaded_dataset_from_instance = Dataset()\n", "loaded_dataset_from_instance.load(tmpdir)\n", "\n", "print(f\"{loaded_dataset_from_instance = }\")\n", "\n", "if platform.system() == \"Linux\":\n", " multi_process_loaded_dataset = Dataset()\n", " multi_process_loaded_dataset.load(tmpdir, processes_number=3)\n", " print(f\"{multi_process_loaded_dataset = }\")" ] }, { "cell_type": "markdown", "id": "19452ae7", "metadata": {}, "source": [ "### Save the dataset to a TAR (Tape Archive) file" ] }, { "cell_type": "code", "execution_count": null, "id": "295005ab", "metadata": {}, "outputs": [], "source": [ "tmpdir = Path(f\"/tmp/test_safe_to_delete_{np.random.randint(low=1, high=2_000_000_000)}\")\n", "tmpfile = tmpdir / \"test_file.plaid\"\n", "\n", "print(f\"Save dataset in: {tmpfile}\")\n", "dataset.save(tmpfile)" ] }, { "cell_type": "markdown", "id": "c9ded521", "metadata": {}, "source": [ "### Load the dataset from a TAR (Tape Archive) file via Dataset instance" ] }, { "cell_type": "code", "execution_count": null, "id": "fb5e2651", "metadata": {}, "outputs": [], "source": [ "new_dataset = Dataset()\n", "new_dataset.load(tmpfile)\n", "\n", "print(f\"{dataset = }\")\n", "print(f\"{new_dataset = }\")" ] }, { "cell_type": "markdown", "id": "b62d20c9", "metadata": {}, "source": [ "### Load the dataset from a TAR (Tape Archive) file via initialization" ] }, { "cell_type": "code", "execution_count": null, "id": "085244d4", "metadata": {}, "outputs": [], "source": [ "new_dataset = Dataset(tmpfile)\n", "\n", "print(f\"{dataset = }\")\n", "print(f\"{new_dataset = }\")" ] } ], "metadata": { "jupytext": { "custom_cell_magics": "kql", "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }