{ "cells": [ { "cell_type": "markdown", "id": "a94ebf49", "metadata": {}, "source": [ "# Initializing a Dataset with Tabular Data\n", "\n", "1. Initializing a Dataset with Tabular Data:\n", "- Generate random tabular data for multiple scalars.\n", "- Initialize a dataset with the tabular data.\n", "\n", "2. Accessing and Manipulating Data in the Dataset:\n", "- Retrieve and print the dataset and specific samples.\n", "- Access and display the value of a particular scalar within a sample.\n", "- Retrieve tabular data from the dataset based on scalar names.\n", "\n", "This example demonstrates how to initialize a dataset with tabular data, access specific samples, retrieve scalar values, and extract tabular data based on scalar names." ] }, { "cell_type": "code", "execution_count": null, "id": "902b295d", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "49593cf3", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries and functions\n", "from plaid.utils.init_with_tabular import initialize_dataset_with_tabular_data" ] }, { "cell_type": "code", "execution_count": null, "id": "b595c81a", "metadata": {}, "outputs": [], "source": [ "# Print dict util\n", "def dprint(name: str, dictio: dict):\n", " print(name, \"{\")\n", " for key, value in dictio.items():\n", " print(\" \", key, \":\", value)\n", "\n", " print(\"}\")" ] }, { "cell_type": "markdown", "id": "0318269d", "metadata": {}, "source": [ "## Section 1: Initializing a Dataset with Tabular Data" ] }, { "cell_type": "code", "execution_count": null, "id": "570a380e", "metadata": {}, "outputs": [], "source": [ "# Generate random tabular data for multiple scalars\n", "nb_scalars = 7\n", "nb_samples = 10\n", "names = [f\"scalar_{j}\" for j in range(nb_scalars)]\n", "\n", "tabular_data = {}\n", "for name in names:\n", " tabular_data[name] = np.random.randn(nb_samples)\n", "\n", "dprint(\"tabular_data\", tabular_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "48f2add5", "metadata": {}, "outputs": [], "source": [ "# Initialize a dataset with the tabular data\n", "dataset = initialize_dataset_with_tabular_data(tabular_data)\n", "print(\"Initialized Dataset: \", dataset)" ] }, { "cell_type": "markdown", "id": "dfe4ca5f", "metadata": {}, "source": [ "## Section 2: Accessing and Manipulating Data in the Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "b66aaf5a", "metadata": {}, "outputs": [], "source": [ "# Retrieve and print the dataset and specific samples\n", "sample_1 = dataset[1]\n", "print(f\"{sample_1 = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b1bfbf33", "metadata": {}, "outputs": [], "source": [ "# Access and display the value of a particular scalar within a sample\n", "scalar_value = sample_1.get_scalar(\"scalar_0\")\n", "print(\"Scalar 'scalar_0' in Sample 1:\", scalar_value)" ] }, { "cell_type": "code", "execution_count": null, "id": "382f0cf9", "metadata": {}, "outputs": [], "source": [ "# Retrieve tabular data from the dataset based on scalar names\n", "scalar_names = [\"scalar_1\", \"scalar_3\", \"scalar_5\"]\n", "tabular_data_subset = dataset.get_scalars_to_tabular(scalar_names)\n", "print(\"Tabular Data Subset for Scalars 1, 3, and 5:\")\n", "dprint(\"tabular_data_subset\", tabular_data_subset)" ] } ], "metadata": { "jupytext": { "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }