{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a94ebf49",
   "metadata": {},
   "source": [
    "# Initializing a Dataset with Tabular Data\n",
    "\n",
    "1. Initializing a Dataset with Tabular Data:\n",
    "- Generate random tabular data for multiple scalars.\n",
    "- Initialize a dataset with the tabular data.\n",
    "\n",
    "2. Accessing and Manipulating Data in the Dataset:\n",
    "- Retrieve and print the dataset and specific samples.\n",
    "- Access and display the value of a particular scalar within a sample.\n",
    "- Retrieve tabular data from the dataset based on scalar names.\n",
    "\n",
    "This example demonstrates how to initialize a dataset with tabular data, access specific samples, retrieve scalar values, and extract tabular data based on scalar names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "902b295d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49593cf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries and functions\n",
    "from plaid.utils.init_with_tabular import initialize_dataset_with_tabular_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b595c81a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print dict util\n",
    "def dprint(name: str, dictio: dict):\n",
    "    print(name, \"{\")\n",
    "    for key, value in dictio.items():\n",
    "        print(\"    \", key, \":\", value)\n",
    "\n",
    "    print(\"}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0318269d",
   "metadata": {},
   "source": [
    "## Section 1: Initializing a Dataset with Tabular Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "570a380e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate random tabular data for multiple scalars\n",
    "nb_scalars = 7\n",
    "nb_samples = 10\n",
    "names = [f\"scalar_{j}\" for j in range(nb_scalars)]\n",
    "\n",
    "tabular_data = {}\n",
    "for name in names:\n",
    "    tabular_data[name] = np.random.randn(nb_samples)\n",
    "\n",
    "dprint(\"tabular_data\", tabular_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48f2add5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize a dataset with the tabular data\n",
    "dataset = initialize_dataset_with_tabular_data(tabular_data)\n",
    "print(\"Initialized Dataset: \", dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfe4ca5f",
   "metadata": {},
   "source": [
    "## Section 2: Accessing and Manipulating Data in the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b66aaf5a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve and print the dataset and specific samples\n",
    "sample_1 = dataset[1]\n",
    "print(f\"{sample_1 = }\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1bfbf33",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Access and display the value of a particular scalar within a sample\n",
    "scalar_value = sample_1.get_scalar(\"scalar_0\")\n",
    "print(\"Scalar 'scalar_0' in Sample 1:\", scalar_value)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "382f0cf9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve tabular data from the dataset based on scalar names\n",
    "scalar_names = [\"scalar_1\", \"scalar_3\", \"scalar_5\"]\n",
    "tabular_data_subset = dataset.get_scalars_to_tabular(scalar_names)\n",
    "print(\"Tabular Data Subset for Scalars 1, 3, and 5:\")\n",
    "dprint(\"tabular_data_subset\", tabular_data_subset)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,py:percent"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}