{ "cells": [ { "cell_type": "markdown", "id": "d451f1ca", "metadata": {}, "source": [ "# Problem Definition Examples\n", "\n", "This Jupyter Notebook demonstrates the usage of the ProblemDefinition class for defining machine learning problems using the PLAID library. It includes examples of:\n", "\n", "1. Initializing an empty ProblemDefinition\n", "2. Configuring problem characteristics and retrieve data\n", "3. Saving and loading problem definitions\n", "\n", "This notebook provides examples of using the ProblemDefinition class to define machine learning problems, configure characteristics, and save/load problem definitions.\n", "\n", "**Each section is documented and explained.**" ] }, { "cell_type": "code", "execution_count": null, "id": "dc4a1247", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "from pathlib import Path\n", "\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "962c905d", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries and functions\n", "from plaid import Dataset, Sample\n", "from plaid import ProblemDefinition\n", "from plaid.utils.split import split_dataset\n", "from plaid.types import FeatureIdentifier" ] }, { "cell_type": "markdown", "id": "40a3538b", "metadata": {}, "source": [ "## Section 1: Initializing an Empty ProblemDefinition\n", "\n", "This section demonstrates how to initialize a Problem Definition and add inputs / outputs." ] }, { "cell_type": "markdown", "id": "4c27d177", "metadata": {}, "source": [ "### Initialize and print ProblemDefinition" ] }, { "cell_type": "code", "execution_count": null, "id": "ffa825a0", "metadata": {}, "outputs": [], "source": [ "print(\"#---# Empty ProblemDefinition\")\n", "problem = ProblemDefinition()\n", "print(f\"{problem = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "7d5e3e67", "metadata": {}, "outputs": [], "source": [ "# ### Initialize some feature identifiers\n", "scalar_1_feat_id = FeatureIdentifier({\"type\":\"scalar\", \"name\":\"scalar_1\"})\n", "scalar_2_feat_id = FeatureIdentifier({\"type\":\"scalar\", \"name\":\"scalar_2\"})\n", "scalar_3_feat_id = FeatureIdentifier({\"type\":\"scalar\", \"name\":\"scalar_3\"})\n", "field_1_feat_id = FeatureIdentifier({\"type\":\"field\", \"name\":\"field_1\", \"base_name\":\"Base_2_2\"})\n", "field_2_feat_id = FeatureIdentifier({\"type\":\"field\", \"name\":\"field_2\", \"base_name\":\"Base_2_2\", \"location\":\"Vertex\"})" ] }, { "cell_type": "markdown", "id": "9b29e1b0", "metadata": {}, "source": [ "### Add inputs / outputs to a Problem Definition" ] }, { "cell_type": "code", "execution_count": null, "id": "ba79b181", "metadata": {}, "outputs": [], "source": [ "# Add unique input and output feature identifiers\n", "problem.add_in_feature_identifier(scalar_1_feat_id)\n", "problem.add_out_feature_identifier(scalar_2_feat_id)\n", "\n", "# Add list of input and output feature identifiers\n", "problem.add_in_features_identifiers([scalar_3_feat_id, field_1_feat_id])\n", "problem.add_out_features_identifiers([field_2_feat_id])\n", "\n", "print(f\"{problem.get_in_features_identifiers() = }\")\n", "print(\n", " f\"{problem.get_out_features_identifiers() = }\",\n", ")" ] }, { "cell_type": "markdown", "id": "08f1222a", "metadata": {}, "source": [ "## Section 2: Configuring Problem Characteristics and retrieve data\n", "\n", "This section demonstrates how to handle and configure ProblemDefinition objects and access data." ] }, { "cell_type": "markdown", "id": "471f1c26", "metadata": {}, "source": [ "### Set Problem Definition task" ] }, { "cell_type": "code", "execution_count": null, "id": "481588c3", "metadata": {}, "outputs": [], "source": [ "# Set the task type (e.g., regression)\n", "problem.set_task(\"regression\")\n", "print(f\"{problem.get_task() = }\")" ] }, { "cell_type": "markdown", "id": "338e73a1", "metadata": {}, "source": [ "### Set Problem Definition split" ] }, { "cell_type": "code", "execution_count": null, "id": "841ebf8e", "metadata": {}, "outputs": [], "source": [ "# Init an empty Dataset\n", "dataset = Dataset()\n", "print(f\"{dataset = }\")\n", "\n", "# Add Samples\n", "dataset.add_samples([Sample(), Sample(), Sample(), Sample()])\n", "print(f\"{dataset = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "e2210789", "metadata": {}, "outputs": [], "source": [ "# Set startegy options for the split\n", "options = {\n", " \"shuffle\": False,\n", " \"split_sizes\": {\n", " \"train\": 2,\n", " \"val\": 1,\n", " },\n", "}\n", "\n", "split = split_dataset(dataset, options)\n", "print(f\"{split = }\")" ] }, { "cell_type": "code", "execution_count": null, "id": "6001bcff", "metadata": {}, "outputs": [], "source": [ "problem.set_split(split)\n", "print(f\"{problem.get_split() = }\")" ] }, { "cell_type": "markdown", "id": "2f79a032", "metadata": {}, "source": [ "### Retrieves Problem Definition split indices" ] }, { "cell_type": "code", "execution_count": null, "id": "fbeea403", "metadata": {}, "outputs": [], "source": [ "# Get all split indices\n", "print(f\"{problem.get_all_indices() = }\")" ] }, { "cell_type": "markdown", "id": "b21cb478", "metadata": {}, "source": [ "### Filter Problem Definition inputs / outputs by feature identifiers" ] }, { "cell_type": "code", "execution_count": null, "id": "140c3ea9", "metadata": {}, "outputs": [], "source": [ "all_feature_ids = [scalar_1_feat_id, scalar_2_feat_id, scalar_3_feat_id, field_1_feat_id, field_2_feat_id]\n", "print(f\"{problem.filter_in_features_identifiers(all_feature_ids) = }\")\n", "print(f\"{problem.filter_out_features_identifiers(all_feature_ids) = }\")" ] }, { "cell_type": "markdown", "id": "25ddaf3d", "metadata": {}, "source": [ "## Section 3: Saving and Loading Problem Definitions\n", "\n", "This section demonstrates how to save and load a Problem Definition from a directory." ] }, { "cell_type": "markdown", "id": "7d2c9145", "metadata": {}, "source": [ "### Save a Problem Definition to a directory" ] }, { "cell_type": "code", "execution_count": null, "id": "b8dfbb6a", "metadata": {}, "outputs": [], "source": [ "test_pth = Path(f\"/tmp/test_safe_to_delete_{np.random.randint(low=1, high=2_000_000_000)}\")\n", "pb_def_save_fname = test_pth / \"test\"\n", "test_pth.mkdir(parents=True, exist_ok=True)\n", "print(f\"saving path: {pb_def_save_fname}\")\n", "\n", "problem.save_to_dir(pb_def_save_fname)" ] }, { "cell_type": "markdown", "id": "69f0b1d4", "metadata": {}, "source": [ "### Load a ProblemDefinition from a directory via initialization" ] }, { "cell_type": "code", "execution_count": null, "id": "9ce604a2", "metadata": {}, "outputs": [], "source": [ "problem = ProblemDefinition(pb_def_save_fname)\n", "print(problem)" ] }, { "cell_type": "markdown", "id": "b24a6141", "metadata": {}, "source": [ "### Load from a directory via the ProblemDefinition class" ] }, { "cell_type": "code", "execution_count": null, "id": "a63a2dcd", "metadata": {}, "outputs": [], "source": [ "problem = ProblemDefinition.load(pb_def_save_fname)\n", "print(problem)" ] }, { "cell_type": "markdown", "id": "2570f767", "metadata": {}, "source": [ "### Load from a directory via a Dataset instance" ] }, { "cell_type": "code", "execution_count": null, "id": "e731988b", "metadata": {}, "outputs": [], "source": [ "problem = ProblemDefinition()\n", "problem.load(pb_def_save_fname)\n", "print(problem)" ] } ], "metadata": { "jupytext": { "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }