{ "cells": [ { "cell_type": "markdown", "id": "6660f0f8", "metadata": {}, "source": [ "# Statistics Calculation Examples\n", "\n", "1. OnlineStatistics Class:\n", "- Initialize an OnlineStatistics object.\n", "- Calculate statistics for an empty dataset.\n", "- Add the first batch of samples and update statistics.\n", "- Add the second batch of samples and update statistics.\n", "- Combine and recompute statistics for all samples.\n", "\n", "2. Stats Class:\n", "- Initialize a Stats object to collect statistics.\n", "- Create and add samples with scalar and field data.\n", "- Retrieve and display the calculated statistics.\n", "- Add more samples with varying field sizes and update statistics.\n", "- Retrieve and display the updated statistics.\n", "\n", "This notebook provides examples of using the OnlineStatistics and Stats classes to compute statistics from sample data, including scalars and fields. It demonstrates the functionality and usage of these classes.\n", "\n", "**Each section is documented and explained.**" ] }, { "cell_type": "code", "execution_count": null, "id": "7b055122", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import numpy as np\n", "import rich" ] }, { "cell_type": "code", "execution_count": null, "id": "112f5851", "metadata": {}, "outputs": [], "source": [ "# Import necessary libraries and functions\n", "from plaid import Sample\n", "from plaid.utils.stats import OnlineStatistics, Stats" ] }, { "cell_type": "code", "execution_count": null, "id": "bbae2766", "metadata": {}, "outputs": [], "source": [ "def sprint(stats: dict):\n", " print(\"Stats:\")\n", " for k in stats:\n", " print(\" - {} -> {}\".format(k, stats[k]))" ] }, { "cell_type": "markdown", "id": "bfe3efe3", "metadata": {}, "source": [ "## Section 1: OnlineStatistics Class\n", "\n", "In this section, we demonstrate the usage of the OnlineStatistics class. We initialize an OnlineStatistics object and calculate statistics for an empty dataset. Then, we add the first and second batches of samples and update the statistics. Finally, we combine and recompute statistics for all samples." ] }, { "cell_type": "markdown", "id": "c6fb35d6", "metadata": {}, "source": [ "### Initialize and empty OnlineStatistics" ] }, { "cell_type": "code", "execution_count": null, "id": "096d7380", "metadata": {}, "outputs": [], "source": [ "print(\"#---# Initialize OnlineStatistics\")\n", "stats_computer = OnlineStatistics()\n", "stats = stats_computer.get_stats()\n", "\n", "sprint(stats)" ] }, { "cell_type": "markdown", "id": "8287566c", "metadata": {}, "source": [ "### Add sample batches" ] }, { "cell_type": "code", "execution_count": null, "id": "fc03469f", "metadata": {}, "outputs": [], "source": [ "# First batch of samples\n", "first_batch_samples = 3.0 * np.random.randn(100, 3) + 10.0\n", "print(f\"{first_batch_samples.shape = }\")\n", "\n", "stats_computer.add_samples(first_batch_samples)\n", "stats = stats_computer.get_stats()\n", "\n", "sprint(stats)" ] }, { "cell_type": "code", "execution_count": null, "id": "06bca5f0", "metadata": {}, "outputs": [], "source": [ "second_batch_samples = 10.0 * np.random.randn(1000, 3) - 1.0\n", "print(f\"{second_batch_samples.shape = }\")\n", "\n", "stats_computer.add_samples(second_batch_samples)\n", "stats = stats_computer.get_stats()\n", "\n", "sprint(stats)" ] }, { "cell_type": "markdown", "id": "e497dd2a", "metadata": {}, "source": [ "### Combine and recompute statistics" ] }, { "cell_type": "code", "execution_count": null, "id": "b38f94ce", "metadata": {}, "outputs": [], "source": [ "total_samples = np.concatenate((first_batch_samples, second_batch_samples), axis=0)\n", "print(f\"{total_samples.shape = }\")\n", "\n", "new_stats_computer = OnlineStatistics()\n", "new_stats_computer.add_samples(total_samples)\n", "stats = new_stats_computer.get_stats()\n", "\n", "sprint(stats)" ] }, { "cell_type": "markdown", "id": "3beb22c3", "metadata": {}, "source": [ "## Section 2: Stats Class\n", "\n", "In this section, we explore the Stats class. We initialize a Stats object to collect statistics, create and add samples with scalar and field data. We retrieve and display the calculated statistics. We also add more samples with varying field sizes and update the statistics, followed by retrieving and displaying the updated statistics." ] }, { "cell_type": "markdown", "id": "48987fa5", "metadata": {}, "source": [ "### Initalize an empty Stats object" ] }, { "cell_type": "code", "execution_count": null, "id": "12bf734a", "metadata": {}, "outputs": [], "source": [ "print(\"#---# Initialize Stats\")\n", "stats = Stats()\n", "print(f\"{stats.get_stats() = }\")" ] }, { "cell_type": "markdown", "id": "ca99f4b7", "metadata": {}, "source": [ "### Feed Stats with Samples" ] }, { "cell_type": "code", "execution_count": null, "id": "c5e6bf58", "metadata": {}, "outputs": [], "source": [ "print(\"#---# Feed Stats with samples\")\n", "\n", "# Init 11 samples\n", "nb_samples = 11\n", "samples = [Sample() for _ in range(nb_samples)]\n", "\n", "spatial_shape_max = 20\n", "#\n", "for sample in samples:\n", " sample.add_scalar(\"test_scalar\", np.random.randn())\n", " sample.init_base(2, 3, \"test_base\")\n", " zone_shape = np.array([0, 0, 0])\n", " sample.init_zone(zone_shape, zone_name=\"test_zone\")\n", " sample.add_field(\"test_field\", np.random.randn(spatial_shape_max))\n", "\n", "stats.add_samples(samples)" ] }, { "cell_type": "markdown", "id": "403e327f", "metadata": {}, "source": [ "### Get and print stats" ] }, { "cell_type": "code", "execution_count": null, "id": "a6aa0d7c", "metadata": {}, "outputs": [], "source": [ "rich.print(\"stats.get_stats():\")\n", "rich.print(stats.get_stats())" ] }, { "cell_type": "markdown", "id": "9f889479", "metadata": {}, "source": [ "### Feed Stats with more Samples" ] }, { "cell_type": "code", "execution_count": null, "id": "0045a539", "metadata": {}, "outputs": [], "source": [ "nb_samples = 11\n", "spatial_shape_max = 20\n", "samples = [Sample() for _ in range(nb_samples)]\n", "\n", "for sample in samples:\n", " sample.add_scalar(\"test_scalar\", np.random.randn())\n", " sample.init_base(2, 3, \"test_base\")\n", " zone_shape = np.array([0, 0, 0])\n", " sample.init_zone(zone_shape, zone_name=\"test_zone\")\n", " sample.add_field(\"test_field_same_size\", np.random.randn(7))\n", " sample.add_field(\n", " \"test_field\",\n", " np.random.randn(np.random.randint(spatial_shape_max // 2, spatial_shape_max)),\n", " )\n", "\n", "stats.add_samples(samples)" ] }, { "cell_type": "markdown", "id": "52e2703a", "metadata": {}, "source": [ "### Get and print stats" ] }, { "cell_type": "code", "execution_count": null, "id": "afb2866e", "metadata": {}, "outputs": [], "source": [ "rich.print(\"stats.get_stats():\")\n", "rich.print(stats.get_stats())" ] } ], "metadata": { "jupytext": { "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }