Statistics Calculation Examples¶

OnlineStatistics Class:

Initialize an OnlineStatistics object.
Calculate statistics for an empty dataset.
Add the first batch of samples and update statistics.
Add the second batch of samples and update statistics.
Combine and recompute statistics for all samples.

Stats Class:

Initialize a Stats object to collect statistics.
Create and add samples with scalar and field data.
Retrieve and display the calculated statistics.
Add more samples with varying field sizes and update statistics.
Retrieve and display the updated statistics.

This notebook provides examples of using the OnlineStatistics and Stats classes to compute statistics from sample data, including scalars and fields. It demonstrates the functionality and usage of these classes.

Each section is documented and explained.

# Import required libraries
import numpy as np
import rich

# Import necessary libraries and functions
from plaid import Sample
from plaid.utils.stats import OnlineStatistics, Stats

def sprint(stats: dict):
    print("Stats:")
    for k in stats:
        print(" - {} -> {}".format(k, stats[k]))

Section 1: OnlineStatistics Class¶

In this section, we demonstrate the usage of the OnlineStatistics class. We initialize an OnlineStatistics object and calculate statistics for an empty dataset. Then, we add the first and second batches of samples and update the statistics. Finally, we combine and recompute statistics for all samples.

Initialize and empty OnlineStatistics¶

print("#---# Initialize OnlineStatistics")
stats_computer = OnlineStatistics()
stats = stats_computer.get_stats()

sprint(stats)

#---# Initialize OnlineStatistics
Stats:
 - n_samples -> 0
 - n_points -> None
 - n_features -> None
 - min -> None
 - max -> None
 - mean -> None
 - var -> None
 - std -> None

Add sample batches¶

# First batch of samples
first_batch_samples = 3.0 * np.random.randn(100, 3) + 10.0
print(f"{first_batch_samples.shape = }")

stats_computer.add_samples(first_batch_samples)
stats = stats_computer.get_stats()

sprint(stats)

first_batch_samples.shape = (100, 3)
Stats:
 - n_samples -> 100
 - n_points -> 300
 - n_features -> 3
 - min -> [[1.52553357 2.72288592 0.54073117]]
 - max -> [[18.77884738 16.76221839 19.34546095]]
 - mean -> [[ 9.91381289  9.55158194 10.32446268]]
 - var -> [[10.98921588  8.78911533 10.12932221]]
 - std -> [[3.31499862 2.96464422 3.18265961]]

second_batch_samples = 10.0 * np.random.randn(1000, 3) - 1.0
print(f"{second_batch_samples.shape = }")

stats_computer.add_samples(second_batch_samples)
stats = stats_computer.get_stats()

sprint(stats)

second_batch_samples.shape = (1000, 3)
Stats:
 - n_samples -> 1100
 - n_points -> 3300
 - n_features -> 3
 - min -> [[-36.38923563 -43.75221979 -35.44641847]]
 - max -> [[31.55917246 27.4569715  28.74978493]]
 - mean -> [[ 0.0624286  -0.32900861  0.05273231]]
 - var -> [[102.08006206  98.52259556 105.17751073]]
 - std -> [[10.10346782  9.9258549  10.25560874]]

Combine and recompute statistics¶

total_samples = np.concatenate((first_batch_samples, second_batch_samples), axis=0)
print(f"{total_samples.shape = }")

new_stats_computer = OnlineStatistics()
new_stats_computer.add_samples(total_samples)
stats = new_stats_computer.get_stats()

sprint(stats)

total_samples.shape = (1100, 3)
Stats:
 - n_samples -> 1100
 - n_points -> 3300
 - n_features -> 3
 - min -> [[-36.38923563 -43.75221979 -35.44641847]]
 - max -> [[31.55917246 27.4569715  28.74978493]]
 - mean -> [[ 0.0624286  -0.32900861  0.05273231]]
 - var -> [[102.08006206  98.52259556 105.17751073]]
 - std -> [[10.10346782  9.9258549  10.25560874]]

Section 2: Stats Class¶

In this section, we explore the Stats class. We initialize a Stats object to collect statistics, create and add samples with scalar and field data. We retrieve and display the calculated statistics. We also add more samples with varying field sizes and update the statistics, followed by retrieving and displaying the updated statistics.

Initalize an empty Stats object¶

print("#---# Initialize Stats")
stats = Stats()
print(f"{stats.get_stats() = }")

#---# Initialize Stats
stats.get_stats() = {}

Feed Stats with Samples¶

print("#---# Feed Stats with samples")

# Init 11 samples
nb_samples = 11
samples = [Sample() for _ in range(nb_samples)]

spatial_shape_max = 20
#
for sample in samples:
    sample.add_scalar("test_scalar", np.random.randn())
    sample.init_base(2, 3, "test_base")
    zone_shape = np.array([0, 0, 0])
    sample.init_zone(zone_shape, zone_name="test_zone")
    sample.add_field("test_field", np.random.randn(spatial_shape_max))

stats.add_samples(samples)

#---# Feed Stats with samples

Get and print stats¶

rich.print("stats.get_stats():")
rich.print(stats.get_stats())

stats.get_stats():

{
    'test_base/test_zone/Vertex/test_field': {
        'n_samples': 11,
        'n_points': 220,
        'n_features': 20,
        'min': array([[-1.13624337, -1.95220382, -0.69190978, -1.38688929, -1.14885963,
        -0.17358256, -1.79724049, -1.6339858 , -2.30592454, -2.88340202,
        -2.6596943 , -1.05386429, -0.92206688, -0.93964417, -0.38797546,
        -0.72601155, -1.85044563, -1.58922411, -1.19229203, -1.67633984]]),
        'max': array([[1.68721182, 1.23780705, 1.6485124 , 1.47617457, 1.59495211,
        1.58235002, 1.7834908 , 1.94994104, 0.41389081, 1.69014925,
        1.52161595, 2.2444526 , 1.51949616, 1.62512551, 1.54798939,
        1.24861287, 1.71253262, 2.36947934, 2.03294793, 1.12510761]]),
        'mean': array([[ 0.26391411, -0.32888221,  0.48826475, -0.06428235,  0.25721342,
         0.61124879, -0.12747306, -0.24358602, -0.69891615, -0.00842234,
        -0.06665774,  0.57419084, -0.00693747,  0.08412853,  0.35731243,
         0.27656149, -0.28908652,  0.08298516, -0.08630824,  0.01797193]]),
        'var': array([[0.49667896, 0.70346184, 0.66870456, 0.70492344, 0.9131578 ,
        0.31955758, 1.35060404, 1.18393176, 1.10579469, 1.76046788,
        1.34335696, 1.03502602, 0.5679275 , 0.63627227, 0.34560579,
        0.36300865, 1.0478891 , 1.56351179, 0.85944039, 1.14295586]]),
        'std': array([[0.70475454, 0.83872632, 0.81774358, 0.83959719, 0.9555929 ,
        0.56529424, 1.16215491, 1.08808629, 1.05156773, 1.32682624,
        1.15903277, 1.01736229, 0.75360965, 0.79766676, 0.58788246,
        0.60250199, 1.02366455, 1.25040465, 0.92706008, 1.06909114]])
    },
    'test_scalar': {
        'n_samples': 11,
        'n_points': 11,
        'n_features': 1,
        'min': array([[-0.94428715]]),
        'max': array([[0.66796919]]),
        'mean': array([[0.06405618]]),
        'var': array([[0.19009011]]),
        'std': array([[0.43599325]])
    }
}

Feed Stats with more Samples¶

nb_samples = 11
spatial_shape_max = 20
samples = [Sample() for _ in range(nb_samples)]

for sample in samples:
    sample.add_scalar("test_scalar", np.random.randn())
    sample.init_base(2, 3, "test_base")
    zone_shape = np.array([0, 0, 0])
    sample.init_zone(zone_shape, zone_name="test_zone")
    sample.add_field("test_field_same_size", np.random.randn(7))
    sample.add_field(
        "test_field",
        np.random.randn(np.random.randint(spatial_shape_max // 2, spatial_shape_max)),
    )

stats.add_samples(samples)

Get and print stats¶

rich.print("stats.get_stats():")
rich.print(stats.get_stats())

stats.get_stats():

{
    'test_base/test_zone/Vertex/test_field': {
        'n_samples': 22,
        'n_points': np.int64(378),
        'n_features': 1,
        'min': array([[-2.88340202]]),
        'max': array([[2.36947934]]),
        'mean': array([[0.03703246]]),
        'var': array([[0.99106652]]),
        'std': array([[0.99552324]])
    },
    'test_base/test_zone/Vertex/test_field_same_size': {
        'n_samples': 11,
        'n_points': 77,
        'n_features': 7,
        'min': array([[-1.83609074, -0.98614982, -1.45687969, -1.87371414, -1.6995121 ,
        -1.88094645, -1.77993506]]),
        'max': array([[1.4072362 , 1.78567416, 1.24330349, 0.81705508, 2.52243342,
        1.16381406, 2.78630007]]),
        'mean': array([[-0.02979795,  0.19580167, -0.1223869 , -0.37642349,  0.47490914,
        -0.35683485,  0.18609708]]),
        'var': array([[1.36121599, 0.79416286, 0.71983646, 0.72180118, 1.23660867,
        0.88930425, 2.07797125]]),
        'std': array([[1.16671161, 0.89115816, 0.84843177, 0.84958883, 1.11202908,
        0.94302929, 1.441517  ]])
    },
    'test_scalar': {
        'n_samples': 22,
        'n_points': np.int64(22),
        'n_features': 1,
        'min': array([[-0.94428715]]),
        'max': array([[1.062049]]),
        'mean': array([[-0.0009593]]),
        'var': array([[0.33420932]]),
        'std': array([[0.57810839]])
    }
}