plaid.containers.dataset

Implementation of the Dataset container.

Attributes

Classes

Dataset

A set of samples, and optionnaly some other informations about the Dataset.

Functions

process_sample(→ tuple)

Load Sample from path.

Module Contents

Self[source]
process_sample(path: str | pathlib.Path) tuple[source]

Load Sample from path.

Parameters:

path (Union[str,Path]) – The path to the Sample.

Returns:

The loaded Sample and its ID.

Return type:

tuple

class Dataset(path: str | pathlib.Path | None = None, directory_path: str | pathlib.Path | None = None, verbose: bool = False, processes_number: int = 0)[source]

Bases: object

A set of samples, and optionnaly some other informations about the Dataset.

Initialize a Dataset.

If path is not specified it initializes an empty Dataset that should be fed with Samples.

Use add_sample or add_samples to feed the Dataset

Parameters:
  • path (Union[str,Path], optional) – The path from which to load PLAID dataset files.

  • directory_path (Union[str,Path], optional) – Deprecated, use path instead.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

  • processes_number (int, optional) – Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.

Example

from plaid import Dataset

# 1. Create empty instance of Dataset
dataset = Dataset()
print(dataset)
>>> Dataset(0 samples, 0 scalars, 0 fields)
print(len(dataset))
>>> 0

# 2. Load dataset and create Dataset instance
dataset = Dataset("path_to_plaid_dataset") # .plaid or directory
print(dataset)
>>> Dataset(3 samples, 2 scalars, 5 fields)
print(len(dataset))
>>> 3
for sample in dataset:
    print(sample)
>>> Sample(1 scalar, 0 time series, 1 timestamp, 2 fields)
    Sample(1 scalar, 0 time series, 0 timestamps, 0 fields)
    Sample(2 scalars, 0 time series, 1 timestamp, 2 fields)

Caution

It is assumed that you provided a compatible PLAID dataset.

_samples: dict[int, plaid.containers.sample.Sample][source]
_infos: dict[str, dict[str, str]][source]
copy() Self[source]

Create a deep copy of the dataset.

Returns:

A new Dataset instance with all internal data (samples, infos) deeply copied to ensure full isolation from the original.

Notes

This operation may be memory-intensive for large datasets.

get_samples(ids: list[int] | None = None, as_list: bool = False) list[plaid.containers.sample.Sample] | dict[int, plaid.containers.sample.Sample][source]

Return dictionnary of samples with ids corresponding to ids if specified, else all samples.

Parameters:
  • ids (list[int], optional) – If None, take all samples. Defaults to None.

  • as_list (bool, optional) – If False, return a dict id -> sample, else return a list on Sample in the same order as ids. Defaults to False.

Returns:

Samples with corresponding ids.

Return type:

dict[int,Sample]

add_sample(sample: plaid.containers.sample.Sample, id: int | None = None) int[source]

Add a new Sample to the Dataset <plaid.containers.dataset.Dataset>..

Parameters:
  • sample (Sample) – The sample to add.

  • id (int, optional) – An optional ID for the new sample. If not provided, the ID will be automatically generated based on the current number of samples in the dataset.

Raises:

TypeError – If sample is not a Sample.

Returns:

Id of the new added Sample.

Return type:

int

Example

from plaid import Dataset
dataset = Dataset()
dataset.add_sample(sample)
print(dataset)
>>> Dataset(3 samples, 0 scalars, 2 fields)
del_sample(sample_id: int) None[source]

Delete a Sample from the Dataset and reorganize the remaining sample IDs to eliminate gaps.

Parameters:

sample_id (int) – The ID of the sample to delete.

Raises:

ValueError – If the provided sample ID is not present in the dataset.

Returns:

The new list of sample ids.

Return type:

list[int]

Example

from plaid import Dataset
dataset = Dataset()
dataset.add_samples(samples)
print(dataset)
>>> Dataset(1 samples, y scalars, x fields)
dataset.del_sample(0)
print(dataset)
>>> Dataset(0 samples, 0 scalars, 0 fields)
add_samples(samples: list[plaid.containers.sample.Sample], ids: list[int] | None = None) list[int][source]

Add new Samples to the Dataset.

Parameters:
  • samples (list[Sample]) – The list of samples to add.

  • ids (list[int], optional) – An optional list of IDs for the new samples. If not provided, the IDs will be automatically generated based on the current number of samples in the dataset.

Raises:
  • TypeError – If samples is not a list or if one of the samples is not a Sample.

  • ValueError – If samples list is empty.

  • ValueError – If the length of ids list (if provided) is not equal to the length of samples list.

  • ValueError – If provided ids are not unique.

Returns:

Ids of added Samples.

Return type:

list[int]

Example

from plaid import Dataset
dataset = Dataset()
dataset.add_samples(samples)
print(len(samples))
>>> n
print(dataset)
>>> Dataset(n samples, 0 scalars, x fields)
del_samples(sample_ids: list[int]) None[source]

Delete Sample from the Dataset and reorganize the remaining sample IDs to eliminate gaps.

Parameters:

sample_ids (list[int]) – The list of IDs of samples to delete.

Raises:
  • TypeError – If sample_ids is not a list.

  • ValueError – If sample_ids list is empty.

  • ValueError – If any of the sample_ids does not exist in the dataset.

  • ValueError – If the provided IDs are not unique.

Returns:

The new list of sample ids.

Return type:

list[int]

Example

from plaid import Dataset
dataset = Dataset()
# Assume samples are already added to the dataset
print(dataset)
>>> Dataset(6 samples, y scalars, x fields)
dataset.del_samples([1, 3, 5])
print(dataset)
>>> Dataset(3 samples, y scalars, x fields)
get_sample_ids() list[int][source]

Return list of sample ids.

Returns:

List of sample ids.

Return type:

list[int]

get_scalar_names(ids: list[int] | None = None) list[str][source]

Return union of scalars names in all samples with id in ids.

Parameters:

ids (list[int], optional) – Select scalars depending on sample id. If None, take all samples. Defaults to None.

Returns:

List of all scalars names

Return type:

list[str]

get_time_series_names(ids: list[int] | None = None) list[str][source]

Return union of time series names in all samples with id in ids.

Parameters:

ids (list[int], optional) – Select time series depending on sample id. If None, take all samples. Defaults to None.

Returns:

List of all time series names

Return type:

list[str]

get_field_names(ids: list[int] | None = None, location: str | None = None, zone_name: str | None = None, base_name: str | None = None, time: float | None = None) list[str][source]

Return union of fields names in all samples with id in ids.

Parameters:
  • ids (list[int], optional) – Select fields depending on sample id. If None, take all samples. Defaults to None.

  • location (str, optional) – If provided, only field names from this location will be included. Defaults to None.

  • zone_name (str, optional) – If provided, only field names from this zone will be included. Defaults to None.

  • base_name (str, optional) – If provided, only field names containing this base name will be included. Defaults to None.

  • time (float, optional) – If provided, only field names from this time will be included. Defaults to None.

Returns:

List of all fields names.

Return type:

list[str]

get_all_features_identifiers(ids: list[int] | None = None) list[plaid.types.FeatureIdentifier][source]

Get all features identifiers from the dataset.

Parameters:

ids (list[int], optional) – Sample id from which returning feature identifiers. If None, take all samples. Defaults to None.

Returns:

A list of dictionaries containing the identifiers of all features in the dataset.

Return type:

list[FeatureIdentifier]

get_all_features_identifiers_by_type(feature_type: Literal['scalar', 'nodes', 'field', 'time_series'], ids: list[int] = None) list[plaid.types.FeatureIdentifier][source]

Get all features identifiers from the dataset.

Parameters:
  • feature_type (str) – Type of features to return

  • ids (list[int], optional) – Sample id from which returning feature identifiers. If None, take all samples. Defaults to None.

Returns:

A list of dictionaries containing the identifiers of all features of a given type in the dataset.

Return type:

list[FeatureIdentifier]

add_tabular_scalars(tabular: numpy.ndarray, names: list[str] | None = None) None[source]

Add tabular scalar data to the summary.

Parameters:
  • tabular (np.ndarray) – A 2D NumPy array containing tabular scalar data.

  • names (list[str], optional) – A list of column names for the tabular data. Defaults to None.

Raises:
  • ShapeError – Raised if the input tabular array does not have the correct shape (2D).

  • ShapeError – Raised if the number of columns in the tabular data does not match the number of names provided.

Notes

If no names are provided, it will automatically create names based on the pattern ‘X{number}’

get_scalars_to_tabular(scalar_names: list[str] | None = None, sample_ids: list[int] | None = None, as_nparray=False) dict[str, numpy.ndarray] | numpy.ndarray[source]

Return a dict containing scalar values as tabulars/arrays.

Parameters:
  • scalar_names (str, optional) – Scalars to work on. If None, all scalars will be returned. Defaults to None.

  • sample_ids (list[int], optional) – Filter by sample id. If None, take all samples. Defaults to None.

  • as_nparray (bool, optional) – If True, return the data as a single numpy ndarray. If False, return a dictionary mapping scalar names to their respective tabular values. Defaults to False.

Returns:

if as_nparray is True. dict[str,np.ndarray]: if as_nparray is False, scalar name -> tabular values.

Return type:

np.ndarray

get_feature_from_string_identifier(feature_string_identifier: str) dict[int, plaid.types.Feature][source]

Get a list of features from the dataset based on the provided feature string identifier.

Parameters:

feature_string_identifier (str) – A string identifier for the feature.

Returns:

A list of features matching the provided string identifier.

Return type:

dict[int, Feature]

get_feature_from_identifier(feature_identifier: plaid.types.FeatureIdentifier) dict[int, plaid.types.Feature][source]

Get a list of features from the dataset based on the provided feature identifier.

Parameters:

feature_identifier (FeatureIdentifier) – A dictionary containing the feature identifier.

Returns:

A list of features matching the provided identifier.

Return type:

dict[int, Feature]

get_features_from_identifiers(feature_identifiers: list[plaid.types.FeatureIdentifier]) dict[int, list[plaid.types.Feature]][source]

Get a list of features from the dataset based on the provided feature identifiers.

Parameters:

feature_identifiers (FeatureIdentifier) – A dictionary containing the feature identifier.

Returns:

A list of features matching the provided identifier.

Return type:

dict[int, list[Feature]]

update_features_from_identifier(feature_identifiers: plaid.types.FeatureIdentifier | list[plaid.types.FeatureIdentifier], features: dict[int, plaid.types.Feature | list[plaid.types.Feature]], in_place: bool = False) Self[source]

Update one or several features of the dataset by their identifier(s).

This method applies updates to scalars, time series, fields, or nodes using feature identifiers, and corresponding feature data. When in_place=False, a deep copy of the dataset is created before applying updates, ensuring full isolation from the original.

Parameters:
  • feature_identifiers (dict or list of dict) – one or more feature identifiers.

  • features (dict) – dict with sample index as keys and one or more features as values.

  • in_place (bool, optional) – If True, modifies the current dataset in place. If False, returns a deep copy with updated features.

Returns:

The updated dataset (either the current instance or a new copy).

Return type:

Self

Raises:

AssertionError – If types are inconsistent or identifiers contain unexpected keys.

extract_dataset_from_identifier(feature_identifiers: plaid.types.FeatureIdentifier | list[plaid.types.FeatureIdentifier]) Self[source]

Extract features of the dataset by their identifier(s) and return a new dataset containing these features.

This method applies updates to scalars, time series, fields, or nodes using feature identifiers

Parameters:

feature_identifiers (dict or list of dict) – One or more feature identifiers.

Returns:

New dataset containing the provided feature identifiers

Return type:

Self

Raises:

AssertionError – If types are inconsistent or identifiers contain unexpected keys.

from_features_identifier(feature_identifiers: plaid.types.FeatureIdentifier | list[plaid.types.FeatureIdentifier]) Self[source]

DEPRECATED: Use extract_dataset_from_identifier() instead.

get_tabular_from_homogeneous_identifiers(feature_identifiers: list[plaid.types.FeatureIdentifier]) plaid.types.Array[source]

Extract features of the dataset by their identifier(s) and return an array containing these features.

Features must have identic sizes to be casted in an array. The first dimension of the array is the number of samples in the dataset. This method applies updates to scalars, time series, fields, or nodes using feature identifiers.

Parameters:

feature_identifiers (list of dict) – Feature identifiers.

Returns:

An containing the provided feature identifiers, size (nb_sample, nb_features, dim_features)

Return type:

Array

Notes

Not working with time_series for the moment (time series have 2 elements: time_sequence and values)

Raises:

AssertionError – If feature sizes are inconsistent.

get_tabular_from_stacked_identifiers(feature_identifiers: list[plaid.types.FeatureIdentifier]) plaid.types.Array[source]

Extract features of the dataset by their identifier(s), stack them and return an array containing these features.

After stacking, each sample has one feature of dimension dim_stacked_features

Parameters:

feature_identifiers (list of dict) – Feature identifiers.

Returns:

An containing the provided feature identifiers, size (nb_sample, dim_stacked_features)

Return type:

Array

add_features_from_tabular(tabular: plaid.types.Array, feature_identifiers: plaid.types.FeatureIdentifier | list[plaid.types.FeatureIdentifier], restrict_to_features: bool = True) Self[source]

Add or update features in the dataset from tabular data using feature identifiers.

This method takes tabular data and applies it to the dataset, either by updating existing features or adding new ones based on the provided feature identifiers. The method can either: 1. Extract only the specified features and return a new dataset with just those features (if restrict_to_features=True) 2. Update the specified features in the current dataset while keeping all other existing features (if restrict_to_features=False)

Parameters:
  • tabular (Array) – of size (nb_sample, nb_features) or (nb_sample, nb_features, dim_feature) if dim_feature>1

  • feature_identifiers (dict or list of dict) – One or more feature identifiers specifying which features to update/add.

  • restrict_to_features (bool, optional) – If True, only returns the features from feature identifiers, otherwise keep the other features as well. Defaults to True.

Returns:

A new dataset with features updated/added from the tabular data. If restrict_to_features=True,

contains only the specified features. If restrict_to_features=False, contains all original features plus the updated/added ones.

Return type:

Self

Raises:

AssertionError – If the number of rows in tabular does not match the number of samples in the dataset, or if the number of feature identifiers does not match the number of columns in tabular.

from_tabular(tabular: plaid.types.Array, feature_identifiers: plaid.types.FeatureIdentifier | list[plaid.types.FeatureIdentifier], restrict_to_features: bool = True) Self[source]

DEPRECATED: Use add_features_from_tabular() instead.

add_info(cat_key: str, info_key: str, info: str) None[source]

Add information to the Dataset, overwriting existing information if there’s a conflict.

Parameters:
  • cat_key (str) – Category key, choose among “legal,” “data_production,” and “data_description”.

  • info_key (str) – Information key, depending on the chosen category key, choose among “owner”, “license”, “type”, “physics”, “simulator”, “hardware”, “computation_duration”, “script”, “contact”, “location”, “number_of_samples”, “number_of_splits”, “DOE”, “inputs” and “outputs”.

  • info (str) – Information content.

Raises:

Example

from plaid import Dataset
dataset = Dataset()
infos = {"legal":{"owner":"CompX", "license":"li_X"}}
dataset.set_infos(infos)
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
dataset.add_info("data_production", "type", "simulation")
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}, 'data_production': {'type': 'simulation'}}
add_infos(cat_key: str, infos: dict[str, str]) None[source]

Add information to the Dataset, overwriting existing information if there’s a conflict.

Parameters:
  • cat_key (str) – Category key, choose among “legal,” “data_production,” and “data_description”.

  • infos (str) – Information key with its related content.

Raises:

Example

from plaid import Dataset
dataset = Dataset()
infos = {"legal":{"owner":"CompX", "license":"li_X"}}
dataset.set_infos(infos)
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
new_info = {"type":"simulation", "simulator":"Z-set"}
dataset.add_infos("data_production", new_info)
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}, 'data_production': {'type': 'simulation', 'simulator': 'Z-set'}}
set_infos(infos: dict[str, dict[str, str]]) None[source]

Set information to the Dataset, overwriting the existing one.

Parameters:

infos (dict[str,dict[str,str]]) – Information to associate with this data set (Dataset).

Raises:
  • KeyError – Invalid category key format in provided infos.

  • KeyError – Invalid info key format in provided infos.

Example

from plaid import Dataset
dataset = Dataset()
infos = {"legal":{"owner":"CompX", "license":"li_X"}}
dataset.set_infos(infos)
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
get_infos() dict[str, dict[str, str]][source]

Get information from an instance of Dataset.

Returns:

Information associated with this data set (Dataset).

Return type:

dict[str,dict[str,str]]

Example

from plaid import Dataset
dataset = Dataset()
infos = {"legal":{"owner":"CompX", "license":"li_X"}}
dataset.set_infos(infos)
print(dataset.get_infos())
>>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
print_infos() None[source]

Prints information in a readable format (pretty print).

merge_dataset(dataset: Self) list[int][source]

Merges samples of another dataset into this one.

Parameters:
  • dataset (Dataset) – The data set to be merged into this one (self).

  • in_place (bool, option) – If True, modifies the current dataset in place.

Returns:

ids of added Samples from input Dataset.

Return type:

list[int]

Raises:

ValueError – If the provided dataset value is not an instance of Dataset

merge_features(dataset: Self, in_place: bool = False) Self[source]

Merge features of another dataset into this one.

Parameters:
  • dataset (Dataset) – The dataset to be merged into this one (self).

  • in_place (bool, option) – If True, modifies the current dataset in place. If False, returns a deep copy with the merged features.

Returns:

A dataset containing all samples from the input datasets.

Return type:

Dataset

classmethod merge_dataset_by_features(datasets_list: list[Self]) Self[source]

Merge features a list of datasets.

Parameters:

datasets_list (list[Dataset]) – The list of datasets to be merged.

Returns:

A new dataset containing all samples from the input datasets.

Return type:

Dataset

save(path: str | pathlib.Path) None[source]

Saves the data set to a TAR (Tape Archive) file.

It creates a temporary intermediate directory to store temporary files during the loading process.

Parameters:

path (Union[str,Path]) – The path to which the data set will be saved.

Raises:

ValueError – If the randomly generated temporary dir name is already used (extremely unlikely!).

summarize_features() str[source]

Show the name of each feature and the number of samples containing it.

Returns:

A summary of features across the dataset.

Return type:

str

Example

Dataset Feature Summary:
==================================================
Scalars (8 unique):
- Pr: 30/32 samples (93.8%)
- Q: 30/32 samples (93.8%)
- Tr: 30/32 samples (93.8%)
- angle_in: 32/32 samples (100.0%)
- angle_out: 30/32 samples (93.8%)
- eth_is: 30/32 samples (93.8%)
- mach_out: 32/32 samples (100.0%)
- power: 30/32 samples (93.8%)

Time Series (0 unique):
None

Fields (8 unique):
- M_iso: 30/32 samples (93.8%)
- mach: 30/32 samples (93.8%)
- nut: 30/32 samples (93.8%)
- ro: 30/32 samples (93.8%)
- roe: 30/32 samples (93.8%)
- rou: 30/32 samples (93.8%)
- rov: 30/32 samples (93.8%)
- sdf: 32/32 samples (100.0%)
check_feature_completeness() str[source]

Detect and notify if some Samples don’t contain all features.

Returns:

A report on feature completeness across the dataset.

Return type:

str

Example

Dataset Feature Completeness Check:
========================================
Complete samples: 30/32 (93.8%)
Incomplete samples: 2/32 (6.2%)

Samples with missing features:
Sample 671: missing 13 features
    - scalar:Tr
    - scalar:angle_out
    - scalar:power
    - scalar:Pr
    - scalar:Q
    ... and 8 more
Sample 672: missing 13 features
    - scalar:Tr
    - scalar:angle_out
    - scalar:power
    - scalar:Pr
    - scalar:Q
    ... and 8 more
classmethod from_list_of_samples(list_of_samples: list[plaid.containers.sample.Sample], ids: list[int] | None = None) Self[source]

Initialise a dataset from a list of samples.

Parameters:
  • list_of_samples (list[Sample]) – The list of samples.

  • ids (list[int], optional) – An optional list of IDs for the new samples. If not provided, the IDs will be automatically generated based on the current number of samples in the dataset.

Returns:

The intialized dataset (Dataset).

Return type:

Self

classmethod load_from_file(path: str | pathlib.Path, verbose: bool = False, processes_number: int = 0) Self[source]

Load data from a specified TAR (Tape Archive) file.

Parameters:
  • path (Union[str,Path]) – The path to the data file to be loaded.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

  • processes_number (int, optional) – Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.

Returns:

The loaded dataset (Dataset).

Return type:

Self

classmethod load_from_dir(path: str | pathlib.Path, ids: list[int] | None = None, verbose: bool = False, processes_number: int = 0) Self[source]

Load data from a specified directory.

Parameters:
  • path (Union[str,Path]) – The path from which to load files.

  • ids (list, optional) – The specific sample IDs to load from the dataset. Defaults to None.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

  • processes_number (int, optional) – Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.

Returns:

The loaded dataset (Dataset).

Return type:

Self

load(path: str | pathlib.Path, verbose: bool = False, processes_number: int = 0) None[source]

Load data from a specified TAR (Tape Archive) file.

It creates a temporary intermediate directory to store temporary files during the loading process.

Parameters:
  • path (Union[str,Path]) – The path to the data file to be loaded.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

  • processes_number (int, optional) – Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.

Raises:
  • ValueError – If a randomly generated temporary directory already exists,

  • indicating a potential conflict during the loading process (extremely unlikely).

add_to_dir(sample: plaid.containers.sample.Sample, path: str | pathlib.Path | None = None, save_dir: str | pathlib.Path | None = None, verbose: bool = False) None[source]

Add a sample to the dataset and save it to the specified directory.

Notes

If path is None, will look for self.path which will be retrieved from last previous call to load or save. path given in argument will take precedence over self.path and overwrite it.

Parameters:
  • sample (Sample) – The sample to add.

  • path (Union[str,Path], optional) – The directory in which to save the sample. Defaults to None.

  • save_dir (Union[str,Path], optional) – Deprecated, use path instead.

  • verbose (bool, optional) – If True, will print additional information. Defaults to False.

Raises:

ValueError – If both self.path and path are None.

_save_to_dir_(path: str | pathlib.Path, verbose: bool = False) None[source]

Saves the dataset into a sub-directory samples and creates an ‘infos.yaml’ file to store additional information about the dataset.

Parameters:
  • path (Union[str,Path]) – The path in which to save the files.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

_load_from_dir_(path: str | pathlib.Path, ids: list[int] | None = None, verbose: bool = False, processes_number: int = 0) None[source]

Loads a dataset from a sample directory and retrieves additional information about the dataset from an ‘infos.yaml’ file, if available.

Parameters:
  • path (Union[str,Path]) – The path from which to load files.

  • ids (list, optional) – The specific sample IDs to load from the dataset. Defaults to None.

  • verbose (bool, optional) – Explicitly displays the operations performed. Defaults to False.

  • processes_number (int, optional) – Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.

Raises:
  • FileNotFoundError – Triggered if the provided directory does not exist.

  • FileExistsError – Triggered if the provided path is a file instead of a directory.

  • ValueError – Triggered if the number of processes is < -1.

static _load_number_of_samples_(_path: str | pathlib.Path) int[source]

Warning: This method is deprecated, use instead plaid.get_number_of_samples.

This function counts the number of sample files in a specified directory, which is useful for determining the total number of samples in a dataset.

Parameters:

path (Union[str,Path]) – The path to the directory where sample files are stored.

Returns:

The number of sample files found in the specified directory.

Return type:

int

set_samples(samples: dict[int, plaid.containers.sample.Sample]) None[source]

Set the samples of the data set, overwriting the existing ones.

Parameters:

samples (dict[int,Sample]) – A dictionary of samples to set inside the dataset.

Raises:
  • TypeError – If the ‘samples’ parameter is not of type dict[int, Sample].

  • TypeError – If the ‘id’ inside a sample is not of type int.

  • ValueError – If the ‘id’ inside a sample is negative (id >= 0 is required).

  • TypeError – If the values inside the ‘samples’ dictionary are not of type Sample.

set_sample(id: int, sample: plaid.containers.sample.Sample, warning_overwrite: bool = True) None[source]

Set a sample with id in the Dataset, overwriting existing samples if there’s a conflict.

Parameters:
  • id (int) – The choosen id of the sample.

  • sample (Sample) – The sample to set inside the dataset.

  • warning_overwrite (bool, optional) – Show warning if an preexisting field is being overwritten

Raises:
  • TypeError – If the ‘id’ inside the sample is not of type int.

  • ValueError – If the ‘id’ inside a sample is negative (id >= 0 is required).

  • TypeError – If ‘sample’ parameter is not of type Sample.

Caution

In case of conflict, the existing samples will be overwritten.