plaid.containers.dataset
========================

.. py:module:: plaid.containers.dataset

.. autoapi-nested-parse::

   Implementation of the `Dataset` container.


Attributes
----------

.. autoapisummary::

   plaid.containers.dataset.Self


Classes
-------

.. autoapisummary::

   plaid.containers.dataset.Dataset


Module Contents
---------------

.. py:data:: Self

.. py:class:: Dataset(path: Optional[Union[str, pathlib.Path]] = None, verbose: bool = False, processes_number: int = 0, samples: Optional[list[plaid.containers.sample.Sample]] = None, sample_ids: Optional[list[int]] = None)

   Bases: :py:obj:`object`


   A set of samples, and optionnaly some other informations about the Dataset.

   Initialize a :class:`Dataset <plaid.containers.dataset.Dataset>`.

   If `path` is not specified it initializes an empty :class:`Dataset <plaid.containers.dataset.Dataset>` that should be fed with :class:`Samples <plaid.containers.sample.Sample>`.

   Use :meth:`add_sample <plaid.containers.dataset.Dataset.add_sample>` or :meth:`add_samples <plaid.containers.dataset.Dataset.add_samples>` to feed the :class:`Dataset`

   :param path: The path from which to load PLAID dataset files.
   :type path: Union[str, Path], optional
   :param verbose: Explicitly displays the operations performed. Defaults to False.
   :type verbose: bool, optional
   :param processes_number: Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.
   :type processes_number: int, optional
   :param samples: A list of :class:`Samples <plaid.containers.sample.Sample>` to initialize the :class:`Dataset <plaid.containers.dataset.Dataset>`. Defaults to None.
   :type samples: list[Sample], optional
   :param sample_ids: An optional list of IDs for the new samples. If not provided, the IDs will be automatically generated based on the current number of samples in the dataset.
   :type sample_ids: list[int], optional

   .. rubric:: Example

   .. code-block:: python

       from plaid import Dataset
       from plaid import Sample

       # 1. Create empty instance of Dataset
       dataset = Dataset()
       print(dataset)
       >>> Dataset(0 samples, 0 scalars, 0 fields)
       print(len(dataset))
       >>> 0

       # 2. Load dataset and create Dataset instance
       dataset = Dataset("path_to_plaid_dataset") # .plaid or directory
       print(dataset)
       >>> Dataset(3 samples, 2 scalars, 5 fields)
       print(len(dataset))
       >>> 3
       for sample in dataset:
           print(sample)
       >>> Sample(1 scalar, 1 timestamp, 2 fields)
           Sample(1 scalar, 0 timestamps, 0 fields)
           Sample(2 scalars, 1 timestamp, 2 fields)

       # 3. Create Dataset instance from a list of Samples
       dataset = Dataset(samples=[sample1, sample2, sample3])
       print(dataset)
       >>> Dataset(3 samples, 0 scalars, 2 fields)

       # 4. Create Dataset instance from a list of Samples with specific ids
       dataset = Dataset(samples=[sample1, sample2, sample3], sample_ids=[3, 5, 7])
       print(dataset)
       >>> Dataset(3 samples, 0 scalars, 2 fields)

   .. caution:: It is assumed that you provided a compatible PLAID dataset.


   .. py:method:: copy() -> Self

      Create a deep copy of the dataset.

      :returns: A new `Dataset` instance with all internal data (samples, infos)
                deeply copied to ensure full isolation from the original.

      .. note:: This operation may be memory-intensive for large datasets.


   .. py:method:: get_samples(ids: Optional[list[int]] = None, as_list: bool = False) -> Union[list[plaid.containers.sample.Sample], dict[int, plaid.containers.sample.Sample]]

      Return dictionnary of samples with ids corresponding to :code:`ids` if specified, else all samples.

      :param ids: If None, take all samples. Defaults to None.
      :type ids: list[int], optional
      :param as_list: If False, return a dict ``id -> sample``, else return a list on ``Sample`` in the same order as ``ids``. Defaults to False.
      :type as_list: bool, optional

      :returns: Samples with corresponding ids.
      :rtype: dict[int,Sample]


   .. py:method:: add_sample(sample: plaid.containers.sample.Sample, id: Optional[int] = None) -> int

      Add a new :class:`Sample <plaid.containers.sample.Sample>` to the :class:`Dataset <plaid.containers.dataset.Dataset>.`.

      :param sample: The sample to add.
      :type sample: Sample
      :param id: An optional ID for the new sample. If not provided, the ID will be automatically generated based on the current number of samples in the dataset.
      :type id: int, optional

      :raises TypeError: If ``sample`` is not a :class:`Sample <plaid.containers.sample.Sample>`.

      :returns: Id of the new added :class:`Sample <plaid.containers.sample.Sample>`.
      :rtype: int

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          dataset.add_sample(sample)
          print(dataset)
          >>> Dataset(3 samples, 0 scalars, 2 fields)


   .. py:method:: del_sample(sample_id: int) -> None

      Delete a :class:`Sample <plaid.containers.sample.Sample>` from the :class:`Dataset <plaid.containers.dataset.Dataset>` and reorganize the remaining sample IDs to eliminate gaps.

      :param sample_id: The ID of the sample to delete.
      :type sample_id: int

      :raises ValueError: If the provided sample ID is not present in the dataset.

      :returns: The new list of sample ids.
      :rtype: list[int]

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          dataset.add_samples(samples)
          print(dataset)
          >>> Dataset(1 samples, y scalars, x fields)
          dataset.del_sample(0)
          print(dataset)
          >>> Dataset(0 samples, 0 scalars, 0 fields)


   .. py:method:: add_samples(samples: list[plaid.containers.sample.Sample], ids: Optional[list[int]] = None) -> list[int]

      Add new :class:`Samples <plaid.containers.sample.Sample>` to the :class:`Dataset <plaid.containers.dataset.Dataset>`.

      :param samples: The list of samples to add.
      :type samples: list[Sample]
      :param ids: An optional list of IDs for the new samples. If not provided, the IDs will be automatically generated based on the current number of samples in the dataset.
      :type ids: list[int], optional

      :raises TypeError: If ``samples`` is not a list or if one of the ``samples`` is not a :class:`Sample <plaid.containers.sample.Sample>`.
      :raises ValueError: If samples list is empty.
      :raises ValueError: If the length of ids list (if provided) is not equal to the length of samples list.
      :raises ValueError: If provided ids are not unique.

      :returns: Ids of added :class:`Samples <plaid.containers.sample.Sample>`.
      :rtype: list[int]

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          dataset.add_samples(samples)
          print(len(samples))
          >>> n
          print(dataset)
          >>> Dataset(n samples, 0 scalars, x fields)


   .. py:method:: del_samples(sample_ids: list[int]) -> None

      Delete  :class:`Sample <plaid.containers.sample.Sample>` from the :class:`Dataset <plaid.containers.dataset.Dataset>` and reorganize the remaining sample IDs to eliminate gaps.

      :param sample_ids: The list of IDs of samples to delete.
      :type sample_ids: list[int]

      :raises TypeError: If ``sample_ids`` is not a list.
      :raises ValueError: If sample_ids list is empty.
      :raises ValueError: If any of the sample_ids does not exist in the dataset.
      :raises ValueError: If the provided IDs are not unique.

      :returns: The new list of sample ids.
      :rtype: list[int]

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          # Assume samples are already added to the dataset
          print(dataset)
          >>> Dataset(6 samples, y scalars, x fields)
          dataset.del_samples([1, 3, 5])
          print(dataset)
          >>> Dataset(3 samples, y scalars, x fields)


   .. py:method:: get_sample_ids() -> list[int]

      Return list of sample ids.

      :returns: List of sample ids.
      :rtype: list[int]


   .. py:method:: get_scalar_names(ids: Optional[list[int]] = None) -> list[str]

      Return union of scalars names in all samples with id in ids.

      :param ids: Select scalars depending on sample id. If None, take all samples. Defaults to None.
      :type ids: list[int], optional

      :returns: List of all scalars names
      :rtype: list[str]


   .. py:method:: get_field_names(ids: Optional[list[int]] = None, location: Optional[str] = None, zone_name: Optional[str] = None, base_name: Optional[str] = None, time: Optional[float] = None) -> list[str]

      Return union of fields names in all samples with id in ids.

      :param ids: Select fields depending on sample id. If None, take all samples. Defaults to None.
      :type ids: list[int], optional
      :param location: If provided, only field names from this location will be included. Defaults to None.
      :type location: str, optional
      :param zone_name: If provided, only field names from this zone will be included. Defaults to None.
      :type zone_name: str, optional
      :param base_name: If provided, only field names containing this base name will be included. Defaults to None.
      :type base_name: str, optional
      :param time: If provided, only field names from this time will be included. Defaults to None.
      :type time: float, optional

      :returns: List of all fields names.
      :rtype: list[str]


   .. py:method:: get_all_features_identifiers(ids: Optional[list[int]] = None) -> list[plaid.containers.feature_identifier.FeatureIdentifier]

      Get all features identifiers from the dataset.

      :param ids: Sample id from which returning feature identifiers. If None, take all samples. Defaults to None.
      :type ids: list[int], optional

      :returns: A list of dictionaries containing the identifiers of all features in the dataset.
      :rtype: list[FeatureIdentifier]


   .. py:method:: get_all_features_identifiers_by_type(feature_type: Literal['scalar', 'nodes', 'field'], ids: list[int] = None) -> list[plaid.containers.feature_identifier.FeatureIdentifier]

      Get all features identifiers from the dataset.

      :param feature_type: Type of features to return
      :type feature_type: str
      :param ids: Sample id from which returning feature identifiers. If None, take all samples. Defaults to None.
      :type ids: list[int], optional

      :returns: A list of dictionaries containing the identifiers of all features of a given type  in the dataset.
      :rtype: list[FeatureIdentifier]


   .. py:method:: add_tabular_scalars(tabular: numpy.ndarray, names: Optional[list[str]] = None) -> None

      Add tabular scalar data to the summary.

      :param tabular: A 2D NumPy array containing tabular scalar data.
      :type tabular: np.ndarray
      :param names: A list of column names for the tabular data. Defaults to None.
      :type names: list[str], optional

      :raises ShapeError: Raised if the input tabular array does not have the correct shape (2D).
      :raises ShapeError: Raised if the number of columns in the tabular data does not match the number of names provided.

      .. note:: If no names are provided, it will automatically create names based on the pattern 'X{number}'


   .. py:method:: get_scalars_to_tabular(scalar_names: Optional[list[str]] = None, sample_ids: Optional[list[int]] = None, as_nparray=False) -> Union[dict[str, numpy.ndarray], numpy.ndarray]

      Return a dict containing scalar values as tabulars/arrays.

      :param scalar_names: Scalars to work on. If None, all scalars will be returned. Defaults to None.
      :type scalar_names: str, optional
      :param sample_ids: Filter by sample id. If None, take all samples. Defaults to None.
      :type sample_ids: list[int], optional
      :param as_nparray: If True, return the data as a single numpy ndarray. If False, return a dictionary mapping scalar names to their respective tabular values. Defaults to False.
      :type as_nparray: bool, optional

      :returns: if as_nparray is True.
                dict[str,np.ndarray]: if as_nparray is False, scalar name -> tabular values.
      :rtype: np.ndarray


   .. py:method:: get_feature_from_string_identifier(feature_string_identifier: str) -> dict[int, plaid.types.Feature]

      Get a list of features from the dataset based on the provided feature string identifier.

      :param feature_string_identifier: A string identifier for the feature.
      :type feature_string_identifier: str

      :returns: A list of features matching the provided string identifier.
      :rtype: dict[int, Feature]


   .. py:method:: get_feature_from_identifier(feature_identifier: plaid.containers.feature_identifier.FeatureIdentifier) -> dict[int, plaid.types.Feature]

      Get a list of features from the dataset based on the provided feature identifier.

      :param feature_identifier: A dictionary containing the feature identifier.
      :type feature_identifier: FeatureIdentifier

      :returns: A list of features matching the provided identifier.
      :rtype: dict[int, Feature]


   .. py:method:: get_features_from_identifiers(feature_identifiers: list[plaid.containers.feature_identifier.FeatureIdentifier]) -> dict[int, list[plaid.types.Feature]]

      Get a list of features from the dataset based on the provided feature identifiers.

      :param feature_identifiers: A dictionary containing the feature identifier.
      :type feature_identifiers: FeatureIdentifier

      :returns: A list of features matching the provided identifier.
      :rtype: dict[int, list[Feature]]


   .. py:method:: update_features_from_identifier(feature_identifiers: Union[plaid.containers.feature_identifier.FeatureIdentifier, list[plaid.containers.feature_identifier.FeatureIdentifier]], features: dict[int, Union[plaid.types.Feature, list[plaid.types.Feature]]], in_place: bool = False) -> Self

      Update one or several features of the dataset by their identifier(s).

      This method applies updates to scalars, fields, or nodes
      using feature identifiers, and corresponding feature data. When `in_place=False`, a deep copy of the dataset is created
      before applying updates, ensuring full isolation from the original.

      :param feature_identifiers: one or more feature identifiers.
      :type feature_identifiers: dict or list of dict
      :param features: dict with sample index as keys and one or more features as values.
      :type features: dict
      :param in_place: If True, modifies the current dataset in place.
                       If False, returns a deep copy with updated features.
      :type in_place: bool, optional

      :returns: The updated dataset (either the current instance or a new copy).
      :rtype: Self

      :raises AssertionError: If types are inconsistent or identifiers contain unexpected keys.


   .. py:method:: extract_dataset_from_identifier(feature_identifiers: Union[plaid.containers.feature_identifier.FeatureIdentifier, list[plaid.containers.feature_identifier.FeatureIdentifier]]) -> Self

      Extract features of the dataset by their identifier(s) and return a new dataset containing these features.

      This method applies updates to scalars, fields, or nodes
      using feature identifiers

      :param feature_identifiers: One or more feature identifiers.
      :type feature_identifiers: dict or list of dict

      :returns: New dataset containing the provided feature identifiers
      :rtype: Self

      :raises AssertionError: If types are inconsistent or identifiers contain unexpected keys.


   .. py:method:: from_features_identifier(feature_identifiers: Union[plaid.containers.feature_identifier.FeatureIdentifier, list[plaid.containers.feature_identifier.FeatureIdentifier]]) -> Self

      DEPRECATED: Use :meth:`Dataset.extract_dataset_from_identifier` instead.


   .. py:method:: get_tabular_from_homogeneous_identifiers(feature_identifiers: list[plaid.containers.feature_identifier.FeatureIdentifier]) -> plaid.types.Array

      Extract features of the dataset by their identifier(s) and return an array containing these features.

      Features must have identic sizes to be casted in an array. The first dimension of the array is the number of samples in the dataset.
      This method applies updates to scalars, fields, or nodes using feature identifiers.

      :param feature_identifiers: Feature identifiers.
      :type feature_identifiers: list of dict

      :returns: An containing the provided feature identifiers, size (nb_sample, nb_features, dim_features)
      :rtype: Array

      :raises AssertionError: If feature sizes are inconsistent.


   .. py:method:: get_tabular_from_stacked_identifiers(feature_identifiers: list[plaid.containers.feature_identifier.FeatureIdentifier]) -> tuple[plaid.types.Array, plaid.types.Array]

      Extract features of the dataset by their identifier(s), stack them and return an array containing these features.

      After stacking, each sample has one feature of dimension dim_stacked_features

      :param feature_identifiers: Feature identifiers.
      :type feature_identifiers: list of dict

      :returns: An array containing the provided feature identifiers, size (nb_sample, dim_stacked_features)
                Array: An array containing the cumulated feature dimensions, starts with 0, size (len(feature_identifiers)+1, )
      :rtype: Array


   .. py:method:: add_features_from_tabular(tabular: plaid.types.Array, feature_identifiers: list[plaid.containers.feature_identifier.FeatureIdentifier], restrict_to_features: bool = True) -> Self

      Add or update features in the dataset from tabular data using feature identifiers.

      This method takes tabular data and applies it to the dataset, either by updating existing features
      or adding new ones based on the provided feature identifiers. The method can either:
      1. Extract only the specified features and return a new dataset with just those features (if restrict_to_features=True)
      2. Update the specified features in the current dataset while keeping all other existing features (if restrict_to_features=False)

      :param tabular: of size (nb_sample, nb_features) or (nb_sample, nb_features, dim_feature) if dim_feature>1
      :type tabular: Array
      :param feature_identifiers: One or more feature identifiers specifying which features to update/add.
      :type feature_identifiers: list of dict
      :param restrict_to_features: If True, only returns the features from feature identifiers, otherwise keep the other features as well. Defaults to True.
      :type restrict_to_features: bool, optional

      :returns:

                A new dataset with features updated/added from the tabular data. If restrict_to_features=True,
                      contains only the specified features. If restrict_to_features=False, contains all original
                      features plus the updated/added ones.
      :rtype: Self

      :raises AssertionError: If the number of rows in `tabular` does not match the number of samples in the dataset,
          or if the number of feature identifiers does not match the number of columns in `tabular`.


   .. py:method:: from_tabular(tabular: plaid.types.Array, feature_identifiers: Union[plaid.containers.feature_identifier.FeatureIdentifier, list[plaid.containers.feature_identifier.FeatureIdentifier]], restrict_to_features: bool = True) -> Self

      DEPRECATED: Use :meth:`Dataset.add_features_from_tabular` instead.


   .. py:method:: add_info(cat_key: str, info_key: str, info: str) -> None

      Add information to the :class:`Dataset <plaid.containers.dataset.Dataset>`, overwriting existing information if there's a conflict.

      :param cat_key: Category key, choose among "legal," "data_production," and "data_description".
      :type cat_key: str
      :param info_key: Information key, depending on the chosen category key, choose among "owner", "license", "type", "physics", "simulator", "hardware", "computation_duration", "script", "contact", "location", "number_of_samples", "number_of_splits", "DOE", "inputs" and "outputs".
      :type info_key: str
      :param info: Information content.
      :type info: str

      :raises KeyError: Invalid category key.
      :raises KeyError: Invalid info key.

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          infos = {"legal":{"owner":"CompX", "license":"li_X"}}
          dataset.set_infos(infos)
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
          dataset.add_info("data_production", "type", "simulation")
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}, 'data_production': {'type': 'simulation'}}


   .. py:method:: add_infos(cat_key: str, infos: dict[str, str]) -> None

      Add information to the :class:`Dataset <plaid.containers.dataset.Dataset>`, overwriting existing information if there's a conflict.

      :param cat_key: Category key, choose among "legal," "data_production," and "data_description".
      :type cat_key: str
      :param infos: Information key with its related content.
      :type infos: str

      :raises KeyError: Invalid category key.
      :raises KeyError: Invalid info key.

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          infos = {"legal":{"owner":"CompX", "license":"li_X"}}
          dataset.set_infos(infos)
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}
          new_info = {"type":"simulation", "simulator":"Z-set"}
          dataset.add_infos("data_production", new_info)
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}, 'data_production': {'type': 'simulation', 'simulator': 'Z-set'}}


   .. py:method:: set_infos(infos: dict[str, dict[str, str]], warn: bool = True) -> None

      Set information to the :class:`Dataset <plaid.containers.dataset.Dataset>`, overwriting the existing one.

      :param infos: Information to associate with this data set (Dataset).
      :type infos: dict[str,dict[str,str]]
      :param warn: If True, warns when replacing existing infos. Defaults to True.
      :type warn: bool, optional

      :raises KeyError: Invalid category key format in provided infos.
      :raises KeyError: Invalid info key format in provided infos.

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          infos = {"legal":{"owner":"CompX", "license":"li_X"}}
          dataset.set_infos(infos)
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}


   .. py:method:: get_infos() -> dict[str, dict[str, str]]

      Get information from an instance of :class:`Dataset <plaid.containers.dataset.Dataset>`.

      :returns: Information associated with this data set (Dataset).
      :rtype: dict[str,dict[str,str]]

      .. rubric:: Example

      .. code-block:: python

          from plaid import Dataset
          dataset = Dataset()
          infos = {"legal":{"owner":"CompX", "license":"li_X"}}
          dataset.set_infos(infos)
          print(dataset.get_infos())
          >>> {'legal': {'owner': 'CompX', 'license': 'li_X'}}


   .. py:method:: print_infos() -> None

      Prints information in a readable format (pretty print).


   .. py:method:: merge_dataset(dataset: Self) -> list[int]

      Merges samples of another dataset into this one.

      :param dataset: The data set to be merged into this one (self).
      :type dataset: Dataset
      :param in_place: If True, modifies the current dataset in place.
      :type in_place: bool, option

      :returns: ids of added :class:`Samples <plaid.containers.sample.Sample>` from input :class:`Dataset <plaid.containers.dataset.Dataset>`.
      :rtype: list[int]

      :raises ValueError: If the provided dataset value is not an instance of Dataset


   .. py:method:: merge_features(dataset: Self, in_place: bool = False) -> Self

      Merge features of another dataset into this one.

      :param dataset: The dataset to be merged into this one (self).
      :type dataset: Dataset
      :param in_place: If True, modifies the current dataset in place.
                       If False, returns a deep copy with the merged features.
      :type in_place: bool, option

      :returns: A dataset containing all samples from the input datasets.
      :rtype: Dataset


   .. py:method:: merge_dataset_by_features(datasets_list: list[Self]) -> Self
      :classmethod:


      Merge features a list of datasets.

      :param datasets_list: The list of datasets to be merged.
      :type datasets_list: list[Dataset]

      :returns: A new dataset containing all samples from the input datasets.
      :rtype: Dataset


   .. py:method:: save(path: Union[str, pathlib.Path]) -> None

      DEPRECATED: use :meth:`Dataset.save_to_file` instead.


   .. py:method:: save_to_file(path: Union[str, pathlib.Path]) -> None

      Saves the data set to a TAR (Tape Archive) file.

      It creates a temporary intermediate directory to store temporary files during the loading process.

      :param path: The path to which the data set will be saved.
      :type path: Union[str, Path]

      :raises ValueError: If the randomly generated temporary dir name is already used (extremely unlikely!).


   .. py:method:: save_to_dir(path: Union[str, pathlib.Path], verbose: bool = False) -> None

      Saves the dataset into a sub-directory `samples` and creates an 'infos.yaml' file to store additional information about the dataset.

      :param path: The path in which to save the files.
      :type path: Union[str, Path]
      :param verbose: Explicitly displays the operations performed. Defaults to False.
      :type verbose: bool, optional


   .. py:method:: summarize_features() -> str

      Show the name of each feature and the number of samples containing it.

      :returns: A summary of features across the dataset.
      :rtype: str

      .. rubric:: Example

      .. code-block:: bash

          Dataset Feature Summary:
          ==================================================
          Scalars (8 unique):
          - Pr: 30/32 samples (93.8%)
          - Q: 30/32 samples (93.8%)
          - Tr: 30/32 samples (93.8%)
          - angle_in: 32/32 samples (100.0%)
          - angle_out: 30/32 samples (93.8%)
          - eth_is: 30/32 samples (93.8%)
          - mach_out: 32/32 samples (100.0%)
          - power: 30/32 samples (93.8%)

          Fields (8 unique):
          - M_iso: 30/32 samples (93.8%)
          - mach: 30/32 samples (93.8%)
          - nut: 30/32 samples (93.8%)
          - ro: 30/32 samples (93.8%)
          - roe: 30/32 samples (93.8%)
          - rou: 30/32 samples (93.8%)
          - rov: 30/32 samples (93.8%)
          - sdf: 32/32 samples (100.0%)


   .. py:method:: check_feature_completeness() -> str

      Detect and notify if some Samples don't contain all features.

      :returns: A report on feature completeness across the dataset.
      :rtype: str

      .. rubric:: Example

      .. code-block:: bash

          Dataset Feature Completeness Check:
          ========================================
          Complete samples: 30/32 (93.8%)
          Incomplete samples: 2/32 (6.2%)

          Samples with missing features:
          Sample 671: missing 13 features
              - scalar:Tr
              - scalar:angle_out
              - scalar:power
              - scalar:Pr
              - scalar:Q
              ... and 8 more
          Sample 672: missing 13 features
              - scalar:Tr
              - scalar:angle_out
              - scalar:power
              - scalar:Pr
              - scalar:Q
              ... and 8 more


   .. py:method:: from_list_of_samples(list_of_samples: list[plaid.containers.sample.Sample], ids: Optional[list[int]] = None) -> Self
      :classmethod:


      DEPRECATED: use `Dataset(samples=..., sample_ids=...)` instead.


   .. py:method:: load_from_file(path: Union[str, pathlib.Path], verbose: bool = False, processes_number: int = 0) -> Self
      :classmethod:


      Load data from a specified TAR (Tape Archive) file.

      :param path: The path to the data file to be loaded.
      :type path: Union[str, Path]
      :param verbose: Explicitly displays the operations performed. Defaults to False.
      :type verbose: bool, optional
      :param processes_number: Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.
      :type processes_number: int, optional

      :returns: The loaded dataset (Dataset).
      :rtype: Self


   .. py:method:: load_from_dir(path: Union[str, pathlib.Path], ids: Optional[list[int]] = None, verbose: bool = False, processes_number: int = 0) -> Self
      :classmethod:


      Load data from a specified directory.

      :param path: The path from which to load files.
      :type path: Union[str, Path]
      :param ids: The specific sample IDs to load from the dataset. Defaults to None.
      :type ids: list, optional
      :param verbose: Explicitly displays the operations performed. Defaults to False.
      :type verbose: bool, optional
      :param processes_number: Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.
      :type processes_number: int, optional

      :returns: The loaded dataset (Dataset).
      :rtype: Self


   .. py:method:: load(path: Union[str, pathlib.Path], verbose: bool = False, processes_number: int = 0) -> None

      Load data from a specified file or directory.

      .. note:: If path is a file, it creates a temporary intermediate directory to extract the files from the archive during the loading process.

      .. note:: This method overwrites the content of the calling instance.

      :param path: The path to the data file to be loaded.
      :type path: Union[str, Path]
      :param verbose: Explicitly displays the operations performed. Defaults to False.
      :type verbose: bool, optional
      :param processes_number: Number of processes used to load files (-1 to use all available ressources, 0 to disable multiprocessing). Defaults to 0.
      :type processes_number: int, optional

      :raises ValueError: If a randomly generated temporary directory already exists,
      :raises indicating a potential conflict during the loading process (extremely unlikely).:


   .. py:method:: add_to_dir(sample: plaid.containers.sample.Sample, path: Optional[Union[str, pathlib.Path]] = None, verbose: bool = False) -> None

      Add a sample to the dataset and save it to the specified directory.

      .. note::

         If `path` is None, will look for `self.path` which will be retrieved from last previous call to load or save.
         `path` given in argument will take precedence over `self.path` and overwrite it.

      :param sample: The sample to add.
      :type sample: Sample
      :param path: The directory in which to save the sample. Defaults to None.
      :type path: Union[str, Path], optional
      :param verbose: If True, will print additional information. Defaults to False.
      :type verbose: bool, optional

      :raises ValueError: If both self.path and path are None.


   .. py:method:: set_samples(samples: dict[int, plaid.containers.sample.Sample]) -> None

      Set the samples of the data set, overwriting the existing ones.

      :param samples: A dictionary of samples to set inside the dataset.
      :type samples: dict[int,Sample]

      :raises TypeError: If the 'samples' parameter is not of type dict[int, Sample].
      :raises TypeError: If the 'id' inside a sample is not of type int.
      :raises ValueError: If the 'id' inside a sample is negative (id >= 0 is required).
      :raises TypeError: If the values inside the 'samples' dictionary are not of type Sample.


   .. py:method:: set_sample(id: int, sample: plaid.containers.sample.Sample, warning_overwrite: bool = True) -> None

      Set a :class:`sample` with :code:`id` in the Dataset, overwriting existing samples if there's a conflict.

      :param id: The choosen id of the sample.
      :type id: int
      :param sample: The sample to set inside the dataset.
      :type sample: Sample
      :param warning_overwrite: Show warning if an preexisting field is being overwritten
      :type warning_overwrite: bool, optional

      :raises TypeError: If the 'id' inside the sample is not of type int.
      :raises ValueError: If the 'id' inside a sample is negative (id >= 0 is required).
      :raises TypeError: If 'sample' parameter is not of type Sample.

      .. caution:: In case of conflict, the existing samples will be overwritten.