DataSet

class dtaianomaly.data.DataSet(X_test: ndarray, y_test: array, X_train: ndarray = None, y_train: array = None, feature_names: list[str] = None, time_steps_test: array = None, time_steps_train: array = None)[source]

Class for time series datasets.

A class for time series anomaly detection data sets. These consist of the raw data for training and testing anomaly detectors, as well as the respective ground truth labels.

Parameters:
X_testarray-like of shape (n_samples_test, n_attributes)

The test time series data.

y_testarray-like of shape (n_samples_test)

The ground truth anomaly labels of the test data.

X_trainarray-like of shape (n_samples_train, n_attributes), default=None

The train time series. If not given, then the test data will be used for training and the data is only compatible with unsupervised anomaly detectors.

y_trainarray-like of shape (n_samples_train), default=None

The ground truth anomaly labels of the training data. If not given, either the train data should not be given either, or the train data is assumed to consist of only normal data.

feature_nameslist of str, default=None

The name of each feature in the data. The number of names must be identical to the number of actual features. If None, then the data is assumed to be unnamed.

time_steps_testarray-like of shape (n_samples_test), default=None

The time steps corresponding to the test data. If None, then no time steps are known.

time_steps_trainarray-like of shape (n_samples_train), default=None

The time steps corresponding to the train data. If None, then no time steps are known. Can only be provided if there is actually some training data given (X_train` != None).

static check_is_valid(X_test: ndarray, y_test: ndarray, X_train: ndarray | None, y_train: ndarray | None) None[source]

Check if the given elements refer o a valid DataSet.

Check if the elements would give a valid DataSet, and otherwise a ValueError is raised.

Parameters:
X_testarray-like of shape (n_samples_test, n_attributes)

The test time series data.

y_testarray-like of shape (n_samples_test)

The ground truth anomaly labels of the test data.

X_trainarray-like of shape (n_samples_train, n_attributes) or None

The train time series data. Note that, even though X_train can be None, it must be provided.

y_trainarray-like of shape (n_samples_train) or None

The ground truth anomaly labels of the train data. Note that, even though y_train can be None, it must be provided.

Raises:
ValueError

If the given variables would not lead to a valid DataSet. This is the case if:

  • If X_test or y_test are not valid array-like.

  • If y_test is not univariate and has a value different from 0 or 1.

  • If X_test and y_test consist of a different number of samples.

  • If X_train is not None, but it is not a valid array-like.

  • If X_train is not None and consists of a different number of attributes than X_test.

  • If y_train is not None but X_train is None.

  • If y_train is not None but it is not a valid array-like.

  • If y_train is not None, but it is not univariate and has a . value different from 0 or 1.

  • If y_train is not None but consists of a different number of samples than X_train.

compatible_supervision() list[Supervision][source]

Get the compatible supervisions.

Get the compatible supervision types for this data set.

Returns:
list of Supervision

A list containing the compatible types for this dataset. The following suprvision types can be compatible:

  • Supervision.UNSUPERVISED: Always compatible.

  • Supervision.SEMI_SUPERVISED: Compatible if and only if there is some training data given (which is assumed to be normal).

  • Supervision.SUPERVISED: Only compatible if both training data and training labels are provided.

is_compatible(detector: BaseDetector) bool[source]

Check if the given detector is compatible.

Check if the given anomaly detector is compatible with this DataSet.

Parameters:
detectorBaseDetector

The anomaly detector to check if it is compatible with this DataSet.

Returns:
bool

True if and only if the given anomaly detector is compatible with this DataSet. The detector is compatible if

  • This DataSet does not contain any training data or training labels, only unsupervised anomaly detectors are compatible

  • This DataSet contains training data but no training labels, then unsupervised and semi-supervised anomaly detectors are compatible.

  • This DataSet contains training data and labels, then supervised, unsupervised and semi-supervised anomaly detectors are compatible.

is_valid() bool[source]

Check whether this DataSet is valid.

Check if this dataset object is valid.

Returns:
bool

True if and only if this instance is valid, i.e., if the attributes X_test, y_test, X_train and y_train of this instance pass all the checks of check_is_valid().