Anomaly detection module

This module contains functionality to detect anomalies. It can be imported as follows:

>>> from dtaianomaly import anomaly_detection

We refer to the documentation for more information regarding detecting anomalies using dtaianomaly.

API cheatsheet

Below there is a quick overview of the most essential methods to detect anomalies:

dtaianomaly.anomaly_detection.BaseDetector.fit(). Fit the anomaly detector. This method requires both an X (the time series) and y (anomaly labels) parameter. However, in practice, most anomaly detectors will not use the ground truth labels. The parameter y is present for API consistency and is not required.
dtaianomaly.anomaly_detection.BaseDetector.decision_function(). Compute the decision scores of an observation being an anomaly for a given time series X. Returns an array with an entry for each observation in the time series. Note that this score is not normalized, and depends on the specific anomaly detector. However, for all detectors, a higher score means more anomalous.
dtaianomaly.anomaly_detection.BaseDetector.predict_proba(). Compute the probability of an anomaly being an anomaly. This is similar to the decision_function() method, but the computed scores are normalized to the interval \([0, 1]\), which enables the interpretation as a probability.

Implemented anomaly detectors

BaseDetector

class dtaianomaly.anomaly_detection.BaseDetector[source]

Abstract base class for time series anomaly detection.

This base class defines method signatures to build specific anomaly detectors. User-defined detectors can be used throughout the dtaianomaly by extending this base class.

abstract decision_function(X: ndarray) → ndarray[source]

Abstract method, compute anomaly scores.

Parameters:: X (array-like of shape (n_samples, n_attributes)) – Input time series.
Returns:: decision_scores – The computed anomaly scores.
Return type:: array-like of shape (n_samples)

abstract fit(X: ndarray, y: ndarray | None = None) → BaseDetector[source]

Abstract method, fit this detector to the given data.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.
y (array-like, default=None) – Ground-truth information.

Returns:

self – Returns the instance itself.

Return type:

BaseDetector

predict_proba(X: ndarray) → ndarray[source]

Predict anomaly probabilities

Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If scores is not a valid array.
ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.

save(path: str | Path) → None[source]

Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.

Parameters:: path (str or Path) – Location where to store the detector.

Utilities

dtaianomaly.anomaly_detection.load_detector(path: str | Path) → BaseDetector[source]

Load a detector from disk.

Warning: method relies on pickle. Only load trusted files!

Parameters:: path (str or Path) – Location of the stored detector.
Returns:: detector – The loaded detector.
Return type:: BaseDetector

dtaianomaly.anomaly_detection.sliding_window(X: ndarray, window_size: int, stride: int) → ndarray[source]

Constructs a sliding window for the given time series.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – The time series
window_size (int) – The window size for the sliding windows.
stride (int) – The stride, i.e., the step size for the windows.

Returns:

windows – The windows as a 2D numpy array. Each row corresponds to a window. For windows of multivariate time series are flattened to form a 1D array of length the number of attributes multiplied by the window size.

Return type:

np.ndarray of shape ((n_samples - window_size)/stride + 1, n_attributes * window_size)

dtaianomaly.anomaly_detection.reverse_sliding_window(per_window_anomaly_scores: ndarray, window_size: int, stride: int, length_time_series: int) → ndarray[source]

Reverses the sliding window, to convert the per-window anomaly scores into per-observation anomaly scores.

For non-overlapping sliding windows, it is trivial to convert the per-window anomaly scores to per-observation scores, because each observation is linked to only one window. For overlapping windows, certain observations are linked to one or more windows (depending on the window size and stride), obstructing simply copying the corresponding per-window anomaly score to each window. In the case of multiple overlapping windows, the anomaly score of the observation is set to the mean of the corresponding per-window anomaly scores.

Parameters:

per_window_anomaly_scores (array-like of shape (n_windows))
window_size (int) – The window size used for creating windows
stride (int) – The stride, i.e., the step size used for creating windows
length_time_series (int) – The original length of the time series.

Returns:

anomaly_scores – The per-observation anomaly scores.

Return type:

np.ndarray of shape (length_time_series)