Baselines

class dtaianomaly.anomaly_detection.baselines.AlwaysNormal[source]

Baseline anomaly detector, which predicts that all observations are normal. This detector should only be used for sanity-check, and not to effectively detect anomalies in time series data.

check_is_fitted() None

Check whether this anomaly detector is fitted or not.

Raises:

NotFittedError – If this detector is not fitted yet.

decision_function(X: ndarray) array

Abstract method, compute anomaly scores.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

decision_scores – The computed anomaly scores.

Return type:

array-like of shape (n_samples)

fit(X: ndarray, y: ndarray | None = None, **kwargs) BaseDetector

Abstract method, fit this detector to the given data.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – Input time series.

  • y (array-like, default=None) – Ground-truth information.

Returns:

self – Returns the instance itself.

Return type:

BaseDetector

is_fitted() bool

Return whether this anomaly detector is fitted.

Returns:

is_fitted – True if and only if this detector is fitted, and can be used for detecting anomalies.

Return type:

bool

predict_confidence(X: ndarray, X_train: ndarray = None, contamination: float = 0.05, decision_scores_given: bool = False)

Predict the confidence of the anomaly scores on the test given test data.

This method implements ExCeeD [perini2020quantifying] (Example-wise Confidence of anomaly Detectors) to estimate the confidence. ExCeed transforms the predicted decision scores to probability estimates using a Bayesian approach, which enables to assign a confidence score to each prediction which captures the uncertainty of the anomaly detector in that prediction.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – The test time series for which the confidence of anomaly scores should be predicted.

  • X_train (array-like of shape (n_samples_train, n_attributes), default=None) – The training time series, which can be used as reference. If X_train=None, the test set is used as reference set.

  • contamination (float, default=0.05) – The (estimated) contamination rate for the data, i.e., the expected percentage of anomalies.

  • decision_scores_given (bool, default=False) – Whether the given X and X_train represent time series data or decision scores. If decision_scores_given=False (default), then the given arrays are interpreted as time series. Otherwise, they are interpreted as decision scores, as computed by decision_function().

Returns:

confidence – The confidence of this anomaly detector in each prediction in the given test time series.

Return type:

array-like of shape (n_samples)

References

[perini2020quantifying]

Perini, L., Vercruyssen, V., Davis, J. Quantifying the Confidence of Anomaly Detectors in Their Example-Wise Predictions. In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Springer, Cham, doi: 10.1007/978-3-030-67664-3_14.

predict_proba(X: ndarray) ndarray

Predict anomaly probabilities

Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.

Return type:

array-like of shape (n_samples)

Raises:
  • ValueError – If scores is not a valid array.

  • ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.

save(path: str | Path) None

Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.

Parameters:

path (str or Path) – Location where to store the detector.

class dtaianomaly.anomaly_detection.baselines.AlwaysAnomalous[source]

Baseline anomaly detector, which predicts that all observations are anomalous. This detector should only be used for sanity-check, and not to effectively detect anomalies in time series data.

check_is_fitted() None

Check whether this anomaly detector is fitted or not.

Raises:

NotFittedError – If this detector is not fitted yet.

decision_function(X: ndarray) array

Abstract method, compute anomaly scores.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

decision_scores – The computed anomaly scores.

Return type:

array-like of shape (n_samples)

fit(X: ndarray, y: ndarray | None = None, **kwargs) BaseDetector

Abstract method, fit this detector to the given data.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – Input time series.

  • y (array-like, default=None) – Ground-truth information.

Returns:

self – Returns the instance itself.

Return type:

BaseDetector

is_fitted() bool

Return whether this anomaly detector is fitted.

Returns:

is_fitted – True if and only if this detector is fitted, and can be used for detecting anomalies.

Return type:

bool

predict_confidence(X: ndarray, X_train: ndarray = None, contamination: float = 0.05, decision_scores_given: bool = False)

Predict the confidence of the anomaly scores on the test given test data.

This method implements ExCeeD [perini2020quantifying] (Example-wise Confidence of anomaly Detectors) to estimate the confidence. ExCeed transforms the predicted decision scores to probability estimates using a Bayesian approach, which enables to assign a confidence score to each prediction which captures the uncertainty of the anomaly detector in that prediction.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – The test time series for which the confidence of anomaly scores should be predicted.

  • X_train (array-like of shape (n_samples_train, n_attributes), default=None) – The training time series, which can be used as reference. If X_train=None, the test set is used as reference set.

  • contamination (float, default=0.05) – The (estimated) contamination rate for the data, i.e., the expected percentage of anomalies.

  • decision_scores_given (bool, default=False) – Whether the given X and X_train represent time series data or decision scores. If decision_scores_given=False (default), then the given arrays are interpreted as time series. Otherwise, they are interpreted as decision scores, as computed by decision_function().

Returns:

confidence – The confidence of this anomaly detector in each prediction in the given test time series.

Return type:

array-like of shape (n_samples)

References

[perini2020quantifying]

Perini, L., Vercruyssen, V., Davis, J. Quantifying the Confidence of Anomaly Detectors in Their Example-Wise Predictions. In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Springer, Cham, doi: 10.1007/978-3-030-67664-3_14.

predict_proba(X: ndarray) ndarray

Predict anomaly probabilities

Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.

Return type:

array-like of shape (n_samples)

Raises:
  • ValueError – If scores is not a valid array.

  • ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.

save(path: str | Path) None

Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.

Parameters:

path (str or Path) – Location where to store the detector.

class dtaianomaly.anomaly_detection.baselines.RandomDetector(seed: int | None = None)[source]

Baseline anomaly detector, which assigns random anomaly scores. This detector should only be used for sanity-check, and not to effectively detect anomalies in time series data.

Parameters:

seed (int, default=None) – The seed to use for generating anomaly scores. If None, no seed will be used.

check_is_fitted() None

Check whether this anomaly detector is fitted or not.

Raises:

NotFittedError – If this detector is not fitted yet.

decision_function(X: ndarray) array

Abstract method, compute anomaly scores.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

decision_scores – The computed anomaly scores.

Return type:

array-like of shape (n_samples)

fit(X: ndarray, y: ndarray | None = None, **kwargs) BaseDetector

Abstract method, fit this detector to the given data.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – Input time series.

  • y (array-like, default=None) – Ground-truth information.

Returns:

self – Returns the instance itself.

Return type:

BaseDetector

is_fitted() bool

Return whether this anomaly detector is fitted.

Returns:

is_fitted – True if and only if this detector is fitted, and can be used for detecting anomalies.

Return type:

bool

predict_confidence(X: ndarray, X_train: ndarray = None, contamination: float = 0.05, decision_scores_given: bool = False)

Predict the confidence of the anomaly scores on the test given test data.

This method implements ExCeeD [perini2020quantifying] (Example-wise Confidence of anomaly Detectors) to estimate the confidence. ExCeed transforms the predicted decision scores to probability estimates using a Bayesian approach, which enables to assign a confidence score to each prediction which captures the uncertainty of the anomaly detector in that prediction.

Parameters:
  • X (array-like of shape (n_samples, n_attributes)) – The test time series for which the confidence of anomaly scores should be predicted.

  • X_train (array-like of shape (n_samples_train, n_attributes), default=None) – The training time series, which can be used as reference. If X_train=None, the test set is used as reference set.

  • contamination (float, default=0.05) – The (estimated) contamination rate for the data, i.e., the expected percentage of anomalies.

  • decision_scores_given (bool, default=False) – Whether the given X and X_train represent time series data or decision scores. If decision_scores_given=False (default), then the given arrays are interpreted as time series. Otherwise, they are interpreted as decision scores, as computed by decision_function().

Returns:

confidence – The confidence of this anomaly detector in each prediction in the given test time series.

Return type:

array-like of shape (n_samples)

References

[perini2020quantifying]

Perini, L., Vercruyssen, V., Davis, J. Quantifying the Confidence of Anomaly Detectors in Their Example-Wise Predictions. In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Springer, Cham, doi: 10.1007/978-3-030-67664-3_14.

predict_proba(X: ndarray) ndarray

Predict anomaly probabilities

Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.

Return type:

array-like of shape (n_samples)

Raises:
  • ValueError – If scores is not a valid array.

  • ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.

save(path: str | Path) None

Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.

Parameters:

path (str or Path) – Location where to store the detector.