Thresholding module

This module contains thresholding functionality. It can be imported as follows:

>>> from dtaianomaly import thresholding

Thresholding is required to convert raw anomaly scores from a detector, obtained via the dtaianomaly.anomaly_detection.BaseDetector.decision_function(), to binary predictions (anomaly or not).

Custom thresholders can be implemented by extending the base dtaianomaly.thresholding.Thresholding class.

class dtaianomaly.thresholding.Thresholding[source]
abstract threshold(scores: ndarray) ndarray[source]

Apply the thresholding operation to the given anomaly scores

Parameters:

scores (array-like of shape (n_samples)) – The continuous anomaly scores to convert to binary anomaly labels.

Returns:

anomaly_labels – The discrete anomaly labels, in which a 0 indicates normal and a 1 indicates anomalous.

Return type:

array-like of shape (n_samples)

class dtaianomaly.thresholding.FixedCutoff(cutoff: float)[source]

Thresholding based on a fixed cut-off.

Values higher than the cut-off are considered anomalous (1), values below the cut-off are considered normal (0).

Parameters:

cutoff (float) – The cutoff above which the given anomaly scores indicate an anomaly.

threshold(scores: ndarray)[source]

Apply the cut-off thresholding.

Parameters:

scores (array-like (n_samples)) – Raw anomaly scores

Returns:

anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If scores is not a valid array

class dtaianomaly.thresholding.ContaminationRate(contamination_rate: float)[source]

Thresholding based on a contamination rate.

The top contamination_rate proportion of anomaly scores are considered anomalous (1), Other (lower) scores are considered normal (0).

Parameters:

contamination_rate (float) – The contamination_rate, i.e., the percentage of instances that are anomalous.

threshold(scores: ndarray)[source]

Apply the contamination-rate thresholding.

Parameters:

scores (array-like (n_samples)) – Raw anomaly scores

Returns:

anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If scores is not a valid array

class dtaianomaly.thresholding.TopN(n: int)[source]

Thresholding based on a top N strategy.

The top n anomaly scores are considered anomalous (1), Other (lower) scores are considered normal (0).

Parameters:

n (int) – The number of instances that should be flagged as an anomaly

threshold(scores: ndarray)[source]

Apply the top-N thresholding.

Parameters:

scores (array-like (n_samples)) – Raw anomaly scores

Returns:

anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If scores is not a valid array