Thresholding module
This module contains thresholding functionality. It can be imported as follows:
>>> from dtaianomaly import thresholding
Thresholding is required to convert raw anomaly scores from a detector,
obtained via the dtaianomaly.anomaly_detection.BaseDetector.decision_function(),
to binary predictions (anomaly or not).
Custom thresholders can be implemented by extending the base dtaianomaly.thresholding.Thresholding class.
- class dtaianomaly.thresholding.Thresholding[source]
- abstract threshold(scores: ndarray) ndarray[source]
Apply the thresholding operation to the given anomaly scores
- Parameters:
scores (array-like of shape (n_samples)) – The continuous anomaly scores to convert to binary anomaly labels.
- Returns:
anomaly_labels – The discrete anomaly labels, in which a 0 indicates normal and a 1 indicates anomalous.
- Return type:
array-like of shape (n_samples)
- class dtaianomaly.thresholding.FixedCutoff(cutoff: float)[source]
Thresholding based on a fixed cut-off.
Values higher than the cut-off are considered anomalous (1), values below the cut-off are considered normal (0).
- Parameters:
cutoff (float) – The cutoff above which the given anomaly scores indicate an anomaly.
- threshold(scores: ndarray)[source]
Apply the cut-off thresholding.
- Parameters:
scores (array-like (n_samples)) – Raw anomaly scores
- Returns:
anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively
- Return type:
array-like of shape (n_samples)
- Raises:
ValueError – If scores is not a valid array
- class dtaianomaly.thresholding.ContaminationRate(contamination_rate: float)[source]
Thresholding based on a contamination rate.
The top contamination_rate proportion of anomaly scores are considered anomalous (1), Other (lower) scores are considered normal (0).
- Parameters:
contamination_rate (float) – The contamination_rate, i.e., the percentage of instances that are anomalous.
- threshold(scores: ndarray)[source]
Apply the contamination-rate thresholding.
- Parameters:
scores (array-like (n_samples)) – Raw anomaly scores
- Returns:
anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively
- Return type:
array-like of shape (n_samples)
- Raises:
ValueError – If scores is not a valid array
- class dtaianomaly.thresholding.TopN(n: int)[source]
Thresholding based on a top N strategy.
The top n anomaly scores are considered anomalous (1), Other (lower) scores are considered normal (0).
- Parameters:
n (int) – The number of instances that should be flagged as an anomaly
- threshold(scores: ndarray)[source]
Apply the top-N thresholding.
- Parameters:
scores (array-like (n_samples)) – Raw anomaly scores
- Returns:
anomaly_labels – Integer array of 1s and 0s, representing anomalous samples and normal samples respectively
- Return type:
array-like of shape (n_samples)
- Raises:
ValueError – If scores is not a valid array