ContaminationRateThreshold

class dtaianomaly.thresholding.ContaminationRateThreshold(contamination_rate: float)[source]

Thresholding based on a contamination rate.

The top contamination_rate proportion of anomaly scores are considered anomalous (1), Other (lower) scores are considered normal (0).

Parameters:
contamination_ratefloat

The contamination_rate, i.e., the percentage of instances that are anomalous.

Examples

>>> from dtaianomaly.thresholding import ContaminationRateThreshold
>>> thresholder = ContaminationRateThreshold(0.25)
>>> thresholder.threshold([0.1, 0.2, 0.3, 0.6, 0.8, 0.5, 0.3, 0.3])
array([0, 0, 0, 1, 1, 0, 0, 0])
threshold(scores: ndarray) ndarray

Threshold the given anomaly scores.

Apply the thresholding operation to the given anomaly scores. This function will perform the necessary checks and formatting on the anomaly scores, before effectively applying the thresholding.

Parameters:
scoresarray-like of shape (n_samples)

The continuous anomaly scores to convert to binary anomaly labels.

Returns:
array-like of shape (n_samples)

The discrete anomaly labels, in which a 0 indicates normal and a 1 indicates anomalous.

Raises:
ValueError

If scores is not a valid array

ValueError

If scores is not one-dimensional. If all dimensions but one have a size of 1, then no error will be thrown.