Volume Under the Surface (VUS)
Implementation of the Volume Under the Surface (VUS) metrics proposed by [19] The implementations are adopted from [27], who slightly modified the original implementations:
For the recall (FPR) existence reward, anomalies are counted as separate events, even if the added slopes overlap;
Overlapping slopes don’t sum up in their anomaly weight, the anomaly weight for each point in the ground truth is maximized;
The original slopes are asymmetric: the slopes at the end of anomalies are a single point shorter than the ones at the beginning of anomalies. Symmetric slopes are used, with the same size for the beginning and end of anomalies;
A linear approximation of the slopes is used instead of the convex slope shape presented in the paper.
By default, the adjusted versions of each metric are used. To use the original implementations,
you can set compatibility_mode=True when initializing the metrics.
In addition, we numbafied the most expensive part of the code (i.e., computing the recalls, precisions and false positive rates for every threshold), which leads to a more than 25x speedup on the demonstration time series.
- class dtaianomaly.evaluation.RangeAreaUnderPR(buffer_size: int = None, compatibility_mode: bool = False, max_samples: int = 250)[source]
Computes the area under the range-based precision-recall-curve [19].
A slope of length
buffer_size // 2is added at the beginning and end of each anomalous event. Next, the precision and recall is computed, taking into account the slopes in ground truth labels to allow for some small misalignment in the predicted and actual anomalous events. Then,max_samplesthresholds are sampled uniformly from the anomaly scores to compute the new precision and recall, after which the area under the curve can be computed as final evaluation score.- Parameters:
buffer_size (int, default=None) – Size of the buffer region around an anomaly. We add an increasing slope of size
buffer_size//2to the beginning of anomalies and a decreasing slope of sizebuffer_size//2to the end of anomalies. Per default (whenbuffer_size==None),buffer_sizeis the median length of the anomalies within the time series. However, you can also set it to the period size of the dominant frequency or any other desired value.compatibility_mode (bool, default=False) – When set to
True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.max_samples (int, default=250) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
- class dtaianomaly.evaluation.RangeAreaUnderROC(buffer_size: int = None, compatibility_mode: bool = False, max_samples: int = 250)[source]
Computes the area under the range-based ROC-curve [19].
A slope of length
buffer_size // 2is added at the beginning and end of each anomalous event. Next, the false positive rate and true positive rate is computed, taking into account the slopes in ground truth labels to allow for some small misalignment in the predicted and actual anomalous events. Then,max_samplesthresholds are sampled uniformly from the anomaly scores to compute the new FPR and TPR, after which the area under the curve can be computed as final evaluation score.- Parameters:
buffer_size (int, default=None) – Size of the buffer region around an anomaly. We add an increasing slope of size
buffer_size//2to the beginning of anomalies and a decreasing slope of sizebuffer_size//2to the end of anomalies. Per default (whenbuffer_size==None),buffer_sizeis the median length of the anomalies within the time series. However, you can also set it to the period size of the dominant frequency or any other desired value.compatibility_mode (bool, default=False) – When set to
True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.max_samples (int, default= 250) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
- class dtaianomaly.evaluation.VolumeUnderPR(max_buffer_size: int = 500, compatibility_mode: bool = False, max_samples: int = 250)[source]
Computes the volume under the range-based precision-recall-curve [19].
Create a buffer around the anomalous event (similar as for
RangeAreaUnderPR) for each buffer size in the range[0, max_buffer_size]. Then,max_samplesthresholds are sampled uniformly from the anomaly scores to compute the new precision and recall for each buffer size. Also varying the buffer size results in a volume (instead of a curve), and the final evaluation score is computed as the volume under this surface.- Parameters:
max_buffer_size (int, default=500) – Maximum size of the buffer region around an anomaly. We iterate over all buffer sizes from 0 to
may_buffer_sizeto create the surface.compatibility_mode (bool, default=False) – When set to
True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.max_samples (int, default=250) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
- class dtaianomaly.evaluation.VolumeUnderROC(max_buffer_size: int = 500, compatibility_mode: bool = False, max_samples: int = 250)[source]
Computes the volume under the range-based ROC-curve [19].
Create a buffer around the anomalous event (similar as for
RangeAreaUnderROC) for each buffer size in the range[0, max_buffer_size]. Then,max_samplesthresholds are sampled uniformly from the anomaly scores to compute the new FPR and TPR for each buffer size. Also varying the buffer size results in a volume (instead of a curve), and the final evaluation score is computed as the volume under this surface.- Parameters:
max_buffer_size (int, default=500) – Maximum size of the buffer region around an anomaly. We iterate over all buffer sizes from 0 to
may_buffer_sizeto create the surface.compatibility_mode (bool, default=False) – When set to
True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.max_samples (int, default=250) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.