Quantitative evaluation with a workflow
It is crucial to qualitatively evaluate the performance of anomaly detectors
to know their capabilities. For this, dtaianomaly offers the Workflow:
detect anomalies in a large set of time series using various detectors, and to measure
their performance using multiple evaluation criteria. The Workflow
facilitates the validation of the anomaly detectors, because you only need to define
the different components.
There are two ways to run a Workflow: from Python
or from a configuration file.
Note
You can also evaluate custom components
in dtaianomaly via a Workflow in Python. However,
this is not possible via a configuration file without extending the functionality of
the workflow_from_config() function!
Run a workflow from Python
We first need to initialize the different components of the Workflow.
We start by creating a list of LazyDataLoader objects. We manually selected
two time series to use for evaluation, but alternatively you can use all datasets in some directory using
the from_directory() method in the data module.
dataloaders = [
UCRLoader('../data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt'),
UCRLoader('../data/UCR-time-series-anomaly-archive/002_UCR_Anomaly_DISTORTED2sddb40_35000_56600_56900.txt')
]
Next, we initialize a number of Identity preprocessor, to analyze what happens if no preprocessing
is applied.
preprocessors = [
Identity(),
ZNormalizer(),
ChainedPreprocessor([MovingAverage(10), ZNormalizer()]),
ChainedPreprocessor([ExponentialMovingAverage(0.8), ZNormalizer()])
]
We will now initialize our anomaly detectors. Each anomaly detector will be combined with each preprocessor, and applied to each time series.
detectors = [LocalOutlierFactor(50), IsolationForest(50)]
Finally, we need to define the BinaryMetric and ProbaMetric
can be provided. However, the workflow evaluates the scores obtained by the predict_proba().
To evaluate a BinaryMetric, a number of thresholding strategies must
be provided to convert the continuous anomaly probabilities to discrete anomaly labels. Each thresholding
strategy is combined with each thresholding metric. The thresholds have no effect on the
ProbaMetric objects.
Note
To save on computational resources, the anomaly detector is used once to detect anomalies in a time series, and the predicted anomaly scores are used to evaluate all anomaly scores. This means that there is no computational overhead on providing more metrics, besides the resources required to compute the metric.
thresholds = [TopN(20), FixedCutoff(0.1)]
metrics = [Precision(), AreaUnderPR(), AreaUnderROC()]
Once all components are defined, we initialize the Workflow. We also
define additional parameters, such n_jobs, to allow for multiple anomaly detectors to detect anomalies
in parallel. Then, we can execute the workflow by calling the run()
method, which returns a dataframe with the results.
workflow = Workflow(
dataloaders=dataloaders,
metrics=metrics,
thresholds=thresholds,
preprocessors=preprocessors,
detectors=detectors,
n_jobs=4
)
results = workflow.run()
Run a workflow from a configuration file
Alternatively, you can define a workflow using JSON configuration files. The file
Config.json illustrates how the workflow defined above can be written as a
configuration file. More details regarding the syntax are provided below. Using the
workflow_from_config() method, you can pass the path
to a configuration file to create the corresponding Workflow,
as shown in the example below. Then, you can run the Workflow
via the run() function.
from dtaianomaly.workflow import workflow_from_config
workflow = workflow_from_config("Config.json")
workflow.run()
A configuration file is build from different entries, with each entry representing a
component of the Workflow. These entries are build
as follows:
.. code-block:: json
{ ‘type’: <name-of-component>, ‘optional-param’: <value-optional-parameter>}
The 'type' equals the name of the component, for example 'LocalOutlierFactor'
or 'ZNormalizer'. This string must exactly match the object name of the component
you want to add to the workflow. In addition, it is possible to define hyperparameters
of each component. For example for 'LocalOutlierFactor', you must define a
'window_size', but can optionally also define a 'stride'. An error will be
raised if the entry has missing obligated parameters or unknown parameters.
The configuration file itself is also a dictionary, in JSON format. The keys of this
dictionary correspond to the parameters of the Workflow.
The corresponding values can be either a single entry (if one component is requested)
or a list of entries (if multiple components are requested).
Below, we show a simplified version of the configuration in Config.json.
{
"dataloaders": [
{
"type": "UCRLoader",
"path":"../data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt"
},
{
"type": "UCRLoader",
"path":"../data/UCR-time-series-anomaly-archive/002_UCR_Anomaly_DISTORTED2sddb40_35000_56600_56900.txt"
}
],
"metrics": [
{"type": "Precision"},
{"type": "AreaUnderPR"},
{"type": "AreaUnderROC"}
],
"thresholds": [
{"type": "TopN", "n": 20},
{"type": "FixedCutoff", "cutoff": 0.5}
],
"preprocessors": [
{"type": "Identity"},
{"type": "ZNormalizer"},
{"type": "ChainedPreprocessor", "base_preprocessors": [
{"type": "MovingAverage", "window_size": 10},
{"type": "ZNormalizer"}
]},
{"type": "ChainedPreprocessor", "base_preprocessors": [
{"type": "ExponentialMovingAverage", "alpha": 0.8},
{"type": "ZNormalizer"}
]}
],
"detectors": [
{"type": "LocalOutlierFactor", "window_size": 50},
{"type": "IsolationForest", "window_size": 50}
],
"n_jobs": 4
}