Quantitative evaluation with a workflow

It is crucial to qualitatively evaluate the performance of anomaly detectors to know their capabilities. For this, dtaianomaly offers the Workflow: detect anomalies in a large set of time series using various detectors, and to measure their performance using multiple evaluation criteria. The Workflow facilitates the validation of the anomaly detectors, because you only need to define the different components.

There are two ways to run a Workflow: from Python or from a configuration file.

Note

You can also evaluate custom components in dtaianomaly via a Workflow in Python. However, this is not possible via a configuration file without extending the functionality of the workflow_from_config() function!

Run a workflow from Python

We first need to initialize the different components of the Workflow. We start by creating a list of LazyDataLoader objects. We manually selected two time series to use for evaluation, but alternatively you can use all datasets in some directory using the from_directory() method in the data module.

dataloaders = [
    UCRLoader('data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt'),
    UCRLoader('data/UCR-time-series-anomaly-archive/002_UCR_Anomaly_DISTORTED2sddb40_35000_56600_56900.txt')
]

Next, we initialize a number of Identity preprocessor, to analyze what happens if no preprocessing is applied.

preprocessors = [
    Identity(),
    StandardScaler(),
    ChainedPreprocessor([MovingAverage(10), StandardScaler()]),
    ChainedPreprocessor([ExponentialMovingAverage(0.8), StandardScaler()])
]

We will now initialize our anomaly detectors. Each anomaly detector will be combined with each preprocessor, and applied to each time series.

detectors = [LocalOutlierFactor(50), IsolationForest(50)]

Finally, we need to define the BinaryMetric and ProbaMetric can be provided. However, the workflow evaluates the scores obtained by the predict_proba(). To evaluate a BinaryMetric, a number of thresholding strategies must be provided to convert the continuous anomaly probabilities to discrete anomaly labels. Each thresholding strategy is combined with each thresholding metric. The thresholds have no effect on the ProbaMetric objects.

Note

To save on computational resources, the anomaly detector is used once to detect anomalies in a time series, and the predicted anomaly scores are used to evaluate all anomaly scores. This means that there is no computational overhead on providing more metrics, besides the resources required to compute the metric.

thresholds = [TopN(20), FixedCutoff(0.1)]
metrics = [Precision(), AreaUnderPR(), AreaUnderROC()]

Once all components are defined, we initialize the Workflow. We also define additional parameters, such n_jobs, to allow for multiple anomaly detectors to detect anomalies in parallel. Then, we can execute the workflow by calling the run() method, which returns a dataframe with the results.

workflow = Workflow(
    dataloaders=dataloaders,
    metrics=metrics,
    thresholds=thresholds,
    preprocessors=preprocessors,
    detectors=detectors,
    n_jobs=4
)
results = workflow.run()

Run a workflow from a configuration file

Alternatively, you can define a workflow using JSON configuration files. The file Config.json illustrates how the workflow defined above can be written as a configuration file. More details regarding the syntax are provided below. Using the workflow_from_config() method, you can pass the path to a configuration file to create the corresponding Workflow, as shown in the example below. Then, you can run the Workflow via the run() function.

from dtaianomaly.workflow import workflow_from_config
workflow = workflow_from_config("Config.json")
workflow.run()

A configuration file is build from different entries, with each entry representing a component of the Workflow. These entries are build as follows:

{ "type": <name-of-component>, "optional-param": <value-optional-parameter>}

The 'type' equals the name of the component, for example 'LocalOutlierFactor' or 'StandardScaler'. This string must exactly match the object name of the component you want to add to the workflow. In addition, it is possible to define hyperparameters of each component. For example for 'LocalOutlierFactor', you must define a 'window_size', but can optionally also define a 'stride'. An error will be raised if the entry has missing obligated parameters or unknown parameters.

The configuration file itself is also a dictionary, in JSON format. The keys of this dictionary correspond to the parameters of the Workflow. The corresponding values can be either a single entry (if one component is requested) or a list of entries (if multiple components are requested).

Below, we show a simplified version of the configuration in Config.json.

{
    "dataloaders": [
        {
            "type": "UCRLoader",
            "path":"../data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt"
        },
        {
            "type": "UCRLoader",
            "path":"../data/UCR-time-series-anomaly-archive/002_UCR_Anomaly_DISTORTED2sddb40_35000_56600_56900.txt"
        }
    ],
    "metrics": [
        {"type": "Precision"},
        {"type": "AreaUnderPR"},
        {"type": "AreaUnderROC"}
    ],
    "thresholds": [
        {"type": "TopN", "n": 20},
        {"type": "FixedCutoff", "cutoff": 0.5}
    ],
    "preprocessors": [
        {"type": "Identity"},
        {"type": "StandardScaler"},
        {"type": "ChainedPreprocessor", "base_preprocessors":  [
            {"type": "MovingAverage", "window_size": 10},
            {"type": "StandardScaler"}
        ]},
        {"type": "ChainedPreprocessor", "base_preprocessors":  [
            {"type": "ExponentialMovingAverage", "alpha": 0.8},
            {"type": "StandardScaler"}
        ]}
    ],
    "detectors": [
        {"type": "LocalOutlierFactor", "window_size": 50},
        {"type": "IsolationForest", "window_size": 50}
    ],
    "n_jobs": 4
}