Workflow module

This module contains the workflow functionality.

>>> from dtaianomaly import workflow

Below we illustrate how a simple workflow can be initialized, which will apply Matrix Profile and Isolation Forest on a dataset from the UCR archive, and compute the area under the ROC and PR curves:

>>> from dtaianomaly.data import UCRLoader
>>> from dtaianomaly.anomaly_detection import MatrixProfileDetector, IsolationForest
>>> from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR
>>> workflow = workflow.Workflow(
...     dataloaders=[
...         UCRLoader(path='data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt'),
...     ],
...     detectors=[MatrixProfileDetector(window_size=100), IsolationForest(15)],
...     metrics=[AreaUnderROC(), AreaUnderPR()]
... )

We refer to the documentation for more information regarding the configuration and use of a Workflow.

class dtaianomaly.workflow.Workflow(dataloaders: LazyDataLoader | List[LazyDataLoader], metrics: Metric | List[Metric], detectors: BaseDetector | List[BaseDetector], preprocessors: Preprocessor | List[Preprocessor] = None, thresholds: Thresholding | List[Thresholding] = None, n_jobs: int = 1, trace_memory: bool = False)[source]

Run anomaly detection experiments

Run all combinations of dataloaders, preprocessors, detectors, and metrics. The metrics requiring a thresholding operation are combined with every element of thresholds.

Parameters:
  • dataloaders (LazyDataLoader or list of LazyDataLoader) – The dataloaders that will be used to load data, and consequently this data is used for evaluation within this workflow.

  • metrics (Metric or list of Metric) – The metrics to evaluate within this workflow.

  • detectors (BaseDetector or list of BaseDetector) – The anomaly detectors to evaluate.

  • thresholds (Thresholding or list of Thresholding, default=None) – The thresholds used for converting continuous anomaly scores to binary anomaly predictions. Each threshold will be combined with each BinaryMetric given via the metrics parameter. The thresholds do not apply on a ProbaMetric. If equals None or an empty list, then all the given metrics via the metrics argument must be of type ProbaMetric. Otherwise, a ValueError will be raised.

  • preprocessors (Preprocessor or list of Preprocessor, default=None) – The preprocessors to apply before evaluating the model. If equals None or an empty list, then no preprocssing will be done, aka. using dtaianomaly.preprocessing.Preprocessor as the preprocessor for each pipeline.

  • n_jobs (int, default=1) – Number of processes to run in parallel while evaluating all combinations.

  • trace_memory (bool, default=False) – Whether or not memory usage of each run is reported. While this might give additional insights into the models, their runtime will be higher due to additional internal bookkeeping.

run() DataFrame[source]

Run the experimental workflow. Evaluate each pipeline within this workflow on each dataset within this workflow in a grid-like manner.

Returns:

results – A pandas dataframe with the results of this workflow. Each row represents an execution of an anomaly detector on a given dataset with some preprocessing steps. The columns correspond to the different evaluation metrics, running time and potentially also the memory usage.

Return type:

pd.DataFrame

dtaianomaly.workflow.workflow_from_config(path: str, max_size: int = 1000000)[source]

Construct a Workflow instance based on a JSON file. The file is first parsed, and then interpreted to obtain a Workflow

Parameters:
  • path (str) – Path to the config file in JSON format

  • max_size (int, optional) – Maximal size of the config file in bytes. Defaults to 1 MB.

Returns:

workflow – The parsed workflow from the given config file.

Return type:

Workflow

Raises:
  • TypeError – If the given path is not a string.

  • FileNotFoundError – If the given path does not correspond to an existing file.

  • ValueError – If the given path does not refer to a json file.

dtaianomaly.workflow.interpret_config(config: dict)[source]

Actual parsing/interpretation logic

All the different _interpret_* functions below check the config for the corresponding dtaianomaly objects. These functions should be extended when the full package is extended.

Parameters:

config (dict) – The config to parse

Returns:

Containing all the components specified in the config

Return type:

Workflow