Workflow module
This module contains the workflow functionality.
>>> from dtaianomaly import workflow
Below we illustrate how a simple workflow can be initialized, which will apply Matrix Profile and Isolation Forest on a dataset from the UCR archive, and compute the area under the ROC and PR curves:
>>> from dtaianomaly.data import UCRLoader
>>> from dtaianomaly.anomaly_detection import MatrixProfileDetector, IsolationForest
>>> from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR
>>> workflow = workflow.Workflow(
... dataloaders=[
... UCRLoader(path='data/UCR-time-series-anomaly-archive/001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt'),
... ],
... detectors=[MatrixProfileDetector(window_size=100), IsolationForest(15)],
... metrics=[AreaUnderROC(), AreaUnderPR()]
... )
We refer to the documentation for more information regarding the configuration and use of a Workflow.
- class dtaianomaly.workflow.Workflow(dataloaders: LazyDataLoader | List[LazyDataLoader], metrics: Metric | List[Metric], detectors: BaseDetector | List[BaseDetector], preprocessors: Preprocessor | List[Preprocessor] = None, thresholds: Thresholding | List[Thresholding] = None, n_jobs: int = 1, trace_memory: bool = False)[source]
Run anomaly detection experiments
Run all combinations of
dataloaders,preprocessors,detectors, andmetrics. The metrics requiring a thresholding operation are combined with every element ofthresholds.- Parameters:
dataloaders (LazyDataLoader or list of LazyDataLoader) – The dataloaders that will be used to load data, and consequently this data is used for evaluation within this workflow.
metrics (Metric or list of Metric) – The metrics to evaluate within this workflow.
detectors (BaseDetector or list of BaseDetector) – The anomaly detectors to evaluate.
thresholds (Thresholding or list of Thresholding, default=None) – The thresholds used for converting continuous anomaly scores to binary anomaly predictions. Each threshold will be combined with each
BinaryMetricgiven via themetricsparameter. The thresholds do not apply on aProbaMetric. If equals None or an empty list, then all the given metrics via themetricsargument must be of typeProbaMetric. Otherwise, a ValueError will be raised.preprocessors (Preprocessor or list of Preprocessor, default=None) – The preprocessors to apply before evaluating the model. If equals None or an empty list, then no preprocssing will be done, aka. using
dtaianomaly.preprocessing.Preprocessoras the preprocessor for each pipeline.n_jobs (int, default=1) – Number of processes to run in parallel while evaluating all combinations.
trace_memory (bool, default=False) – Whether or not memory usage of each run is reported. While this might give additional insights into the models, their runtime will be higher due to additional internal bookkeeping.
- run() DataFrame[source]
Run the experimental workflow. Evaluate each pipeline within this workflow on each dataset within this workflow in a grid-like manner.
- Returns:
results – A pandas dataframe with the results of this workflow. Each row represents an execution of an anomaly detector on a given dataset with some preprocessing steps. The columns correspond to the different evaluation metrics, running time and potentially also the memory usage.
- Return type:
pd.DataFrame
- dtaianomaly.workflow.workflow_from_config(path: str, max_size: int = 1000000)[source]
Construct a Workflow instance based on a JSON file. The file is first parsed, and then interpreted to obtain a
Workflow- Parameters:
path (str) – Path to the config file in JSON format
max_size (int, optional) – Maximal size of the config file in bytes. Defaults to 1 MB.
- Returns:
workflow – The parsed workflow from the given config file.
- Return type:
- Raises:
TypeError – If the given path is not a string.
FileNotFoundError – If the given path does not correspond to an existing file.
ValueError – If the given path does not refer to a json file.
- dtaianomaly.workflow.interpret_config(config: dict)[source]
Actual parsing/interpretation logic
All the different _interpret_* functions below check the config for the corresponding dtaianomaly objects. These functions should be extended when the full package is extended.
- Parameters:
config (dict) – The config to parse
- Returns:
Containing all the components specified in the config
- Return type: