JobBasedWorkflow

class dtaianomaly.workflow.JobBasedWorkflow(jobs: list[Job], metrics: Metric | list[Metric], thresholds: Thresholding | list[Thresholding] = None, n_jobs: int = 1, trace_memory: bool = False, anomaly_scores_path: str = None, error_log_path: str = './error_logs', fit_unsupervised_on_test_data: bool = False, fit_semi_supervised_on_test_data: bool = False, show_progress: bool = False)[source]

Run anomaly detection experiments.

Runs an experiment for each given Job. If an error occurs in any execution of an anomaly detector or loading of data, then the error will be written to an error file, which is an executable Python file to reproduce the error.

Parameters:

jobslist of Job: The jobs to execute within this workflow.
metricsMetric or list of Metric: The metrics to evaluate within this workflow.
thresholdsThresholding or list of Thresholding, default=None: The thresholds used for converting continuous anomaly scores to binary anomaly predictions. Each threshold will be combined with each BinaryMetric given via the metrics parameter. The thresholds do not apply on a ProbaMetric. If equals None or an empty list, then all the given metrics via the metrics argument must be of type ProbaMetric. Otherwise, a ValueError will be raised.
n_jobsint, default=1: Number of processes to run in parallel while evaluating all combinations.
trace_memorybool, default=False: Whether or not memory usage of each run is reported. While this might give additional insights into the models, their runtime will be higher due to additional internal bookkeeping.
anomaly_scores_pathstr, default=None: The path where the anomaly scores should be saved. If None, the anomaly scores will not be saved.
error_log_pathstr, default=’./error_logs’: The path in which the error logs should be saved.
fit_unsupervised_on_test_databool, default=False: Whether to fit the unsupervised anomaly detectors on the test data. If True, then the test data will be used to fit the detector and to evaluate the detector. This is no issue, since unsupervised detectors do not use labels and can deal with anomalies in the training data.
fit_semi_supervised_on_test_databool, default=False: Whether to fit the semi-supervised anomaly detectors on the test data. If True, then the test data will be used to fit the detector and to evaluate the detector. This is not really an issue, because it only breaks the assumption of semi-supervised methods of normal training data. However, these methods do not use the training labels themselves.
show_progressbool, default=False: Whether to show the progress using a TQDM progress bar or not.

Note

Ensure tqdm installed for this (which is not part of the core dependencies of dtaianomaly). Otherwise, no progress bar will be shown.

Examples

>>> from dtaianomaly.data import DemonstrationTimeSeriesLoader
>>> from dtaianomaly.anomaly_detection import MatrixProfileDetector, IsolationForest
>>> from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR
>>> from dtaianomaly.preprocessing import StandardScaler, MinMaxScaler
>>> from dtaianomaly.workflow import JobBasedWorkflow, Job
>>> workflow = JobBasedWorkflow(
...     jobs=[
...         Job(
...             dataloader=DemonstrationTimeSeriesLoader(),
...             detector=IsolationForest(15),
...         ),
...         Job(
...             dataloader=DemonstrationTimeSeriesLoader(),
...             preprocessor=StandardScaler(),
...             detector=IsolationForest(15),
...         ),
...         Job(
...             dataloader=DemonstrationTimeSeriesLoader(),
...             preprocessor=MinMaxScaler(),
...             detector=IsolationForest(15),
...         ),
...     ],
...     metrics=[AreaUnderROC(), AreaUnderPR()]
... )
>>> workflow.run()

jobs(iterable=(), /): alias of list[Job]

run(**kwargs) → DataFrame[source]

Run the experimental workflow.

Evaluate each pipeline within this workflow on each dataset within this workflow in a grid-like manner.

Parameters:

**kwargs: Additional parameters to be passed to the fit method of the anomaly detector.

Returns:

pd.DataFrame: A pandas dataframe with the results of this workflow. Each row represents an execution of an anomaly detector on a given dataset with some preprocessing steps. The columns correspond to the different evaluation metrics, running time and potentially also the memory usage.