Workflow
- class dtaianomaly.workflow.Workflow(dataloaders: LazyDataLoader | list[LazyDataLoader], metrics: Metric | list[Metric], detectors: BaseDetector | list[BaseDetector], preprocessors: Preprocessor | list[Preprocessor] = None, thresholds: Thresholding | list[Thresholding] = None, n_jobs: int = 1, trace_memory: bool = False, anomaly_scores_path: str = None, error_log_path: str = './error_logs', fit_unsupervised_on_test_data: bool = False, fit_semi_supervised_on_test_data: bool = False, show_progress: bool = False)[source]
Run anomaly detection experiments.
Run all combinations of
dataloaders,preprocessors,detectors, andmetrics. The metrics requiring a thresholding operation are combined with every element ofthresholds. If an error occurs in any execution of an anomaly detector or loading of data, then the error will be written to an error file, which is an executable Python file to reproduce the error.- Parameters:
- dataloadersLazyDataLoader or list of LazyDataLoader
The dataloaders that will be used to load data, and consequently this data is used for evaluation within this workflow.
- metricsMetric or list of Metric
The metrics to evaluate within this workflow.
- detectorsBaseDetector or list of BaseDetector
The anomaly detectors to evaluate.
- preprocessorsPreprocessor or list of Preprocessor, default=None
The preprocessors to apply before evaluating the model. If equals None or an empty list, then no preprocessing will be done, aka. using
dtaianomaly.preprocessing.Preprocessoras the preprocessor for each pipeline.- thresholdsThresholding or list of Thresholding, default=None
The thresholds used for converting continuous anomaly scores to binary anomaly predictions. Each threshold will be combined with each
BinaryMetricgiven via themetricsparameter. The thresholds do not apply on aProbaMetric. If equals None or an empty list, then all the given metrics via themetricsargument must be of typeProbaMetric. Otherwise, a ValueError will be raised.- n_jobsint, default=1
Number of processes to run in parallel while evaluating all combinations.
- trace_memorybool, default=False
Whether or not memory usage of each run is reported. While this might give additional insights into the models, their runtime will be higher due to additional internal bookkeeping.
- anomaly_scores_pathstr, default=None
The path where the anomaly scores should be saved. If
None, the anomaly scores will not be saved.- error_log_pathstr, default=’./error_logs’
The path in which the error logs should be saved.
- fit_unsupervised_on_test_databool, default=False
Whether to fit the unsupervised anomaly detectors on the test data. If True, then the test data will be used to fit the detector and to evaluate the detector. This is no issue, since unsupervised detectors do not use labels and can deal with anomalies in the training data.
- fit_semi_supervised_on_test_databool, default=False
Whether to fit the semi-supervised anomaly detectors on the test data. If True, then the test data will be used to fit the detector and to evaluate the detector. This is not really an issue, because it only breaks the assumption of semi-supervised methods of normal training data. However, these methods do not use the training labels themselves.
- show_progressbool, default=False
Whether to show the progress using a TQDM progress bar or not.
Note
Ensure
tqdminstalled for this (which is not part of the core dependencies ofdtaianomaly). Otherwise, no progress bar will be shown.
Examples
>>> from dtaianomaly.data import DemonstrationTimeSeriesLoader >>> from dtaianomaly.anomaly_detection import MatrixProfileDetector, IsolationForest >>> from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR >>> from dtaianomaly.workflow import Workflow >>> workflow = Workflow( ... dataloaders=DemonstrationTimeSeriesLoader(), ... detectors=[MatrixProfileDetector(window_size=100), IsolationForest(15)], ... metrics=[AreaUnderROC(), AreaUnderPR()] ... ) >>> workflow.run()
- run(**kwargs) DataFrame
Run the experimental workflow.
Evaluate each pipeline within this workflow on each dataset within this workflow in a grid-like manner.
- Parameters:
- **kwargs
Additional parameters to be passed to the fit method of the anomaly detector.
- Returns:
- pd.DataFrame
A pandas dataframe with the results of this workflow. Each row represents an execution of an anomaly detector on a given dataset with some preprocessing steps. The columns correspond to the different evaluation metrics, running time and potentially also the memory usage.