Pipeline module

This module contains functionality to combine preprocessing, anomaly detection, and evaluation in a single wrapped object.

>>> from dtaianomaly import pipeline

Users are not expected to extend the base Pipeline objects as they are wrappers of underlying dtaianomaly objects. Custom functionality is better achieved by implementing the dtaianomaly.preprocessing.Preprocessor, dtaianomaly.anomaly_detection.BaseDetector or dtaianomaly.evaluation.Metric objects.

class dtaianomaly.pipeline.Pipeline(preprocessor: Preprocessor | List[Preprocessor], detector: BaseDetector)[source]

Pipeline to combine preprocessing and anomaly detection

The pipeline works with a single Preprocessor object or a list of Preprocessor objects. This list is converted into a ChainedPreprocessor. At the moment the Pipeline always requires a Preprocessor object passed at construction. If no preprocessing is desired, you need to explicitly pass an Identity preprocessor.

Parameters:
  • preprocessor (Preprocessor or list of Preprocessors) – The preprocessors to include in this pipeline.

  • detector (BaseDetector) – The anomaly detector to include in this pipeline.

class dtaianomaly.pipeline.EvaluationPipeline(preprocessor: Preprocessor | List[Preprocessor], detector: BaseDetector, metrics: ProbaMetric | List[ProbaMetric])[source]

Pipeline to combine a base pipeline, and a set of metrics. Used in the workflow. The given Preprocessor and BaseDetector are combined into a Pipeline object.

Parameters:
  • preprocessor (Preprocessor or list of Preprocessors) – The preprocessors to include in this evaluation pipeline.

  • detector (BaseDetector) – The anomaly detector to include in this evaluation pipeline.

  • metrics (list of ProbaMetric objects) – The evaluation metrics to compute in this evaluation pipeline.

evaluate(y_test_: array, y_pred: array) Dict[str, float][source]

Evaluate this pipeline by computing the evaluation scores.

Parameters:
  • y_test (array-like of shape (n_samples)) – The formatted ground truth anomaly labels.

  • y_pred (array-like of shape (n_samples)) – The predicted anomaly scores.

Returns:

performances – The evaluation of the performance metrics. The keys are string descriptors of the performance metrics, with values the corresponding performance score.

Return type:

Dict[str, float]

fit(X_train: ndarray, y_train: array | None, **kwargs) None[source]

Apply the fit stage of this evaluation pipeline.

Parameters:
  • X_train (array-like of shape (n_samples, n_attributes)) – The train time series data.

  • y_train (array-like of shape (n_samples) or None.) – The ground truth anomaly labels of the train data. Note that, even though y_train can be None, it must be provided (i.e., there is no default value).

format_y_test(X_test: ndarray, y_test: array) array[source]

Format the test labels using the preprocessor in this pipeline. This is necessary if some preprocessors are used that undersample the data.

Parameters:
  • X_test (array-like of shape (n_samples, n_attributes)) – The test time series data.

  • y_test (array-like of shape (n_samples)) – The ground truth anomaly labels of the test data.

Returns:

y_test_ – The formatted ground truth labels.

Return type:

array-like of shape (n_samples_)

predict(X_test: ndarray)[source]

Apply the predict stage of the pipeline.

Parameters:

X_test (array-like of shape (n_samples, n_attributes)) – The test time series data.

Returns:

y_pred – The predicted anomaly scores.

Return type:

array-like of shape (n_samples)