:orphan: .. testsetup:: X, y = None, None X_, y_ = None, None preprocessor = None detector = None y_pred = None pipeline = None thresholding = None y_pred_binary = None precision = None recall = None f_1 = None auc_roc, auc_pr = None, None Anomaly detection ================= The core functionality of ``dtaianomaly`` is to offer a simple interface for time series anomaly detection. Below, we illustrate how anomalies can be detected in time series using ``dtaianomaly``. .. note:: Some of the code has not been added to this webpage for clarity reasons. The full code can be found in the `anomaly detection notebook `_. .. note:: Below example illustrates how to detect anomalies in a simple demonstration time series. It is also possible to use ``dtaianomaly`` in industrial datasets, as is shown in the `industrial anomaly detection notebook `_. Load the data ------------- We will illustrate how to detect anomalies with ``dtaianomaly`` using the demonstration time series. This time series can easily be loaded using the :py:func:`~dtaianomaly.data.demonstration_time_series` method and then plotted using the :py:func:`~dtaianomaly.visualization.plot_time_series_colored_by_score` method. .. doctest:: >>> from dtaianomaly.data import demonstration_time_series >>> from dtaianomaly.visualization import plot_time_series_colored_by_score >>> X, y = demonstration_time_series() >>> plot_time_series_colored_by_score(X, y, figsize=(10, 2)) # doctest: +SKIP .. image:: /../notebooks/Demonstration-time-series.svg :align: center :width: 100% Anomaly detection ----------------- Before detecting anomalies, we can preprocess the time series. In this case, we apply :py:class:`~dtaianomaly.preprocessing.MovingAverage` to remove some of the noise from the time series. .. doctest:: >>> from dtaianomaly.preprocessing import MovingAverage >>> preprocessor = MovingAverage(window_size=10) In general, `any anomaly detector `_ in ``dtaianomaly`` can be used to detect anomalies in this time series. Here, we use the :py:class:`~dtaianomaly.anomaly_detection.MatrixProfileDetector` .. doctest:: >>> from dtaianomaly.anomaly_detection import MatrixProfileDetector >>> detector = MatrixProfileDetector(window_size=100) Now that the components have been initialized, we can preprocess the time series and detect anomalies. Note that the preprocessor returns two values, processed data ``X_`` and processed ground truth ``y_``. While :py:class:`~dtaianomaly.preprocessing.MovingAverage` does not process the ground truth, other preprocessors may change the ground truth slightly. For example, :py:class:`~dtaianomaly.preprocessing.SamplingRateUnderSampler` samples both the time series ``X`` and labels ``y``. .. doctest:: >>> X_, y_ = preprocessor.fit_transform(X) >>> y_pred = detector.fit(X_).predict_proba(X_) Now we can plot the data along with the anomaly scores, and see that the predictions nicely align with the anomaly! .. image:: /../notebooks/Demonstration-time-series-detected-anomalies.svg :align: center :width: 100% Anomaly detection with a Pipeline --------------------------------- Above, we manually preprocessed the data and detected anomalies within the processed data. In ``dtaianomaly``, these steps can be performed automatically using a :py:class:`~dtaianomaly.pipeline.Pipeline`. Upon initialization, we simply pass the preprocessors we want to apply, as well as the detector. The fit and predict methods will automatically process the data before detecting anomalies. Note that it is also possible to pass a list of preprocessors to apply multiple preprocessing steps before detecting anomalies. .. doctest:: >>> from dtaianomaly.pipeline import Pipeline >>> pipeline = Pipeline( ... preprocessor=preprocessor, ... detector=detector ... ) >>> y_pred = pipeline.fit(X).predict_proba(X) Quantitative evaluation ----------------------- Besides visually checking the performance of an anomaly detector, it is also important to quantitatively measure how accurately the anomalies are detected. Below, we first compute the :py:class:`~dtaianomaly.evaluation.Precision` and :py:class:`~dtaianomaly.evaluation.Recall`. However, that the precision and recall require binary labels, while the predicted anomaly scores are continuous. For this reason, we apply :py:class:`~dtaianomaly.thresholding.FixedCutoff` thresholding to convert all scores above 0.85 to 1 ("anomaly") and the scores below 0.85 to 0 ("normal"). At this threshold, we see that all anomalous observations are detected (recall=1.0), at the cost of some false positives near the borders of the ground truth anomaly (precision<1). .. doctest:: >>> from dtaianomaly.thresholding import FixedCutoffThreshold >>> from dtaianomaly.evaluation import Precision, Recall >>> thresholding = FixedCutoffThreshold(0.85) >>> y_pred_binary = thresholding.threshold(y_pred) >>> precision = Precision().compute(y, y_pred_binary) >>> recall = Recall().compute(y, y_pred_binary) Alternatively to manually applying a threshold to convert the continuous scores to binary predictions, you can initialize a :py:class:`~dtaianomaly.evaluation.ThresholdMetric`, which will automatically apply a specified thresholding strategy before using a binary evaluation metric. Below, we use the same thresholding as above, but compute the :py:class:`~dtaianomaly.evaluation.FBeta` score with :math:`\\beta = 1`. .. doctest:: >>> from dtaianomaly.evaluation import ThresholdMetric, FBeta >>> f_1 = ThresholdMetric(thresholding, FBeta(1.0)).compute(y, y_pred) Lastly, we also compute the :py:class:`~dtaianomaly.evaluation.AreaUnderROC` and :py:class:`~dtaianomaly.evaluation.AreaUnderPR`. Because these metrics create a curve for all possible thresholds, we can simply pass the predicted, continuous anomaly scores, as shown below. .. doctest:: >>> from dtaianomaly.evaluation import AreaUnderROC, AreaUnderPR >>> auc_roc = AreaUnderROC().compute(y, y_pred) >>> auc_pr = AreaUnderPR().compute(y, y_pred) .. doctest:: :hide: >>> round(precision, 2) 0.64 >>> round(recall, 2) 1.0 >>> round(f_1, 2) 0.78 >>> round(auc_roc, 2) 0.99 >>> round(auc_pr, 2) 0.68 The table below shows the computed performance metrics for this example. .. list-table:: :align: center :widths: 25 25 25 25 25 :header-rows: 1 * - Precision - Recall - F1 - AUC-ROC - AUC-PR * - 0.64 - 1.0 - 0.78 - 0.99 - 0.68