Kernel Principal Component Analysis

class dtaianomaly.anomaly_detection.KernelPrincipalComponentAnalysis(window_size: str | int, stride: int = 1, **kwargs)[source]

Anomaly detector based on the Kernel Principal Component Analysis (KPCA).

Standard PCA maps the data to a lower dimensional space through linear projections. Deviations in this lower dimensional space are then considered to be anomalies. KPCA [hoffmann2007kernel] is a non-linear extension of PCA, which maps the data into a new kernel space, from which the principal components are learned.

Notes

KPCA inherets from PyodAnomalyDetector.

Parameters:

window_size (int or str) – The window size to use for extracting sliding windows from the time series. This value will be passed to compute_window_size().
stride (int, default=1) – The stride, i.e., the step size for extracting sliding windows from the time series.
**kwargs – Arguments to be passed to the PyOD PCA.

window_size_

The effectively used window size for this anomaly detector

Type:: int

pyod_detector_

A KPCA-detector of PyOD

Type:: KPCA

Examples

>>> from dtaianomaly.anomaly_detection import KernelPrincipalComponentAnalysis
>>> from dtaianomaly.data import demonstration_time_series
>>> x, y = demonstration_time_series()
>>> kpca = KernelPrincipalComponentAnalysis(10, n_components=2).fit(x)
>>> kpca.decision_function(x)
array([0.03151377, 0.03697829, 0.04415575, ..., 0.03345565, 0.0330048 ,
       0.03089501])

References

[hoffmann2007kernel]

Heiko Hoffmann. Kernel pca for novelty detection. Pattern recognition, 40(3):863–874, 2007, doi: 10.1016/j.patcog.2006.07.009.

decision_function(X: ndarray) → ndarray

Compute decision scores.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

decision_scores – The decision scores of the anomaly detector. Higher indicates more anomalous.

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If X is not a valid array.
NotFittedError – If this method is called before fitting the anomaly detector.

fit(X: ndarray, y: ndarray | None = None, **kwargs) → BaseDetector

Fit this PyOD anomaly detector on the given data.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.
y (ignored) – Not used, present for API consistency by convention.
kwargs – Additional parameters to be passed to compute_window_size().

Returns:

self – Returns the instance itself

Return type:

PyODAnomalyDetector

Raises:

ValueError – If X is not a valid array.

predict_proba(X: ndarray) → ndarray

Predict anomaly probabilities

Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.

Parameters:

X (array-like of shape (n_samples, n_attributes)) – Input time series.

Returns:

anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.

Return type:

array-like of shape (n_samples)

Raises:

ValueError – If scores is not a valid array.
ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.

save(path: str | Path) → None

Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.

Parameters:: path (str or Path) – Location where to store the detector.