\(K\)-Nearest Neighbor
- class dtaianomaly.anomaly_detection.KNearestNeighbors(window_size: str | int, stride: int = 1, **kwargs)[source]
Anomaly detector based on K-nearest neighbors [ramaswamy2000efficient].
Given some distance metric \(dist\), the \(K\)-nearest neighbor of an instance \(x\) is the sample \(y\) such that there exist exactly \(K-1\) other samples \(z\) with \(dist(x, z) < dist(x, y)\). The \(K\)-nearest neighbor distance of \(x\) equals the distance to this \(K`the nearest neighbor. The larger this :math:`K\)-nearest neighbor distance of a sample is, the further away it is from the other instances. \(K\)-nearest neighbor uses this distance as an anomaly score, and thus detects distance-based anomalies.
Notes
The K-nearest neighbors inherets from
PyodAnomalyDetector.- Parameters:
window_size (int or str) – The window size to use for extracting sliding windows from the time series. This value will be passed to
compute_window_size().stride (int, default=1) – The stride, i.e., the step size for extracting sliding windows from the time series.
**kwargs – Arguments to be passed to the PyOD isolation forest.
- window_size_
The effectively used window size for this anomaly detector
- Type:
int
- pyod_detector_
A K-nearest neighbors detector of PyOD
- Type:
KNN
Examples
>>> from dtaianomaly.anomaly_detection import KNearestNeighbors >>> from dtaianomaly.data import demonstration_time_series >>> x, y = demonstration_time_series() >>> knn = KNearestNeighbors(10).fit(x) >>> knn.decision_function(x) array([0.2527578 , 0.26430228, 0.2728953 , ..., 0.26269151, 0.26798469, 0.26139759])
References
[ramaswamy2000efficient]Ramaswamy, Sridhar, Rajeev Rastogi, and Kyuseok Shim. “Efficient algorithms for mining outliers from large data sets.” Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 2000, doi: 10.1145/342009.33543.
- decision_function(X: ndarray) ndarray
Compute decision scores.
- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Input time series.
- Returns:
decision_scores – The decision scores of the anomaly detector. Higher indicates more anomalous.
- Return type:
array-like of shape (n_samples)
- Raises:
ValueError – If X is not a valid array.
NotFittedError – If this method is called before fitting the anomaly detector.
- fit(X: ndarray, y: ndarray | None = None, **kwargs) BaseDetector
Fit this PyOD anomaly detector on the given data.
- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Input time series.
y (ignored) – Not used, present for API consistency by convention.
kwargs – Additional parameters to be passed to
compute_window_size().
- Returns:
self – Returns the instance itself
- Return type:
- Raises:
ValueError – If X is not a valid array.
- predict_proba(X: ndarray) ndarray
Predict anomaly probabilities
Estimate the probability of a sample of X being anomalous, based on the anomaly scores obtained from decision_function by rescaling them to the range of [0, 1] via min-max scaling.
- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Input time series.
- Returns:
anomaly_scores – 1D array with the same length as X, with values in the interval [0, 1], in which a higher value implies that the instance is more likely to be anomalous.
- Return type:
array-like of shape (n_samples)
- Raises:
ValueError – If scores is not a valid array.
ValueError – If the prediction scores from ‘decision_function’ are constant, but not in the interval [0, 1], because these values can not unambiguously be transformed to an anomaly probability.
- save(path: str | Path) None
Save detector to disk as a pickle file with extension .dtai. If the given path consists of multiple subdirectories, then the not existing subdirectories are created.
- Parameters:
path (str or Path) – Location where to store the detector.