Preprocessing module
This module contains preprocessing functionality.
>>> from dtaianomaly import preprocessing
Custom preprocessors can be implemented by extending the base Preprocessor class.
- preprocessing.check_preprocessing_inputs(y: ndarray | None = None) None
Check if the given X and y arrays are valid.
- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Raw time series
y (array-like, default=None) – Ground-truth information
- Raises:
ValueError – If inputs are not valid numeric arrays
ValueError – If inputs have a different size in the first dimension (n_samples)
- class dtaianomaly.preprocessing.Preprocessor[source]
Base preprocessor class.
- fit(X: ndarray, y: ndarray | None = None) Preprocessor[source]
First checks the inputs with
check_preprocessing_inputs(), and then fits this preprocessor.- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Raw time series
y (array-like, default=None) – Ground-truth information
- Returns:
self – Returns the fitted instance self.
- Return type:
- fit_transform(X: ndarray, y: ndarray | None = None) Tuple[ndarray, ndarray | None][source]
First checks the inputs with
check_preprocessing_inputs(), and then chains the fit and transform methods on the given data, i.e., first fit this preprocessor on the given X and y, after which the given X and y will be transformed.- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Raw time series
y (array-like of shape (n_samples), default=None) – Ground-truth information
- Returns:
X_transformed (np.ndarray of shape (n_samples, n_attributes)) – Preprocessed raw time series
y_transformed (np.ndarray of shape (n_samples)) – The transformed ground truth. If no ground truth was provided (y=None), then None will be returned as well.
- transform(X: ndarray, y: ndarray | None = None) Tuple[ndarray, ndarray | None][source]
First checks the inputs with
check_preprocessing_inputs(), and then transforms (i.e., preprocesses) the given time series.- Parameters:
X (array-like of shape (n_samples, n_attributes)) – Raw time series
y (array-like of shape (n_samples), default=None) – Ground-truth information
- Returns:
X_transformed (np.ndarray of shape (n_samples, n_attributes)) – Preprocessed raw time series
y_transformed (np.ndarray of shape (n_samples)) – The transformed ground truth. If no ground truth was provided (y=None), then None will be returned as well.
- class dtaianomaly.preprocessing.ChainedPreprocessor(*base_preprocessors: Preprocessor | List[Preprocessor])[source]
Wrapper chaining multiple Preprocessor objects.
- Parameters:
base_preprocessors (list of Preprocessor objects) – The preprocessors to chain. These preprocessors can be passed as a single list argument or as multiple independent arguments to the constructor.
- class dtaianomaly.preprocessing.Identity[source]
Identity preprocessor. A dummy preprocessor which does not do any processing at all.
- class dtaianomaly.preprocessing.MinMaxScaler[source]
Rescale raw time series to a [0, 1] via min-max scaling. The minimum and maximum is computed on a training set, after which these values can be used to transform a new time series. Therefore, there is no guarantee that the values of the transformed test set will actually be in the range [0, 1].
For multivariate time series, each attribute will be normalized independently, i.e., the minimum and maximum of each attribute in the transformed time series will 0 and 1, respectively.
If the minimum and maximum of an attribute is the same (the time series consists of only one value), then the transformation will not do anything.
- min_
The minimum value in each attribute of the training data.
- Type:
array-like of shape (n_attributes)
- max_
The maximum value in each attribute of the training data.
- Type:
array-like of shape (n_attributes)
- Raises:
NotFittedError – If the transform method is called before fitting this MinMaxScaler.
- class dtaianomaly.preprocessing.ZNormalizer(min_std: float = 1e-09)[source]
Rescale to zero mean, unit variance.
Rescale to zero mean and unit variance. A mean value and standard deviation is computed on a training set, after which these values can be used to transform a new time series. Therefore, there is no guarantee that the values of the transformed test set will actually have zero mean and unit variance.
For multivariate time series, each attribute will be normalized independently, i.e., the mean and std of each attribute in the transformed time series will 1.0 and 0.0, respectively.
- Parameters:
min_std (float, default = 1e-9) – The minimum std required to actually Z-normalize an attribute. If the standard deviation is below this value, then no normalization will be applied. This prevents amplifying noise in the data.
- mean_
The mean value in each attribute of the training data.
- Type:
array-like of shape (n_attributes)
- std_
The standard deviation in each attribute of the training data.
- Type:
array-like of shape (n_attributes)
- Raises:
NotFittedError – If the transform method is called before fitting this MinMaxScaler.
- class dtaianomaly.preprocessing.MovingAverage(window_size: int)[source]
Computes the moving average of a time series. This is the unweighted average of the observations within a window.
To compute the moving average at time \(t\), the window is centered at position \(t\). For an odd window size, the number of measurements taken before and after \(t\) is equal (namely
(window_size - 1 ) / 2. For an even window size, there is one additional observation taken before \(t\), to ensure a correct window size.For multivariate time series, the moving average is computed within each attribute independently.
- Parameters:
window_size (int) – Length of the window in which the average should be computed.
- class dtaianomaly.preprocessing.ExponentialMovingAverage(alpha: float)[source]
Compute exponential moving average. For a given input \(x\), the exponential moving average \(y\) is computed as
\[\begin{split}y_0 &= x_0 \\ y_t &= \alpha \cdot x_t + (1 - \alpha) \cdot y_{t-1}\end{split}\]with \(0 < \alpha < 1\) the smoothing factor. Higher values of \(\alpha\) result in more smoothing.
- Parameters:
alpha (float) – The decaying factor to be used in the exponential moving average.
- class dtaianomaly.preprocessing.SamplingRateUnderSampler(sampling_rate: int)[source]
Undersample time series with sampling rate sampling_rate. This means that every sampling_rate element is taken from the time series. After undersampling, only 1/sampling_rate percent of the original samples will remain.
- Parameters:
sampling_rate (int) – The rate at which should be sampled.
- class dtaianomaly.preprocessing.NbSamplesUnderSampler(nb_samples: int)[source]
Undersample time series such that exactly nb_samples samples remain in the original time series. This enables to manually set the size of the transformed time series, independent of the original size of the time series.
- Parameters:
nb_samples (int, default=None) – The number of samples remaining.