CustomDataLoader

class dtaianomaly.data.CustomDataLoader(test_path: str | Path, train_path: str | Path = None, do_caching: bool = False)[source]

A data loader for loading custom data.

The training and testing data is located in different files. Both must be readable through pandas.read_csv(path). The test data must contain a column with name ‘label’, in which the anomalies are marked (1 for anomaly, 0 for normal). The test data may have an optional column ‘time’, which will be interpreted as the time step of each observation. All other columns are assumed to be part of the time series data. The ‘label’ column is optional for the training set. If note present, the training data is assumed to be completely normal. The training data may have an optional column ‘time’, similarly as for the test data. All remaining columns are time series data. The titles of the training and test set must match exactly, although the order may be different.

Parameters:

test_pathstr: The path at which the test data is located.
train_pathstr, default=None: The path at which the train data is located. If None, then there will be no training data in the loaded dataset.
do_cachingbool, default=False: Whether to cache the loaded data or not.

Examples

>>> from dtaianomaly.data import CustomDataLoader
>>> train_path = "path-to-training-data.csv"
>>> test_path = "path-to-testing-data.csv"
>>> data_set_train_and_test = CustomDataLoader(test_path, train_path).load()
>>> data_set_only_test = CustomDataLoader(test_path).load()  # No training data

load() → DataSet

Load the time series.

Load the dataset. If do_caching==True, the loaded will be saved in the cache if no cache is available yet, and the cached data will be returned.

Returns:

DataSet: The loaded dataset.