UCRLoader
- class dtaianomaly.data.UCRLoader(path: str | Path, do_caching: bool = False)[source]
Lazy dataloader for the UCR suite of anomaly detection data sets [39].
The UCR time series anomaly archive consists of 250 time series, which have been published to mitigate common issues in existing time series anomaly detection benchmarks: (1) Triviality: many benchmarks are easily solved without any fancy algorithms; (2) Unrealistic anomaly density: the number of ground truth anomalies is relatively high, even though anomalies should be rare observations; (3) Mislabeling: the ground truth labels might not be perfectly aligned with the actual anomalies in the data; (4) Run-to-failure bias: most anomalies are located near the end of the time series.
- Parameters:
- pathstr
The path at which the data set is located.
- do_cachingbool, default=False
Whether to cache the loaded data or not.
Notes
This implementation expects the file names to contain the start and stop time stamps of the single anomaly in the time series as:
*_<train-test-split>_<start>_<stop>.txt.Examples
>>> from dtaianomaly.data import UCRLoader >>> path_to_ucr = "001_UCR_Anomaly_DISTORTED1sddb40_35000_52000_52620.txt" >>> ucr_data_set = UCRLoader(path_to_ucr).load()