On Heterogeneous Ensembles for Anomaly Detection - Evaluation Tests
Creators
- 1. TU Wien
Description
Context and methodology
This repository accompanies our study on heterogeneous ensembles for anomaly detection. Anomaly detection algorithms vary in their implicit assumptions and definitions of anomalies. When the distribution and characteristics of anomalies are largely unknown, combining diverse methods can improve robustness and overall detection performance. Motivated by this, we focus on ensembles that combine heterogeneous approaches, addressing three key aspects: the impact of method combination, the effect of score normalization and aggregation, and the influence of evaluation metrics on ensemble design.
We evaluate 14 established anomaly detection algorithms, including ABOD, K-NN, SDO, LODA, HBOS, OCSVM, PCA, IForest, LOF, GLOSH, AE, DeepSVDD and two feature bagging ensembles nesting K-NN (FBkNN) and LOF (FBLOF). on two collections of structured tabular datasets: GLCA (100 synthetic dataset) and ADBench-Classic (47 real datasets). We additionally carry out validation tests with the CIC2017IDS dataset.
Experiments test different ensemble configurations, including varying ensemble sizes, homogeneous versus heterogeneous ensembles, and multiple score normalization and aggregation strategies. This repository provides the datasets, algorithm implementations in Python, and experiment configurations used in our study. It enables replication of our experiments, further evaluation, and comparison of anomaly detection methods and ensemble strategies. It is framed within the broader research domains of anomaly detection, ensemble learning, unsupervised learning, machine learning, and data mining.
Docker
This repository includes required Docker files to build a containerized environment for testing as well as the already pre-compiled docker-image ready to be used.
Technical details
Experiments are conducted in Python v3.11. The file and folder structure is as follows:
"AD_ensembles_Apr2026.tar" contains the pre-compiled docker-container-image to run experiments in a secure, tested environment
"docker.zip" contains docker configuration files, including Python dependencies, to build the docker environment of the pre-compiled image.
“HetEns_Apr2026.zip” contains the source files with datasets (ADBench and GLCA collections) and Python scripts.
“results_Apr2026.zip” contains results, performance files, log files, tables, figures and all extracted statistics and material of experiments as published in the paper.
Most algorithms used are implementations from the PyOD library/repository (https://pyod.readthedocs.io/en/latest/index.html), presented in:
Han, S., Hu, X., Huang, H., Jiang, M., & Zhao, Y. (2022). Adbench: Anomaly detection benchmark. Advances in neural information processing systems, 35, 32142-32159.
License
The CC-BY license applies to all data generated with the GLCA tool.
All distributed code is under the GPLv3+ license.
ADBench datasets were originally published in the repository: ADBench (https://github.com/Minqi824/ADBench/tree/main), In particular, the Classical collection (https://github.com/Minqi824/ADBench/tree/main/adbench/datasets/Classical)
These datasets are © the original authors and are licensed under the BSD 2-Clause "Simplified" License. No endorsement by the original authors is implied.
The CIC-IDS2017 dataset (Kaggle version by Laurens D'hooge, https://www.kaggle.com/datasets/dhoogla/cicids2017), originally by the Canadian Institute for Cybersecurity (CIC), University of New Brunswick. This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International (CC BY-NC-SA 4.0) License.
Files
docker.zip
Files (9.3 GiB)
| Name | Size | |
|---|---|---|
| md5:e04d6e5f0df79d70090ec7efb1fa3a7a | 8.8 GiB | Download |
| md5:eaec4a907a21a8edd2061a4ff6119522 | 1.6 KiB | Preview Download |
| md5:aee38d94a07fc9aa39fbcedac2d4ffd5 | 359.5 MiB | Preview Download |
| md5:32667d399a0b7fbafd1e42181fac5c3e | 1.5 KiB | Download |
| md5:df5028d58e823a3c6cf74058b1b6c9e0 | 614 Bytes | Download |
| md5:b59a561aa0649c4fedccaea5f6126e74 | 4.2 KiB | Preview Download |
| md5:f53ffd225ae18a368e485dfa9b1d55b5 | 214.5 MiB | Preview Download |
Additional details
Related works
- Is supplement to
- Journal Article: 10.1016/j.eswa.2026.133468 (DOI)