Codes, datasets, and experimental results for reuse and replication of the paper:
Félix Iglesias, Tanja Zseby, Conrado Martínez, Arthur Zimek. On Heterogeneous Ensembles for Anomaly Detection: Empirical Insights and Guidelines for the Design. Expert Systems with Applications, 2026, 133468. DOI: 10.1016/j.eswa.2026.133468
Jun, 2026
To replicate the experiments, make sure the following tools are installed:
If you want to try with your own data, you must either:
run.sh script accordingly, orWe recommend to locate your own dataset under the data folder, e.g.:
data/mycollection/
Each dataset must be a tabular CSV or NPZ file. See the load_dataset function in modules/utils.py for details on the expected format.
compiled_image/ # Precompiled Docker image
docker/ # Files to build the image and container
README.md # This file
HetEns_Apr2026.zip # Source code and datasets
results_Apr2026.zip # All performances, tables and images generated
# and shown in the paper
First of all, unzip HetEns_Apr2026.zip in place to create the HetEns_Apr2026/ folder.
You need sudo/root privileges to run the following commands.
$ docker load -i AD_ensembles_Apr2026.tar
/docker$ make build
/docker$ make run # Runs run.sh
This complete process may take a few days, depending on your machine.
/docker$ make test # Runs run_test.sh
/docker$ make shell
/docker$ make clean
/docker$ make nuke # Force deletion
When running run.sh, the following folders will be created or rewritten`:
data/globloc/results/results/globloc/results/adbench/results/cic2017bin/Each subfolder contains:
| Folder | Description | | ------------------ | -------------------------------------------------------------------------------------------- | | ad_performances/ | Tables with anomaly detection performance per dataset, algorithm, and metric. | | ad_scores/ | Raw anomaly detection scores per algorithm and dataset. | | summaries/ | Tables, figures, measurements, statistical tests, CDD, etc., extracted from all experiments. | | metadata/ | (Only in globloc) Meta-data from the data generation process. | | results/ | Extracted statistics and performance related to all the different ensemble options tested |