Published February 9, 2024 | Version 1.0.0
Dataset Open

Trusted Research Environments: Analysis of Characteristics and Data Availability

  • 1. TU Wien

Description

Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices  make available a majority of available sensitive data records included in this study.

Methodology

We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

  • Peer-reviewed articles where available,
  • TRE websites,
  • TRE metadata catalogs.

The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.

Technical details

This dataset consists of five comma-separated values (.csv) files describing our inventory:

  • countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
  • tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
  • access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
  • inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
  • major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).

Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

  • schema.sql: Schema definition file to create the tables and views used in the analysis.

The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

Files

tres.csv

Files (11.0 KiB)

Name Size
md5:23151c2fa6ecc45d0b4d973cd13a5baf
609 Bytes Preview Download
md5:755d4491e578e846eb076d5a4b939d84
332 Bytes Preview Download
md5:050571a817fc70dfe7d3a11e1a56a0fe
665 Bytes Preview Download
md5:704eda134897285d94a231aac8ef683c
617 Bytes Preview Download
md5:002b01bd67eb251bd85fb0755809de27
3.2 KiB Download
md5:05b35fce7a0c726f9824a3ff428dbee9
5.6 KiB Preview Download

Additional details