Published October 27, 2025 | Version v1
Dataset Open

Dataset for "APD direct detection receiver OEIC operating 14.1 dB above the shot noise quantum limit"

Description

Overview

This repository provides measurement data and evaluated data related to our manuscript "APD direct detection receiver OEIC operating 14.1 dB above the shot noise quantum limit" by Simon Michael Laube, Christoph Gasser, Kerstin Schneider-Hornstein, and Horst Zimmerman, published in Optics Express, 2025, DOI: 10.1364/OE.577195.

Context

In our study, we present the design and experimental verification of two optoelectronic integrated circuits (OEICs). The main difference between the OEICs is the photodiode. The two photodiodes are:

  1. Avalanche photodiode, denoted by "APD"
  2. p-i-n photodiode, denoted by "PIN"

We measured the transient output voltage of the OEICs across optical input power with an oscilloscope and stored the waveforms in HDF5 files. The bit error probability (BER) was evaluated from the transient measurements using post-processing in Python, as explained in our manuscript. A data rate of 100 Mb/s with 80% return-to-zero (RZ) on-off keying (OOK) modulation was used for all BER measurements.

File structure

All measurement data is provided separately for each OEIC sample. Two samples were measured for each photodiode. Please note that the OEIC samples have individual sample identifiers that are part of the file names. The sample identifiers are:

  1. APD 1:  C4_4
  2. APD 2:  B4_4
  3. PIN 1:   G2_2
  4. PIN 2:   H2_2

The main folders /BER, /powermeter, and /waveforms are provided.

Waveforms

/waveforms contains raw waveform (transient measurement) data. Waveforms are stored as HDF5 files (.h5 file ending) that contain an internal file system with metadata and data, generated by the oscilloscope. HDF5 files can be read using the free HDFView program, h5py Python library, or other software.

The internal file system within our HDF5 files has the following structure:
/FileType/KeysightH5FileType
/Frame/TheFrame
/Waveforms
    /Channel 1/Channel 1 Data
    /Channel 2/Channel 2 Data
    /Channel 3/Channel 3 Data
    /Function 1/Function 1 Data

KeysightH5FileType and TheFrame are oscilloscope metadata. The Channel 1 sub-folder contains metadata and Channel 1 Data. Channel 1 Data is the raw waveform data of the pseudo-random bit sequence (PRBS) that was used as the input signal of our OEICs. The Channel 2 sub-folder contains metadata and Channel 2 Data. The Channel 3 sub-folder contains metadata and Channel 3 Data. Channel 2 Data  and Channel 3 Data is the raw waveform data of the OEIC output voltage. The output voltage is calculated from the difference between Channel 3 Data and Channel 2 Data, that is
    output voltage=Channel 3 Data - Channel 2 Data
The Function 1 sub-folder contains metadata and Function 1 Data. Function 1 Data is the OEIC output voltage computed on the oscilloscope that was used for debugging only. Not all HDF5 files may contain the Function 1 sub-folder.

The file name structure of the HDF5 files is
  <sample identifier>_<measurement identifier>x<VSUB>x1<optical power identifier>x1.h5

Here, the sample identifier is the same as explained above; the measurement identifier is an arbitrary text/number; the VSUB is the substrate bias identifier; and the optical power identifier connects the power measurement (see below) with the corresponding waveform. Note that the VSUB field can be any number. For the actual voltage values of VSUB, please refer to the manuscript. As an example, the file "G2_2_17Lx1x1x5x1.h5" is the raw waveform of the PIN 1 sample, measurement "17L", recorded for the 5th optical power setting. 

 

Optical Power

/powermeter contains the raw optical power measurement results, as well as the calibration factor that was used to calculate the power incident on the chip. In other words,
  chip power=raw power * calibration factor.

The optical power measurements are provided in CSV files (.csv file ending), separately for each OEIC and measurement. Within the CSV files, the first column is the optical power identifier of the measurement (see above), and the second column is the respective raw optical power. The file name structure of the CSV files is
  power_<sample identifier>_<measurement identifier>.csv

The calibration factor is provided in a TXT file (.txt file ending), separately for each OEIC and measurement. The TXT file contains only a single floating-point number that is the calibration factor. The file name structure of the TXT file is
  calibration_<sample identifier>_<measurement identifier>.txt

Here, the sample identifier and measurement identifier are the same as for the waveform files, as explained above. For example, the files "calibration_G2_2_17L.txt" and "power_G2_2_17L.csv" correspond to all waveforms with the prefix "G2_2_17L", such as the abovementioned "G2_2_17Lx1x1x5x1.h5".

Bit error probability

/BER contains the evaluated bit error probability of the OEICs. All files within this folder are generated from the raw data provided in /waveforms and /powermeter, using our Python script. Three file types are provided for each sample and data rate:

  1. Log files (.log file ending) that document the result of the evaluation. These log files were used to plot Fig. 5 in our manuscript.
  2. Image files (.png file ending) that illustrate the result of the evaluation.
  3. A CSV file that contains a results summary (.csv file ending).

The log file contains metadata about the evaluation process, the evaluation result (BER), as well as the input file (waveform) and output files of the evaluation. Note that the .tab output files are not provided because they were only used for debugging of our Python script. While most of the log file contents should be self-explanatory, some require special attention:

  • In the "User settings" section, we provide settings for the evaluation of the reference PRBS (Channel 1 Data in the HDF5 files). The boolean flag "PRBS inverted" shows whether the PRBS waveform was processed as is, or was logically inverted. The "PRBS detection threshold" is the threshold voltage that was used to digitize the (analog) PRBS waveform. Because the SNR of the PRBS is very high, the threshold itself is uncritical and was auto-detected by our Python script. The "PRBS detection offset" marks the start of the PRBS with respect to the recorded waveform. This is necessary because the recording may start at an arbitrary time, so the first recorded bit is incomplete. The start of the PRBS was auto-detected by our Python script by rising edge detection. The "PRBS detection delay" shows at which time instant each PRBS bit is sampled, with respect to the start of a bit. Typically, the bit should be sampled at the center. For 100 Mb/s with 80% RZ modulation, the center is 4 ns (=PRBS detection delay) after the start of the bit.
  • In the "Results" section, the result and the optimized settings for the evaluation of the chip output (Channel 2 Data and Channel 3 Data in the HDF5 files) are provided. "Decision threshold" is the threshold (voltage) for bit decision.  "CDS delta time" is the time between the two sample instants of correlated double sampling (CDS). "Best BER" is the BER result. The "Static delay" is the coarse delay between PRBS and chip output waveform, given in multiples of the bit period (10 ns at 100 Mb/s). The "Inter-bit delay" is the fine delay between PRBS and the chip output waveform, which is always less than the bit period. The sum of static delay and inter-bit delay is the total delay between PRBS and the chip output waveform.

The file name structure of the log files is
  <sample identifier>_<measurement identifier>x<VSUB>x1x<optical power identifier>x1.log
The file name structure of the image files is
  <sample identifier>_<measurement identifier>x<VSUB>x1x<optical power identifier>x1.png

The results summary CSV contains all optical power settings, the BER results, and the underlying dataset file names.

The file name structure of the CSV is
  BER_<sample>_100Mbps_80RZ.csv
where sample is APD_1, APD_2, PIN_1, or PIN_2, as defined above.

Other data

Some raw data is given in the manuscript in tabular form. These data are not included in this dataset.



Licensing

The dataset consists of raw measurement data and processed data.
Raw data is licensed under the Creative Commons Zero 1.0 Universal (CC0) license.
Processed data is copyrighted and licensed under the Creative Commons Attribution 4.0 International (CC-BY) license.
All metadata is licensed under the Creative Commons Attribution 4.0 International (CC-BY) license.

The following list shows the license attached to the individual files:

  • All files and sub-folders within /waveforms: CC0 license
  • All files and sub-folders within /powermeter: CC0 license
  • All files and sub-folders within /BER: CC-BY license
  • /README.txt: CC-BY license

Files

Dataset_APD_direct_detection_receiver_OEIC_operating_14_1dB_above_the_shot_noise_quantum_limit.zip

Additional details

Related works

Is part of
Journal Article: 10.1364/OE.577195 (DOI)

Funding

FWF Austrian Science Fund
Ultra-sensitive PIN and avalanche photodiode receivers P34649