Published October 17, 2025 | Version v1
Dataset Open

Dataset for "Highly-Sensitive Integrating Optical Receiver With Large PIN Photodiode"

Description

Overview

This repository provides measurement data and evaluated data related to our manuscript "Highly-Sensitive Integrating Optical Receiver With Large PIN Photodiode" by Simon Michael Laube, Christoph Gasser, Kerstin Schneider-Hornstein, and Horst Zimmerman, published in IEEE Photonics Journal, 2024, DOI: 10.1109/JPHOT.2024.3487302.

Context

In our study, we present the design and experimental verification of three optoelectronic integrated circuits (OEICs). The main difference between the OEICs is the integrated photodiode. The three photodiodes are:

  1. 7WPD, a 7-dot honeycomb PIN photodiode
  2. 3X3PD, a 3-by-3 matrix PIN photodiode
  3. 6X6PD, a 6-by-6 matrix PIN photodiode

We measured the responsivity and capacitance of the photodiodes. Moreover, we measured the transient output voltage of the OEICs across optical input power with an oscilloscope, and stored the waveforms in HDF5 files. The bit error probability (BER) was evaluated from the transient measurements using post-processing in Python, as explained in our manuscript. A data rate of 20 Mb/s with 80% return-to-zero (RZ) on-off keying (OOK) modulation was used for all BER measurements.

File structure

All measurement data is provided separately for each of the three photodiodes. Please note that the OEIC samples have individual sample identifiers that are part of the file names. The sample identifiers are:

  1. 7WPD:   H6_1
  2. 3X3PD:  D3_7
  3. 6X6PD:  E6_6

The main folders /BER, /powermeter, /responsivity, and /waveforms are provided.

Waveforms

/waveforms contains raw waveform (transient measurement) data. Waveforms are stored as HDF5 files (.h5 file ending) that contain an internal file system with metadata and data, generated by the oscilloscope. HDF5 files can be read using the free HDFView program, h5py Python library, or other software.

The internal file system within our HDF5 files has the following structure:
/FileType/KeysightH5FileType
/Frame/TheFrame
/Waveforms
    /Channel 1/Channel 1 Data
    /Channel 2/Channel 2 Data

KeysightH5FileType and TheFrame are oscilloscope metadata. The Channel 1 sub-folder contains metadata and Channel 1 Data. Channel 1 Data is the raw waveform data of the pseudo-random bit sequence (PRBS) that was used as the input signal of our OEICs. The Channel 2 sub-folder contains metadata and Channel 2 Data. Channel 2 Data is the raw waveform data of the OEIC output voltage.

The file name structure of the HDF5 files is
  <sample identifier>_<measurement identifier>x1x<optical power identifier>x<vG identifier>.h5

Here, the sample identifier is the same as explained above; the measurement identifier is an arbitrary text/number; the optical power identifier connects the power measurement (see below) with the corresponding waveform; and the vG identifier connects the reference voltage setting (see below) with the corresponding waveform. For example, the file "E6_6_01Hx1x10x5.h5" is the raw waveform of the 6X6PD OEIC, measurement "01H", recorded for the 10th optical power setting and 5th vG setting. Note that the vG settings are not coherent across measurements, e.g. setting number 5 is not always the same voltage.

Optical Power

/powermeter contains the raw optical power measurement results, as well as the calibration factor that was used to calculate the power incident on the chip. In other words,
  chip power=raw power * calibration factor.

The optical power measurements are provided in CSV files (.csv file ending), separately for each OEIC and measurement. Within the CSV files, the first column is the optical power identifier of the measurement (see above), and the second column is the respective raw optical power. The file name structure of the CSV files is
  power_<sample identifier>_<measurement identifier>.csv

The calibration factor is provided in a TXT file (.txt file ending), separately for each OEIC and measurment. The TXT file contains only a single floating point number that is the calibration factor. The file name structure of the TXT file is
  calibration_<sample identifier>_<measurement identifier>.txt

Here, the sample identifier and measurement identifier are the same as for the waveform files, as explained above. For example, the files "calibration_E6_6_01H.txt" and "power_E6_6_01H.csv" correspond to all waveforms with the prefix "E6_6_01H", such as the abovementioned "E6_6_01Hx1x10x5.h5".

Bit error probability

/BER contains the evaluated bit error probability of the OEICs. All files within this folder are generated from the raw data provided in /waveforms and /powermeter, using our Python script. Three file types are provided for each photodiode:

  1. Log files (.log file ending) that document the result of the evaluation. These log files were used to plot Fig. 9 in our manuscript.
  2. Image files (.png file ending) that illustrate the result of the evaluation, similar to Fig. 10 in our manuscript.
  3. A CSV file that contains a results summary (.csv file ending).

The log file contains metadata about the evaluation process, the evaluation result (BER), as well as the the input file (waveform) and output files of the evaluation. Note that the .tab output files are not provided because they were only used for debugging of our Python script. While most of the log file contents should be self-explanatory, some require special attention:

  • In the "User settings" section we provide settings for the evaluation of the reference PRBS (Channel 1 Data in the HDF5 files). The boolean flag "PRBS inverted" shows whether the PRBS waveform was processed as is, or was logically inverted. The "PRBS detection threshold" is the threshold voltage that was used to digitize the (analog) PRBS waveform. Because the SNR of the PRBS is very high, the threshold itself is uncritical and was auto-detected by our Python script. The "PRBS detection offset" marks the start of the PRBS with respect to the recorded waveform. This is necessary because the recording may start at an arbitrary time, so the first recorded bit is incomplete. The start of the PRBS was auto-detected by our Python script by rising edge detection. The "PRBS detection delay" shows at which time instant each PRBS bit is sampled, with respect to the start of a bit. Typically, the bit should be sampled at the center. For 20 Mb/s with 80% RZ modulation, the center is 20 ns (=PRBS detection delay) after the start of the bit.
  • In the "Results" section, the result and the optimized settings for the evaluation of the chip output (Channel 2 Data in the HDF5 files) are provided. "Decision threshold" is the threshold (voltage) for bit decision.  "CDS delta time" is the time between the two sample instants of correlated double sampling (CDS). "Best BER" is the BER result. The "Static delay" is the coarse delay between PRBS and chip output waveform, given in multiples of the bit period (50 ns at 20 Mb/s). The "Inter-bit delay" is the fine delay between PRBS and chip output waveform, that is always less than the bit period. The sum of static delay and inter-bit delay are the total delay between PRBS and chip output waveform.

The file name structure of the log files is
  <sample identifier>_<measurement identifier>x1x<optical power identifier>x<vG identifier>.log
The file name structure of the image files is
  <sample identifier>_<measurement identifier>x1x<optical power identifier>x<vG identifier>.png

The results summary CSV contains all optical power and vG settings, the BER results, and the underlying dataset file names. For the meaning of vG, please refer to Fig. 4 of our manuscript.

The file name structure of the CSV is
  BER_<photodiode>_20Mbps_80RZ.csv
where the photodiode is 7WPD, 3X3PD, or 6X6PD, as defined above.

Responsivity

/responsivity contains the raw spectral responsivity data of the photodiodes, that is plotted in Fig. 6 of our manuscript. A single CSV file (.csv file ending) is provided for each photodiode. The first column of the CSV file is the optical wavelength (lambda); the second column is the measured responsivity (R).

The file name structure of the CSV files is
  responsivity_<photodiode>.csv
where the photodiode is 7WPD, 3X3PD, or 6X6PD, as defined above.

Capacitance

Because the raw photodiode capacitance data is given in the manuscript, no data is provided in this dataset.


Licensing

The dataset consists of raw measurement data and processed data.
Raw data is licensed under the Creative Commons Zero 1.0 Universal (CC0) license.
Processed data is copyrighted and licensed under the Creative Commons Attribution 4.0 International (CC-BY) license.
All metadata is licensed under the Creative Commons Attribution 4.0 International (CC-BY) license.

The following list shows the license attached to the individual files:

  • All files and sub-folders within /waveforms: CC0 license
  • All files and sub-folders within /powermeter: CC0 license
  • All files and sub-folders within /BER: CC-BY license
  • All files and sub-folders within /responsivity: CC0 license
  • /README.txt: CC-BY license

Files

Dataset_Highly_Sensitive_Integrating_Optical_Receiver_With_Large_PIN_Photodiode.zip

Files (73.0 GiB)

Additional details

Related works

Is part of
Journal Article: 10.1109/JPHOT.2024.3487302 (DOI)

Funding

FWF Austrian Science Fund
Ultra-sensitive PIN and avalanche photodiode receivers P34649