BED: Biometric EEG dataset
- Arnau-González, Pablo 1
- Katsigiannis, Stamos 2
- Arevalillo-Herráez, Miguel 3
- Ramzan, Naeem 1
-
1
University of the West of Scotland
info
-
2
Durham University
info
-
3
Universitat de València
info
Editor: Zenodo
Year of publication: 2020
Type: Dataset
Abstract
<strong>The BED dataset</strong> <strong>Version 1.0.0</strong> <strong>Please cite as:</strong> Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021. <strong>Disclaimer</strong> While every care has been taken to ensure the accuracy of the data included in the BED dataset, the authors and the University of the West of Scotland, Durham University, and Universitat de València do not provide any guaranties and disclaim all responsibility and all liability (including without limitation, liability in negligence) for all expenses, losses, damages (including indirect or consequential damage) and costs which you might incur as a result of the provided data being inaccurate or incomplete in any way and for any reason. 2020, University of the West of Scotland, Scotland, United Kingdom. <strong>Contact</strong> For inquiries regarding the BED dataset, please contact: Dr Pablo Arnau-González, arnau.pablo [*AT*] gmail.com Dr Stamos Katsigiannis, stamos.katsigiannis [*AT*] durham.ac.uk Prof. Miguel Arevalillo-Herráez, miguel.arevalillo [*AT*] uv.es Prof. Naeem Ramzan, Naeem.Ramzan [*AT*] uws.ac.uk <strong>Dataset summary</strong> BED (Biometric EEG Dataset) is a dataset specifically designed to test EEG-based biometric approaches that use relatively inexpensive consumer-grade devices, more specifically the Emotiv EPOC+ in this case. This dataset includes EEG responses from 21 subjects to 12 different stimuli, across 3 different chronologically disjointed sessions. We have also considered stimuli aimed to elicit different affective states, so as to facilitate future research on the influence of emotions on EEG-based biometric tasks. In addition, we provide a baseline performance analysis to outline the potential of consumer-grade EEG devices for subject identification and verification. It must be noted that, in this work, EEG data were acquired in a controlled environment in order to reduce the variability in the acquired data stemming from external conditions. The stimuli include: Images selected to elicit specific emotions Mathematical computations (2-digit additions) Resting-state with eyes closed Resting-state with eyes open Visual Evoked Potentials at 2, 5, 7, 10 Hz - Standard checker-board pattern with pattern reversal Visual Evoked Potentials at 2, 5, 7, 10 Hz - Flashing with a plain colour, set as black For more details regarding the experimental protocol and the design of the dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, 2021. (Under review) <strong>Dataset structure and contents</strong> The BED dataset contains EEG recordings from 21 subjects, acquired during 3 similar sessions for each subject. The sessions were spaced one week apart from each other. The BED dataset includes: The raw EEG recordings with no pre-processing and the log files of the experimental procedure, in text format The EEG recordings with no pre-processing, segmented, structured and annotated according to the presented stimuli, in Matlab format The features extracted from each EEG segment, as described in the associated publication The dataset is organised in 3 folders: RAW RAW_PARSED Features <strong>RAW/</strong> Contains the RAW files<br> <strong>RAW/sN/</strong> Contains the RAW files associated with subject <em>N</em><br> Each folder <strong>sN</strong> is composed by the following files:<br> <strong>- sN_s1.csv, sN_s2.csv, sN_s3.cs</strong>v -- Files containing the EEG recordings for subject <em>N</em> and session 1, 2, and 3, respectively. These files contain 39 columns:<br> COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 ...UNUSED DATA... UNIX_TIMESTAMP<br> <strong>- subject_N_session_1_time_X.log, subject_N_session_2_time_X.log, subject_N_session_3_time_X.log</strong> -- Log files containing the sequence of events for the subject N and the session 1,2, and 3 respectively. <strong>RAW_PARSED/</strong><br> Contains Matlab files named <strong>sN_sM.mat</strong>. The files contain the recordings for the subject <em>N</em> in the session <em>M</em>. These files are composed by two variables:<br> <strong>- <em>recording</em></strong>: size (time@256Hz x 17), Columns: COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 UNIX_TIMESTAMP<br> <strong>-<em> events</em></strong>: cell array with size (events x 3) START_UNIX END_UNIX ADDITIONAL_INFO<br> START_UNIX is the UNIX timestamp in which the event starts<br> END_UNIX is the UNIX timestamp in which the event ends<br> ADDITIONAL INFO contains a struct with additional information regarding the specific event, in the case of the images, the expected score, the voted score, in the case of the cognitive task the input, in the case of the VEP the pattern and the frequency, etc.. <strong>Features/</strong><br> <strong>Features/Identification</strong><br> <strong>Features/Identification/[ARRC|MFCC|SPEC]/</strong>: Each of these folders contain the extracted features ready for classification for each of the stimuli, each file is composed by two variables, "feat" the feature matrix and "Y" the label matrix.<br> <strong><em>- feat</em></strong>: N x number of features<br> <strong><em>- Y</em></strong>: N x 2 (the #subject and the #session)<br> <strong><em>- INFO</em></strong>: Contains details about the event same as the ADDITIONAL INFO<br> <strong>Features/Verification</strong>: This folder is composed by 3 different files each of them with one different set of features extracted. Each file is composed by one cstruct array composed by:<br> <strong><em>- data</em></strong>: the time-series features, as described in the paper<br> <strong><em>- y</em></strong>: the #subject<br> <strong><em>- stimuli</em></strong>: the stimuli by name<br> <em><strong>- session</strong></em>: the #session<br> <strong><em>- INFO</em></strong>: Contains details about the event The features provided are in sequential order, so index 1 and index 2, etc. are sequential in time if they belong to the same stimulus. <strong>Additional information</strong> For additional information regarding the creation of the BED dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.