detection_of_IoT_botnet_attacks_N_BaIoT
Donated on 3/18/2018
This dataset addresses the lack of public botnet datasets, especially for the IoT. It suggests *real* traffic data, gathered from 9 commercial IoT devices authentically infected by Mirai and BASHLITE.
Dataset Characteristics
Multivariate, Sequential
Subject Area
Computer Science
Associated Tasks
Classification, Clustering
Feature Type
Real
# Instances
7062606
# Features
-
Dataset Information
Additional Information
(a) Attribute being predicted: -- Originally we aimed at distinguishing between benign and Malicious traffic data by means of anomaly detection techniques. -- However, as the malicious data can be divided into 10 attacks carried by 2 botnets, the dataset can also be used for multi-class classification: 10 classes of attacks, plus 1 class of 'benign'. (b) The study's results: -- For each of the 9 IoT devices we trained and optimized a deep autoencoder on 2/3 of its benign data (i.e., the training set of each device). This was done to capture normal network traffic patterns. -- The test data of each device comprised of the remaining 1/3 of benign data plus all the malicious data. On each test set we applied the respective trained (deep) autoencoder as an anomaly detector. The detection of anomalies (i.e., the cyberattacks launched from each of the above IoT devices) concluded with 100% TPR.
Has Missing Values?
No
Introductory Paper
By Yair Meidan, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Dominik Breitenbacher, A. Shabtai, Y. Elovici. 2018
Published in IEEE pervasive computing
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 115
Additional Variable Information
-- The following describes each of the features headers: * Stream aggregation: H: Stats summarizing the recent traffic from this packet's host (IP) HH: Stats summarizing the recent traffic going from this packet's host (IP) to the packet's destination host. HpHp: Stats summarizing the recent traffic going from this packet's host+port (IP) to the packet's destination host+port. Example 192.168.4.2:1242 -> 192.168.4.12:80 HH_jit: Stats summarizing the jitter of the traffic going from this packet's host (IP) to the packet's destination host. * Time-frame (The decay factor Lambda used in the damped window): How much recent history of the stream is capture in these statistics L5, L3, L1, ... * The statistics extracted from the packet stream: weight: The weight of the stream (can be viewed as the number of items observed in recent history) mean: ... std: ... radius: The root squared sum of the two streams' variances magnitude: The root squared sum of the two streams' means cov: an approximated covariance between two streams pcc: an approximated covariance between two streams
Dataset Files
File | Size |
---|---|
Philips_B120N10_Baby_Monitor/benign_traffic.csv | 204.4 MB |
Danmini_Doorbell/mirai_attacks.rar | 177.9 MB |
Philips_B120N10_Baby_Monitor/mirai_attacks.rar | 166.4 MB |
SimpleHome_XCS7_1003_WHT_Security_Camera/mirai_attacks.rar | 163.2 MB |
Ecobee_Thermostat/mirai_attacks.rar | 162.7 MB |
0 to 5 of 27
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset detection_of_iot_botnet_attacks_n_baiot = fetch_ucirepo(id=442) # data (as pandas dataframes) X = detection_of_iot_botnet_attacks_n_baiot.data.features y = detection_of_iot_botnet_attacks_n_baiot.data.targets # metadata print(detection_of_iot_botnet_attacks_n_baiot.metadata) # variable information print(detection_of_iot_botnet_attacks_n_baiot.variables)
Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Breitenbacher, D., , A., & Shabtai, A. (2018). detection_of_IoT_botnet_attacks_N_BaIoT [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5RC8J.
Creators
Yair Meidan
Michael Bohadana
Yael Mathov
Yisroel Mirsky
Dominik Breitenbacher
Asaf
Asaf Shabtai
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.