Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Kitsune Network Attack Dataset Data Set
Download: Data Folder, Data Set Description

Abstract: A cybersecurity dataset containing nine different network attacks on a commercial IP-based surveillance system and an IoT network. The dataset includes reconnaissance, MitM, DoS, and botnet attacks.

Data Set Characteristics:  

Multivariate, Sequential, Time-Series

Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:

Classification, Clustering, Causal-Discovery

Missing Values?


Number of Web Hits:



Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai.
Ben-Gurion University of the Negev, Department of Information Systems Engineering

Yisroel Mirsky

Data Set Information:

==== Overview ====
The are 9 network capture datasets in total, listed below. Viol. is the security violation (Confidentiality, Integrity, and Authenticity).

Attack Type Attack Name Tool Viol. Description: The attacker
-1 OS Scan Nmap C scans the network for hosts, and their operating systems, to reveal possible vulnerabilities.
-2 Fuzzing SFuzz C searches for vulnerabilities in the camera's web servers by sending random commands to their cgis.
Man in the Middle
-3 Video Injection Video Jack C,I injects a recorded video clip into a live video stream.
-4 ARP MitM Ettercap C intercepts all LAN traffic via an ARP poisoning attack.
-5 Active Wiretap R.PI 3B C intercepts all LAN traffic via active wiretap (network bridge) covertly installed on an exposed cable.
Denial of Service
-6 SSDP Flood Saddam A overloads the DVR by causing cameras to spam the server with UPnP advertisements.
-7 SYN DoS Hping3 A disables a camera's video stream by overloading its web server.
-8 SSL Reneg. THC A disables a camera's video stream by sending many SSL renegotiation packets to the camera.
Botnet Malware
-9 Mirai Telnet C,I infects IoT with the Mirai malware by exploiting default credentials, and then scans for new vulnerable victims network.

-For more details on the attacks themselves, please refer to our paper.

==== Data Organization ====
For each attack (network capture) above we provide (1) a csv of the features used in our paper where each row is a network packet, (2) the corresponding labels [benign, malicious], and (3) the original network capture in truncated pcap format.

-Each attack dataset is located in a separate directory
-Each directory contains three files:
_pcap.pcapng : A raw pcap capture of the original N packets. The packets have been truncated to 200 bytes for privacy reasons.
_dataset.csv : An N-by-M matrix of M-sized feature vectors, each describing the packet and the context of that packet's channel (see our paper for details).
_labels.csv : An N-by-1 vector of 0-1 values which indicate whether each packet in _pcap.pcapng (and _dataset.csv) is malicious ('1') or not ('0'). For the Man-in-middle-Attacks, all packets which have passed through the MitM are marked as '1'.
-Every attack dataset begins with benign traffic, and then at some point (1) the attacker connects to the network and (2) initiates the given attack.

Attribute Information:

=== The features in the csv files ===
Each row in the csv is a packet captured (chronologically). More a deep explanation, please see our paper.
In general, each row (feature vector) are recent (temporal) statistics which describes the context of the packet's channel and its communicating parties:

Whenever a packet arrives, we extract a behavioral snapshot of the hosts and protocols which communicated the given packet. The snapshot consists of 115 traffic statistics capturing a small temporal window into: (1) the packet's sender in general, and (2) the traffic between the packet's sender and receiver.

Specifically, the statistics summarize all of the traffic...
...originating from this packet's source MAC and IP address (denoted SrcMAC-IP).
...originating from this packet's source IP (denoted SrcIP).
...sent between this packet's source and destination IPs (denoted Channel).
...sent between this packet's source and destination TCP/UDP Socket (denoted Socket).

A total of 23 features (capturing the above) can be extracted from a single time window λ (see Table II). The FE extracts the same set of features from a total of five time damped windows of approximately: 100ms, 500ms, 1.5sec, 10sec, and 1min into the past (λ = 5, 3, 1, 0.1, 0.01), thus totaling 115 features.

We note that not every packet applies to every channel type (e.g., there is no socket if the packet does not contain a TCP or UDP datagram). In these cases, these features are zeroed. Thus, the final feature vector ~x, which the FE passes to the
FM, is always a member of R^n, where n = 115.

The feature extraction code (pcap to csv) is available at: [Web Link]

Relevant Papers:

[Web Link]
[Web Link]

Citation Request:

If you use this dataset, please cite:
Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai, 'Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection', Network and Distributed System Security Symposium 2018 (NDSS'18)

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML