Detect Malware Types
Donated on 6/2/2019
Provide a short description of your data set (less than 200 characters).
Dataset Characteristics
Multivariate, Time-Series, Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
7107
# Features
280
Dataset Information
Additional Information
This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts. In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.
Has Missing Values?
No
Variable Information
Various Windows API calls
Dataset Files
-
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset detect_malware_types = fetch_ucirepo(id=533) # data (as pandas dataframes) X = detect_malware_types.data.features y = detect_malware_types.data.targets # metadata print(detect_malware_types.metadata) # variable information print(detect_malware_types.variables)
Detect Malware Types [Dataset]. (2019). UCI Machine Learning Repository. https://doi.org/10.24432/C57S5W.
DOI
Notes
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.