Detect Malacious Executable(AntiVirus)
Donated on 3/2/2016
I extract features from malacious and non-malacious and create and training dataset to teach svm classifier.Dataset made of unknown executable to detect if it is virus or normal safe executable.
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Real
# Instances
373
# Features
-
Dataset Information
Additional Information
TRAINING File : I have created training file with 100+ non malacious examples and 250+ malacious samples. NON-MALACIOUS dataset is represented by +1 while MALACIOUS datset is represented by -1 as label. Based on comparison and analysis I have selected 500 most commonly occuring features in MALACIOUS and NON-MALACIOUS file and compared extracted features of each file with this best features. The file is saved with .train extension. TESTING file: We select a unknown malacious executable and carry out same procedure on it ( however we can put it in any class +1/ -1) cuz svmpredict will any way corretly find it for us. We save this testing file with .test extension.
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 513
Additional Variable Information
For best results I have used Hybrid Features ( hexdump and DLL) from an executable. After extracting this features I find out the top 500 hex features and top 13 DLL features which are most commonly occuring and prepare file with best features.Now feature amoung this which are found in individual file is been stated in dataset along with 1 while rest are ignored and feature set ends with -1 ie say ( +1 2:1 5:1 45:1 .............. -1) so here +1 states a NON-malacious file while 2:1 states 2nd feature exists similarly for 5,45 while features which do not occur are simply ignored. For MALACIOUS executable we write it as ( -1 6:1 56:1 ............ -1) so Attribute which exists is given a colon 1 ahead of it (:1)
Dataset Files
File | Size |
---|---|
Dataset.rar | 12.5 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset detect_malacious_executable_antivirus = fetch_ucirepo(id=355) # data (as pandas dataframes) X = detect_malacious_executable_antivirus.data.features y = detect_malacious_executable_antivirus.data.targets # metadata print(detect_malacious_executable_antivirus.metadata) # variable information print(detect_malacious_executable_antivirus.variables)
Rumao, P. (2016). Detect Malacious Executable(AntiVirus) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5531V.
Creators
Piyush Rumao
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.