Detect Malacious Executable(AntiVirus)

Donated on 3/2/2016

I extract features from malacious and non-malacious and create and training dataset to teach svm classifier.Dataset made of unknown executable to detect if it is virus or normal safe executable.

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Real

# Instances

373

# Features

-

Dataset Information

Additional Information

TRAINING File : I have created training file with 100+ non malacious examples and 250+ malacious samples. NON-MALACIOUS dataset is represented by +1 while MALACIOUS datset is represented by -1 as label. Based on comparison and analysis I have selected 500 most commonly occuring features in MALACIOUS and NON-MALACIOUS file and compared extracted features of each file with this best features. The file is saved with .train extension. TESTING file: We select a unknown malacious executable and carry out same procedure on it ( however we can put it in any class +1/ -1) cuz svmpredict will any way corretly find it for us. We save this testing file with .test extension.

Has Missing Values?

Yes

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
no
no
no
no
no
no
no
no
no
no

0 to 10 of 513

Additional Variable Information

For best results I have used Hybrid Features ( hexdump and DLL) from an executable. After extracting this features I find out the top 500 hex features and top 13 DLL features which are most commonly occuring and prepare file with best features.Now feature amoung this which are found in individual file is been stated in dataset along with 1 while rest are ignored and feature set ends with -1 ie say ( +1 2:1 5:1 45:1 .............. -1) so here +1 states a NON-malacious file while 2:1 states 2nd feature exists similarly for 5,45 while features which do not occur are simply ignored. For MALACIOUS executable we write it as ( -1 6:1 56:1 ............ -1) so Attribute which exists is given a colon 1 ahead of it (:1)

Dataset Files

FileSize
Dataset.rar12.5 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (12.6 KB)
0 citations
1614 views

Creators

Piyush Rumao

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy