Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

REWEMA Data Set
Download: Data Folder, Data Set Description

Abstract: REWEMA (Retrieval of 32-bit Windows Architecture Executables Applied to Malware Analysis) can be used in Artificial intelligence-based antivirus.

Data Set Characteristics:  

Multivariate

Number of Instances:

6272

Area:

Computer

Attribute Characteristics:

Integer

Number of Attributes:

632

Date Donated

2020-12-13

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

1518


Source:

Sidney M. L. de Lima
Electronics and Systems Department, Federal University of Pernambuco, Arquitetura Avenue No Number, Block A, 4th Andar, Cidade Universitária, Recife, Brazil

Heverton K. de L. Silva & João H. da S. Luz
Tempest Security Intelligence, Alfândega Street n.35, Bairro Do Recife, Recife, Brazil

Hercília J. do N. Lima, Samuel L. de P. Silva, Anna B. A. de Andrade & Alisson M. da Silva
São Miguel Faculty, Conde da Boa Vista Avenue n. 1410, Boa Vista, Recife, Brazil


Data Set Information:

The extraction of features of executables employs the process of disassembling. Then, the algorithm, referring to the executable, can be studied and later classified by the neural. There are 3136 malicious executables and 3136 other benign executables. Therefore, the REWEMA base is suitable for learning with artificial intelligence, since both classes of executables have the same amount.
As for malicious executables, REWEMA is the junction of several malware databases. Virtual plagues were extracted from databases provided by enthusiastic study groups such as Vxheaven and TheZoo. Malwares, contained in REWEMA, are widely used in various malicious activities. In REWEMA, there are Trojan horse, Worm, Constructor, Exploit, HackTool, Hoax, Backdoor, Rootkit, Virus and Spyware. In addition, REWEMA contains Botnets aiming Flooder, CC (command-and-control server), DoS (Denial of Service), IRC (Internet Relay Chat), CF (Click Fraud) and SPAM e-mail.
As for benign executables, the acquisition came from benign application repositories such as sourceforge, github and sysinternals. It should be noted that all benign executables were submitted to VirusTotal and all were its benign attested by the main commercial antivirus worldwide. The diagnostics, provided by VirusTotal, corresponding to the benign and malware executables are available in the virtual address of the REWEMA database ¹.

1. REWEMA (Retrieval of 32-bit Windows Architecture Executables Applied to Malware Analysis). [Web Link]. Accessed on Feb 2020


Attribute Information:

1) Application name
2) Class (M = malware, B = benign)
3) Input Attribute (3-632).
Next, the groups of features extracted from the executables investigated are detailed.
• Histogram of instructions, in assembly, referring to the mnemonic.
• Number of subroutines invoking TLS (transport layer security).
• Number of subroutines responsible for exporting data (exports).
• APIs (application programming interface) used by the executable.
• Features related to clues that the computer has suffered fragmentation on its hard disk, as well as accumulated invalid boot attempts.
• Application execution mode. There are two options:software with a graphical interface (GUI); software running on the console.
• Features related to the operating system.
• Features related to Windows Registry (Regedit).
• Features related to spywares such as keyloggers (capture of keyboard information in order to theft of passwords and logins) and screenloggers (screen shot of the victim).
• Features related to antiforensic digital which are techniques of removal, occultation and subversion of evidences with the goal of reducing the consequences of the results of forensic analyzes.
• Features related to the creation of GUI (Graphical User Interface) of the suspicious program.
• Features related to the illicit forensic of the RAM (main memory) of the local system.
• Features related to network traffic.
• Features related to utility application programs.


Relevant Papers:

de Lima, S.M.L., Silva, H.K.d.L., Luz, J.H.d.S. et al. Artificial intelligence-based antivirus in order to detect malware preventively. Prog Artif Intell (2020). [Web Link]



Citation Request:

If you have no special citation requests, please leave this field blank.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML