Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Download: Data Folder, Data Set Description

Abstract: REJAFADA (Retrieval of Jar Files Applied to Dynamic Analysis) aims to be used, as benchmark, to check the quality of the detection of Jar malware.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Sidney M. L. de Lima: sidney.lima '@'
Electronics and Systems Department, Federal University of Pernambuco, Arquitetura Avenue No Number, Block A, 4th Andar, Cidade Universitária, Recife, Brazil

Ricardo Pinheiro, Danilo Souza, Sthéfano Silva, Petrônio Lopes, Rafael Lima, Jemerson Oliveira, Thyago Monteiro, Sérgio Murilo, Edison Albuquerque: {rpp3, dms2, shmts, pgl, rdtl, jro, tam, smurilo, edison}
Computer Engineering, University of Pernambuco, Benfica Avenue 45, Recife, Brazil.

Data Set Information:

The REJAFADA (Retrieval of Jar Files Applied to Dynamic Analysis) is a dataset which allows the classification of files with Jar extension between benign and malwares. The REJAFADA is composed of 998 malware Jar files and 998 other benign Jar files. The REJAFADA dataset, consequently, is suitable for learning endowed with AI (Artificial Intelligence), considering that the Jar files presented the same amount in the different classes (malware and benign). The goal is that tendentious classifiers, in relation to a certain class, do not have their success taxes favored.
In relation to virtual plagues, REJAFADA extracted malicious Jar files from VirusShare which is a repository of malware samples to provide security researchers, incident responders, forensic analysts, and the morbidly curious access to samples of live malicious code. With respect to benign Jar files, the catalog was given from application repositories such as, and All of the benign files have been audited by VirusTotal. Then, the benign Jar files, contained in REJAFADA, had their benevolence attested by the main commercial antiviruses of the world. The obtained results corresponding to the analyses of the benign and malware Jar files, resulting from the VirusTotal audit, are available for consultation at the virtual address of REJAFADA ¹.
The features of Jar files originate through the dynamic analysis of suspicious files. Therefore, in our methodology, the malware is executed in order to infect, intentionally, the Java Virtual Machine installed in Windows 7 audited, in real time (dynamic), by the Cuckoo Sandbox.

1. REJAFADA (A Retrieval of Jar Files Applied to Dynamic Analysis). Available in: [Web Link]. Accessed on June 2018.

Attribute Information:

1) Application name
2) Class (M = malware, B = benign)
3) Input Attribute (3-6826).
Next, the groups of features are detailed
• Features related to virtual machines.
• Features related to malware.
• Features related to Backdoors.
• Features related to the banking threats (Trojan horses).
• Features related to Bitcoin.
• Features related to bots (machines that perform automatic network tasks, malicious or not, without the knowledge of their owners).
• Features related to browsers.
• Features related to Firewall.
• Features related to cloud computing.
• Features related to DDoS (Dynamic Danial of Service) attacks.
• Features that seek to disable features of Windows 7 OS and other utilities.
• Features associated with network traffic hint windows 7 OS in PCAP format.
• Features related to DNS servers (Domain Name System, servers responsible for the translation of URL addresses in IP).
• Features related to native Windows 7 OS programs.
• Features related to Windows 7 Boot OS.
• Features related to Windows 7 OS (Regedit).
• Features related to the use of sandboxes. The digital forensics examines whether the file tried tries to detect whether sandboxes: Cuckoo, Joe, Anubis, Sunbelt, ThreatTrack / GFI / CW or Fortinet are being used, through the presence of their own files.
• Features related to antivirus. Checks if the file being investigated tries to check for registry keys, in regedit, for Chinese antivirus.
• Features related to Ransomware (type of malware that by means of encryption, leaves the victim's files unusable, then request a redemption in exchange for the normal use later of the user's files, a redemption usually paid in a non-traceable way, such as bitcoins).
• Features related to exploit-related features which constitute malware attempting to exploit known or unackaged vulnerabilities, faults or defects in the system or one or more of its components in order to cause unforeseen instabilities and behavior on both your hardware and in your software.
• Features related to Infostealers, malicious programs that collect confidential information from the affected computer.

Relevant Papers:

PINHEIRO, RICARDO ; LIMA, SIDNEY ; FERNANDES, SERGIO ; ALBUQUERQUE, EDISON ; MEDEIROS, SERGIO ; SOUZA, DANILO ; MONTEIRO, THYAGO ; LOPES, PETRONIO ; LIMA, RAFAEL ; OLIVEIRA, JEMERSON ; SILVA, STHEFANO . Next Generation Antivirus Applied to Jar Malware Detection based on Runtime Behaviors using Neural Networks. In: 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2019, Porto. 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD),doi: [Web Link], 2019.

Citation Request:

If you have no special citation requests, please leave this field blank.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML