Browse Datasets

Micro Gas Turbine Electrical Energy Prediction

This dataset consists of measurements of electrical power corresponding to an input control signal over time, collected from a 3-kilowatt commercial micro gas turbine.

Printed Circuit Board Processed Image

This CSV dataset, originally used for test-pad coordinate retrieval from PCB images, presents potential applications like classification (e.g., Grey test pad detection), anomaly detection (e.g., fake test pads), or clustering for grey test pads discovery. The dataset includes X and Y representing pixel positions, and R, G, B values determining pixel color (minmax normalized from 0-255). A 'Grey' field indicates approximate grey pixels. This dataset was originally used for a 2-stage discovery of high number of test pad clusters (>100) in a dataset presented in: @article{Tan2016FastRO, title={Fast retrievals of test-pad coordinates from photo images of printed circuit boards}, author={Swee Chuan Tan and Schumann Tong Wei Kit}, journal={2016 International Conference on Advanced Mechatronic Systems (ICAMechS)}, year={2016}, pages={464-467}, url={} } More pixels here than that in the paper due to different extraction method.

PhiUSIIL Phishing URL (Website)

PhiUSIIL Phishing URL Dataset is a substantial dataset comprising 134,850 legitimate and 100,945 phishing URLs. Most of the URLs we analyzed, while constructing the dataset, are the latest URLs. Features are extracted from the source code of the webpage and URL. Features such as CharContinuationRate, URLTitleMatchScore, URLCharProb, and TLDLegitimateProb are derived from existing features.

UR3 CobotOps

The UR3 CobotOps Dataset is an essential collection of multi-dimensional time-series data from the UR3 cobot, offering insights into operational parameters and faults for machine learning in robotics and automation. It features electrical currents, temperatures, speeds across joints (J0-J5), gripper current, operation cycle count, protective stops, and grip losses, collected via MODBUS and RTDE protocols. This dataset supports research in fault detection, predictive maintenance, and operational optimization, providing a detailed operational snapshot of a leading cobot model for industrial applications


The RT-IoT2022, a proprietary dataset derived from a real-time IoT infrastructure, is introduced as a comprehensive resource integrating a diverse range of IoT devices and sophisticated network attack methodologies. This dataset encompasses both normal and adversarial network behaviours, providing a general representation of real-world scenarios. Incorporating data from IoT devices such as ThingSpeak-LED, Wipro-Bulb, and MQTT-Temp, as well as simulated attack scenarios involving Brute-Force SSH attacks, DDoS attacks using Hping and Slowloris, and Nmap patterns, RT-IoT2022 offers a detailed perspective on the complex nature of network traffic. The bidirectional attributes of network traffic are meticulously captured using the Zeek network monitoring tool and the Flowmeter plugin. Researchers can leverage the RT-IoT2022 dataset to advance the capabilities of Intrusion Detection Systems (IDS), fostering the development of robust and adaptive security solutions for real-time IoT networks.

Regensburg Pediatric Appendicitis

This repository holds the data from a cohort of pediatric patients with suspected appendicitis admitted with abdominal pain to Children’s Hospital St. Hedwig in Regensburg, Germany, between 2016 and 2021. Each patient has (potentially multiple) ultrasound (US) images, aka views, tabular data comprising laboratory, physical examination, scoring results and ultrasonographic findings extracted manually by the experts, and three target variables, namely, diagnosis, management and severity.

National Poll on Healthy Aging (NPHA)

This is a subset of the NPHA dataset filtered down to develop and validate machine learning algorithms for predicting the number of doctors a survey respondent sees in a year. This dataset’s records represent seniors who responded to the NPHA survey.

Infrared Thermography Temperature

The Infrared Thermography Temperature Dataset contains temperatures read from various locations of inferred images about patients, with the addition of oral temperatures measured for each individual. The 33 features consist of gender, age, ethnicity, ambiant temperature, humidity, distance, and other temperature readings from the thermal images. The dataset is intended to be used in a regression task to predict the oral temperature using the environment information as well as the thermal image readings.

Jute Pest

This dataset has 17 classes. Data are divided in three partition train, val and test. The classes are 0 : Beet Armyworm 1 : Black Hairy 2 : Cutworm 3 : Field Cricket 4 : Jute Aphid 5 : Jute Hairy 6 : Jute Red Mite 7 : Jute Semilooper 8 : Jute Stem Girdler 9 : Jute Stem Weevil 10 : Leaf Beetle 11 : Mealybug 12 : Pod Borer 13 : Scopula Emissaria 14 : Termite 15 : Termite odontotermes (Rambur) 16 : Yellow Mite

Differentiated Thyroid Cancer Recurrence

This data set contains 13 clinicopathologic features aiming to predict recurrence of well differentiated thyroid cancer. The data set was collected in duration of 15 years and each patient was followed for at least 10 years.

0 to 10 of 667

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy