Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (100)
Regression (26)
Clustering (22)
Other (5)

Attribute Type

Categorical (0)
Numerical (21)
Mixed (3)

Data Type

Multivariate (25)
Univariate (1)
Sequential (0)
Time-Series (3)
Text (0)
Domain-Theory (1)
Other (0)

Area

Life Sciences (3)
Physical Sciences (4)
CS / Engineering (9)
Social Sciences (2)
Business (5)
Game (0)
Other (3)

# Attributes

Less than 10 (9)
10 to 100 (13)
Greater than 100 (4)

# Instances - Undo

Less than 100 (5)
100 to 1000 (26)
Greater than 1000 (55)

Format Type - Undo

Matrix (26)
Non-Matrix (8)

26 Data Sets

Table View  List View


1. Auto MPG: Revised from CMU StatLib library, data concerns city-cycle fuel consumption

2. Automobile: From 1985 Ward's Automotive Yearbook

3. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database

4. Computer Hardware: Relative CPU Performance Data, described in terms of its cycle time, memory size, etc.

5. Concrete Slump Test: Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients.

6. DrivFace: The DrivFace contains images sequences of subjects while driving in real scenarios. It is composed of 606 samples of 640Ă—480, acquired over different days from 4 drivers with several facial features.

7. Early biomarkers of Parkinson’s disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease.

8. Energy efficiency: This study looked into assessing the heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters.

9. Facebook metrics: Facebook performance metrics of a renowned cosmetic's brand Facebook page.

10. Forest Fires: This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at: http://www.dsi.uminho.pt/~pcortez/forestfires).

11. Gas sensor array exposed to turbulent gas mixtures: A chemical detection platform composed of 8 chemoresistive gas sensors was exposed to turbulent gas mixtures generated naturally in a wind tunnel. The acquired time series of the sensors are provided.

12. ISTANBUL STOCK EXCHANGE: Data sets includes returns of Istanbul Stock Exchange with seven other international index; SP, DAX, FTSE, NIKKEI, BOVESPA, MSCE_EU, MSCI_EM from Jun 5, 2009 to Feb 22, 2011.

13. Las Vegas Strip: This dataset includes quantitative and categorical features from online reviews from 21 hotels located in Las Vegas Strip, extracted from TripAdvisor (http://www.tripadvisor.com).

14. Optical Interconnection Network : This dataset contains 640 performance measurements from a simulation of 2-Dimensional Multiprocessor Optical Interconnection Network.

15. QSAR aquatic toxicity: Data set containing values for 8 attributes (molecular descriptors) of 546 chemicals used to predict quantitative acute aquatic toxicity towards Daphnia Magna..

16. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling.

17. QSAR fish toxicity: Data set containing values for 6 attributes (molecular descriptors) of 908 chemicals used to predict quantitative acute aquatic toxicity towards the fish Pimephales promelas (fathead minnow).

18. Real estate valuation data set: The “real estate valuation” is a regression problem. The market historical data set of real estate valuation are collected from Sindian Dist., New Taipei City, Taiwan.

19. Residential Building Data Set: Data set includes construction cost, sale prices, project variables, and economic variables corresponding to real estate single-family residential apartments in Tehran, Iran.

20. Servo: Data was from a simulation of a servo system

21. Stock portfolio performance: The data set of performances of weighted scoring stock portfolios are obtained with mixture design from the US stock market historical database.

22. Student Performance: Predict student performance in secondary education (high school).

23. Tennis Major Tournament Match Statistics: This is a collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.

24. Twin gas sensor arrays: 5 replicates of an 8-MOX gas sensor array were exposed to different gas conditions (4 volatiles at 10 concentration levels each).

25. wiki4HE: Survey of faculty members from two Spanish universities on teaching uses of Wikipedia

26. Yacht Hydrodynamics: Delft data set, used to predict the hydodynamic performance of sailing yachts from dimensions and velocity.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML