Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Browse Through:

Default Task - Undo

Classification (82)
Regression (21)
Clustering (13)
Other (3)

Attribute Type

Categorical (0)
Numerical (18)
Mixed (1)

Data Type - Undo

Multivariate (21)
Univariate (0)
Sequential (0)
Time-Series (2)
Text (1)
Domain-Theory (0)
Other (0)

Area

Life Sciences (7)
Physical Sciences (1)
CS / Engineering (5)
Social Sciences (2)
Business (4)
Game (0)
Other (2)

# Attributes - Undo

Less than 10 (10)
10 to 100 (21)
Greater than 100 (5)

# Instances - Undo

Less than 100 (0)
100 to 1000 (21)
Greater than 1000 (51)

Format Type

Matrix (17)
Non-Matrix (4)

21 Data Sets

Table View  List View


1. wiki4HE: Survey of faculty members from two Spanish universities on teaching uses of Wikipedia

2. QSAR Bioconcentration classes dataset: Dataset of manually-curated Bioconcentration factor (BCF, fish) and mechanistic classes for QSAR modeling.

3. Automobile: From 1985 Ward's Automotive Yearbook

4. Student Performance: Predict student performance in secondary education (high school).

5. Facebook metrics: Facebook performance metrics of a renowned cosmetic's brand Facebook page.

6. CSM (Conventional and Social Media Movies) Dataset 2014 and 2015: 12 features categorized as conventional and social media features. Both conventional features, collected from movies databases on Web as well as social media features(YouTube,Twitter).

7. Heart failure clinical records: This dataset contains the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

8. South German Credit: 700 good and 300 bad credits with 20 predictor variables. Data from 1973 to 1975. Stratified sample from actual credits with bad credits heavily oversampled. A cost matrix can be used.

9. Tennis Major Tournament Match Statistics: This is a collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.

10. Bone marrow transplant: children: The data set describes pediatric patients with several hematologic diseases, who were subject to the unmanipulated allogeneic unrelated donor hematopoietic stem cell transplantation.

11. South German Credit (UPDATE): 700 good and 300 bad credits with 20 predictor variables. Data from 1973 to 1975. Stratified sample from actual credits with bad credits heavily oversampled. A cost matrix can be used.

12. Early biomarkers of Parkinson’s disease based on natural connected speech: Predict a pattern of neurodegeneration in the dataset of speech features obtained from patients with early untreated Parkinson’s disease and patients at high risk developing Parkinson’s disease.

13. Optical Interconnection Network : This dataset contains 640 performance measurements from a simulation of 2-Dimensional Multiprocessor Optical Interconnection Network.

14. Behavior of the urban traffic of the city of Sao Paulo in Brazil: The database was created with records of behavior of the urban traffic of the city of Sao Paulo in Brazil.

15. Breast Cancer Wisconsin (Prognostic): Prognostic Wisconsin Breast Cancer Database

16. Algerian Forest Fires Dataset : The dataset includes 244 instances that regroup a data of two regions of Algeria.

17. GPS Trajectories: The dataset has been feed by Android app called Go!Track. It is available at Goolge Play Store(https://play.google.com/store/apps/details?id=com.go.router).

18. Stock portfolio performance: The data set of performances of weighted scoring stock portfolios are obtained with mixture design from the US stock market historical database.

19. Forest Fires: This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data (see details at: http://www.dsi.uminho.pt/~pcortez/forestfires).

20. Concrete Slump Test: Concrete is a highly complex material. The slump flow of concrete is not only determined by the water content, but that is also influenced by other concrete ingredients.

21. Fertility: 100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML