Breast Cancer
Donated on 7/10/1988
This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature. (See also lymphography and primary-tumor.)
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Categorical
# Instances
286
# Features
9
Dataset Information
Additional Information
This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
Class | Target | Binary | no-recurrence-events, recurrence-events | no | ||
age | Feature | Categorical | Age | 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99 | years | no |
menopause | Feature | Categorical | lt40, ge40, premeno | no | ||
tumor-size | Feature | Categorical | 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59 | no | ||
inv-nodes | Feature | Categorical | 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39 | no | ||
node-caps | Feature | Binary | yes, no | yes | ||
deg-malig | Feature | Integer | 1, 2, 3 | no | ||
breast | Feature | Binary | left, right | no | ||
breast-quad | Feature | Categorical | left-up, left-low, right-up, right-low, central | yes | ||
irradiat | Feature | Binary | yes, no | no |
0 to 10 of 10
Additional Variable Information
1. Class: no-recurrence-events, recurrence-events 2. age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. 3. menopause: lt40, ge40, premeno. 4. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59. 5. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39. 6. node-caps: yes, no. 7. deg-malig: 1, 2, 3. 8. breast: left, right. 9. breast-quad: left-up, left-low, right-up, right-low, central. 10. irradiat: yes, no.
Class Labels
no-recurrence-events, recurrence-events
Baseline Model Performance
Dataset Files
File | Size |
---|---|
breast-cancer.data | 18.2 KB |
breast-cancer.names | 3.1 KB |
Index | 132 Bytes |
Papers Citing this Dataset
Sort by Year, desc
By Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard. 2019
Published in ArXiv.
By Nitin Agrawal, Ali Shamsabadi, Matt Kusner, Adria Gasc'on. 2019
Published in ArXiv.
By Giovanni Gennaro, Amedeo Buonanno, Francesco Palmieri. 2019
Published in ArXiv.
By Orpaz Goldstein, Mohammad Kachuee, Kimmo Karkkainen, Majid Sarrafzadeh. 2019
Published in
By Rafael Leiva, Antonio Anta, Vincenzo Mancuso, Paolo Casari. 2019
Published in ArXiv.
0 to 5 of 147
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset breast_cancer = fetch_ucirepo(id=14) # data (as pandas dataframes) X = breast_cancer.data.features y = breast_cancer.data.targets # metadata print(breast_cancer.metadata) # variable information print(breast_cancer.variables)
Zwitter, M. & Soklic, M. (1988). Breast Cancer [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C51P4M.
Creators
Matjaz Zwitter
Milan Soklic
DOI
Notes
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.