Breast Cancer

Donated on 7/10/1988

This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. This is one of three domains provided by the Oncology Institute that has repeatedly appeared in the machine learning literature. (See also lymphography and primary-tumor.)

Dataset Characteristics

Multivariate

Subject Area

Health and Medicine

Associated Tasks

Classification

Feature Type

Categorical

# Instances

286

# Features

Dataset Information

Additional Information

This data set includes 201 instances of one class and 85 instances of another class. The instances are described by 9 attributes, some of which are linear and some are nominal.

Has Missing Values?

Yes

Variables Table

Variable Name	Role	Type	Demographic	Description	Units	Missing Values
Class	Target	Binary		no-recurrence-events, recurrence-events		no
age	Feature	Categorical	Age	10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99	years	no
menopause	Feature	Categorical		lt40, ge40, premeno		no
tumor-size	Feature	Categorical		0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59		no
inv-nodes	Feature	Categorical		0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39		no
node-caps	Feature	Binary		yes, no		yes
deg-malig	Feature	Integer		1, 2, 3		no
breast	Feature	Binary		left, right		no
breast-quad	Feature	Categorical		left-up, left-low, right-up, right-low, central		yes
irradiat	Feature	Binary		yes, no		no

Rows per page

0 to 10 of 10

Additional Variable Information

1. Class: no-recurrence-events, recurrence-events 2. age: 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99. 3. menopause: lt40, ge40, premeno. 4. tumor-size: 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59. 5. inv-nodes: 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39. 6. node-caps: yes, no. 7. deg-malig: 1, 2, 3. 8. breast: left, right. 9. breast-quad: left-up, left-low, right-up, right-low, central. 10. irradiat: yes, no.

Class Labels

no-recurrence-events, recurrence-events

Baseline Model Performance

Dataset Files

File	Size
breast-cancer.data	18.2 KB
breast-cancer.names	3.1 KB
Index	132 Bytes

Papers Citing this Dataset

Online Data Poisoning Attack

By Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard. 2019

Published in ArXiv.

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

By Nitin Agrawal, Ali Shamsabadi, Matt Kusner, Adria Gasc'on. 2019

Published in ArXiv.

Optimized Realization of Bayesian Networks in Reduced Normal Form using Latent Variable Model

By Giovanni Gennaro, Amedeo Buonanno, Francesco Palmieri. 2019

Published in ArXiv.

Target-Focused Feature Selection Using a Bayesian Approach

By Orpaz Goldstein, Mohammad Kachuee, Kimmo Karkkainen, Majid Sarrafzadeh. 2019

Published in

A Novel Hyperparameter-free Approach to Decision Tree Construction that Avoids Overfitting by Design

By Rafael Leiva, Antonio Anta, Vincenzo Mancuso, Paolo Casari. 2019

Published in ArXiv.

Rows per page

0 to 5 of 147

Reviews

There are no reviews for this dataset yet.

Download (3.5 KB)

147 citations

113259 views

Keywords

cancer health

Creators

Matjaz Zwitter

Milan Soklic

DOI

10.24432/C51P4M

Notes

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Breast Cancer

Donated on 7/10/1988

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Additional Variable Information

Baseline Model Performance

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

Keywords

Creators

DOI

Notes

License