Mammographic Mass
Donated on 10/28/2007
Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age.
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Integer
# Instances
961
# Features
5
Dataset Information
Additional Information
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years.These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. This data set can be used to predict the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient's age. It contains a BI-RADS assessment, the patient's age and three BI-RADS attributes together with the ground truth (the severity field) for 516 benign and 445 malignant masses that have been identified on full field digital mammograms collected at the Institute of Radiology of the University Erlangen-Nuremberg between 2003 and 2006. Each instance has an associated BI-RADS assessment ranging from 1 (definitely benign) to 5 (highly suggestive of malignancy) assigned in a double-review process by physicians. Assuming that all cases with BI-RADS assessments greater or equal a given value (varying from 1 to 5), are malignant and the other cases benign, sensitivities and associated specificities can be calculated. These can be an indication of how well a CAD system performs compared to the radiologists. Class Distribution: benign: 516; malignant: 445
Has Missing Values?
Yes
Introductory Paper
By M. Elter, R. Schulz-Wendtland, T. Wittenberg. 2007
Published in Medical Physics (Lancaster)
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
BI-RADS | Feature | Integer | yes | |||
Age | Feature | Integer | Age | yes | ||
Shape | Feature | Integer | yes | |||
Margin | Feature | Integer | yes | |||
Density | Feature | Integer | yes | |||
Severity | Target | Binary | no |
0 to 6 of 6
Additional Variable Information
6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes) 1. BI-RADS assessment: 1 to 5 (ordinal, non-predictive!) 2. Age: patient's age in years (integer) 3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal) 4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal) 5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal) 6. Severity: benign=0 or malignant=1 (binominal, goal field!) Missing Attribute Values: - BI-RADS assessment: 2 - Age: 5 - Shape: 31 - Margin: 48 - Density: 76 - Severity: 0
Dataset Files
File | Size |
---|---|
mammographic_masses.data | 13.1 KB |
mammographic_masses.names | 3.3 KB |
Papers Citing this Dataset
Sort by Year, desc
By Minghao Gu, Shiliang Sun. 2019
Published in ArXiv.
By Harris Papadopoulos. 2011
Published in EANN/AIAI.
0 to 4 of 4
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset mammographic_mass = fetch_ucirepo(id=161) # data (as pandas dataframes) X = mammographic_mass.data.features y = mammographic_mass.data.targets # metadata print(mammographic_mass.metadata) # variable information print(mammographic_mass.variables)
Elter, M. (2007). Mammographic Mass [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C53K6Z.
Creators
Matthias Elter
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.