CDC Diabetes Health Indicators
Linked on 9/25/2023
The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.
Dataset Characteristics
Tabular, Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Categorical, Integer
# Instances
253680
# Features
21
Dataset Information
For what purpose was the dataset created?
To better understand the relationship between lifestyle and diabetes in the US
Who funded the creation of the dataset?
The CDC
What do the instances in this dataset represent?
Each row represents a person participating in this study.
Are there recommended data splits?
Cross validation or a fixed train-test split could be used.
Does the dataset contain data that might be considered sensitive in any way?
- Gender - Income - Education level
Was there any data preprocessing performed?
Bucketing of age
Additional Information
Dataset link: https://www.cdc.gov/brfss/annual_data/annual_2014.html
Has Missing Values?
No
Introductory Paper
By Nilka Rios Burrows, MPH; Israel Hora, PhD; Linda S. Geiss, MA; Edward W. Gregg, PhD; Ann Albright, PhD. 2017
Published in Morbidity and Mortality Weekly Report
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
ID | ID | Integer | Patient ID | no | ||
Diabetes_binary | Target | Binary | 0 = no diabetes 1 = prediabetes or diabetes | no | ||
HighBP | Feature | Binary | 0 = no high BP 1 = high BP | no | ||
HighChol | Feature | Binary | 0 = no high cholesterol 1 = high cholesterol | no | ||
CholCheck | Feature | Binary | 0 = no cholesterol check in 5 years 1 = yes cholesterol check in 5 years | no | ||
BMI | Feature | Integer | Body Mass Index | no | ||
Smoker | Feature | Binary | Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] 0 = no 1 = yes | no | ||
Stroke | Feature | Binary | (Ever told) you had a stroke. 0 = no 1 = yes | no | ||
HeartDiseaseorAttack | Feature | Binary | coronary heart disease (CHD) or myocardial infarction (MI) 0 = no 1 = yes | no | ||
PhysActivity | Feature | Binary | physical activity in past 30 days - not including job 0 = no 1 = yes | no |
0 to 10 of 23
Additional Variable Information
- Diabetes diagnosis - Demographics (race, sex) - Personal information (income, educations) - Health history (drinking, smoking, mental health, physical health)
Class Labels
- Diabetes - Pre-diabetes - Healthy
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset cdc_diabetes_health_indicators = fetch_ucirepo(id=891) # data (as pandas dataframes) X = cdc_diabetes_health_indicators.data.features y = cdc_diabetes_health_indicators.data.targets # metadata print(cdc_diabetes_health_indicators.metadata) # variable information print(cdc_diabetes_health_indicators.variables)
CDC Diabetes Health Indicators [Dataset]. (2017). UCI Machine Learning Repository. https://doi.org/10.24432/C53919.
Citations/Acknowledgements
If you use this dataset, please follow the acknowledgment policy on the original dataset website.