CDC Diabetes Health Indicators


Linked on 9/25/2023

The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.

Dataset Characteristics

Tabular, Multivariate

Subject Area

Health and Medicine

Associated Tasks


Feature Type

Categorical, Integer

# Instances


# Features


Dataset Information

For what purpose was the dataset created?

To better understand the relationship between lifestyle and diabetes in the US

Who funded the creation of the dataset?


What do the instances in this dataset represent?

Each row represents a person participating in this study.

Are there recommended data splits?

Cross validation or a fixed train-test split could be used.

Does the dataset contain data that might be considered sensitive in any way?

- Gender - Income - Education level

Was there any data preprocessing performed?

Bucketing of age

Additional Information

Dataset link:

Has Missing Values?


Introductory Paper

Incidence of End-Stage Renal Disease Attributed to Diabetes Among Persons with Diagnosed Diabetes — United States and Puerto Rico, 2000–2014

By Nilka Rios Burrows, MPH; Israel Hora, PhD; Linda S. Geiss, MA; Edward W. Gregg, PhD; Ann Albright, PhD. 2017

Published in Morbidity and Mortality Weekly Report

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
IDIDIntegerPatient IDno
Diabetes_binaryTargetBinary0 = no diabetes 1 = prediabetes or diabetesno
HighBPFeatureBinary0 = no high BP 1 = high BPno
HighCholFeatureBinary0 = no high cholesterol 1 = high cholesterolno
CholCheckFeatureBinary0 = no cholesterol check in 5 years 1 = yes cholesterol check in 5 yearsno
BMIFeatureIntegerBody Mass Indexno
SmokerFeatureBinaryHave you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] 0 = no 1 = yesno
StrokeFeatureBinary(Ever told) you had a stroke. 0 = no 1 = yesno
HeartDiseaseorAttackFeatureBinarycoronary heart disease (CHD) or myocardial infarction (MI) 0 = no 1 = yesno
PhysActivityFeatureBinaryphysical activity in past 30 days - not including job 0 = no 1 = yesno

0 to 10 of 23

Additional Variable Information

- Diabetes diagnosis - Demographics (race, sex) - Personal information (income, educations) - Health history (drinking, smoking, mental health, physical health)

Class Labels

- Diabetes - Pre-diabetes - Healthy


There are no reviews for this dataset yet.

Login to Write a Review
Dataset Home Page
1 citations


If you use this dataset, please follow the acknowledgment policy on the original dataset website.



By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy