
Diabetes 130-US Hospitals for Years 1999-2008
Donated on 5/2/2014
The dataset represents ten years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. Each row concerns hospital records of patients diagnosed with diabetes, who underwent laboratory, medications, and stayed up to 14 days. The goal is to determine the early readmission of the patient within 30 days of discharge. The problem is important for the following reasons. Despite high-quality evidence showing improved clinical outcomes for diabetic patients who receive various preventive and therapeutic interventions, many patients do not receive them. This can be partially attributed to arbitrary diabetes management in hospital environments, which fail to attend to glycemic control. Failure to provide proper diabetes care not only increases the managing costs for the hospitals (as the patients are readmitted) but also impacts the morbidity and mortality of the patients, who may face complications associated with diabetes.
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification, Clustering
Feature Type
Categorical, Integer
# Instances
101766
# Features
47
Dataset Information
What do the instances in this dataset represent?
The instances represent hospitalized patient records diagnosed with diabetes.
Are there recommended data splits?
No recommendation. The standard train-test split could be used. Can use three-way holdout split (i.e., train-validation-test) when doing model selection.
Does the dataset contain data that might be considered sensitive in any way?
Yes. The dataset contains information about the age, gender, and race of the patients.
Additional Information
The dataset represents ten years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria. (1) It is an inpatient encounter (a hospital admission). (2) It is a diabetic encounter, that is, one during which any kind of diabetes was entered into the system as a diagnosis. (3) The length of stay was at least 1 day and at most 14 days. (4) Laboratory tests were performed during the encounter. (5) Medications were administered during the encounter. The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab tests performed, HbA1c test result, diagnosis, number of medications, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.
Has Missing Values?
Yes
Introductory Paper
By Beata Strack, Jonathan DeShazo, Chris Gennings, Juan Olmo, Sebastian Ventura, Krzysztof Cios, John Clore. 2014
Published in BioMed Research International, vol. 2014
Variables Table
| Variable Name | Role | Type | Demographic | Description | Units | Missing Values | 
|---|---|---|---|---|---|---|
| encounter_id | ID | Unique identifier of an encounter | no | |||
| patient_nbr | ID | Unique identifier of a patient | no | |||
| race | Feature | Categorical | Race | Values: Caucasian, Asian, African American, Hispanic, and other | yes | |
| gender | Feature | Categorical | Gender | Values: male, female, and unknown/invalid | no | |
| age | Feature | Categorical | Age | Grouped in 10-year intervals: [0, 10), [10, 20),..., [90, 100) | no | |
| weight | Feature | Categorical | Weight in pounds. | yes | ||
| admission_type_id | Feature | Categorical | Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available | no | ||
| discharge_disposition_id | Feature | Categorical | Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available | no | ||
| admission_source_id | Feature | Categorical | Integer identifier corresponding to 21 distinct values, for example, physician referral, emergency room, and transfer from a hospital | no | ||
| time_in_hospital | Feature | Integer | Integer number of days between admission and discharge | no | 
0 to 10 of 50
Additional Variable Information
Detailed description of all the atrributes is provided in Table 1 Beata Strack, Jonathan P. DeShazo, Chris Gennings, Juan L. Olmo, Sebastian Ventura, Krzysztof J. Cios, and John N. Clore, “Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records,” BioMed Research International, vol. 2014, Article ID 781670, 11 pages, 2014. http://www.hindawi.com/journals/bmri/2014/781670/
Dataset Files
| File | Size | 
|---|---|
| diabetic_data.csv | 18.3 MB | 
| IDS_mapping.csv | 2.5 KB | 
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset diabetes_130_us_hospitals_for_years_1999_2008 = fetch_ucirepo(id=296) # data (as pandas dataframes) X = diabetes_130_us_hospitals_for_years_1999_2008.data.features y = diabetes_130_us_hospitals_for_years_1999_2008.data.targets # metadata print(diabetes_130_us_hospitals_for_years_1999_2008.metadata) # variable information print(diabetes_130_us_hospitals_for_years_1999_2008.variables)
Clore, J., Cios, K., DeShazo, J., & Strack, B. (2014). Diabetes 130-US Hospitals for Years 1999-2008 [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5230J.
Creators
John Clore
Krzysztof Cios
Jon DeShazo
Beata Strack
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.