National Health and Nutrition Health Survey 2013-2014 (NHANES) Age Prediction Subset

Donated on 9/21/2023

The National Health and Nutrition Examination Survey (NHANES), administered by the Centers for Disease Control and Prevention (CDC), collects extensive health and nutritional information from a diverse U.S. population. Though expansive, the dataset is often too broad for specific analytical purposes. In this sub-dataset, we narrow our focus to predicting respondents' age by extracting a subset of features from the larger NHANES dataset. These selected features include physiological measurements, lifestyle choices, and biochemical markers, which were hypothesized to have strong correlations with age.

Dataset Characteristics


Subject Area

Health and Medicine

Associated Tasks


Feature Type

Real, Categorical, Integer

# Instances


# Features


Dataset Information

For what purpose was the dataset created?

The NHANES dataset was created to assess the health and nutritional status of adults and children in the United States.

Who funded the creation of the dataset?

Centers for Disease Control and Prevention (CDC), specifically through its National Center for Health Statistics (NCHS)

What do the instances in this dataset represent?

Survey respondents throughout the United States Data was gathered through interviews, physical examinations, and laboratory tests.

Was there any data preprocessing performed?

For this subset respondents 65 years old and older were labeled as “senior” and all individuals under 65 years old as “non-senior.”

Additional Information

The original full dataset can be found at:

Has Missing Values?


Introductory Paper

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning

By An Dinh, Stacey Miertschin, Amber Young, S. Mohanty. 2019

Published in BMC Medical Informatics and Decision Making

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
SEQNIDContinuousRespondent Sequence Numberno
age_groupTargetCategoricalAgeRespondent's Age Group (senior/non-senior)no
RIDAGEYROtherContinuousAgeRespondent's Ageno
RIAGENDRFeatureContinuousGenderRespondent's Genderno
PAQ605FeatureContinuousIf the respondent engages in moderate or vigorous-intensity sports, fitness, or recreational activities in the typical weekno
BMXBMIFeatureContinuousRespondent's Body Mass Indexno
LBXGLUFeatureContinuousRespondent's Blood Glucose after fastingno
DIQ010FeatureContinuousIf the Respondent is diabeticno
LBXGLTFeatureContinuousRespondent's Oral no
LBXINFeatureContinuousRespondent's Blood Insulin Levelsno

0 to 10 of 10

Additional Variable Information

Class Labels

RIAGENDR: a 1 represents Male and 2 represents Female PAQ605: a 1 represents that the respondent takes part in weekly moderate or vigorous-intensity physical activity and a 2 represents that they do not


There are no reviews for this dataset yet.

Login to Write a Review
1 citations





National Center for Health Statistics (NCHS) at the Centers for Disease Control and Prevention (CDC)


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy