National Health and Nutrition Health Survey 2013-2014 (NHANES) Age Prediction Subset
Donated on 9/21/2023
The National Health and Nutrition Examination Survey (NHANES), administered by the Centers for Disease Control and Prevention (CDC), collects extensive health and nutritional information from a diverse U.S. population. Though expansive, the dataset is often too broad for specific analytical purposes. In this sub-dataset, we narrow our focus to predicting respondents' age by extracting a subset of features from the larger NHANES dataset. These selected features include physiological measurements, lifestyle choices, and biochemical markers, which were hypothesized to have strong correlations with age.
Dataset Characteristics
Tabular
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Real, Categorical, Integer
# Instances
6287
# Features
7
Dataset Information
For what purpose was the dataset created?
The NHANES dataset was created to assess the health and nutritional status of adults and children in the United States.
Who funded the creation of the dataset?
Centers for Disease Control and Prevention (CDC), specifically through its National Center for Health Statistics (NCHS)
What do the instances in this dataset represent?
Survey respondents throughout the United States Data was gathered through interviews, physical examinations, and laboratory tests.
Was there any data preprocessing performed?
For this subset respondents 65 years old and older were labeled as “senior” and all individuals under 65 years old as “non-senior.”
Additional Information
The original full dataset can be found at: https://wwwn.cdc.gov/nchs/nhanes/search/DataPage.aspx?Component=Questionnaire&CycleBeginYear=2013
Has Missing Values?
No
Introductory Paper
By An Dinh, Stacey Miertschin, Amber Young, S. Mohanty. 2019
Published in BMC Medical Informatics and Decision Making
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
SEQN | ID | Continuous | Respondent Sequence Number | no | ||
age_group | Target | Categorical | Age | Respondent's Age Group (senior/non-senior) | no | |
RIDAGEYR | Other | Continuous | Age | Respondent's Age | no | |
RIAGENDR | Feature | Continuous | Gender | Respondent's Gender | no | |
PAQ605 | Feature | Continuous | If the respondent engages in moderate or vigorous-intensity sports, fitness, or recreational activities in the typical week | no | ||
BMXBMI | Feature | Continuous | Respondent's Body Mass Index | no | ||
LBXGLU | Feature | Continuous | Respondent's Blood Glucose after fasting | no | ||
DIQ010 | Feature | Continuous | If the Respondent is diabetic | no | ||
LBXGLT | Feature | Continuous | Respondent's Oral | no | ||
LBXIN | Feature | Continuous | Respondent's Blood Insulin Levels | no |
0 to 10 of 10
Additional Variable Information
Class Labels
RIAGENDR: a 1 represents Male and 2 represents Female PAQ605: a 1 represents that the respondent takes part in weekly moderate or vigorous-intensity physical activity and a 2 represents that they do not
Dataset Files
File | Size |
---|---|
NHANES_age_prediction.csv | 116.8 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset national_health_and_nutrition_health_survey_2013_2014_nhanes_age_prediction_subset = fetch_ucirepo(id=887) # data (as pandas dataframes) X = national_health_and_nutrition_health_survey_2013_2014_nhanes_age_prediction_subset.data.features y = national_health_and_nutrition_health_survey_2013_2014_nhanes_age_prediction_subset.data.targets # metadata print(national_health_and_nutrition_health_survey_2013_2014_nhanes_age_prediction_subset.metadata) # variable information print(national_health_and_nutrition_health_survey_2013_2014_nhanes_age_prediction_subset.variables)
NA, N. (2019). National Health and Nutrition Health Survey 2013-2014 (NHANES) Age Prediction Subset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5BS66.
Keywords
Creators
NA NA
National Center for Health Statistics (NCHS) at the Centers for Disease Control and Prevention (CDC)
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.