Census Income

Donated on 4/30/1996

Predict whether income exceeds $50K/yr based on census data. Also known as Adult dataset.

Dataset Characteristics


Subject Area

Social Science

Associated Tasks


Feature Type

Categorical, Integer

# Instances


# Features


Dataset Information

Additional Information

Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)) Prediction task is to determine whether a person makes over 50K a year.

Has Missing Values?


Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
workclassFeatureCategoricalIncomePrivate, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.yes
educationFeatureCategoricalEducation Level Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.no
education-numFeatureIntegerEducation Levelno
marital-statusFeatureCategoricalOtherMarried-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.no
occupationFeatureCategoricalOtherTech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.yes
relationshipFeatureCategoricalOtherWife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.no
raceFeatureCategoricalRaceWhite, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.no
sexFeatureBinarySexFemale, Male.no

Additional Variable Information

Listing of attributes: >50K, <=50K. age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Dataset Files

adult.data3.8 MB
adult.test1.9 MB
adult.names5.1 KB
old.adult.names4.2 KB
Index140 Bytes

Papers Citing this Dataset

The What-If Tool: Interactive Probing of Machine Learning Models

By James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson. 2019

Published in ArXiv.

Paired-Consistency: An Example-Based Model-Agnostic Approach to Fairness Regularization in Machine Learning

By Yair Horesh, Noa Haas, Elhanan Mishraky, Yehezkel Resheff, Shir Lador. 2019

Published in ArXiv.

Distributed generation of privacy preserving data with user customization

By Xiao Chen, Thomas Navidi, Stefano Ermon, Ram Rajagopal. 2019

Published in ArXiv.

Information-Theoretic Privacy through Chaos Synchronization and Optimal Additive Noise

By Carlos Murguia, Iman Shames, Farhad Farokhi, Dragan Nesic. 2019

Published in ArXiv.

Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

By Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Tae, Steven Whang. 2018

Published in IEEE Transactions on Knowledge and Data Engineering.

There are no reviews for this dataset yet.

24 citations



Ron Kohavi


