Gender by Name
Donated on 3/14/2020
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
Dataset Characteristics
Text
Subject Area
Social Science
Associated Tasks
Classification, Clustering
Feature Type
-
# Instances
147270
# Features
4
Dataset Information
Additional Information
This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities: -US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019 -UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018 -Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018 -Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
Name | Feature | Categorical | no | |||
Gender | Feature | Categorical | Gender | no | ||
Count | Feature | Integer | no | |||
Probability | Feature | Continuous | no |
0 to 4 of 4
Additional Variable Information
Name: String Gender: M/F (category/string) Count: Integer Probability: Float
Dataset Files
File | Size |
---|---|
name_gender_dataset.csv | 3.6 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset gender_by_name = fetch_ucirepo(id=591) # data (as pandas dataframes) X = gender_by_name.data.features y = gender_by_name.data.targets # metadata print(gender_by_name.metadata) # variable information print(gender_by_name.variables)
Gender by Name [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C55G7X.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.