Gender by Name
Donated on 3/14/2020
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
Dataset Characteristics
Text
Subject Area
Social Science
Associated Tasks
Classification, Clustering
Feature Type
-
# Instances
147270
# Features
4
Dataset Information
Additional Information
This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities: -US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019 -UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018 -Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018 -Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019
Has Missing Values?
No
Variable Information
Name: String Gender: M/F (category/string) Count: Integer Probability: Float
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset gender_by_name = fetch_ucirepo(id=591) # data (as pandas dataframes) X = gender_by_name.data.features y = gender_by_name.data.targets # metadata print(gender_by_name.metadata) # variable information print(gender_by_name.variables)
Gender by Name. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C55G7X.
@misc{misc_gender_by_name_591, title = {{Gender by Name}}, year = {2020}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C55G7X} }
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.