Gender by Name

Donated on 3/14/2020

This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.

Dataset Characteristics

Text

Subject Area

Social Science

Associated Tasks

Classification, Clustering

Feature Type

# Instances

147270

# Features

Dataset Information

Additional Information

This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities: -US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019 -UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018 -Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018 -Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019

Has Missing Values?

Variables Table

Variable Name	Role	Type	Demographic	Missing Values
Name	Feature	Categorical		no
Gender	Feature	Categorical	Gender	no
Count	Feature	Integer		no
Probability	Feature	Continuous		no

Rows per page

0 to 4 of 4

Additional Variable Information

Name: String Gender: M/F (category/string) Count: Integer Probability: Float

Dataset Files

File	Size
name_gender_dataset.csv	3.6 MB

Download (3.6 MB)

0 citations

23714 views

DOI

10.24432/C55G7X

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.