Gender by Name Data Set
Download: Data Folder, Data Set Description
Abstract: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.
|
|
Data Set Characteristics: |
Text |
Number of Instances: |
147270 |
Area: |
Social |
Attribute Characteristics: |
N/A |
Number of Attributes: |
4 |
Date Donated |
2020-03-15 |
Associated Tasks: |
Classification, Clustering |
Missing Values? |
N/A |
Number of Web Hits: |
24813 |
Source:
Dataset creator and donator: Arun Rao, e-mail: hermesfeet '@' gmail.com, Institution: Skydeck, UC Berkeley, Berkeley, CA
Source institutional websites:
-US: https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data
-UK: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases
-Canada: https://www2.gov.bc.ca/gov/content/life-events/statistics-reports/bc-s-most-popular-baby-names
-Australia: https://data.gov.au/dataset/ds-sa-9849aa7f-e316-426e-8ab5-74658a62c7e6/details?q=
Data Set Information:
This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities:
-US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019
-UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018
-Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018
-Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019
Attribute Information:
Name: String
Gender: M/F (category/string)
Count: Integer
Probability: Float
Relevant Papers:
Provide references to papers that have cited this data set in the past (if any).
Citation Request:
If you have no special citation requests, please leave this field blank.
|