Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Gender by Name Data Set
Download: Data Folder, Data Set Description

Abstract: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.

Data Set Characteristics:  

Text

Number of Instances:

147270

Area:

Social

Attribute Characteristics:

N/A

Number of Attributes:

4

Date Donated

2020-03-15

Associated Tasks:

Classification, Clustering

Missing Values?

N/A

Number of Web Hits:

24813


Source:

Dataset creator and donator: Arun Rao, e-mail: hermesfeet '@' gmail.com, Institution: Skydeck, UC Berkeley, Berkeley, CA

Source institutional websites:
-US: https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data
-UK: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/previousReleases
-Canada: https://www2.gov.bc.ca/gov/content/life-events/statistics-reports/bc-s-most-popular-baby-names
-Australia: https://data.gov.au/dataset/ds-sa-9849aa7f-e316-426e-8ab5-74658a62c7e6/details?q=


Data Set Information:

This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities:
-US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019
-UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018
-Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018
-Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019


Attribute Information:

Name: String
Gender: M/F (category/string)
Count: Integer
Probability: Float


Relevant Papers:

Provide references to papers that have cited this data set in the past (if any).



Citation Request:

If you have no special citation requests, please leave this field blank.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML