Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Gender by Name Data Set
Download: Data Folder, Data Set Description

Abstract: This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:

Classification, Clustering

Missing Values?


Number of Web Hits:



Dataset creator and donator: Arun Rao, e-mail: hermesfeet '@', Institution: Skydeck, UC Berkeley, Berkeley, CA

Source institutional websites:

Data Set Information:

This dataset combines raw counts for first/given names of male and female babies in those time periods, and then calculates a probability for a name given the aggregate count. Source datasets are from government authorities:
-US: Baby Names from Social Security Card Applications - National Data, 1880 to 2019
-UK: Baby names in England and Wales Statistical bulletins, 2011 to 2018
-Canada: British Columbia 100 Years of Popular Baby names, 1918 to 2018
-Australia: Popular Baby Names, Attorney-General's Department, 1944 to 2019

Attribute Information:

Name: String
Gender: M/F (category/string)
Count: Integer
Probability: Float

Relevant Papers:

Provide references to papers that have cited this data set in the past (if any).

Citation Request:

If you have no special citation requests, please leave this field blank.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML