Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

GitHub MUSAE Data Set
Download: Data Folder, Data Set Description

Abstract: A social network of GitHub users with user-level attributes, connectivity data and a binary target variable.

Data Set Characteristics:  

Multivariate

Number of Instances:

37700

Area:

Social

Attribute Characteristics:

N/A

Number of Attributes:

4006

Date Donated

2019-10-07

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

5221


Source:

Benedek Rozemberczki
The University of Edinburgh
United Kingdom
benedek.rozemberczki '@' gmail.com
https://github.com/benedekrozemberczki


Data Set Information:

A large social network of GitHub developers which was collected from the public API in June 2019. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships between them. The vertex features are extracted based on the location, repositories starred, employer and e-mail address. The task related to the graph is binary node classification - one has to predict whether the GitHub user is a web or a machine learning developer. This target feature was derived from the job title of each user.


Attribute Information:

Attributes are binary indicators extracted based on the location, repositories starred, employer and e-mail address.


Relevant Papers:

[Web Link]



Citation Request:

@misc{rozemberczki2019multiscale,
title = {Multi-scale Attributed Node Embedding},
author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},
year = {2019},
eprint = {1909.13021},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML