GitHub MUSAE

Donated on 10/6/2019

A social network of GitHub users with user-level attributes, connectivity data and a binary target variable.

Dataset Characteristics

Multivariate

Subject Area

Social Science

Associated Tasks

Classification

Feature Type

-

# Instances

37700

# Features

4006

Dataset Information

Additional Information

A large social network of GitHub developers which was collected from the public API in June 2019. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships between them. The vertex features are extracted based on the location, repositories starred, employer and e-mail address. The task related to the graph is binary node classification - one has to predict whether the GitHub user is a web or a machine learning developer. This target feature was derived from the job title of each user.

Has Missing Values?

No

Variable Information

Attributes are binary indicators extracted based on the location, repositories starred, employer and e-mail address.

Dataset Files

FileSize
git_web_ml/musae_git_features.json4.2 MB
git_web_ml/musae_git_edges.csv3.2 MB
git_web_ml/musae_git_target.csv660.7 KB
git_web_ml/README.txt881 Bytes
git_web_ml/citing.txt485 Bytes

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (2.3 MB)
0 citations
1607 views

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy