GitHub MUSAE
Donated on 10/6/2019
A social network of GitHub users with user-level attributes, connectivity data and a binary target variable.
Dataset Characteristics
Multivariate
Subject Area
Social Science
Associated Tasks
Classification
Feature Type
-
# Instances
37700
# Features
4006
Dataset Information
Additional Information
A large social network of GitHub developers which was collected from the public API in June 2019. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships between them. The vertex features are extracted based on the location, repositories starred, employer and e-mail address. The task related to the graph is binary node classification - one has to predict whether the GitHub user is a web or a machine learning developer. This target feature was derived from the job title of each user.
Has Missing Values?
No
Variable Information
Attributes are binary indicators extracted based on the location, repositories starred, employer and e-mail address.
Dataset Files
File | Size |
---|---|
git_web_ml/musae_git_features.json | 4.2 MB |
git_web_ml/musae_git_edges.csv | 3.2 MB |
git_web_ml/musae_git_target.csv | 660.7 KB |
git_web_ml/README.txt | 881 Bytes |
git_web_ml/citing.txt | 485 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset github_musae = fetch_ucirepo(id=588) # data (as pandas dataframes) X = github_musae.data.features y = github_musae.data.targets # metadata print(github_musae.metadata) # variable information print(github_musae.variables)
GitHub MUSAE [Dataset]. (2019). UCI Machine Learning Repository. https://doi.org/10.24432/C5Z02B.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.