Gender Gap in Spanish WP
Donated on 8/13/2023
Data set used to estimate the number of women editors and their editing practices in the Spanish Wikipedia
Dataset Characteristics
Multivariate
Subject Area
Social Science
Associated Tasks
Classification
Feature Type
Real, Integer
# Instances
4746
# Features
18
Dataset Information
Has Missing Values?
No
Introductory Paper
By J. Minguillón, J. Meneses, E. Aibar, Núria Ferran-Ferrer, Sergi Fàbregues. 2021
Published in PLoS ONE
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
gender | Target | Categorical | Gender | 0 (unknown), 1 (male), 2 (female) | no | |
C_api | Target | Categorical | Gender | gender extracted from WikiMedia API, codes as female / male / unknown | no | |
C_man | Target | Integer | Gender | gender extracted from content coding, coded as 1 (male) / 2 (female) / 3 (unknown) | no | |
E_NEds | Feature | Integer | I index of stratum IJ (0,1,2,3) | no | ||
E_Bpag | Feature | Integer | J index of stratum IJ (0,1,2,3) | no | ||
firstDay | Feature | Date | first edition in the Spanish Wikipedia (YYYYMMDDHHMMSS) | no | ||
lastDay | Feature | Date | last edition in the Spanish Wikipedia (YYYYMMDDHHMMSS) | no | ||
NEds | Feature | Integer | total number of editions | no | ||
NDays | Feature | Integer | number of days (lastDay-firstDay+1) | no | ||
NActDays | Feature | Integer | number of days with editions | no |
0 to 10 of 21
Additional Variable Information
gender: 0 (unknown), 1 (male), 2 (female) C_api: gender extracted from WikiMedia API, codes as female / male / unknown C_man: gender extracted from content coding, coded as 1 (male) / 2 (female) / 3 (unknown) E_NEds: I index of stratum IJ (0,1,2,3) E_Bpag: J index of stratum IJ (0,1,2,3) firstDay: first edition in the Spanish Wikipedia (YYYYMMDDHHMMSS) lastDay: last edition in the Spanish Wikipedia (YYYYMMDDHHMMSS) NEds: total number of editions NDays: number of days (lastDay-firstDay+1) NActDays: number of days with editions NPages: number of different pages edited NPcreated: number of pages created pagesWomen: number of edits in pages related to women wikiprojWomen: number of edits in WikiProjects related to women ns_user: number of edits in namespace user ns_wikipedia: number of edits in namespace wikipedia ns_talk: number of edits in namespace talk ns_userTalk: number of edits in namespace user talk ns_content: number of edits in content pages weightIJ: correcting weight for stratum IJ NIJ: number of elements in stratum IJ
Dataset Files
File | Size |
---|---|
data.csv | 474.3 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset gender_gap_in_spanish_wp = fetch_ucirepo(id=852) # data (as pandas dataframes) X = gender_gap_in_spanish_wp.data.features y = gender_gap_in_spanish_wp.data.targets # metadata print(gender_gap_in_spanish_wp.metadata) # variable information print(gender_gap_in_spanish_wp.variables)
Minguillón, J., Meneses, J., Aibar, E., Ferran-Ferrer, N., & Fàbregues, S. (2021). Gender Gap in Spanish WP [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XP52.
Creators
J. Minguillón
jminguillona@uoc.edu
Universitat Oberta de Catalunya
J. Meneses
E. Aibar
Núria Ferran-Ferrer
Sergi Fàbregues
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.