Educational Process Mining (EPM): A Learning Analytics Data Set
Donated on 9/23/2015
Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator.
Dataset Characteristics
Multivariate, Sequential, Time-Series
Subject Area
Computer Science
Associated Tasks
Classification, Regression, Clustering
Feature Type
Integer
# Instances
230318
# Features
-
Dataset Information
Additional Information
The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa. We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students, and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at: http://www.esng.dibe.unige.it/deeds/ and to know more about the exercises contents of each session see 'exercises_info.txt'. Our data set contains the students' time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the students’ data per session. Each 'Session' folder contains up to 99 CSV files each dedicated to a specific student log during that session. The number of files in each folder changes due to the number of students present in each session. Each file contains 13 features. See 'features_info.txt' for more details. For the details of activities performed by the students during the course, see 'activities_info.txt' The data set includes the following files: ========================================= - 'README.txt' - 'features_info.txt': contains information about the variables used on the feature vector. - 'features.txt': List of all features. - 'activities_info.txt': contains information about the variable 'activity'. - 'activities.txt': list of all activities. - 'exercises_info.txt': contains information about the variable 'exercise'. - 'grades_info.txt': contains information about the grade data. Data: ====== - 'Processes': contains the data files from Session 1 to 6. - 'logs.txt': shows information about the log data per student Id. It shows whether a student has a log in each session (0: has no log, 1: has log). - 'final_grades.xlsx': contains the results of the final exam in two sheets. - 'intermediate_grades.xlsx': contains the grades for the students' assignments per session. - 'final_exam.pdf': shows the content of the final exam (original in Italian). - 'final_exam_ENG.pdf': shows the content of the final exam translated in English. Notes: ====== For more information about this data set please look at: www.la.smartlab.ws la '@' smartlab.ws
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 13
Additional Variable Information
The features selected for this data set come from pre-processing of data collected through a logging program. Due to ethical reasons and to ensure the anonymity of our users, we cannot share the original log files, instead, we share the data transformed and cleaned in an appropriate format. The original logs contain the logging data of client system per approximately a second, while the features are calculated in order to be allocated to a particular activity. The features are selected and presented in a suitable format for Process Mining. In this sense, the data is presented per session, per student, and per exercise. Each CSV file belongs to a specific session and a specific student (named by the student Id). Each file contains several exercises of that session presented in 'exercise' feature. Each 'exercise' contains activities, which start-time, end-time, and other features are allocated to that. For further information about each feature, see 'features_info.txt'.
Dataset Files
-
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset educational_process_mining_epm_a_learning_analytics_data_set = fetch_ucirepo(id=346) # data (as pandas dataframes) X = educational_process_mining_epm_a_learning_analytics_data_set.data.features y = educational_process_mining_epm_a_learning_analytics_data_set.data.targets # metadata print(educational_process_mining_epm_a_learning_analytics_data_set.metadata) # variable information print(educational_process_mining_epm_a_learning_analytics_data_set.variables)
Vahdat, M., Oneto, L., Anguita, D., Funk, M., & Rauterberg, M. (2015). Educational Process Mining (EPM): A Learning Analytics Data Set [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5NP5K.
Creators
Mehrnoosh Vahdat
Luca Oneto
Davide Anguita
Mathias Funk
Matthias Rauterberg
DOI
Notes
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.