Educational Process Mining (EPM): A Learning Analytics Data Set

Donated on 9/23/2015

Educational Process Mining data set is built from the recordings of 115 subjects' activities through a logging application while learning with an educational simulator.

Dataset Characteristics

Multivariate, Sequential, Time-Series

Subject Area

Computer Science

Associated Tasks

Classification, Regression, Clustering

Feature Type


# Instances


# Features


Dataset Information

Additional Information

The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa. We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students, and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at: and to know more about the exercises contents of each session see 'exercises_info.txt'. Our data set contains the students' time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the students’ data per session. Each 'Session' folder contains up to 99 CSV files each dedicated to a specific student log during that session. The number of files in each folder changes due to the number of students present in each session. Each file contains 13 features. See 'features_info.txt' for more details. For the details of activities performed by the students during the course, see 'activities_info.txt' The data set includes the following files: ========================================= - 'README.txt' - 'features_info.txt': contains information about the variables used on the feature vector. - 'features.txt': List of all features. - 'activities_info.txt': contains information about the variable 'activity'. - 'activities.txt': list of all activities. - 'exercises_info.txt': contains information about the variable 'exercise'. - 'grades_info.txt': contains information about the grade data. Data: ====== - 'Processes': contains the data files from Session 1 to 6. - 'logs.txt': shows information about the log data per student Id. It shows whether a student has a log in each session (0: has no log, 1: has log). - 'final_grades.xlsx': contains the results of the final exam in two sheets. - 'intermediate_grades.xlsx': contains the grades for the students' assignments per session. - 'final_exam.pdf': shows the content of the final exam (original in Italian). - 'final_exam_ENG.pdf': shows the content of the final exam translated in English. Notes: ====== For more information about this data set please look at: la '@'

Has Missing Values?


Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values

0 to 10 of 13

Additional Variable Information

The features selected for this data set come from pre-processing of data collected through a logging program. Due to ethical reasons and to ensure the anonymity of our users, we cannot share the original log files, instead, we share the data transformed and cleaned in an appropriate format. The original logs contain the logging data of client system per approximately a second, while the features are calculated in order to be allocated to a particular activity. The features are selected and presented in a suitable format for Process Mining. In this sense, the data is presented per session, per student, and per exercise. Each CSV file belongs to a specific session and a specific student (named by the student Id). Each file contains several exercises of that session presented in 'exercise' feature. Each 'exercise' contains activities, which start-time, end-time, and other features are allocated to that. For further information about each feature, see 'features_info.txt'.

0 citations


Mehrnoosh Vahdat

Luca Oneto

Davide Anguita

Mathias Funk

Matthias Rauterberg


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy