Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Heterogeneity Activity Recognition Data Set
Download: Data Folder, Data Set Description

Abstract: The Heterogeneity Human Activity Recognition (HHAR) dataset from Smartphones and Smartwatches is a dataset devised to benchmark human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc.) in real-world contexts; specifically, the dataset is gathered with a variety of different device models and use-scenarios, in order to reflect sensing heterogeneities to be expected in real deployments.

Data Set Characteristics:  

Multivariate, Time-Series

Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:

Classification, Clustering

Missing Values?


Number of Web Hits:



Allan Stisen, allans, '@', Aarhus University, Denmark
Henrik Blunck, blunck '@', Aarhus University, Denmark
Sourav Bhattacharya, sourav.bhattacharya '@', Bell Laboratories, Dublin, Ireland
Thor Siiger Prentow, prentow '@', Aarhus University, Denmark
Mikkel Baun Kjærgaard, mikkelbk '@', Aarhus University, Denmark
Anind Dey, anind '@', Carnegie Mellon University, USA
Tobias Sonne,tsonne '@', Aarhus University, Denmark
Mads Møller Jensen, mmjensen '@' , Aarhus University, Denmark

Data Set Information:

The Heterogeneity Dataset for Human Activity Recognition from Smartphone and Smartwatch sensors consists of two datasets devised to investigate sensor heterogeneities' impacts on human activity recognition algorithms (classification, automatic data segmentation, sensor fusion, feature extraction, etc). The datasets were used for the results and analyses produced in [1].
Activity recognition data set

The dataset contains the readings of two motion sensors commonly found in smartphones. Reading were recorded while users executed activities scripted in no specific order carrying smartwatches and smartphones.
Activities: ‘Biking’, ‘Sitting’, ‘Standing’, ‘Walking’, ‘Stair Up’ and ‘Stair down’.
Sensors: Sensors: Two embedded sensors, i.e., Accelerometer and Gyroscope, sampled at the highest frequency the respective device allows.
Devices: 4 smartwatches (2 LG watches, 2 Samsung Galaxy Gears)
8 smartphones (2 Samsung Galaxy S3 mini, 2 Samsung Galaxy S3, 2 LG Nexus 4, 2 Samsung Galaxy S+)
Recordings: 9 users

Recording scenario

The activity recognition environment and scenario has been designed to generate many activity primitives, yet in a realistic manner. Users took 2 different routes for the biking and walking, and 2 different set of stairs were used for the stairs up and down.

Still experiment data set

Accelerometer recordings as above but with devices lying still, in 6 different orientations. Devices used comprise 31 smartphones, 4 smartwatches and 1 tablet, representing 13 different models from 4 manufacturers, running variants of Android and iOS.

Attribute Information:

Activity recognition data set
accelerometer Samples ------------
The Phones_accelerometer.csv contains all smartphone accelerometer samples from all devices and users.
The csv file consist of the following columns:
'Index', 'Arrival_Time', 'Creation_Time', 'x', 'y', 'z', 'User', 'Model', 'Device', 'gt'

All samples from all the experiments is a row in the file containing each column value.

------------- Groundtruths --------------------

The null class is defined as null in the gt (groundtruth) column, whereas the rest of the classes can be seen in the column.

------------- Devices --------------------------
the phones from the still experiment which has been used for activity recognition is the following:
‘it-116', 'it-133', 'it-108', 'it-103','it-123','3Renault-AH', 'no-name/LG-Nexus4','G-Watch'

The device numbering used in the data set is:
LG-Nexus 4
Saumsung Galaxy S3
Samsung Galaxy S3 min:
Samsung Galaxy S+:

Still experiment data set
This is the Heterogeneity Dataset for Human Activity Recognition, and contains all the samples
from the static still experiment. Where the phones where place in the 6 different possible orientation.
The data set is structured in the following way:

------------- Static Accelerometer Samples ------------
Each specific device is located in the following way: Orientation/[Web Link]
Where the 6 different orientations can be either one of the following:

For example to get the samples from the device named 3Renault-AH of the model Samsung-Galaxy-S3 Mini when laying static on the back we get the following structure:
Phoneonback/3Renault-AH/Samsung-Galaxy-S3 Mini.csv.

Each CSV file consist of 6 columns creation time, sensor time,arrival time,x,y,z.
The six axes from the accelerometer is the x,y,z columns.

Relevant Papers:

[1] Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen "Smart Devices are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition" In Proc. 13th ACM Conference on Embedded Networked Sensor Systems (SenSys 2015), Seoul, Korea, 2015. [Web Link]

Citation Request:

Use of this dataset in publications should be acknowledged by referencing publication [1].
We recommend to refer to this dataset as the "Heterogeneity Human Activity Recognition Dataset" or HHAR for short in publications.
We also appreciate if you drop us an email (allans '@' or blunck ‘@’ to inform us of any publication using this dataset or if you have further question about the dataset and how to make use of it.
Reference [1] details the dataset, recording scenarios, multimodality and sensor aspects of the setup as well as quality metrics for evaluating heterogeneities and their impact on HAR.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML