Query Analytics Workloads Dataset Data Set
Download: Data Folder, Data Set Description
Abstract: The dataset contains three (3) sets of range/radius query workloads from Gaussian distributions over a real dataset; Each query is associated with aggregate scalar values (count/sum/average).


Data Set Characteristics: 
Multivariate 
Number of Instances: 
260000 
Area: 
Computer 
Attribute Characteristics: 
Real 
Number of Attributes: 
8 
Date Donated 
20190622 
Associated Tasks: 
Regression, Clustering 
Missing Values? 
N/A 
Number of Web Hits: 
6568 
Source:
Source: Dr Christos Anagnostopoulos; School of Computing Science, University of Glasgow; email: christos.anagnostopoulos '@' glasgow.ac.uk; G12 8QQ Scotland, UK. (Essence: Pervasive & Distributed Intelligence: http://www.dcs.gla.ac.uk/essence/)
Data Set Information:
The dataset contains three (3) sets of synthetic range and radius query workloads derived from Gaussian distributions over the real dataset in [URL1]. Each processed query is associated with aggregate scalar values (count, sum, average) over the dataset in [URL1].
[URL1]: [Web Link]
Note: the current dataset is processed data derived after synthetic query analytics workloads over the realdataset in [URL1] and does not include any data from [URL1]
Attribute Information:
[*] The dataset 'Radius Queries' contains records of the format: {'Xcoordinate','Ycoordinate', 'RRadius'}. These queries define a disc over a 2D space with center (X,Y) and radius R in order to investigate the number of crime incidents, the total arrests and the average beat of the disc region (spatial area) defined by each query.
[*] The dataset 'Radius Queries Count' contains records of the format: {'Xcoordinate','Ycoordinate', 'RRadius', 'Count'}. These queries define a disc over a 2D space with center (X,Y) and radius R and the number of crime incidents Count of the disc region (spatial area) defined by each query.
[*] The dataset 'Range Queries Aggregates' contains records of the format: {'Xcoordinate','Ycoordinate', 'Xrange', 'Yrange', 'Count', 'SUM', 'AVG'}.
These queries define a rectangle over a 2D space with coordinates/points: X +/ Xrange and Y +/ Yrange. The count, sum, and avg is the number of incidents, total arrests and average beat of the rectangle region (spatial area) defined by each query.
[*] All datasets are .csv
Example of a Range Query with Count, SUM, and AVG:
[1159191.2534425869,1894755.9479944962,5225.375665408865,2981.728430851036,96046.0,34927.0,1111.618901359765]
where:
'Xcoordinate' = 1159191.2534425869,
'Ycoordinate' = 1894755.9479944962,
'Xrange' = 5225.375665408865,
'Yrange' = 2981.728430851036,
'Count' = 96046.0,
'SUM' = 34927,
'AVG' = 1111.618901359765.
Attribute Information:
Attributes:
'ID' = serial number of query (optional)
'Xcoordinate' = spatial xcoordinate (float)
'Ycoordinate' = spatial ycoordinate (float)
'RRadius' = spatial radius of a disc (X,Y) for radius query (float)
'Xrange' = spatial xrange for range query (float)
'Yrange' = spatial yrange for range query (float)
'Count' = number of crime incidents in the 2D disc (radius queries) or rectangle (range queries)
'SUM' = summation of Arrests in the 2D disc (radius queries) or rectangle (range queries)
'AVG' = average Beat in the 2D disc (radius queries) or rectangle (range queries)
Relevant Papers:
[1] Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2018) Explaining Aggregates for Exploratory Analytics. In: IEEE Big Data 2018, Seattle, WA, USA, 1013 Dec 2018
[2] Anagnostopoulos, C. , Savva, F. and Triantafillou, P. (2018) Scalable aggregation predictive analytics: a querydriven machine learning approach. Applied Intelligence, 48(9), pp. 25462567.
Citation Request:
[1] Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2018) Explaining Aggregates for Exploratory Analytics. In: IEEE Big Data 2018, Seattle, WA, USA, 1013 Dec 2018
[2] Anagnostopoulos, C. , Savva, F. and Triantafillou, P. (2018) Scalable aggregation predictive analytics: a querydriven machine learning approach. Applied Intelligence, 48(9), pp. 25462567.
