SGEMM GPU kernel performance
Donated on 2/26/2018
Running times for multiplying two 2048 x 2048 matrices using a GPU OpenCL SGEMM kernel with varying parameters (using the library 'CLTune').
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Regression
Feature Type
Integer
# Instances
241600
# Features
-
Dataset Information
Additional Information
This data set measures the running time of a matrix-matrix product A*B = C, where all matrices have size 2048 x 2048, using a parameterizable SGEMM GPU kernel with 241600 possible parameter combinations. For each tested combination, 4 runs were performed and their results are reported as the 4 last columns. All times are measured in milliseconds*. There are 14 parameter, the first 10 are ordinal and can only take up to 4 different powers of two values, and the 4 last variables are binary. Out of 1327104 total parameter combinations, only 241600 are feasible (due to various kernel constraints). This data set contains the results for all these feasible combinations. The experiment was run on a desktop workstation running Ubuntu 16.04 Linux with an Intel Core i5 (3.5GHz), 16GB RAM, and a NVidia Geforce GTX 680 4GB GF580 GTX-1.5GB GPU. We use the 'gemm_fast' kernel from the automatic OpenCL kernel tuning library 'CLTune' (https://github.com/CNugteren/CLTune). * Note: for this kind of data sets it is usually better to work with the logarithm of the running times (see e.g. Falch and Elster, 'Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications', 2015).
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 18
Additional Variable Information
- Independent variables: 1-2. MWG, NWG: per-matrix 2D tiling at workgroup level: {16, 32, 64, 128} (integer) 3. KWG: inner dimension of 2D tiling at workgroup level: {16, 32} (integer) 4-5. MDIMC, NDIMC: local workgroup size: {8, 16, 32} (integer) 6-7. MDIMA, NDIMB: local memory shape: {8, 16, 32} (integer) 8. KWI: kernel loop unrolling factor: {2, 8} (integer) 9-10. VWM, VWN: per-matrix vector widths for loading and storing: {1, 2, 4, 8} (integer) 11-12. STRM, STRN: enable stride for accessing off-chip memory within a single thread: {0, 1} (categorical) 13-14. SA, SB: per-matrix manual caching of the 2D workgroup tile: {0, 1} (categorical) - Output: 15-18. Run1, Run2, Run3, Run4: performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08.
Dataset Files
File | Size |
---|---|
sgemm_product.csv | 13.7 MB |
Readme.txt | 3.9 KB |
__MACOSX/._sgemm_product.csv | 389 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset sgemm_gpu_kernel_performance = fetch_ucirepo(id=440) # data (as pandas dataframes) X = sgemm_gpu_kernel_performance.data.features y = sgemm_gpu_kernel_performance.data.targets # metadata print(sgemm_gpu_kernel_performance.metadata) # variable information print(sgemm_gpu_kernel_performance.variables)
Paredes, E. & Ballester-Ripoll, R. (2017). SGEMM GPU kernel performance [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5MK70.
Creators
Enrique Paredes
Rafael Ballester-Ripoll
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.