Sundanese Twitter Dataset
Donated on 11/26/2021
This dataset contains tweet of the second-largest local language in Indonesia and is used for emotion classification.
Dataset Characteristics
Tabular
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
2510
# Features
1
Dataset Information
For what purpose was the dataset created?
This dataset is created as contribution for NLP research particularly in Indonesia
Who funded the creation of the dataset?
This dataset is self-funded
What do the instances in this dataset represent?
tweet
Are there recommended data splits?
No
Was there any data preprocessing performed?
tokenization, stopword removal, stemming
Has Missing Values?
No
Introductory Paper
By Oddy Virgantara Putra; Fathin Muhammad Wasmanson; Triana Harmini; Shoffin Nahwa Utama. 2020
Published in Conference
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
label | Target | Categorical | no | ||
data | Feature | Categorical | no |
0 to 2 of 2
Dataset Files
File | Size |
---|---|
data.csv | 196.9 KB |
test.csv | 35.1 KB |
stopwordv1.txt | 3.9 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset sundanese_twitter_dataset = fetch_ucirepo(id=695) # data (as pandas dataframes) X = sundanese_twitter_dataset.data.features y = sundanese_twitter_dataset.data.targets # metadata print(sundanese_twitter_dataset.metadata) # variable information print(sundanese_twitter_dataset.variables)
Putra, O. (2020). Sundanese Twitter Dataset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5MK8C.
Keywords
Creators
Oddy Virgantara Putra
oddy@unida.gontor.ac.id
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.