Sundanese Twitter Dataset

Donated on 11/26/2021

This dataset contains tweet of the second-largest local language in Indonesia and is used for emotion classification.

Dataset Characteristics

Tabular

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

-

# Instances

2510

# Features

1

Dataset Information

For what purpose was the dataset created?

This dataset is created as contribution for NLP research particularly in Indonesia

Who funded the creation of the dataset?

This dataset is self-funded

What do the instances in this dataset represent?

tweet

Are there recommended data splits?

No

Was there any data preprocessing performed?

tokenization, stopword removal, stemming

Has Missing Values?

No

Introductory Paper

Sundanese Twitter Dataset for Emotion Classification

By Oddy Virgantara Putra; Fathin Muhammad Wasmanson; Triana Harmini; Shoffin Nahwa Utama. 2020

Published in Conference

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
labelTargetCategoricalno
dataFeatureCategoricalno

0 to 2 of 2

Dataset Files

FileSize
data.csv196.9 KB
test.csv35.1 KB
stopwordv1.txt3.9 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (236.2 KB)
1 citations
2882 views

Creators

Oddy Virgantara Putra

oddy@unida.gontor.ac.id

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy