Roman Urdu Data Set

Donated on 8/28/2018

Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.

Dataset Characteristics

Text

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

# Instances

20000

# Features

Dataset Information

Additional Information

Tagged for Sentiment (Positive, Negative, Neutral)

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Units	Missing Values
					no
					no

Rows per page

0 to 2 of 2

Additional Variable Information

Each record comprises of two string datatype values. One for Comment/Review and the second for sentiment.

Dataset Files

File	Size
Roman Urdu DataSet.csv	1.6 MB

Download (1.6 MB)

0 citations

2797 views

Creators

Zareen Sharf

DOI

10.24432/C58325

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.