Roman Urdu Data Set
Donated on 8/28/2018
Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
-
# Instances
20000
# Features
-
Dataset Information
Additional Information
Tagged for Sentiment (Positive, Negative, Neutral)
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no |
0 to 2 of 2
Additional Variable Information
Each record comprises of two string datatype values. One for Comment/Review and the second for sentiment.
Dataset Files
File | Size |
---|---|
Roman Urdu DataSet.csv | 1.6 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset roman_urdu_data_set = fetch_ucirepo(id=458) # data (as pandas dataframes) X = roman_urdu_data_set.data.features y = roman_urdu_data_set.data.targets # metadata print(roman_urdu_data_set.metadata) # variable information print(roman_urdu_data_set.variables)
Sharf, Z. (2017). Roman Urdu Data Set [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C58325.
Creators
Zareen Sharf
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.