Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Roman Urdu Data Set Data Set
Download: Data Folder, Data Set Description

Abstract: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.

Data Set Characteristics:  

Text

Number of Instances:

20000

Area:

Computer

Attribute Characteristics:

N/A

Number of Attributes:

2

Date Donated

2018-08-29

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

25813


Source:

Zareen Sharf, zareensharf76 '@' gmail.com, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST).


Data Set Information:

Tagged for Sentiment (Positive, Negative, Neutral)


Attribute Information:

Each record comprises of two string datatype values. One for Comment/Review and the second for sentiment.


Relevant Papers:

Sharf, Zareen, and Saif Ur Rahman. 'Lexical normalization of roman Urdu text.' IJCSNS 17.12 (2017): 213.
Sharf, Zareen, and Saif Ur Rahman. “Performing Natural Language Processing On Roman Urdu Datasets.' IJCSNS (January 2018 Volume)



Citation Request:

To be cited whenever accessed or downloaded.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML