Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Roman Urdu Data Set Data Set
Download: Data Folder, Data Set Description

Abstract: Roman Urdu (the scripting style for Urdu language) is one of the limited resource languages.A data corpus comprising of more than 20000 records was collected.

Data Set Characteristics:  

Text

Number of Instances:

20000

Area:

Computer

Attribute Characteristics:

N/A

Number of Attributes:

2

Date Donated

2018-08-29

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

5779


Source:

Zareen Sharf, zareensharf76 '@' gmail.com, Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology (SZABIST).


Data Set Information:

Tagged for Sentiment (Positive, Negative, Neutral)


Attribute Information:

Each record comprises of two string datatype values. One for Comment/Review and the second for sentiment.


Relevant Papers:

Sharf, Zareen, and Saif Ur Rahman. 'Lexical normalization of roman Urdu text.' IJCSNS 17.12 (2017): 213.
Sharf, Zareen, and Saif Ur Rahman. “Performing Natural Language Processing On Roman Urdu Datasets.' IJCSNS (January 2018 Volume)



Citation Request:

To be cited whenever accessed or downloaded.


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML