Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

Roman Urdu Sentiment Analysis Dataset (RUSAD) Data Set
Download: Data Folder, Data Set Description

Abstract: The dataset was gathered to carry out research on the task of sentiment analysis for Roman Urdu.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Khawar Mehmood (k.mehmood '@', Daryl Essam (d.essam '@', Muhammad Kamran Malik (kamran.malik '@'

Data Set Information:

The dataset has two columns. The first column has the binary categorical information (positive, negative) and the second column has the actual review.

Attribute Information:

There are two attributes of this dataset. The first attribute holds the binary categorical information (positive, negative) while the second attribute holds the actual review.

Relevant Papers:

Provide references to papers that have cited this data set in the past (if any).

Citation Request:

To view, download and use this dataset, please Cite the following papers (related to the dataset) in your research.

(1) Mehmood, Khawar, Daryl Essam, and Kamran Shafi. 'Sentiment analysis system for roman Urdu.' In Science and Information Conference, pp. 29-42. Springer, Cham, 2018.
(2) Mehmood, Khawar, Daryl Essam, Kamran Shafi, and Muhammad Kamran Malik. 'Sentiment Analysis for a Resource Poor Language—Roman Urdu.' ACM Transactions on Asian and Low-Resource Language
Information Processing (TALLIP) 19, no. 1 (2019): 10.
(3) Mehmood, Khawar, Daryl Essam, Kamran Shafi, and Muhammad Kamran Malik. 'Discriminative Feature Spamming Technique for Roman Urdu Sentiment Analysis.' IEEE Access 7 (2019): 47991-48002.
(4) Mehmood, Khawar, Daryl Essam, Kamran Shafi, and Muhammad Kamran Malik. 'An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis.' Information Processing & Management
(2020): 102368.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML