Paper Reviews
Donated on 10/22/2017
This sentiment analysis data set contains scientific paper reviews from an international conference on computing and informatics. The task is to predict the orientation or the evaluation of a review.
Dataset Characteristics
Text
Subject Area
Computer Science
Associated Tasks
Classification, Regression
Feature Type
Integer
# Instances
405
# Features
10
Dataset Information
Additional Information
The data set consists of paper reviews sent to an international conference mostly in Spanish (some are in English). It has a total of N = 405 instances evaluated with a 5-point scale ('-2': very negative, '-1': negative, '0': neutral, '1': positive, '2': very positive), expressing the reviewer's opinion about the paper and the orientation perceived by a reader who does not know the reviewer's evaluation (more details in the attributes' section). The distribution of the original scores is more uniform in comparison to the revised scores. This difference is assumed to come from a discrepancy between the way the paper is evaluated and the way the review is written by the original reviewer. The data set is stored in JSON format, the structure is as follows: Paper: { papers have an associated timespan and a paper ID, each paper contains some reviews. The reviews have their own ID, the review text, the remarks (which can be empty), the language of the review, its orientation and evaluation. Some relevant statistics (excluding reviews in English and empty reviews): - Number of words: Min: 3 Max: 530 Avg: 88.64 Stdev: 69.76 - Number of sentences: Min: 1 Max: 47 Avg: 8.91 Stdev: 7.54
Has Missing Values?
Yes
Introductory Paper
By Brian Keith, C. Meneses. 2017
Published in ICRITO
Variable Information
1. Timespan (datetime): A date associated with the year of conference, which in turn corresponds with the time the review was written. The data set includes four years of reviews worth of conferences. 2. Paper ID (integer): This number identifies each individual paper from a given conference. The data set has 172 different papers. 3. Preliminary decision (label): The preliminary decision of acceptance or rejection of a paper taken by the conference committee. 4. Review ID (integer: A serial number identifier for each review as a correlative with respect to each individual paper. (e.g. the second review of some paper would correspond to the number $2$). The data set has a total of 405 reviews. Most papers have 2 reviews each. 5. Text (text): Comments and detailed review of the paper. This is read by the authors and the editing commission of the conference. The editors determine if the paper should be published or not depending on the reviews. There are $6$ instances of empty reviews. 6. Remarks (text): Additional comments that can be read only by the editing commission of the conference. This is used in conjunction with the previous attribute to determine if the paper should be published. This is an optional attribute. Whenever it is possible it is concatenated at the end of the main body of the review. Some reviews do not have remarks, this is indicated with an empty string ''. 7. Language (text): Language corresponding to the review (it may be English or Spanish). In this case the majority of the reviews are in Spanish, with only $17$ instances of English reviews. 8. Orientation (integer from -2 to 2): Review classification defined by the authors of this study, according to the 5-point scale previously described, obtained through the authors' systematic judgement of each review. This attribute represents the subjective perception of each review (i.e. how negative or positive the review is perceived when someone reads it). 9. Evaluation (integer from -2 to 2): Review classification as defined by the reviewer, according to the 5-point scale previously described. This attribute represents the real evaluation given to the paper, as determined by the reviewers. 10. Confidence (integer from 1 to 5): Value describing the confidence of the reviewer, a higher value denotes more confidence, while a lower value indicates less confidence.
Dataset Files
File | Size |
---|---|
reviews.json | 579.7 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset paper_reviews = fetch_ucirepo(id=410) # data (as pandas dataframes) X = paper_reviews.data.features y = paper_reviews.data.targets # metadata print(paper_reviews.metadata) # variable information print(paper_reviews.variables)
Keith, B. (2017). Paper Reviews [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C50G60.
Creators
Brian Keith
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.