NYSK
Donated on 10/10/2013
NYSK (New York v. Strauss-Kahn) is a collection of English news articles about the case relating to allegations of sexual assault against the former IMF director Dominique Strauss-Kahn (May 2011).
Dataset Characteristics
Multivariate, Sequential, Text
Subject Area
Social Science
Associated Tasks
Clustering
Feature Type
-
# Instances
10421
# Features
7
Dataset Information
Additional Information
Documents are first obtained via a Web search using AMIEI: an integrated platform for delivering enterprise intelligence, developed by AMI Software (http://www.amisw.com/en) with the following query: ``dsk'' OR ``strauss-kahn'' OR ``strauss-khan''. NYSK dataset was used to extract topic-sentiment correlation and evolution over time but may be used for other text mining tasks like topic extraction, sentiment analysis, etc.
Has Missing Values?
No
Variable Information
Documents are then filtered and presented in XML format. All XML fields are self explanatory.
Dataset Files
File | Size |
---|---|
nysk.xml | 52.3 MB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset nysk = fetch_ucirepo(id=260) # data (as pandas dataframes) X = nysk.data.features y = nysk.data.targets # metadata print(nysk.metadata) # variable information print(nysk.variables)
Lauf, A., Khouas, L., & Dermouche, M. (2014). NYSK [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56C8K.
Creators
Aurlien Lauf
Leila Khouas
Mohamed Dermouche
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.