Dishonest Internet users Dataset
Donated on 3/19/2018
The dataset was used to test an architecture based on a trust model capable to cope with the evaluation of the trustworthiness of users interacting in pervasive environments.
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification, Clustering
Feature Type
-
# Instances
322
# Features
-
Dataset Information
Additional Information
In pervasive computing the interacting users are not able to obtain information about the trustworthiness of each other. Thus, unfair users can act maliciously towards others. The proposed solution enables to evaluate the trustworthiness of each user by monitoring the behavior of each other during their interaction on the network. These behaviors are represented by tuples including significant parameters. Based on these tuples, the architecture combines some artificial intelligence-based technologies to implement a decision making system. The tuples are as follows: eij = <EIDj, CT,CU, LT, TC, TS > where: eij - i-th entity interacting with j-th entity. EIDj - j-th entity Identification CT - Counting Trust. It is used to count how many trustworthy transactions (belonging to a specific context) occur after the last untrustworthy transaction. CU - Counting Un-trust. It is used to count how many untrustworthy transactions (belonging to a specific context) occur after the last trustworthy transaction. LT - Last Time. It is used to take into account of the date at which the last experience in a specific context took place. TC - Transactions Context. It is used to identify the type of transaction, such as game, e-commerce, social network and others. TS - Trust Score. It is the score that an entity gives to another entity at the end of each direct interaction. The data set was obtained by a Java simulator which implemented the proposed architecture. It includes data for the three most popular types of attack, namely: - Counting-based attack. The user tries to gain a good reputation by alternating the honest and dishonest behavior. - Time-based attack. User again tries to gain a good reputation by alternating the honest and dishonest behavior, but acts in different time. - Context-based attack. R tries to gain a good reputation by acting honestly for a type of transaction and dishonestly for another one. Because EIDj parameters are not relevant for the decision-making process, only the following parameters were reported in the dataset: - CT - CU - LT - TC - TS Because, there could be situation in which users have not historical data (tuples) for interacting with another one, it may get data (tuples) from third-parties who previously have had interaction with the inquired user. Nevertheless, the trustworthiness of such third party entities (recommenders) needs to be evaluated also. Indeed, they may act through attacks, such as: Ballot Stuffing (BS), Bad mouthing , and Random opinion (RO). Changing of the TS parameter for a number of rows in the dataset, and in according to a specific attack, allows to obtain different datasets useful for the recommenders trustworthiness evaluation. According to this, the following datasets are also provided: - BM_x%.txt x is the percentage of unfair recommendations obtained by a BM attack. It ranges from 10 to 50. - BS_x%.txt x is the percentage of unfair recommendations obtained by a BS attack. It ranges from 10 to 50. - RO_x%.txt x is the percentage of unfair recommendations obtained by a BM attack. It ranges from 10 to 50.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 5 of 5
Additional Variable Information
1) CT {CT_range_1, CT_range_2, CT_range_3, CT_range_4} 2) CU {CU_range_1, CU_range_2, CU_range_3, CU_range_4} 3) LT {LT_range_1, LT_range_2, LT_range_3, LT_range_4} 4) TC {sport, game, ECommerce, holiday} 5) TS {trustworthy, untrustworthy} The numerical attributes (CT, CU, LT) was discretized. Several of the papers listed below contain detailed descriptions of how these attributes were discretized.
Dataset Files
File | Size |
---|---|
RO_50%.txt | 19.2 KB |
RO_40%.txt | 19.1 KB |
RO_30%.txt | 19 KB |
BM_50%.txt | 18.9 KB |
RO_20%.txt | 18.9 KB |
0 to 5 of 17
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset dishonest_internet_users_dataset = fetch_ucirepo(id=453) # data (as pandas dataframes) X = dishonest_internet_users_dataset.data.features y = dishonest_internet_users_dataset.data.targets # metadata print(dishonest_internet_users_dataset.metadata) # variable information print(dishonest_internet_users_dataset.variables)
D'Angelo, G. (2015). Dishonest Internet users Dataset [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5FP4B.
Creators
Gianni D'Angelo
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.