Dishonest Internet users Dataset

Donated on 3/19/2018

The dataset was used to test an architecture based on a trust model capable to cope with the evaluation of the trustworthiness of users interacting in pervasive environments.

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification, Clustering

Feature Type

# Instances

322

# Features

Dataset Information

Additional Information

In pervasive computing the interacting users are not able to obtain information about the trustworthiness of each other. Thus, unfair users can act maliciously towards others. The proposed solution enables to evaluate the trustworthiness of each user by monitoring the behavior of each other during their interaction on the network. These behaviors are represented by tuples including significant parameters. Based on these tuples, the architecture combines some artificial intelligence-based technologies to implement a decision making system. The tuples are as follows: eij = <EIDj, CT,CU, LT, TC, TS > where: eij - i-th entity interacting with j-th entity. EIDj - j-th entity Identification CT - Counting Trust. It is used to count how many trustworthy transactions (belonging to a specific context) occur after the last untrustworthy transaction. CU - Counting Un-trust. It is used to count how many untrustworthy transactions (belonging to a specific context) occur after the last trustworthy transaction. LT - Last Time. It is used to take into account of the date at which the last experience in a specific context took place. TC - Transactions Context. It is used to identify the type of transaction, such as game, e-commerce, social network and others. TS - Trust Score. It is the score that an entity gives to another entity at the end of each direct interaction. The data set was obtained by a Java simulator which implemented the proposed architecture. It includes data for the three most popular types of attack, namely: - Counting-based attack. The user tries to gain a good reputation by alternating the honest and dishonest behavior. - Time-based attack. User again tries to gain a good reputation by alternating the honest and dishonest behavior, but acts in different time. - Context-based attack. R tries to gain a good reputation by acting honestly for a type of transaction and dishonestly for another one. Because EIDj parameters are not relevant for the decision-making process, only the following parameters were reported in the dataset: - CT - CU - LT - TC - TS Because, there could be situation in which users have not historical data (tuples) for interacting with another one, it may get data (tuples) from third-parties who previously have had interaction with the inquired user. Nevertheless, the trustworthiness of such third party entities (recommenders) needs to be evaluated also. Indeed, they may act through attacks, such as: Ballot Stuffing (BS), Bad mouthing , and Random opinion (RO). Changing of the TS parameter for a number of rows in the dataset, and in according to a specific attack, allows to obtain different datasets useful for the recommenders trustworthiness evaluation. According to this, the following datasets are also provided: - BM_x%.txt x is the percentage of unfair recommendations obtained by a BM attack. It ranges from 10 to 50. - BS_x%.txt x is the percentage of unfair recommendations obtained by a BS attack. It ranges from 10 to 50. - RO_x%.txt x is the percentage of unfair recommendations obtained by a BM attack. It ranges from 10 to 50.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Units	Missing Values
					no
					no
					no
					no
					no

Rows per page

0 to 5 of 5

Additional Variable Information

1) CT {CT_range_1, CT_range_2, CT_range_3, CT_range_4} 2) CU {CU_range_1, CU_range_2, CU_range_3, CU_range_4} 3) LT {LT_range_1, LT_range_2, LT_range_3, LT_range_4} 4) TC {sport, game, ECommerce, holiday} 5) TS {trustworthy, untrustworthy} The numerical attributes (CT, CU, LT) was discretized. Several of the papers listed below contain detailed descriptions of how these attributes were discretized.

Dataset Files

File	Size
RO_50%.txt	19.2 KB
RO_40%.txt	19.1 KB
RO_30%.txt	19 KB
BM_50%.txt	18.9 KB
RO_20%.txt	18.9 KB

Rows per page

0 to 5 of 17

Download (10 KB)

0 citations

4288 views

Creators

Gianni D'Angelo

DOI

10.24432/C5FP4B

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.