Nomao Data Set
Download: Data Folder, Data Set Description
Abstract: Nomao collects data about places (name, phone, localization...) from many sources.
Deduplication consists in detecting what data refer to the same place.
Instances in the dataset compare 2 spots.
|
|
Data Set Characteristics: |
Univariate |
Number of Instances: |
34465 |
Area: |
Computer |
Attribute Characteristics: |
Real |
Number of Attributes: |
120 |
Date Donated |
2012-07-04 |
Associated Tasks: |
Classification |
Missing Values? |
Yes |
Number of Web Hits: |
61301 |
Source:
(a) Original owner of database (name / phone / snail address / email address)
Nomao / 00 33 5 62 48 33 90 / 1 avenue Jean Rieux, 31500 Toulouse / challenge '@' nomao.com
(b) Donor of database (name / phone / snail address / email address)
Laurent Candillier / - / 1 avenue Jean Rieux, 31500 Toulouse / laurent '@' nomao.com
Data Set Information:
The dataset has been enriched during the Nomao Challenge:
[Web Link]
organized along with the ALRA workshop (Active Learning in Real-world Applications):
[Web Link]
held at the ECML-PKDD 2012 conference.
Attribute Information:
120 attributes: 89 continuous, 31 nominal (including the attributes 'label' and 'id').
Relevant Papers:
@inproceedings{nomaochallenge-ecml,
author={Laurent Candillier and Vincent Lemaire},
title={Design and Analysis of the Nomao Challenge - Active Learning in the Real-World},
booktitle={Proceedings of the ALRA : Active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK},
year = 2012,
pages={to appear}
}
Citation Request:
Thanks to Nomao Labs for opening its data:
[Web Link]
|