Nomao Data Set
Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place. Instances in the dataset compare 2 spots.

(a) Original owner of database (name / phone / snail address / email address)
Nomao / 00 33 5 62 48 33 90 / 1 avenue Jean Rieux, 31500 Toulouse / challenge '@'
(b) Donor of database (name / phone / snail address / email address)
Laurent Candillier / - / 1 avenue Jean Rieux, 31500 Toulouse / laurent '@'

Data Set Information:

The dataset has been enriched during the Nomao Challenge:
[Web Link]
organized along with the ALRA workshop (Active Learning in Real-world Applications):
[Web Link]
held at the ECML-PKDD 2012 conference.

Attribute Information:

120 attributes: 89 continuous, 31 nominal (including the attributes 'label' and 'id').

Relevant Papers:

author={Laurent Candillier and Vincent Lemaire},
title={Design and Analysis of the Nomao Challenge - Active Learning in the Real-World},
booktitle={Proceedings of the ALRA : Active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK},
year = 2012,
pages={to appear}

Citation Request:

Thanks to Nomao Labs for opening its data:
[Web Link]

