Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact


Repository Web            Google
View ALL Data Sets

Nomao Data Set
Download: Data Folder, Data Set Description

Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place. Instances in the dataset compare 2 spots.

Data Set Characteristics:  

Univariate

Number of Instances:

34465

Area:

Computer

Attribute Characteristics:

Real

Number of Attributes:

120

Date Donated

2012-07-04

Associated Tasks:

Classification

Missing Values?

Yes

Number of Web Hits:

13677


Source:

(a) Original owner of database (name / phone / snail address / email address)
Nomao / 00 33 5 62 48 33 90 / 1 avenue Jean Rieux, 31500 Toulouse / challenge '@' nomao.com
(b) Donor of database (name / phone / snail address / email address)
Laurent Candillier / - / 1 avenue Jean Rieux, 31500 Toulouse / laurent '@' nomao.com


Data Set Information:

The dataset has been enriched during the Nomao Challenge:
[Web Link]
organized along with the ALRA workshop (Active Learning in Real-world Applications):
[Web Link]
held at the ECML-PKDD 2012 conference.


Attribute Information:

120 attributes: 89 continuous, 31 nominal (including the attributes 'label' and 'id').


Relevant Papers:

@inproceedings{nomaochallenge-ecml,
author={Laurent Candillier and Vincent Lemaire},
title={Design and Analysis of the Nomao Challenge - Active Learning in the Real-World},
booktitle={Proceedings of the ALRA : Active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK},
year = 2012,
pages={to appear}
}



Citation Request:

Thanks to Nomao Labs for opening its data:
[Web Link]


Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML