CNAE-9

Donated on 8/2/2012

This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories

Dataset Characteristics

Multivariate, Text

Subject Area

Business

Associated Tasks

Classification

Feature Type

Integer

# Instances

1080

# Features

-

Dataset Information

Additional Information

This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories cataloged in a table called National Classification of Economic Activities (Classificação Nacional de Atividade Econômicas - CNAE). The original texts were pre-processed to obtain the current data set: initially, it was kept only letters and then it was removed prepositions of the texts. Next, the words were transformed to their canonical form. Finally, each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is highly sparse (99.22% of the matrix is filled with zeros).

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
no
no
no
no
no
no
no
no
no
no

0 to 10 of 857

Additional Variable Information

In the data set there are 857 attributes, 1 attributes with the class of instance and 856 with word frequency: 1. category: range 1 - 9 (integer) 2 - 857. word frequency: (integer)

Dataset Files

FileSize
CNAE-9.data1.8 MB
CNAE-9.names2.2 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (1.8 MB)
0 citations
2391 views

Creators

Patrick Ciarelli

Elias Oliveira

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy