Spambase Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Spambase data set page.
Don R. Hush and Clint Scovel and Ingo Steinwart. Los Alamos National Laboratory Stability of Unstable Learning Algorithms. Modeling, Algorithms and Informatics Group, CCS3. 2003.
are synthetically generated according to Fukunaga's socalled I4I and I# distributions (Fukunaga, 1990), and the third is the Spambase data set from the UCI repository (Blake & Merz, 1998). For the synthetic data sets we set d = 8 and generate samples from R d # f 1; 1g according to the I4I and I# distributions. For both distributions the
Yongmei Wang and Ian H. Witten. Modeling for Optimal Probability Prediction. ICML. 2002.
runs of tenfold crossvalidation. According to both the negative loglikelihood and the classification rate, the estimator New provides either the best or nearly the best results for six of the datasets. For the other two Spambase and WDBC), its results are intermediate and comparable with other estimators. Along with this, it also reduces the model dimensionality, which the MLE can never do. 5.
C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.
was to determine what attributes were most meaningful in determining what the block contained. The most important attributes turned out to be the size and shape of the block. Figure 4.9: Page block data set in museum environment. 34 4.10 Spambase The spambase data set was analysed by Sean Kelly. This dataset contains roughly 4000 instances of 58 attributes each, representing email messages. One
Christos Dimitrakakis and Samy Bengioy. Online Policy Adaptation for Ensemble Classifiers. IDIAP.
2.72% 3.10% 2.80% 2.69% 8.33% 6.48% 7.75% 7.41% 56.1% 61.9% 68.1% 48.3% Table 1: Classification error on the UCI breast, forest, heart, ionosphere, letter, optdigits, pendigits, spambase and vowel datasets using 32 experts. 7 times out of 9 respectively. For each dataset we have also calculated the cumulative margin distribution resulting from equation (1). For the RL mixture there was a constant
