Abalone Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Abalone data set page.
Ilhan Uysal and H. Altay Guvenir. InstanceBased Regression by Partitioning Feature Projections. Applied. 2004.
are shown in Table 2. In order to save space, they are coded with two letters (e.g., AB for Abalone . 72 Uysal and G¨ uvenir Table 2. Datasets. Features Missing Target Dataset Code Instances (C + N)values feature Abalone [6] AB 4177 8 (7 + 1) None Rings Airport [21] AI 135 4 (4 + 0) None Tons of mail Auto [6] AU 398 7 (6 + 1) 6 Gas
Jianbin Tan and David L. Dowe. MML Inference of Decision Graphs with Multiway Joins and Dynamic Attributes. Australian Conference on Artificial Intelligence. 2003.
#6.7 5.2 Discussions of above test results Tables 2 and 3 clearly show the decision graph with dynamic attributes to always be either outright first or (sometimes) equal first. When testing on the data sets with disjunctions (like abalone scale, tictactoe and XD6), decision graph with dynamic attributes has a much lower error rate. On other data sets, it returns results not worse than those from
Edward Snelson and Carl Edward Rasmussen and Zoubin Ghahramani. Warped Gaussian Processes. NIPS. 2003.
D t min t max N train N test creep 30 18 MPa 530 MPa 800 1266 abalone 8 1 yr 29 yrs 1000 3177 ailerons 4010Dataset Model Absolute error Squared errorcreep GP 16.4 654 4.46 GP + log 15.6 587 4.24 warped GP 15.0 554 4.19 abalone GP 1.53 4.79 2.19 GP + log 1.48 4.62 2.01 warped GP 1.47 4.63 1.96 ailerons GP 1.23 ×
Anton Schwaighofer and Volker Tresp. Transductive and Inductive Methods for Approximate Gaussian Process Regression. NIPS. 2002.
costs for predictions show the cost per test point. 4 Experimental Comparison In this section we will present a comparison of the different approximation methods discussed in Sec. 3. In the ABALONE data set [1] with 4177 examples, the goal is to predict the age of Abalones based on 8 inputs. The KIN8NM data set 2 represents the forward dynamics of an 8 link allrevolute robot arm, based on 8192
Alexander G. Gray and Bernd Fischer and Johann Schumann and Wray L. Buntine. Automatic Derivation of Statistical Algorithms: The EM Family and Beyond. NIPS. 2002.
extension of our running example, integrating several features, yields a Gaussian Bayes classifier model # # . # # has been successfully tested on various standard benchmarks [1], e.g., the Abalone dataset. Currently, the number of expected classes has to be given in advance. Mixture models and EM. A wide range of # Gaussian mixture models can be handled by AUTOBAYES, ranging from the simple 1D ( # #
Christopher K I Williams and Carl Edward Rasmussen and Anton Schwaighofer and Volker Tresp. Observations on the Nystrom Method for Gaussian Process Prediction. Division of Informatics Gatsby Computational Neuroscience Unit University of Edinburgh University College London. 2002.
the drop at index 100 for the Nystrom case, due to the rank100 approximation. The horizontial line in the dashed plot is at log e # 2 # . The Nystrom method was originally tested on the UCI abalone data set using # 2 # = 0:05. Analysis shows that for the kernel parameters used, 112 eigenvalues in K were larger than # 2 # . This is in good agreement with the experimental results that values of m of 250
Marc Sebban and Richard Nock and Stéphane Lallich. Stopping Criterion for BoostingBased Data Reduction Techniques: from Binary to Multiclass Problem. Journal of Machine Learning Research, 3. 2002.
In order to assess the relevance of our multiclass statistical test, we 879 SEBBAN, NOCK AND LALLICH DATASET # CLASSES LS # FEATURES WAVES 3 500 21 ABALONE 3 1000 8 GLASS 6 214 9 BALANCE 3 625 4 IRIS 3 150 4 LED 10 500 7 LED+17 10 500 24 DERMATOLOGY 6 366 34 Table 4: Multiclass classification problems. 1
Shai Fine and Katya Scheinberg. Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM. NIPS. 2001.
bias is most likely far away from the global solution, and as such, the results presented here should be regarded as a lower bound. We examined performances on a moderate size problem, the Abalone data set from the UCI Repository [2]. We fed the training algorithm with increasing subsets up to the whole set (of size 4177). The gender encoding (male/female/infant) was mapped into
Matthew Mullin and Rahul Sukthankar. Complete CrossValidation for Nearest Neighbor Classifiers. ICML. 2000.
from the UCI repository (Blake & Merz, 1998). Synthetic consists of 4 Gaussians with equal variance and significant overlap, and Bayes Error # 0.504. 6 Abalone 6 The Bayes error for the Synthetic data set was estimated using a MonteCarlo simulation with 60000 samples. is a 29 class problem, however many of the classes have only very few instances. Abalone2 and Abalone3 are twoand threeclass
Nir Friedman and Iftach Nachman. Gaussian Process Networks. UAI. 2000.
contains 506 samples with 14 attributes. 300 samples were used as a test set. # Abalone data set  a data set of physical measurements of abalones. The data set contains 4177 samples with 9 attributes. 300 samples were used as a test set. # Glass identification data set  a data describing the
Bernhard Pfahringer and Hilan Bensusan and Christophe G. GiraudCarrier. MetaLearning by Landmarking Various Learning Algorithms. ICML. 2000.
had from 5 to 12 attributes and were classified by simple parity, DNF and CNF rules as well as at random. The 18 Uci data sets were: mushrooms, abalone crx, sat, acetylation, titanic, waveform, yeast, car, chess(kingrookvsking), led7, led24, tictactoe, monk1, monk2, monk3, satimage, quisclas. The performance of every
Iztok Savnik and Peter A. Flach. Discovery of multivalued dependencies from relations. Intell. Data Anal, 4. 2000.
which were used in the experiments are available at UCI Machine learning repository [10]. In the case of the datasets Car, Bupa and Abalone we use randomly selected subsets of the original datasets. For each experiment we specify the name of the relation (dataset) r(R), the number of tuples in relation jrj, the
Tapio Elomaa and Juho Rousu. General and Efficient Multisplitting of Numerical Attributes. Machine Learning, 36. 1999.
values that they produce. In order to assess how different multisplitting strategies 226 T. ELOMAA AND J. ROUSU Table 1. Characteristic figures of the thirty test domains. Attributes Data set Nomin. Int. Real V B Examples Classes Abalone 1 7 863:7 826:4 4;177 29 Adult 8 6 3; 673:7 1; 668:2 32;561 2 Annealing 10 4 6 27:5 17:7 798 5 Australian 9 6 188:2 129:7 690 2 Auto insuran. 10 8 7
Christopher J. Merz. Using Correspondence Analysis to Combine Classifiers. Machine Learning, 36. 1999.
number of examples; number of attributes; number of numeric attributes; number of classes; and whether missing values exist. Data Set Exs. Atts. Num. Class Missing abalone 4177 8 7 3 no balance 625 4 4 3 no breast 286 9 4 2 yes credit 690 15 6 2 yes dementia 118 26 26 3 no glass 214 10 10 7 no heart 303 13 6 2 yes ionosphere 351
Kai Ming Ting and Ian H. Witten. Issues in Stacked Generalization. J. Artif. Intell. Res. (JAIR, 10. 1999.
can easily be interpreted. Examples of the combination weights it derives (for the probabilitybased model ~ M 0 ) appear in Table 5 for the Horse, Credit, Splice, Abalone Waveform, Led24 and Vowel datasets. The weights indicate the relative importance of the level0 generalizers for each prediction class. For example, in the Splice dataset (in Table 5(b)), NB is the dominant generalizer for
Marko RobnikSikonja and Igor Kononenko. Pruning Regression Trees with MDL. ECAI. 1998.
each consisting of 10 attributes  2, 3 or 4 important, the rest are random, and containing 1000 examples. UCI datasets used were: Abalone predicting the age of the abalone, 1 nominal and 7 continuous attributes, 4177 instances. Autompg: citycycle fuel consumption, 1 nominal, 6 continuous attributes 398 instances.
Khaled A. Alsabti and Sanjay Ranka and Vineet Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. KDD. 1998.
are taken from the STATLOG project, which has been a widely used benchmark in classification. 3 The Abalone " Waveform," and Isolet" datasets can be found in [13]. The Synth1" and Synth2" datasets have been used in [15, 17] for evaluating SLIQ and SPRINT; they have been referred to as the Function2" dataset. The main parameter of our
Christopher J. Merz. Combining Classifiers Using Correspondence Analysis. NIPS. 1997.
the direct comparison of SCANN with the SBP and SBayes, SCANN posts 5 and 4 signifcant wins, respectively, and no losses. The most dramatic improvement of the combiners over PV came in the abalone data set. A closer look at the result revealed that 7 of the 8 learned models were very poor classifiers with error rates around 80 percent. This empirically demonstrates PV's known sensitivity to learned
C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.
an overview of the entire museum layout. 45 4.13 abalone The abalone data set was analysed by Sean Kelly. This dataset contains roughly 2000 instances of measurements of abalone shellfish. The biggest issue in visualizing it was the lack of distinct target attribute for
RongEn Fan and P. H Chen and C. J Lin. Working Set Selection Using the Second Order Information for Training SVM. Department of Computer Science and Information Engineering National Taiwan University.
incorporates the shrinking technique. 20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 image splice tree a1a australian breastcancer diabetes fourclass german.numer w1a abalone cadata cpusmall space_ga mg Ratio Data sets time (40M cache) time (100K cache) total #iter 0 0.2 0.4 0.6 0.8 1 1.2 1.4 image splice tree a1a australian breastcancer diabetes fourclass german.numer w1a abalone cadata cpusmall space_ga mg
Johannes Furnkranz. Round Robin Rule Learning. Austrian Research Institute for Artificial Intelligence.
9 Classification time is only included in the runs that had a separate test set. In general, it can be expected to be more expensive for R 3 . See Section 7 for a brief discussion of this issue. 12 dataset C5 b vs. C5 R 3 vs. unord vs. order abalone 23.34 10.81 193.0 4.51 5.73 covertype      letter 73.37 6.64 1250.0 0.51 1.14 sat 27.86 9.10 143.0 0.85 1.51 shuttle 35.98 5.67 277.0
Christian Borgelt and Rudolf Kruse. Speeding Up Fuzzy Clustering with Neural Network Techniques. Research Group Neural Networks and Fuzzy Systems Dept. of Knowledge Processing and Language Engineering, School of Computer Science OttovonGuerickeUniversity of Magdeburg.
FCM a.p. GK GK dataset exp. mom. exp. mom. exp. mom. abalone 3 1.50 0.30 1.40 0.20 1.90 0.65 abalone 6 1.60 0.50 1.40 0.25 1.90 0.70 breast 2 1.20 0.05 1.30 0.10 1.80 0.55 iris 3 1.40 0.15 1.20 0.05 1.80 0.50 wine 3 1.40
Miguel Moreira and Alain Hertz and Eddy Mayoraz. Data binarization by discriminant elimination. Proceedings of the ICML99 Workshop: From Machine Learning to.
The bottom row shows the number of wins, including ties. Initial Final set size set n.con#icts entropy random choice data set size local global local global avg. min. abalone 5779 192 177 724 220 282.6 #21.6 240 allhyper 440 21 33 36 34 30.4 #4.3 22 allhypo 548 20 32 43 48 35.2 #6.2 17 anneal 134 31 31 30 30 30.9 #2.1 28
Johannes Furnkranz. Pairwise Classification as an Ensemble Technique. Austrian Research Institute for Artificial Intelligence.
with c5.0 as a base learner. For these, we give both the absolute error rate and the performance ratio relative to the base learner c5.0. The last line shows the geometric average of these ratios. dataset c5.0 round robin boosting both abalone 78.48 75.08 0.957 77.88 0.992 74.67 0.951 car 7.58 5.84 0.771 3.82 0.504 1.85 0.244 glass 35.05 24.77 0.707 27.57 0.787 22.90 0.653 image 3.20 2.90 0.905 1.60
Edward Snelson and Carl Edward Rasmussen and Zoubin Ghahramani. Draft version; accepted for NIPS*03 Warped Gaussian Processes. Gatsby Computational Neuroscience Unit University College London.
predict the the age of abalone from various physical inputs [9]. ailerons is a simulated control problem, with the aim to predict the control action on the ailerons of an F16 aircraft [10, 11]. For datasets creep and abalone, which consist of positive observations only, standard practice may be to model the ##### of the data with a GP. So for these datasets we have compared three models: a GP directly
Sally Jo Cunningham. Dataset cataloging metadata for machine learning applications and research. Department of Computer Science University of Waikato.
At UCI, for example, the documentation records range from the highly specific (for example, the abalone dataset record in Figure 1a) to the nearly nonexistent (the "undocumented databases" directory at UCI, as typified by the economic sanctions data description in Figure 1b). This variability is not
Bernhard Pfahringer and Hilan Bensusan. Tell me who can learn you and I can tell you who you are: Landmarking Various Learning Algorithms. Austrian Research Institute for Artificial Intelligence.
had from 5 to 12 attributes and were classified by simple parity, dnf and cnf rules as well as at random. The 18 UCI datasets were: mushrooms, abalone crx, sat, acetylation, titanic, waveform, yeast, car, chess(kingrookvsking), led7, led24, tictactoe, monk1, monk2, monk3, satimage, quisclas. The performance of every
Efficiently Updating and Tracking the Dominant Kernel Eigenspace. (a) Katholieke Universiteit Leuven Department of Electrical Engineering, ESATSCDSISTA.
true eigenvectors are approximated whilst tracking (ie up/downdating and downsizing) a kernel matrix. We consider a representative example of a kernel matrix, based upon the known Abalone benchmark dataset [14] with n = 3000 training instances having dimension p = 7. We consider the common choice of the radial basis kernel function (with h the kernel width parameter, fixed at 18.57): k(x i , x j ) =
Luc Hoegaerts and J. A. K Suykens and J. Vandewalle and Bart De Moor. Subset Based Least Squares Subspace Regression in RKHS. Katholieke Universiteit Leuven Department of Electrical Engineering, ESATSCDSISTA.
the methods relative to the standard deviations. Apart from this nondi#erence, we must remark that the mean of the KPCA based model is approximately equal to the mean of the KPLS1 model. The Abalone data set is another benchmark from the same UCI repository [33], consisting of 4177 cases, having p = 7 input variables. The aim is to predict the age of abalone fish from physical measurements. We picked at
