Auto MPG Data Set
Below are papers that cite this data set, with context shown.
Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info.
Return to Auto MPG data set page.
Dan Pelleg. Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. School of Computer Science Carnegie Mellon University. 2004.
would be to first try and estimate # (say, using a model with spherical Gaussians) and use the estimate to set the rectangle tails. Experiments on real-life data were done on the mpg and "census" datasets from the UCI repository (Blake & Merz, 1998). The "mpg" data has about 400 records with 7 continuous 2 attributes. Running on this data with the number of components set to three, we get the
Qingping Tao Ph. D. MAKING EFFICIENT LEARNING ALGORITHMS WITH EXPONENTIALLY MANY FEATURES. Qingping Tao A DISSERTATION Faculty of The Graduate College University of Nebraska In Partial Fulfillment of Requirements. 2004.
(T 0 = n 2 and T s =10n 2 ). M - Metropolis, G - Gibbs, MG - Metropolized Gibbs, PT - Parallel Tempering, BF - Brute Force. Data Sets iris car breast cancer voting auto annealing n 4 6 9 16 25 38 M 5.3 ± 2.1 1.7 ± 0.831.5 ± 5.05.0± 2.1 12.8 ± 7.5 1.0 ± 0.7 G 6.7 ± 3.81.9 ± 0.8 30.9 ± 5.5 5.0 ± 2.415.6 ± 7.80.6 ± 0.5 MG 6.0 ± 1.7
Christopher R. Palmer and Christos Faloutsos. Electricity Based External Similarity of Categorical Attributes. PAKDD. 2003.
house servant as an outlier, combined Clerical with Other service and combined Sales and Technical support . The final pair of clusterings in parts (g) and (h) show the makes of cars in the Auto data set. The comparison here is more subtle, but the REP clustering has a more natural looking structure and three very distinct clusters for the luxury cars, the family cars and the imports. D fr;P on the
Jinyan Li and Kotagiri Ramamohanarao and Guozhu Dong. Combining the Strength of Pattern Frequency and Distance for Classification. PAKDD. 2001.
The accuracy gaps can reach up to 14.93% (in sonar), half of them are around 6.5%. -- Our method is not always better than C5.0. We lose on four data sets, particularly on auto -- On average over the 30 data sets, our accuracy is 2.18% higher than C5.0, and 7.27% higher than 3-NN. Table 2. Accuracy comparison among our algorithm, C5.0, and k-NN.
Thomas Melluish and Craig Saunders and Ilia Nouretdinov and Volodya Vovk and Carol S. Saunders and I. Nouretdinov V.. The typicalness framework: a comparison with the Bayesian approach. Department of Computer Science. 2001.
regions for data with w ~ N(0,1) % confidence Mean tolerance region width a=1 a=1000 a=10000 Figure 1 Bayesian RR and RRCM on data generated with w ¸ N(0; 1) We also experimented on two benchmark dataset, the auto mpg dataset and the Boston housing dataset. For each experiment, we show the percentage confidence against the percentage of labels outside the tolerance region predicted for that
Wai Lam and Kin Keung and Charles X. Ling. PR 1527. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. 2001.
and their codes Data set Code Automobile Ab Auto Mpg Am Audiology Au Balance-scale Ba Breast-cancer-w Bc Car Ca Credit screening Cs Ecoli Ec Glass1 Gl Hepati He Ionosphere Io Iris Ir Letter Le Liver Li Monk-1 M1 Monk-2 M2
Dan Pelleg and Andrew W. Moore. Mixtures of Rectangles: Interpretable Soft Clustering. ICML. 2001.
form of a rectangle (in this case a line-segment) with tails. An M-dimensional tailed rectangle is simply a product of these. Experiments on real-life data were done on the ` mpg ' and ``census'' datasets from the UCI repository (Blake & Merz, 1998). The ``mpg'' data has about 400 records with 7 continuous 3 attributes. Running on this data with the number of components set to three, we get the
Zhi-Hua Zhou and Shifu Chen and Zhaoqian Chen. A Statistics Based Approach for Extracting Priority Rules from Trained Neural Networks. IJCNN (3). 2000.
80 2 19 19 0 UCI-Iris Plant 150 3 4 0 4 UCI-Lung Cancer 27 3 56 56 0 IS Fault Diagnosis 352 10 22 8 14 Table 3. Comparison of STARE and Crave & Shavlik's approach Rule number Test set fidelity Data set CS94 STARE CS94 STARE UCI-1985 Auto Imports 27 36 93.1% 100% UCI-Credit Screening 32 38 96.6% 98.3% UCI-Hepatitis 11 12 92.5% 100% UCI-Iris Plant 10 13 92.0% 97.3% UCI-Lung Cancer 7 8 92.6% 100% IS
Mauro Birattari and Gianluca Bontempi and Hugues Bersini. Lazy Learning Meets the Recursive Least Squares Algorithm. NIPS. 1998.
classical mean square error criterion: ^ y q = x 0 q ^ fi( ^ k); with ^ k = arg min k MSE(k) = arg min k P k i=1 ! i (e cv i (k)) 2 P k i=1 ! i ; (9) Table 1: A summary of the characteristics of the datasets considered. Dataset Housing Cpu Prices Mpg Servo Ozone Number of examples 506 209 159 392 167 330 Number of regressors 13 6 16 7 8 8 where ! i are weights than can be conveniently used to discount
D. Greig and Hava T. Siegelmann and Michael Zibulevsky. A New Class of Sigmoid Activation Functions That Don't Saturate. 1997.
a sequence of 100 networks was trained using different values of ø for each hidden node. For the auto mpg servo and Tecator data sets (3 hidden nodes) the ø values (0:5; 1:5; 2:5) were used, for the glass data set (6 hidden nodes), the values (0:5; 1:0; 1:5; 2:0; 2:5; 3:0) were used, and for the bodyfat data set (7 hidden nodes)
Johannes Furnkranz. Pairwise Classification as an Ensemble Technique. Austrian Research Institute for Artificial Intelligence.
Four of the datasets (Pole Telecom, MV Artificial, Auto MPG and Triazines) seem to be completely unamenable to pairwise classification, i.w., j48 performs better in all three classification settings. This, however,
C. Titus Brown and Harry W. Bullen and Sean P. Kelly and Robert K. Xiao and Steven G. Satterfield and John G. Hagedorn and Judith E. Devaney. Visualization and Data Mining in an 3D Immersive Environment: Summer Project 2003.
was analysed by Christian Brown. Overview The Miles Per Gallon MPG data set consisted of data regarding the engines of numerous cars. Each car had 8 attributes, intended to be used to predict miles per gallon for each car. The attributes were a mix of both discrete and