*Dan Pelleg. Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. School of Computer Science Carnegie Mellon University. 2004. *
of points and linearly with the number of clusters. This allows for clustering with tens of thousands of centroids and millions of points using commodity **hardware** 7 1.1 Introduction Consider a dataset with R records, each having M attributes. Given a constant k, the clustering problem is to partition the data into k subsets such that each subset behaves "well" under some measure. For example, we
*Yongge Wang. A New Approach to Fitting Linear Models in High Dimensional Spaces. Alastair Scott (Department of Statistics, University of Auckland). *
The datasets Autos (Automobile), Cpu **Computer ****Hardware** /b> , and Cleveland (Heart Disease---Processed Cleveland) 141 Autos Bankbill Bodyfat Cholesterol Cleveland Cpu n / k 159 / 16 71 / 16 252 / 15 297 / 14 297 /
