A density-based algorithm that finds cluster centers by iteratively shifting toward regions of highest data density. No need to specify the number of clusters.
Mean Shift finds clusters by locating the peaks (modes) of the data's density function. Each data point iteratively moves toward the mean of points within a window (bandwidth), converging at density peaks. Points that converge to the same peak belong to the same cluster.
Mean Shift automatically determines the number of clusters. You only need to set the bandwidth (window size), which controls the granularity of the clustering.
The bandwidth defines the radius of the window used to compute the local mean. Small bandwidth = many small clusters. Large bandwidth = fewer, larger clusters.
Sklearn provides an automatic bandwidth estimator:
| Feature | K-Means | Mean Shift |
|---|---|---|
| Number of clusters | Must specify K | Determined automatically |
| Cluster shape | Spherical only | Arbitrary shape |
| Parameters | K (number of clusters) | Bandwidth (window size) |
| Speed | Fast (O(nKt)) | Slow (O(n^2) per iteration) |
| Outlier handling | Assigns to nearest cluster | Forms tiny clusters for outliers |
| Good For | Not Ideal For |
|---|---|
| Unknown number of clusters | Large datasets (slow, O(n^2)) |
| Non-spherical cluster shapes | High-dimensional data |
| Image segmentation, object tracking | When speed is critical |
| Small to medium datasets | Very different cluster densities |
Mean Shift is computationally expensive (O(n^2) per iteration). For large datasets, use K-Means or DBSCAN instead. The bandwidth parameter heavily influences results; use estimate_bandwidth() as a starting point.
Unsupervised Clustering Density-based Non-parametric