Density-Based Spatial Clustering of Applications with Noise. Finds clusters of arbitrary shape and automatically detects outliers.
DBSCAN groups together points that are tightly packed (dense regions) and labels points in low-density areas as noise (outliers). Unlike K-Means, it does not need the number of clusters specified upfront and can find clusters of any shape.
DBSCAN has two key advantages: it automatically determines the number of clusters and it identifies outliers as noise points (label = -1).
The radius of the neighborhood around each point. Points within this radius are considered neighbors.
Minimum number of points required within eps radius to form a dense region (core point).
| Type | Definition |
|---|---|
| Core point | Has at least MinPts neighbors within eps radius |
| Border point | Within eps of a core point, but has fewer than MinPts neighbors |
| Noise point | Not within eps of any core point. Labeled as -1 |
| Good For | Not Ideal For |
|---|---|
| Clusters of arbitrary shape (non-spherical) | Clusters with very different densities |
| Automatic outlier detection | High-dimensional data (distance becomes meaningless) |
| Unknown number of clusters | When you need every point assigned to a cluster |
| Geospatial data, anomaly detection | Very large datasets without spatial indexing |
Choosing eps and min_samples is critical. A common approach: use a k-distance plot (sort distances to the k-th nearest neighbor and look for an "elbow"). Poor parameter choices lead to either one giant cluster or all noise.
Unsupervised Clustering Density-based Outlier Detection