What does the acronym DBSCAN stand for?
Density-based spatial clustering of applications with noise.
Who proposed the DBSCAN algorithm and in what year?
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996.
What is the fundamental principle of the DBSCAN algorithm?
It groups together points that are closely packed and marks points that lie alone in low-density regions as outliers.
In DBSCAN, what does the parameter $\epsilon$ (epsilon) define?
It defines the radius of the neighborhood around each point to be considered for density.
What does the minPts parameter in DBSCAN specify?
It specifies the minimum number of points required within a point’s $\epsilon$-radius to form a dense region.
How is a ‘core point’ defined in the DBSCAN algorithm?
A point is a core point if at least minPts points (including itself) are within its $\epsilon$-radius.
In DBSCAN, a point ‘q’ is ‘directly reachable’ from point ‘p’ if ‘q’ is within distance $\epsilon$ from ‘p’, and ‘p’ is a _____ point.
core
What does it mean for a point ‘q’ to be ‘reachable’ from a point ‘p’ in DBSCAN?
There is a path of points from ‘p’ to ‘q’ where each subsequent point is directly reachable from the previous one.
In DBSCAN, what are points that are not reachable from any other point called?
Outliers or noise points.
How is a ‘border point’ characterized in DBSCAN?
It is a non-core point that is reachable from a core point, effectively forming the ‘edge’ of a cluster.
According to DBSCAN’s logic, a core point forms a cluster with all other points that are _____ from it.
reachable
Is the ‘reachability’ relationship in DBSCAN symmetric? Why or why not?
No, because by definition only core points can reach non-core points, but the reverse is not true.
When are two points ‘p’ and ‘q’ considered ‘density-connected’ in DBSCAN?
They are density-connected if there is a core point ‘o’ from which both ‘p’ and ‘q’ are reachable.
What is a major advantage of DBSCAN over an algorithm like k-means regarding the number of clusters?
DBSCAN does not require the user to specify the number of clusters a priori.
What kind of cluster shapes can DBSCAN identify that many other algorithms cannot?
DBSCAN can find arbitrarily-shaped clusters, including a cluster completely surrounded by another.
How does the MinPts parameter help reduce the ‘single-link effect’ in DBSCAN?
It prevents different clusters from being connected by just a thin line of points, requiring a certain density to bridge them.
What is a primary limitation of DBSCAN when dealing with data containing clusters of varying densities?
A single combination of minPts and $\epsilon$ cannot be chosen appropriately for all clusters simultaneously.
In what specific situation is the DBSCAN algorithm not entirely deterministic?
Border points that are reachable from more than one cluster can be assigned to either, depending on the data processing order.
The effectiveness of DBSCAN in high-dimensional data can be hindered by the so-called ‘_____ of dimensionality’.
Curse
What graphical method is commonly used to help choose an appropriate value for the $\epsilon$ parameter?
A k-distance graph, plotting the distance to the k-th nearest neighbor (where k = minPts - 1).
On a k-distance graph for DBSCAN, a good value for $\epsilon$ is typically found where the plot shows an ‘_____’.
elbow
What is a common rule of thumb for setting the minPts parameter based on the number of dimensions, D?
Set minPts to be greater than or equal to D + 1, or often minPts = 2 * D.
Why is setting minPts = 1 not sensible for DBSCAN?
Because every point would be defined as a core point, making every point a tiny cluster.
If minPts is set to 2 or less, DBSCAN’s result becomes equivalent to what other clustering method?
Hierarchical clustering with the single link metric.