What are the use case for similarity?
How can nearest neighbours be used for predictive modelling in the case of classification, probability estimation and regression?
Classification: using the smallest distance of the NN
Probability estimation: using scores
Regression: using the average or the median
How many neighbours are needed and how can you solve the issue of points further away having the same influence as those close by?
How does kNN overfitting vary with k?
What can you use to choose the best value for k?
What are the issues with nearest neighbour methods?
Describe Hierarchical Clustering
What are the steps to perform k-means clustering?
How do k-means and hierarchical clustering compare in terms of efficiency?
How do you determine a good value for k?
How do we understand the results of clustering?
How can we interpret the results and its implications?
What are Characteristic and Differential descriptions?