What is the goal of unsupervised learning?
Give some examples of objectives that can be achieved with unsupervised learning…
Unsupervised learning is used for clustering tasks, explain how this is done…
Unsupervised learning is used for community detection, explain what this is and how it’s done…
Unsupervised modelling is used for topic modelling, explain what this is and how it’s done…
Give some examples of clustering algorithms…
K means -> Identifies points close to K centroids where K is a hyper parameter given by the user.
DBSCAN -> Density Based Spatial Clustering of Applications with Noise. Finds high density regions, and creates cluster by expanding outwards.
Hierarchical Clustering -> Repeatedly divide clusters into sub-clusters.
What are the 2 types of clustering algorithms? Define each…
Hard Clustering -> Each data belongs to 1 cluster and only 1 cluster. Used when we want to make a definite decision on the data. I.e data can’t belong to multiple classifications. e.g data is either in A or B or C.
Soft Clustering -> Data can be assigned to multiple clusters.
What is a common similarity / distance metric used for clustering?
When do we use Jaccard Similarity? How is it calculated?
We use Jaccard Similarity when we want to establish the similarity between 2 sets. It’s calculated by the number of intersection points of the sets divided by the number of union data point of the sets.
The Jaccard Distance = 1 - Jaccard Similarity.
How do we calculate Jaccard Distance?