Unsupervised learning Flashcards

Question 1

Q

What is unsupervised learning

Answer

A

collects similar data into clusters (they get names cluster A etc but this has no semantic meaning)

Question 2

Q

What is the process of clustering

Answer

A

grouping similar objects into groups

Question 3

Q

What are example applications of clustering

Answer

A

social network analysis or marketing
Image segmentation

Question 4

Q

What are some unsupervised learning algorithms

Answer

A

clustering concepts
Partition-based clustering algorithms (k-means)
Hierarchical clustering (agglomerative clustering)

Question 5

Q

What is the aim of clustering algorithms

Answer

A

to see whether the data fall into distinct groups, with members within each
group being similar to other members in that group but different from members of
other groups

Question 6

Q

What are the steps of the k-means algorithm

Answer

A

1. define the number of clusters (k)
  choose k data objects randomly to serve as the initial centroids for
  the k clusters
assign each data object to the cluster represented by its nearest
centroid
find a new centroid for each cluster by calculating the mean vector
of its members
undo the memberships of all data objects; go back to step 3 and
repeat the process until cluster membership no longer changes or a
maximum number of iterations is reached

Question 7

Q

What are k-means variations

Answer

A

selection of the initial k means
dissimilarity calculations
strategies to calculate cluster means
using different distance measures

Question 8

Q

What are the k-means stengths

Answer

A

simple and easy to implement
quite efficient

Question 9

Q

What are the weaknesses with kmeans

Answer

A

Need to specify the value of k, but we may not know what the value should be beforehand
you may want to experiment with k value (i.e. elbow)
Sensitive to the initialisation
sensitive to noise
clustering of different sizes

Question 10

Q

Why is k-means sensitive to the noise

Answer

A

since we are using the mean

Question 11

Q

Explain hte agglomerative clustering algorithm

Answer

A

take all n data objects as individual clusters and build a n x n
dissimilarity matrix storing distances between any pair of data objects
while the number of clusters > 1 do:
1. 1. find a pair of data objects/clusters with the minimum distance
    merge the two data objects/clusters into a bigger cluster
    replace the entries in the matrix for the original clusters or
    objects by the cluster tag of the newly formed cluster
re-calculate relevant distances and update the matrix
* whole process produces a dendrogram
* relies on the definition of a distance metric between clusters

Question 12

Q

WHat are the strengths of agglomerative clustering

Answer

A

deterministic results
multiple possible versions of clustering
no need to specify the value of a k before hand
can create clusters of arbrary shales

Question 13

Q

WHat are the weakensses of agglomerative clustering

Answer

A

does not scale up for large data sets

Unsupervised learning Flashcards

(13 cards)