Unsupervised learning Flashcards

(13 cards)

1
Q

What is unsupervised learning

A

collects similar data into clusters (they get names cluster A etc but this has no semantic meaning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the process of clustering

A

grouping similar objects into groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are example applications of clustering

A

social network analysis or marketing
Image segmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some unsupervised learning algorithms

A

clustering concepts
Partition-based clustering algorithms (k-means)
Hierarchical clustering (agglomerative clustering)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the aim of clustering algorithms

A

to see whether the data fall into distinct groups, with members within each
group being similar to other members in that group but different from members of
other groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps of the k-means algorithm

A
    1. define the number of clusters (k)
      choose k data objects randomly to serve as the initial centroids for
      the k clusters
  1. assign each data object to the cluster represented by its nearest
    centroid
  2. find a new centroid for each cluster by calculating the mean vector
    of its members
  3. undo the memberships of all data objects; go back to step 3 and
    repeat the process until cluster membership no longer changes or a
    maximum number of iterations is reached
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are k-means variations

A
  • selection of the initial k means
  • dissimilarity calculations
  • strategies to calculate cluster means
  • using different distance measures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the k-means stengths

A

simple and easy to implement
quite efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the weaknesses with kmeans

A
  • Need to specify the value of k, but we may not know what the value should be beforehand
  • you may want to experiment with k value (i.e. elbow)
  • Sensitive to the initialisation
  • sensitive to noise
  • clustering of different sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is k-means sensitive to the noise

A

since we are using the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain hte agglomerative clustering algorithm

A
  1. take all n data objects as individual clusters and build a n x n
    dissimilarity matrix storing distances between any pair of data objects
  2. while the number of clusters > 1 do:
      1. find a pair of data objects/clusters with the minimum distance
        merge the two data objects/clusters into a bigger cluster
        replace the entries in the matrix for the original clusters or
        objects by the cluster tag of the newly formed cluster
  3. re-calculate relevant distances and update the matrix
    * whole process produces a dendrogram
    * relies on the definition of a distance metric between clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHat are the strengths of agglomerative clustering

A
  • deterministic results
  • multiple possible versions of clustering
  • no need to specify the value of a k before hand
  • can create clusters of arbrary shales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHat are the weakensses of agglomerative clustering

A

does not scale up for large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly