Unsupervised Learning Flashcards by Peter Oliver

Basic idea of unsupervised learning.

Find structure within a set of instances defined by descriptive features alone.

How well did you know this?

Not at all

Perfectly

What is a clustering algorithm?

Given finite set of data points, finds homogeneous subgroups of data points with similar characteristics. End result is generation of a feature that describes a cluster.

How well did you know this?

Not at all

Perfectly

Give one use case of clustering.

Customer segmentation.

How well did you know this?

Not at all

Perfectly

Clustering algorithm fundamentals?

Feature-space and distance measure.
Representation learning focused on creating new representation instances with the expectation that this result will be useful later.

How well did you know this?

Not at all

Perfectly

How is data pre-processed before k-means clustering?

Change categorical data to numerical data. Feature reduction techniques.

How well did you know this?

Not at all

Perfectly

T or F. K-Means is highly sensitive to outliers.

True.

How well did you know this?

Not at all

Perfectly

Name the four steps of k-means.

Step 1: Initialize K & Centroids
Step 2: Assigning Clusters to Datapoints
Step 3: Updating Centroids
Step 4: Stopping Criterion

How well did you know this?

Not at all

Perfectly

Explain step 1: Initialize K & Centroids.

Tell model how many clusters there will be, pick k datapoints as initial centroids.

How well did you know this?

Not at all

Perfectly

What are the initial centroids often called?

Seeds

How well did you know this?

Not at all

Perfectly

Explain Step 2: Assigning clusters to datapoints.

Perform distance calculation between each datapoint and all the cluster centroids and assign it to cluster of its closest centroid.

How well did you know this?

Not at all

Perfectly

Explain step 3: Updating cluster centroids.

We then split the data on features (e.g. x, y … co-ordinates). We get the average for each feature in each cluster to get a new cluster centroid. We do not include the centroid as a datapoint unless it is already a datapoint.

How well did you know this?

Not at all

Perfectly

Explain step 4: Stopping criterion.

Step 2 and 3 are performed iteratively. Until stopping criterion met. E.g. Distance of datapoints from their centroid is below some threshold. No cluster membership change on a certain iteration.

How well did you know this?

Not at all

Perfectly

Outline the k means clustering algorithm.

Select k cluster centroids.
Loop until stopping criterion met:
- Calculate distance of each datapoint from each cluster centroid.
- Assign each datapoint to its closest cluster centroid.
- Update cluster centroid, by getting average of datapoints in its cluster.

Return: The clusters and the datapoints in each of them at the end, along with the final k centroids.

How well did you know this?

Not at all

Perfectly

What is the output of k-means clustering algorithm? How would new value be assigned?

Datapoints, what clusters they belong to and their centroids.

Which every centroid it is closest to.

How well did you know this?

Not at all

Perfectly

What makes a good cluster? (informally)

Member datapoints close together and afar from other clusters.

How well did you know this?

Not at all

Perfectly

How do we measure cluster quality?

Study These Flashcards

The Inertia Score: Inertia tells how far away the data points within a cluster are from the closest centroid. Ranges up from 0 with low being desirable.

Silhouette Width: Considers both intra-cluster and inter-cluster distance measures to determine whether a given point is well placed. Ranges from -1 to 1. With 1 being desirable.

How do we calculate inertia?

Study These Flashcards

Sum of distances between each datapoint at its centroid for all clusters.

How do you calculate silhouette width?

Study These Flashcards

s(i) = b(i) - a(i) / max(a(i), b(i))

for point i where a(i) is average distance of points in its cluster to it and b(i) is average distance of points from next closest cluster to it.

How can we determine the number of clusters to have in k-means?

Study These Flashcards

Plot datapoints and observe patterns.
Repeatedly cluster the data for all values of k within a given range. Have some measure of performance. The Elbow Method and The Silhouette Method are common.

Explain The Elbow Method.

Study These Flashcards

We want a low value of inertia and a small number of clusters (k). Inertia decreases as k increases. The “elbow point” in the inertia-k graph is a good choice because after that the change in the value of inertia is not significant.

Explain The Silhouette Method (for determining appropriate k).

Study These Flashcards

Simply do silhouette method for different values of k and whichever has more values towards 1 and less outliers towards -1. This is better.

What is association rule learning?

Study These Flashcards

Rule-based machine learning method, finds associations between any attributes to create rules to predict any attribute or combination of attributes.

How is association rule learning different to k-means?

Study These Flashcards

There are no clusters / classes

What is the name for identified rules and format? What is the itemset?

Study These Flashcards

Association Rules
{Antecedent} => {Consequent}
I.e. {data we find} => {data that often occurs at the same time}

List containing all antecedent and consequent terms for a given rule.

What is the big issues with association rule data extraction?

1. Patterns in data can occur by chance. 2. Discovering rules in large datasets computationally expensive.

Name the three techniques for association rule mining.

Support, Confidence & Lift

What is support?

Indicates how frequently an item or itemset is in all the dataset. Support (X -> Y) = Frequency of (X -> Y) / total number of entries in dataset.

What is confidence?

Indicates how often a rule is found to be true in the dataset. Confidence (X -> Y) = Support (X -> Y) / Support (X)

What is lift?

Indicates the rise in probability of occurrence of Y when X has already occurred. The relationship between the antecedent and the consequent. Lift (X -> Y) = Support (X -> Y) / Support (X) * Support (Y)

What do different values of lift mean?

lift = 1. Antecedent and consequent are not dependent on one another. lift < 1. Occurence of antecedent has negative effect on occurrence of consequent. lift > 1. Antecedent effectively predicts consequent. They are dependent.

Discovering all rules for large data-sets is computationally expensive. What is the solution?

Only learn strong association rules. Apriori algorithm.

What is apriori?

Algorithm for frequent item set mining and association rule learning for large databases.

What is the basic concept of apriori?

Anti-monotone property. For any infrequent itemset, all its supersets must be infrequent too. Determined by some support / confidence minimum.

Outline Apriori algorithm.

Intialise k to 1 Generate itemsets of length k from D and generate their support. Eliminate infrequent itemsets. Repeat until no new frequent itemsets - k++ - Generate length k candidates from length (k - 1) frequent itemsets. - Prune candidate itemsets containg any subset of length (k-1) that is infrequent. - Count the support of each candidate by scanning D. - Eliminate infrequent candidates, leaving only frequent. Return set of strong rules.

When performing appriori if I have the min support percentage, how do I go about using it in the calculation?

E.g. 30% 0.3 * number of transactions in dataset Then compared that number with raw frequencies of itemsets.

T or F. Apriori and K-Means are eager learning algorithms.

True

Unsupervised Learning Flashcards

(36 cards)