Unsupervised Learning and Probabilistic Inference Flashcards by Khush Damani

In ML what do we start with

Observations (dataset) of a system or phenomenon

How well did you know this?

Not at all

Perfectly

What is the aim when given a dataset in ML

To discover patterns which leads to discovering general knowledge that can be used

How well did you know this?

Not at all

Perfectly

What is a key limitation of learning from data

A model can only learn what is present in the data

How well did you know this?

Not at all

Perfectly

What else limits a machine learning model beside data

The choice of model

How well did you know this?

Not at all

Perfectly

What matters more than the quantity of data

The quality of data

How well did you know this?

Not at all

Perfectly

What questions should we asked about data quality

Is the data representative of the underlying phenomenon

Are the features sensible

Have we included all the features we need to

Do we have observations from all regions of the input space

How well did you know this?

Not at all

Perfectly

What do rows in the data matrix represent

Observations

How well did you know this?

Not at all

Perfectly

What do columns in the data matrix represent

Features

How well did you know this?

Not at all

Perfectly

What is the aim of unsupervised learning

Not to make predictions, but to understand and or gain insight into the phenomenon itself

How well did you know this?

Not at all

Perfectly

What does unsupervised learning try to find

Internal relationships and patterns

How well did you know this?

Not at all

Perfectly

How are observations analysed in unsupervised learning

By comparing observations to each other

How well did you know this?

Not at all

Perfectly

What key question does unsupervised learning ask abut data

Do we have different high level types of observations (clusters)

How well did you know this?

Not at all

Perfectly

What is another key goal of unsupervised learning besides clustering

Dimensionality reduction

How well did you know this?

Not at all

Perfectly

What is dimensionality reduction

Representing the data matrix in a reduced form while preserving important info

How well did you know this?

Not at all

Perfectly

Why do we do dimensionality reduction

To make the dataset more efficient and easier to analyse without losing information

How well did you know this?

Not at all

Perfectly

What is K-Means clustering used for

Study These Flashcards

To group observations into K clusters based on similarity

What are the steps of K-Means clustering

Study These Flashcards

Choose the number of clusters (K)
Select K initial centroids (starting points)
Assign each point to the nearest centroid
Update centroid positions based on the assigned points
Repeat steps 3 and 4

What does K-Means clustering stop

Study These Flashcards

When convergence is reached (centroids stop changing significantly)

How do you perform one full iteration of K-Means clustering

Study These Flashcards

Assign each data point to the nearest centroid

recalculate each centroid as the mean of its assigned points

What calculation is used when updating a centroid

Study These Flashcards

The mean of all points assigned to that cluster

How do you assign a point to a centroid in K-Means

Study These Flashcards

Choose the centroid with the smallest distance to the point

What is supervised learning

Study These Flashcards

Comparing observations to distinguish them in terms of a target

What is unsupervised learning in contrast

Study These Flashcards

Learning without labels, aiming to get an insight into the dataset

What is the aim of supervised learning

Study These Flashcards

To distinguish between inputs such that this helps distinguish between targets

What is the key difference between supervised and unsupervised learning?

Supervised learning uses targets; unsupervised learning does not.

What is probability used for in ML?

To deal with uncertainty.

How is probability interpreted?

As a degree of belief.

What defines a probabilistic model?

The joint distribution.

What probability rules are important?

Sum rule, product rule, and conditional probability.

What is marginalisation?

Propagating belief or uncertainty by summing over variables.

Unsupervised Learning and Probabilistic Inference Flashcards

(30 cards)