ML Fundamentals Principles, Supervised and Unsupervised Learning Flashcards

(17 cards)

1
Q

What does the cost function measure

A

Mismatch between model and data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 types of supervised learning

A

Regression
Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the core assumption behind k-NN (smoothness assumption)

A

If observations are close in the input space, they are also close in the output space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the steps in k-NN

A

Compute distances from new point to all training points.

Pick the k closest.

Classification → majority vote

Regression → average their outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is overfitting

A

Tuning the model parameters too closely to the noise in the measurements, which prevents it from generalizing well

Low training error, high test error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is underfitting

A

Using a model that is too simple (e.g. a straight line for non-linear data) to make good predictions

High training error, high test error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we test how well a model generalises

A

The model is trained on the training set, and its performance is evaluated by how well it predicts outputs for the test set.

Low test error = good generalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Ockham’s razor

A

A principle suggesting the simplest solution is usually the best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the goal of unsupervised learning

A

Goal is not prediction, but gaining insight into the phenomenon itself by finding internal relationships and patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is clustering

A

Finding groups of similar observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is dimensionality reduction

A

Compressing data while preserving structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

K-Means clustering process

A

Choose the number of clusters (K).
Select K initial “centroids”.
Assign points to the nearest centroid.
Update centroid positions.
Repeat until convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does supervised learning mean

A

The dataset contains inputs and correct outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to find test error

A

∑(ytest​−ypred​)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the sources of bias in machine learning

A

Biased datasets
Wrong labels
Missing data
Unbalanced data
Poor feature choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What matters more, data quality or quantity

17
Q

Supervised vs Unsupervised Learning

A

Supervised - Have labels, predict target

Unsupervised - No labels, understand data structure