In ML what do we start with
Observations (dataset) of a system or phenomenon
What is the aim when given a dataset in ML
To discover patterns which leads to discovering general knowledge that can be used
What is a key limitation of learning from data
A model can only learn what is present in the data
What else limits a machine learning model beside data
The choice of model
What matters more than the quantity of data
The quality of data
What questions should we asked about data quality
Is the data representative of the underlying phenomenon
Are the features sensible
Have we included all the features we need to
Do we have observations from all regions of the input space
What do rows in the data matrix represent
Observations
What do columns in the data matrix represent
Features
What is the aim of unsupervised learning
Not to make predictions, but to understand and or gain insight into the phenomenon itself
What does unsupervised learning try to find
Internal relationships and patterns
How are observations analysed in unsupervised learning
By comparing observations to each other
What key question does unsupervised learning ask abut data
Do we have different high level types of observations (clusters)
What is another key goal of unsupervised learning besides clustering
Dimensionality reduction
What is dimensionality reduction
Representing the data matrix in a reduced form while preserving important info
Why do we do dimensionality reduction
To make the dataset more efficient and easier to analyse without losing information
What is K-Means clustering used for
To group observations into K clusters based on similarity
What are the steps of K-Means clustering
What does K-Means clustering stop
When convergence is reached (centroids stop changing significantly)
How do you perform one full iteration of K-Means clustering
Assign each data point to the nearest centroid
recalculate each centroid as the mean of its assigned points
What calculation is used when updating a centroid
The mean of all points assigned to that cluster
How do you assign a point to a centroid in K-Means
Choose the centroid with the smallest distance to the point
What is supervised learning
Comparing observations to distinguish them in terms of a target
What is unsupervised learning in contrast
Learning without labels, aiming to get an insight into the dataset
What is the aim of supervised learning
To distinguish between inputs such that this helps distinguish between targets