What is machine learning?
What are some example use of machine learning?
What is unsupervised learning?
Techniques where the machine is NOT given labels, or corresponding outputs.
- The machine will detect patterns from the data with no example to rely on.
The dataset containing the data to learn from is called an unlabelled dataset.
What is supervised learning?
Techniques where the machine is given inputs and corresponding outputs to learn from.
- The machine will try to adjust parameters to make the best prediction of the output when given an input.
Dataset containing inputs and corresponding outputs is called a labeled dataset.
What is reinforcement learning?
The machine learns through trial and errors.
- The method includes a feedback loop with rewards. While attempting trials, the machine tries to maximise the rewards.
What is a good algorithm?
An algorithm capable of making the correct prediction.
What is the goal of machine learning and how is this tested?
What fit of your model don’t you want?
You do not want a model that is fitted to the random variations of your data
What is underfitting and overfitting of a model?
Under: Not enough parameters to correctly predict Y (may be linear when it should not be).
Over: To many parameters to correctly predict Y (touches every data point) = small residual
What are the three components the machine learns with?
What are some challenges of machine learning?
What is a model within unsupervised learning?
K-means clustering
What is the intuition behind K-means clustering?
If we know we have K groups (clusters) in our dataset, we can try to group the observations so that the distance among observations:
- within a group is the smallest possible
between groups is the largest possible.
How do we estimate in K-means clustering?
How do we assess accuracy, sensitivity and specificity?
By comparing a training and a test dataset. Typically done by splitting the original dataset in two random groups: 70% to train the model, 30% to test it
Create confusion matrix to calculate
What is accuracy?
Proportion of observations correctly labelled by the algorithm.
= (TP+TN) / (TP+TN+FP+FN)
What is sensitivity?
Proportion of observations correctly predicted to belong to a category.
= TP / (TP+FN)
What is specificity?
Proportion of observations correctly predicted to NOT belong to a category
= TN / (TN+FP)
What two methods can we use if we don’t know the number of clusters in our data (for K-means clustering)?