What is machine learning?
using data to teach algorithms to predict outcomes they have never seen before. Steps
What are the three steps in machine learning?
Give 5 differences between statistics and machine learning?
How does the goal of statistics differ from machine learning goal?
understanding relationship VS making accurate predictions
How does the central question in statistics differ from that in machine learning?
How does X relate to Y? VS Given X, what is Y?
How does the evaluation of statistics differ from that of machine learning?
Coefficients, p-values VS Test-set accuracy, error rate
How does the style of statistics differ from that of machine learning?
Transparent but rigid VS Flexible but often opaque
How does the approach of statistics differ from that of machine learning?
Model assumptions VS Algorithms that learn patterns
What are 4 important concepts in machine learning?
What are the two types of machine learning?
Give 4 ways in which you move from inference to prediction?
What is a regression tree?
flowchart predicting a continuous outcome by splitting data into groups by asking a series of yes/no questions and splits data step by step. Each endpoint than gives a prediction, which is the average outcome for the observations that end up there.
What are three characteristics of regression treess?
What are the two elements of a regression tree?
What are the steps in which a regression tree works?
Give three key advantags of a regression tree?
How do you measure prediction quality of algorithms?
RMSE = SQRT(1/n * Sum of (Actual value – predicted value)^2
Lower RMSE means better predictions.
What is meant with model complexity?
basically refers to the fact that more leaves in a regression tree leads to catching more intricate patterns.
What s the difference between a training set and a test set?
Training set: set of data that is used to estimate (train) the model
Test set: hiding during estimation, used only to evaluate performance
What is meant with overfitting?
model learns training data too well including its noise and peculiarities.
What is the sweet spot in ML algorithm development?
number of leaves of the regression tree where the RMSE of test data is lowest
How does classification using logistic regression work?
Classification is done using Logistic regression. Changes in classification opposed to testing regression trees’ RMSE:
1. Prediction: each leaf predicts a class instead of a number
2. Splitting criterion: we want each split to make groups as pure as possible instead of minimizing prediction error. Purity is measured by entropy
What is entropy?
Measure of how mixed a group is
What are the different gradations in entropy and what do they mean?