What does it mean to train / fit a model?
Letting an algorithm look at your data repeatedly and adjust its internal numbers for more accurate predictions
You provide examples, and the model learns the rule by itself.
In the context of training a model, what is RMSE?
Root Mean Squared Error
RMSE summarizes errors into one number; smaller RMSE indicates a better fit.
What are the parameters in the model equation y = mx + c?
Parameters are values learned directly from the data during training.
Fill in the blank: A hyperparameter is something the algorithm cannot learn from the data by itself and must be chosen by the _______.
data scientist
Examples include the number of clusters or the depth of a decision tree.
True or false: The algorithm learns hyperparameters directly from the data.
FALSE
Hyperparameters are set by the human and cannot be learned by the algorithm.
What is the process of how a machine learns during training?
This repeated adjusting is called training or fitting the model.
What is the goal of the algorithm during training?
Find m and c that give the smallest RMSE
The smaller the RMSE, the better the fit.
What is an example of a simple relationship used in training a model?
Temperature → ice-cream revenue
This relationship can be modeled using a straight line.
What does the error represent in the context of model training?
The distance between the real value and the predicted value
These vertical distances are used to calculate RMSE.
What is the first step in the training process of a model?
Choose a model (for example, a straight line)
This sets the foundation for adjusting parameters during training.
What is the training set used for in model development?
Learning pile
This is where the model tries rules, makes mistakes, and fixes itself.
What is the purpose of the validation set?
Choosing and tuning pile
It is used to answer questions about model performance and settings.
True or false: Once data is used for validation, it can still be considered unseen data.
FALSE
Validation data is no longer ‘unseen’ after it is used.
What is the test set meant to simulate?
Real-world check
It is used to evaluate the model after training and validation.
Why do we keep some data hidden during model training?
To prevent overfitting
Models can perform well on training data but poorly on new data.
In supervised learning, what is a common split for the data?
Exact numbers may vary, but the concept remains the same.
What is the difference between validation and test sets?
Testing on validation data can lead to biased results.
In unsupervised learning, what is the typical data split?
Validation is often skipped in unsupervised learning.
Ultra-short summary: What does each data set represent?
This summarizes the purpose of each data set in model development.
What is a performance metric?
A number that tells you how wrong your model is or how often it is right
Performance metrics are essential for evaluating model effectiveness.
Name the two cases for performance metrics.
Regression predicts a number, while classification predicts a class or label.
In regression, what does MAE stand for?
Mean Absolute Error
MAE answers the question: ‘On average, how wrong am I?’
What is the purpose of MSE in regression?
Mean Squared Error
MSE emphasizes larger errors more than smaller ones.
What does RMSE represent?
Root Mean Squared Error
RMSE is preferred in practice as it returns to normal units.