What is training data used for?
to train the algorithm
3 steps of train, evaulate, test
During training what 2 data sets are available?
How is a data set split into test and training sets?
split more or less randomly, making sure to capture important classes up front
What percentage would training and testing sets be split into
N-fold cross validation
What bias does cross-validation hold?
Cross-validation is almost unbias
What is a confusion matrix used for?
It is used to describe
True positives (TP) means
predicted yes = actual yes
True negatives (TN) means
predicted no = actual no
False positives (FP) means…
False negative (FN)
How is accuracy measured in a confusion matrix?
( True Positive + True Negative ) / total
Name 3 regression evaluation metrics
Mean absolute error describes…
the mean of the absolute value of the errors
Mean squared error describes…
the mean of the squared errors
Root Mean Squared Error (RMSE) describes
The square root of the mean of the squared errors