Machine learning
Algorithms that can learn from observational data and make predictions from it
Unsupervised learning
An algorithm makes sense of a data set without prior learning experience or answers to learn from
Latent variable
A previously unknown part of the data, which unsupervised learning can do
Supervised learning
An algorithm learns from a data set plus the correct “answers”
Training/testing sets
A model is trained using a training set of data, then the model is tested on a similar but disjoint set of data to test its accuracy.
What are practical considerations for training/testing sets?
Why is train/test useful?
It can guard against overfitting.
K-fold cross variation
K-means clustering
What is a large caveat with K-means clustering?
The algorithm does not assign names or titles to clusters.
Entropy (data science)
Disorder of data
Zero if all data points are the same.