What is the definition of ML that Dr. Isbell prefers?
ML is about using math, computation, engineering (among other things) to build computational artifacts that learn over time.
What is inductive reasoning?
Reasoning that goes from specifics –> generalities
What is deductive reasoning?
Applying general rules to draw specific (logical) conclusions
At a high-level, what is supervised learning considered an example of?
Function approximation. It’s the process of inductively learning a general rule from observations.
What is unsupervised learning all about?
It’s about DESCRIBING data. In UML, we’re only given “inputs”, so the objective is to learn if there is some latent structure in the data that we can use to describe the data in a more efficient (i.e. more concise) way. Clustering is a common time of unsupervised ML.
What are two types of supervised machine learning?
In a Decision Tree (DT) model, nodes are _______ and edges are _____?
Attributes, Values
What is the decision tree algorithm?
What is the complexity of XOR?
O(2^n) [i.e. exponential, NP-Hard]
What is the complexity of OR?
O(n) [i.e. linear]
What are the two types of biases we worry about when searching through a hypothesis space?
What are some of the inductive biases of the ID3 algorithm for decision trees?
According to Mitchell, what three features must be present in order to have a well-posed learning problem?
Historically, how did we end up with the term “regression”?
The idea of regression to the mean, e.g. the heights of a taller/shorter than average person have children that tend to ‘regress’ back to the mean.
What are some examples of where error comes from?
What is one of the fundamental assumptions that we make in most supervised learning algorithms?
That the data from the training set are REPRESENTATIVE of what we expect in the future. If this isn’t the case, then our model won’t GENERALIZE, which is what we really care about as ML practitioners. More formally, the general assumption is that data are I.I.D. (Independent, Identically Distributed); that is, that the process the generated the training data is the same process that is generating the test data (in fact, any future data!).
In the context of regression, the best constant in terms of the squared error is the _____?
mean
Describe cross-validation and why we use it?
We split the training data into k-folds. Then we use n-1 of the folds for training and use the final n fold as a validation set (i.e. a stand-in for the test set). We can do this for all the combinatorial sets of k-folds, and then average the validation error across all of them. The best model is then the one with the lowest average error.
We use cross validation to avoid overfitting the model. Since we should have no access to the test set when developing our model, using cross validation improves generalization.
Logical AND is expressible as a perceptron unit? (True/False)
True
Logical OR is not expressible as a perceptron unit? (True/False)
False
Logical NOT is expressible as a perceptron unit? (True/False)
True
Logical XOR is not expressible as a perceptron unit? (True/False)
False
For perceptron network training, what is the difference between the “perceptron rule” and the “gradient descent” rule?
The perceptron rule uses thresholded output values while gradient descent uses the UNthresholded values.
If the data are linearly separable, wile the perceptron rule find the hyperplane that separates them in a finite number of iterations?
Yes