What is parameter learning / parametric modelling?
A model specified by a data miner is finetuned through the use of data mining so that the model fits the data as well as possible.
Most common examples: linear models, such as linear regression
What are some of the assumptions made in this chapter?
What are linear discriminant functions?
What is a parameterised model?
I feel like these are the fundamentals that data mining tasks use.
What are Objective functions?
What are SVMs? How do you find the best linear discriminant?

What are Loss Functions and which ones should you know?
Fitting linear functions to regression
What’s the difference between odds and probability?
Odds is the ratio between something happening (i.e. winning) and something not happening (i.e. losing).
Probability is the ratio between something happnening and everything that could happen (i.e. winning and losing)
Why is log often applied to odds?
To normalise the magnitude differences between odds greater than 1 going to infinity and odds between 0 and 1. Since log looks at the exponent, it makes everything symmetrical and make it easy to compare.
What is the Odds ratio?
The odds ratio is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group.
For example, say you’re comparing the association between a mutated gene and cancer, and you have a confusion matrix. A large odds ratio indicates that the gene is a good predictor of cancer. Conversely, a small value indicates that it isn’t a good predictor.
What is logistic regression?
Logistic regression is a model/method that has an objective function designed to give accurate estimates of class probability.
Logistic regression uses the same linear model as linear discriminants for classification and linear regression for estimating numeric target values.
Output of Logistic regression is interpreted as the log-odds of class membership. This can directly be converted to probability of class membership
What is the logistic function?
Where f(x) is the linear function.
The model is fitted by determining the slope of the almost-linear part and thererby how quickly we are certain of the class as we move away from the boundary.
p+ is the class that we are trying to model
p+ should be as close as possible to 1 for positive instances
p+ should be as close as possible to 0 for negative instances

What is the objective function generally used by Logistic regression?
For Linear Discriminant, Linear regression and Logistic regression what do each estimate/predict and what is the target variable type?

What does the linear part of the Logistic regressions estimate tell us?
The slope of the almost-linear part tells us how quickly we are certain of the class as we move away from the line.
How would you choose between Decision Trees and Logistic regression?
Which model is more appropriate depends on the background of the stakeholders (i.e. statistical knowledge, making it easier to understand logistic regression).
What is the similarity between Decision Trees and Logistic regression and what are the key differences?
Similarity: classification trees and linear classifiers both use linear decision boundaries.

What are the most common techniques that are based on fitting the parameters of complex, non-linear functions?
What are the disadvantages?
Nonlinear support vector machines and Neural networks
Support vector machines have a so-called “kernel function” that maps the original features to some other feature space
Neural networks implement complex nonlinear functions as a “stack” of models, where the results of the previous model are used as input to the next model
Target labels for training are generally only provided for the final layer (the actual target variable)
Disadvantage:
How you can apply logistic regression for the case that your explanatory (or input) data has categorical variables?
To perform logistic regression the variables need to be numeric. If the input data is categorical, it would be converted into numeric using dummy variables. In the case that the categorical values are binary, 0 and 1 can be used. If they are not binary, the different levels can be defined by more complex 0 and 1 combinations.
What effect does a moving point have on the maximal margin classifier? What would happen with logistic regression?
It shouldn’t change anything about the maximal margin classifier unless it moves within the margin maximising boundary, then it could change the margin, it would shrink it for example depending on the tradeoff between error and margin size. If it moves to the other side of the boundary then the SVM either crashes or consider it an error. You can adjust this by giving how many possible errors it can accept.
Logistic regression would adapt since it is sensitive to al points.
What is the purpose of the kernel approach in a support vector machine? How does changing the kernel changes the location of the decision boundary?
The kernel approach allows you to apply SVM in cases where we cannot find a straight line, i.e. we apply SVM non-linearly. The kernel function allows you to map the original features to another feature space (i.e. x-y coordinates). In other words, when the dataset is inseparable in the current dimension, add another one. The rule is usually to go one dimension up.
Changing the kernel will vary how the decision boundary is drawn. As the dimensions increase the decision boundaries can become increasingly complex.