Machine Learning Flashcards by Garry Smith

What is a target variable

A target variable is the dependent variable (the y variable)
Target variables can be continuous, catagorical or ordinal

How well did you know this?

Not at all

Perfectly

What is the definition of features

Features are the x variables

How well did you know this?

Not at all

Perfectly

What is the definition of a training set

Training set is the sample used to FIT the model

How well did you know this?

Not at all

Perfectly

What is the definition of a hyperperameter

This is a model input which is specified by the researcher,

How well did you know this?

Not at all

Perfectly

What is the definition of supervised machine learning

This is when you use labelled training data, where the y variable is clearly defined and provided to the algorithm.

The goal is to guide the machine towards higher accuracy by providing the answer to the question basically

How well did you know this?

Not at all

Perfectly

What are the main key tasks, and when is each used

Key tasks are regression and classification

Regresssion is used when the target is continuous
And classification is used when the target is ordinal or categorical.
This can be a binary classification such as something either being a dog or not.

Multiple regression is a good example of supervised learning

How well did you know this?

Not at all

Perfectly

What is unsupervised machine learning

Unsupervised machine learning is when the algorithm is not given any labelled training data.
Therefore there is no answer given to the model

Instead of using classifiers, like sector similarities, you could use unsupervised training in order to group stocks into groups wich have behaved the most similarly.

How well did you know this?

Not at all

Perfectly

What is the definition of deep learning.

Deep learning is used for highly conplex tasks which involve non linearities
They are based on neural networks which have many hidden layers.
At least two but usually more than 20
Reinforcement learning is when models just basically use trial and error in order to maximise an outcome, based on the result of their previous attempts.

How well did you know this?

Not at all

Perfectly

What are the key tasks that unsupervised learning is likely to perform

Unsupervised learning is likely to perform clustering, where a model groups observations.
And it’s also likely to be used for image recognition, deep learning, and natural language processing.

How well did you know this?

Not at all

Perfectly

What is supervised learning
When do you use it

When you label training data where the y variable is clearly defined and provided to the algorithm
Used when the target variable is categorical rather than ordinal and can be a binary classification or a multi category classification.

How well did you know this?

Not at all

Perfectly

What is the meaning of reinforcement learning.

Reinforcement learning is where an agent learns through trial and error how to maximise a set of constraints.

How well did you know this?

Not at all

Perfectly

What is the meaning of overfitting when it comes to machine learning
When does overfitting happen

Overfitting is when a model is too complex, becuase it has too many features, it’s’ a bit like an ols regression having too many variables.
Overfitting happens when the model mistakes random noise for a signal or pattern

How well did you know this?

Not at all

Perfectly

What happens to the r squared and the adjusted r squared when you have overfitting

When you have overfitting the r squared is likely to be high, whereas when you have overfitting the adjusted r squared is likely to be low.

How well did you know this?

Not at all

Perfectly

What does it mean when a model is said to generalise well

A model generalises well when it retains the explanitory power when it’s applied to new and out of sample data.

How well did you know this?

Not at all

Perfectly

What are the three prediction errors that data scientists use to understand and adress overfitting

1 bias error. This is the in sample error that results from a model having a poor fit
2 variance error. This is the out of sample error that results from over fitted models.
3 base errors. This is the random error that it is impossible to eliminate from a model due to random noise.

How well did you know this?

Not at all

Perfectly

What happens to variance error and bias error through increased model complexity

Study These Flashcards

Variance error increases with model complexity
Bias error decreases with complexity

What are the 5 ways that you can adress overfitting and explain each of them

Study These Flashcards

1 complexity reduction
This is when you impose a penalty to exclude features that do not contribute to the out of sample prediction accuracy. You attempt to create a parsimonious model which is a model that achieves the highest level of explanation using the smallest number of variables possible,

2 penalised regression (LASSO regression)
It’s a method that minimises the sum squared errors plus a penalties based on the absolute value of the slope coefficients.
It automatically eliminates the least predictive features.

3 cross validation
Cross validation estimates the out of sample error by dividing the data into K parts.
The model is trained on K-1 parts and then validated one he remaining part, repeating the process k times.

4 regularisation and pruning.
In classification and regression trees overfittting is adressed by taking away sections of the decision tree that do not offer a lot of explanatory power.

5 ensemble learning
Random forests mitigate overfitting byt training the treees using different subsets of data. Because each tree uses different features errors across them tend to cancel out.

What does LASSO stand for and what does it measure
What is the metric used to determine the balance between overfitting and parsimony

Study These Flashcards

Lasso stands for least absolute shrinkage and selection operator.
Lambda determines the balance between overfitting and parsimony.

What is K cross validation how does it work.

Study These Flashcards

You first divide the data into k different parts, which are of equal size.
Then the model is trained k times, in each different training operating, k-1 parts are used in the training sample.
The remaining one piece is used for validation.
Then the process is repeated until every one of the folds has served as the validation set at least once.
Then the error rate is measured for each of the iterations and the mean of these errors is uesed as the estimate for the out of sample error.

What are penalised regressions and what are they used for

Study These Flashcards

Penalised regression models are used to adress overfitting.
They impose a penalty on the models compelxity which will increase with the number of features.

The concept is that the models minimise the SSE plus a penalty term. This means the models acchives a high level of explaination but with as few predictors as possible.

LASSO where the penalty is the sum of the absolute values slope coefficients.

What is the support vector machine
What is a support vector
What is a soft margin classification
What is a kernel trick

Study These Flashcards

Support vector machine, is an algorithm mainly used for classification
It tries to find the optimal descision boundary that separates data into two groups with the biggest margin between the two groups.
Support vector is a value that is near a boundary that is used to define its position.
A soft margin classification is an adaptation that allows for some missclassified observations to optimise between a wide margin and an error
A kernel trick is a method used to reshape data into higher dimensions, to find a clear split when groups are not separable.

What is the K nearest neighbour
What is the meaning of k
What happens if k is too small or k is too big
What is the investment application of this method

Study These Flashcards

K nearest neighbour is used to classify the nearness of the observation to an observation in the training sample.
K is the number of neighbors that the algorithm considers
When k is too small the model might overfit, (high error rate)
When k is too big it dilutes the result by averaging across different outcomes.
This method is used when you need to asssign bonds to different ratings, or when you want to create different indices.

What is a classification and regression tree
What si the difference between the two

Study These Flashcards

A classification and regression tree is basically a Question tree

It organisers data into different nodes
The root node at the top of the tree is the most important
And the desision nodes are points further down from the top of the tree where data is split further
Terminal nodes are final points where the algorithm stops splitting and provides the outcome of the model

A classification tree is used when the target variable is binary
Regression tress are used when the variable is continuous.

How do you adress overfitting in a CART model

Study These Flashcards

You can set limits to the model’s complexity
You can prune sections of the model that have minimal explanatory power

What are ensemble learning

Ensemble learning combines predictions from more than one model to reduce the average error and noise. Heterogeneous ensemble, this combines different types of algorithm and uses a voting classifier to determine the final prediction Homogeneous ensemble uses the same algorithm multiple times but on different raining data samples via bagging, which generates the random samples with replacement

What is random forest

It’s an application of homogeneous ensemble learning You train a large number of independent descision trees. Using bagged data and a set of features for each tree The by averaging the results of the tress it increases the signal to ratio noise becuase errors of each tree tens to cancel eachother out. It also mitigates the overfirrritn problem but it loses the visual transparency of CART

What is principle component analysis

Basically you take all of the raw data points, and you make a linear transformation of the raw data points, so that the ones taht are the most similar to one another become one data point. These are called eigenvectors (because they are transformed) each eigenvector has an eigen value, which is basically the variance of the outcome that is described by the raw values that sit inside each eigenvector. Then you basically rank all the eigenvalues from 1-what ever the number of them you have is. You then go down the eigenvector rank, and include all the eigenvectors until the sum of the eigenvalues can basically explain about 85%-95% of the total varaition Then you discard the rest The problem with this aproach is you can’t really see what’s going on inside, and it’s a bit of a black box aproach.

What is K means clustering.

K means clustering splits observations into non overlapping clusters Basically you select the number of groups that oyu want ans what you want the midpoint of that group to be Then you add new data to the cluster, and it’s added into the cluster that it’s closest to the midpoint of Then after you add a data point the midpoint is calculated again This means that the data can move around You keep doing this until every data point has been allocated to a cluster And oyu stop. The issue here however is that you have to pick the number of clusters before oyu start and you also have to pick the mean point of each of the clusters before you start. This means you have to have a decent idea about the data set.

What is hierarchical clustering what are the different approaches

Hierarchical clustering such as agglomerative clusterign This is when each value starts as its own cluster. Then the two closest values are combined with one another This continues for as long as you want until there are large enough clusters Divisive top down clustering This is when one giant cluster gets split up into smaller bits progressively. This is the best when the optimal number of clusters is unknown

What are neural networks What are the three layers How does it get better over time

Neural networks are modeled after the brain and consist of layers of nodes that are threaded together. The three layers are the input layer - the input layer contains the independent variables The hidden layers - the hidden layers are where the calculations happen, here there is a summation operator which calculates a weighted average of the inputs, and an activation function which basically processes the inputs And the output layer - this contains a single node which generates the models prediction Becuase the model is above to modify the weights that are given to each node after it’s worked out the output, and that means that next time it will be better at calculating the output.

What are deep learning networks

Deep learning networks are a complex set of neural networks which are very deep This means that they have lots of hidden layers Deep learning networks are good for very complex tasks which involve big data sets.

What is the meaning of reinforcement learning

Reinforcement learning is basically a model which is able to maximise a reward given a constraint The reinforcement learning agent does not rely on training data or provide feedback. It learns based on feedback from lots and lots of attempts.

Machine Learning Flashcards

(32 cards)