What do Non-Linear algorithms assume?
Non-Linear Algorithms assume a non-linear relationship between x and y.
What are the 4 common Non-Linear models?
How does K-Nearest Neighbour work?
How do you calculate the Euclidean distance between two points?
distance = square root of ((x1 - x2)^2 + (y1 - y2)^2)
What are the advantages of using the K-nearest Neighbour algorithm?
Doesn’t require any prior training, just the storage of data
Can be used for both classification and regression
What are some disadvantages of K Nearest Neighbour?
What are the steps involved in using a Hard Margin SVM?
What are the advantages of Hard Margin SVMs?
Theoretical Guarantee - Finds the hyperplane with the maximum margin, leading to good generalisation
Deterministic - There is always a unique solution for linearly separable data.
What are some disadvantages of Hard Margin SVMs?
Assumes the data is perfectly linearly separable, otherwise the algorithm fails.
Sensitive to Outliers - A single outlier can drastically affect the hyperplane.
What is the main difference between Soft Margin SVMs and Hard Margin SVMs?
Soft Margin SVMs are designed to handle cases where the data is not perfectly linearly separable, whereas Hard Margin SVMs can’t handle them as well.
What are the effects of increasing the value of C within Soft Margin SVMs?
A higher C value will result in a larger penalisation for margin violations, thereby leading to smaller margins and fewer miscalculations.
How does a Kernel SVM differ from that of a Hard or Soft Margin SVM algorithm?
Kernel SVMs are able to solve problems where the data is not linearly separable in the original feature space, by mapping the data to a higher-dimensional space.
What are the advantages of using a Soft Margin SVM over a Hard Margin SVM?
Soft Margin SVMs are more robust to outliers, and they can be customised more thoroughly to better cater for the problem it’s attempting to solve.
What is the step-by-step operation of a Kernel SVM?
What are the advantages of a Kernel SVM?
Able to handle non-linear data
Very powerful tool for high-dimensional and non-linear datasets
Multiple kernel options allow adaptation to different types of data.
What are the disadvantages of a Kernel SVM algorithm?
Kernel computation can be slow for large datasets
It requires careful selection of the kernel function and its hyperparameters
Struggles with very large datasets due to quadratic complexity
When would you use a Soft Margin SVM compared to standard Linear Regression?
Soft Margin SVM - Designed for Classification problems, especially when the data is not linearly separable
Linear Regression - Designed for Regression problems to pick continuous target values
How does a Decision Tree work?
What kind of Splitting Criterion is used to select the best features in a Classification Problem?
Gini Impurity
Entropy (Information Gain)
What kind of Splitting Criterion is used to select the best features in a Regression problem?
Use Mean Squared Error (MSE)
Variance Reduction
What are the advantages of using Decision Trees?
What are the disadvantages of using Decision Trees?
Greedy search at each node, which is computationally expensive
Overfitting as the tree goes deeper
What is a Random Forest?
A Random Forest is an ensemble learning method that combines multiple decision trees to improve performance.
What are the advantages of using Random Forest?
Reduces overfitting
Handles non-linear data well
Can be used to evaluate importance of features
Left out data can be used to estimate the error without needing a separate validation set