What are the differences between supervised and unsupervised learning?
Supervised Learning:
- Uses known and labeled data as input
- Supervised learning has a feedback mechanism
- The most commonly used supervised learning algorithms are decision trees, logistic regression, and support vector machine
Unsupervised Learning:
- Uses unlabeled data as input
- Unsupervised learning has no feedback mechanism
- The most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, and apriori algorithm
How is logistic regression done?
Logistic regression measures the relationship between the dependent variable (our label of what we want to predict) and one or more independent variables (our features) by estimating probability using its underlying logistic function (sigmoid).
p(x) = 1 / (1 + exp(- \theta^T * x)) [w/ Bernoulli likelihood]
Sigmoid:
S(x) = 1 / (1 + exp(-x))
S’(x) = S(x) * (1 - S(x))
Sigmoid generalisation: Softmax function
How is logistic regression done? [ICL notes]
ICL notes:
- Binary classification problems
- Linear model with non-Gaussian likelihood
- Implicit modeling assumptions
- Parameter estimation (MLE, MAP) no longer in closed form
- Bayesian logistic regression with Laplace approximation of the
posterior
Explain the steps in making a decision tree
Bias vs Variance
What are the three commonly used methods for finding the sweet spot between a simple and complicated model?
How does Random Forest work?
How does AdaBoost work?
How does GradientBoost work?
How does LightGBM work?
How do you build a random forest model?
Steps to build a random forest model:
(1) Randomly select ‘k’ features from a total of ‘m’ features where k «_space;m
(2) Among the ‘k’ features, calculate the node D using the best split point
(3) Split the node into daughter nodes using the best split
(4) Repeat steps two and three until leaf nodes are finalised
(5) Build forest by repeating steps one to four for ‘n’ times to create ‘n’ number of trees
How can you avoid overfitting your model?
Overfitting refers to a model that is only set for a very small amount of data and ignores the bigger picture. There are three main methods to avoid overfitting:
(1) Keep the model simple—take fewer variables into account, thereby removing some of the noise in the training data
(2) Use cross-validation techniques, such as k folds cross-validation
(3) Use regularisation techniques, such as LASSO, that penalise certain model parameters if they’re likely to cause overfitting
[4] Bagging, Boosting
Differentiate between univariate, bivariate, and multivariate analysis
Univariate data contains only one variable. The purpose of the univariate analysis is to describe the data and find patterns that exist within it.
Example height of students: The patterns can be studied by drawing conclusions using mean, median, mode, dispersion or range, minimum, maximum, etc.
Bivariate data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to determine the relationship between the two variables.
Example temperature and ice cream sales in the summer season: the relationship is visible from the table that temperature and sales are directly proportional to each other. The hotter the temperature, the better the sales.
Multivariate data involves three or more variables, it is categorised under multivariate. It is similar to a bivariate but contains more than one dependent variable.
Example data for house price prediction: patterns can be studied by drawing conclusions using mean, median, and mode, dispersion or range, minimum, maximum, etc. You can start describing the data and using it to guess what the price of the house will be.
There are two main methods for feature selection, i.e, filter, and wrapper methods.
Filter Methods:
- Linear discriminant analysis (LDA)
- PCA [X^TX feature vs feature, added myself]
- ANOVA [analysis of variance, SST = SSB + SSW]
- Chi-Square [test for mutually independent features, e.g. reject if significance level is below 5%]
Wrapper Methods:
- Forward Selection: We test one feature at a time and keep adding them until we get a good fit
- Backward Selection: We test all the features and start removing them to see what works better
- Recursive Feature Elimination: Recursively looks through all the different features and how they pair together
Note: Wrapper methods are very labor-intensive, and high-end computers are needed if a lot of data analysis is performed with the wrapper method.
Describe ANOVA
ANOVA = analysis of variance
SST = SSB + SSW,
where
SST = sum of squares total
SSB = … between
SSW = … within
X \in R^(mxn)
F-statistic = (SSB/(m-1)) / (SSW/(m*(n-1))»_space; 1 => highly different features
Describe Linear Discriminant Analysis
d^2/(s_1^2 + s_2^2) = ideally large/small,
where, s_i is the sample std
(1) Maximise the distance between means
(2) Minimise the variation (scatter) within each category
LDA vs PCA:
- both try to reduce dimensions
- PCA looks at the features with the most variation, hence X^TX
- LDA tries to maximise the separation of categories and minimises variation