What is variance and how does it affect model performance?
Key point: Variance is central to the bias-variance tradeoff:
> High variance → overfitting; High bias → underfitting. Random Forest reduces variance via averaging multiple trees.
What is standard deviation and how is it related to variance? How it help in ML
Key point: SD and variance are fundamental for scaling, normalization, and understanding model input distribution.
What is covariance, and how is it different from correlation?
Covariance: Measures how two variables vary together.
Cov(X, Y) = (1/n) * Σ[(X_i - mean(X)) * (Y_i - mean(Y))]
Correlation: Normalized covariance, ranges [-1, 1]
r = Cov(X, Y) / (SD(X) * SD(Y))
Key point: Covariance shows joint variability; correlation shows strength and direction of linear relationship.
What is a probability distribution, and why is it important in ML?
Key point: Understanding the data distribution is essential for choosing models, preprocessing, and evaluating uncertainty.
What is the difference between a population and a sample?
Why it matters in ML:
Key point: Understanding the difference helps in estimating errors, confidence intervals, and generalization.
Define bias and variance in machine learning.
Bias-Variance Tradeoff:
Key point: Random Forest reduces variance via averaging; deep linear models may reduce bias.
What is skewness and kurtosis in data?
Importance in ML:
Key point: Skewness and kurtosis help understand distribution shape, outliers, and preprocessing needs.
What is a z-score, and why is it useful?
z = (x - mean) / SD
Key point: Z-scores normalize data for better model performance and comparability.
Key point: CLT explains why averages of random samples tend to be predictable and normally distributed, which is very useful for ML statistics.
What is the Central Limit Theorem (CLT) and why is it important in ML?
Key point: CLT enables ML practitioners to apply probabilistic reasoning and statistical tests even on non-normal data, as long as sample size is large.
What is outlier, and why is it important in ML?
Key point: Identifying and handling outliers improves model accuracy and robustness.
Why does applying log or other transformations not change the relationship between features and target?
Key point: Transformations improve model fit and reduce skewness without reversing or breaking relationships between variables.
What are residuals in regression, and why are they important?
residual = actual_value - predicted_value
What are the assumptions of linear regression?
Key point: Violating these assumptions can lead to biased, inefficient, or invalid estimates, so diagnostic checks and preprocessing are essential.
Why is multicollinearity a problem in linear regression? Give an example.
house_size (m²)number_of_roomsKey point: Multicollinearity does not reduce predictive power much for tree-based models, but for linear regression it destabilizes coefficients and interpretation.
You notice two features in your dataset are highly correlated. Why is this a problem for linear regression, and how can you fix it?
Key point: Handling multicollinearity improves model stability, interpretability, and prediction accuracy.
What is feature scaling, and why is it important in ML?
X_scaled = (X - X_min) / (X_max - X_min)
X_scaled = (X - mean(X)) / SD(X)
Key point: Scaling ensures features contribute equally to model learning and improves training efficiency.
Why are some algorithms (gradient descent, k-NN, SVM, neural networks) sensitive to feature scale?
Key point: Feature scaling prevents domination by large-scale features, ensures balanced learning, and improves model convergence.
What’s the difference between normalization and standardization, and when should we use each?
x' = (x - x_min) / (x_max - x_min)
x' = (x - mean) / standard_deviation
Example:
If height ranges from 150–200 cm and weight from 40–120 kg:
Which of the following best describes the variance of a dataset?
A) The average of all data points
B) The square root of the mean
C) The average of the squared differences from the mean
D) The difference between the maximum and minimum values
C
Variance measures how much the data points spread out from the mean. It is calculated as the average of the squared differences between each data point and the mean.
Question (True/False):
In machine learning, a high correlation between two features always implies that one feature causes the other.
False
True correlation does not imply causation.
Two features can be highly correlated due to a third hidden factor, coincidence, or data bias.
In ML, high correlation between features may indicate redundancy, which could affect model performance (e.g., multicollinearity in linear models).
The central limit theorem states that, given a sufficiently large sample size, the sampling distribution of the sample mean will be approximately ________, regardless of the population’s distribution.
The sampling distribution of the sample mean will be approximately normally distributed, regardless of the population’s distribution.
Explanation:
Which of the following is an example of a probability distribution commonly used in machine learning?
A) Confusion matrix
B) Normal distribution
C) Decision tree
D) Gradient descent
B
Normal distribution is a continuous probability distribution widely used in ML for modeling data, assumptions in algorithms (like linear regression), and initializing weights in neural networks.
Other common distributions: Bernoulli, Binomial, Poisson, Exponential.
p-value meaning, why low p-value can reject null hypothesis (H0)
Analogy: