What type of machine learning algorithm is logistic regression?
It is a supervised machine learning algorithm used for classification.
What is the primary output of a logistic regression model?
It produces probabilities over one or multiple classes, which are used to categorize data points.
For what common task is logistic regression often used as a baseline model?
It is often used to create a baseline for classification tasks and to interpret the effect of input variables on the output.
What kind of input data does logistic regression require?
A dataset where features are continuous or categorical, but must be represented numerically.
Identify two common use cases for logistic regression.
Credit scoring, medical diagnosis, and spam detection are common use cases.
What are the three core concepts a junior data scientist should master for logistic regression?
Loss function, gradient descent, and basic evaluation methods like accuracy, precision, and recall.
What is the first step in the process of training a logistic regression model?
Initialize the weights (according to the number of input features) and a bias term to random numerical values.
Which function is minimized during the training of a logistic regression model?
The cross-entropy loss function is minimized on the training set.
The process of minimizing the loss function in logistic regression involves multiple iterations of _____.
gradient descent
What are the gradients of the cross-entropy loss with respect to the weights and bias used for?
They are used to perform the gradient descent update step.
Name one common stopping criterion for the training process of a logistic regression model.
A fixed number of epochs, convergence of the loss function, or maximum allowed training time.
What activation function is typically used in logistic regression for binary classification?
The sigmoid function.
What activation function is typically used in logistic regression for multiclass classification?
The softmax function.
How do linear regression and logistic regression differ in their loss functions?
Linear regression minimizes Mean Squared Error (MSE), while logistic regression minimizes Cross-Entropy Loss (log loss).
What is the purpose of the vectorized equation $z = w^T x + b$ in logistic regression?
It computes the linear combination of input features and weights, plus a bias term, before applying the activation function.
What does the cross-entropy loss function measure?
It measures the difference between predicted probabilities and actual labels across all training examples.
In the binary cross-entropy loss graph, what happens to the loss as the predicted probability approaches 0 when the true label is 1?
The loss approaches infinity.
In the binary cross-entropy loss graph, what happens to the loss as the predicted probability approaches 0 when the true label is 0?
The loss decreases, approaching 0.
Why is the convexity of the binary cross-entropy loss function advantageous for logistic regression?
It ensures that gradient descent can find a single global minimum.
Minimizing the binary cross-entropy loss is theoretically equivalent to maximizing what?
It is equivalent to maximizing the likelihood of the observed data.
What do the equations for $dL/dW$ and $dL/db$ calculate in logistic regression?
They calculate the gradients of the loss with respect to the weights and bias, used for the backward pass.
What is the formula for the gradient descent update step for the weights in logistic regression?
$w_n = W - \alpha \cdot \frac{dL}{dW}$
What is the purpose of the sigmoid function in binary logistic regression?
It transforms the unbounded linear output (logits) into a probability between 0 and 1.
What is the output of the sigmoid function for an input of 0?
The output is 0.5, representing maximum uncertainty.