Logistic Regression Flashcards by Abhishek Verma

What type of machine learning algorithm is logistic regression?

It is a supervised machine learning algorithm used for classification.

How well did you know this?

Not at all

Perfectly

What is the primary output of a logistic regression model?

It produces probabilities over one or multiple classes, which are used to categorize data points.

How well did you know this?

Not at all

Perfectly

For what common task is logistic regression often used as a baseline model?

It is often used to create a baseline for classification tasks and to interpret the effect of input variables on the output.

How well did you know this?

Not at all

Perfectly

What kind of input data does logistic regression require?

A dataset where features are continuous or categorical, but must be represented numerically.

How well did you know this?

Not at all

Perfectly

Identify two common use cases for logistic regression.

Credit scoring, medical diagnosis, and spam detection are common use cases.

How well did you know this?

Not at all

Perfectly

What are the three core concepts a junior data scientist should master for logistic regression?

Loss function, gradient descent, and basic evaluation methods like accuracy, precision, and recall.

How well did you know this?

Not at all

Perfectly

What is the first step in the process of training a logistic regression model?

Initialize the weights (according to the number of input features) and a bias term to random numerical values.

How well did you know this?

Not at all

Perfectly

Which function is minimized during the training of a logistic regression model?

The cross-entropy loss function is minimized on the training set.

How well did you know this?

Not at all

Perfectly

The process of minimizing the loss function in logistic regression involves multiple iterations of _____.

gradient descent

How well did you know this?

Not at all

Perfectly

What are the gradients of the cross-entropy loss with respect to the weights and bias used for?

They are used to perform the gradient descent update step.

How well did you know this?

Not at all

Perfectly

Name one common stopping criterion for the training process of a logistic regression model.

A fixed number of epochs, convergence of the loss function, or maximum allowed training time.

How well did you know this?

Not at all

Perfectly

What activation function is typically used in logistic regression for binary classification?

The sigmoid function.

How well did you know this?

Not at all

Perfectly

What activation function is typically used in logistic regression for multiclass classification?

The softmax function.

How well did you know this?

Not at all

Perfectly

How do linear regression and logistic regression differ in their loss functions?

Linear regression minimizes Mean Squared Error (MSE), while logistic regression minimizes Cross-Entropy Loss (log loss).

How well did you know this?

Not at all

Perfectly

What is the purpose of the vectorized equation $z = w^T x + b$ in logistic regression?

It computes the linear combination of input features and weights, plus a bias term, before applying the activation function.

How well did you know this?

Not at all

Perfectly

What does the cross-entropy loss function measure?

It measures the difference between predicted probabilities and actual labels across all training examples.

How well did you know this?

Not at all

Perfectly

In the binary cross-entropy loss graph, what happens to the loss as the predicted probability approaches 0 when the true label is 1?

The loss approaches infinity.

How well did you know this?

Not at all

Perfectly

In the binary cross-entropy loss graph, what happens to the loss as the predicted probability approaches 0 when the true label is 0?

The loss decreases, approaching 0.

How well did you know this?

Not at all

Perfectly

Why is the convexity of the binary cross-entropy loss function advantageous for logistic regression?

It ensures that gradient descent can find a single global minimum.

How well did you know this?

Not at all

Perfectly

Minimizing the binary cross-entropy loss is theoretically equivalent to maximizing what?

It is equivalent to maximizing the likelihood of the observed data.

How well did you know this?

Not at all

Perfectly

What do the equations for $dL/dW$ and $dL/db$ calculate in logistic regression?

Study These Flashcards

They calculate the gradients of the loss with respect to the weights and bias, used for the backward pass.

What is the formula for the gradient descent update step for the weights in logistic regression?

Study These Flashcards

$w_n = W - \alpha \cdot \frac{dL}{dW}$

What is the purpose of the sigmoid function in binary logistic regression?

Study These Flashcards

It transforms the unbounded linear output (logits) into a probability between 0 and 1.

What is the output of the sigmoid function for an input of 0?

Study These Flashcards

The output is 0.5, representing maximum uncertainty.

What is the formula for the sigmoid activation function?

$\sigma(z) = \frac{1}{1 + e^{-z}}$

The softmax function is used for _____ classification, while the sigmoid function is used for _____ classification.

multiclass, binary

What is a key property of the probabilities output by the softmax function?

The sum of all the output probabilities is equal to 1.

What is a major limitation of logistic regression that can be mitigated by L1/L2 regularization?

Overfitting.

How does regularization help mitigate overfitting in logistic regression?

It adds a weight penalty to the loss term, encouraging smaller weights and thus reducing variance.

What key assumption does logistic regression make about the relationship between inputs and the log odds of the prediction?

It assumes a linear relationship.

Name an alternative algorithm to use when the assumptions of logistic regression, like linearity, are not met.

Decision trees or ensemble algorithms like Random Forest.

How can sensitivity to imbalanced training data be addressed in logistic regression?

By undersampling the majority class, oversampling the minority class, or applying class weights during training.

In the `scikit-learn` library, what class is used to implement a logistic regression model?

The `LogisticRegression` class from `sklearn.linear_model`.

In `scikit-learn`, which method is called on a `LogisticRegression` object to train it with data?

The `.fit(X_train, y_train)` method.

How are the coefficients in logistic regression interpreted?

They indicate the change in the log-odds of the outcome for a one-unit increase in the predictor variable.

A _____ coefficient in logistic regression increases the likelihood of the outcome.

positive

Why is accuracy a poor evaluation metric for a spam email classifier where most emails are not spam?

A model can achieve high accuracy by simply classifying all emails as non-spam, giving a false sense of good performance.

What three metrics are better suited than accuracy for evaluating a spam classifier with imbalanced data?

Precision, recall, and the F1 score.

What is the relationship between minimizing the cross-entropy loss function and the maximum likelihood principle?

Minimizing the cross-entropy loss is equivalent to maximizing the log-likelihood of the model's probability distribution given the data.

What is an advantage of using minibatch stochastic gradient descent (SGD) regarding memory usage?

Smaller batches of a large training set can more easily fit into CPU/GPU memory.

What is the effect of using a smaller batch size in minibatch SGD?

Using smaller batch sizes has a regularization effect and can help the model generalize better to unseen data.

What is the effect of using a larger batch size in minibatch SGD?

Larger batch sizes provide more accurate gradient estimates and may result in faster convergence.

What is the name of the function represented by the formula $L = -\frac{1}{m} \sum_{i=1}^{m} (y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i))$?

The Cross-Entropy Loss function (or log loss).

What role does the learning rate, denoted by $\alpha$, play in the gradient descent step equation?

It controls the size of the step taken to update the weights and bias in the direction that minimizes the loss function.

In the context of logistic regression, what is the 'forward pass'?

It is the process of making predictions by computing the linear combination of inputs and passing it through an activation function.

In the context of logistic regression, what is the 'backward pass'?

It is the process of calculating the gradients of the loss function with respect to the model's parameters (weights and bias).

What is the purpose of the `predict(X, W, b)` function in the provided Python code example?

It makes predictions by taking input features, weights, and a bias, computing their dot product, and applying the sigmoid function.

What does the `binary_cross_entropy(y_pred, y)` function compute in the Python code?

It computes the binary cross-entropy loss between the predicted probabilities and the ground truth labels.

The training loop in the provided Python code iterates over the data in batches. This implementation demonstrates _____.

minibatch stochastic gradient descent

What does logistic regression assume about multicollinearity among input features?

It assumes the absence of high multicollinearity between the independent variables.

Logistic Regression Flashcards

(50 cards)