ML: Classification Flashcards

(39 cards)

1
Q

What is the confusion matrix and why is it important?

A

A table summarizing TP, TN, FP, FN. It underpins all classification metrics (precision, recall, F1, specificity) and reveals model behavior beyond accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is accuracy a misleading metric?

A

In imbalanced datasets (e.g., fraud, disease detection) where accuracy hides minority class failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between precision and recall?

A

Precision: Of predicted positives, how many are correct? TP/(TP + FP). High precision = model’s pos predictions are highly reliable. Spam filter. Helps avoid false alarms.

Recall: Of actual positives, how many did we catch? TP/(TP + FN). High recall = model catches most positive instances. Cancer/Disease prediction. Helps avoid missed detections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you choose between precision and recall?

A

Depends on business risk:

Precision priority → expensive FP (e.g., sending human reviewers)

Recall priority → expensive FN (e.g., medical diagnosis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is F1-score the geometric mean of precision and recall?

A

It penalizes extreme imbalance by requiring both precision and recall to be high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the ROC curve represent?

A

True Positive Rate vs. False Positive Rate trade-off at various decision thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does AUC measure?

A

The probability that the classifier ranks a random positive higher than a random negative. Equivalent to ranking quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is PR curve preferred over ROC?

A

Highly imbalanced datasets → PR curve is more sensitive to performance on the positive class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are ROC/AUC used in LLM agent evaluation?

A

Tool selection classification

Retrieval relevance classifier

Routing tasks (which LLM or tool to pick)

Safety classifiers (detect harmful intent)

AUC helps evaluate ranking quality of these decision layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why can’t logistic regression be solved with a closed-form solution?

A

The log-likelihood is concave but not quadratic → derivative doesn’t yield an algebraic solution → requires iterative optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do regularization terms affect logistic regression?

A

L1 → sparse features

L2 → stable coefficients, mitigates multicollinearity
Regularization improves generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to interpret logistic regression coefficients?

A

Exponentiating coefficients yields odds ratios; positive weights → increase log-odds of being class 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is logistic regression still important in LLM pipelines?

A

Lightweight safety and routing classifiers

Calibration layers for confidence

Reward model pre-steps

Linear probing on embeddings

Its interpretability makes it especially valuable in AI agent decision layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is Naive Bayes “naive”?

A

It assumes conditional independence between features given the class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why does Naive Bayes often perform surprisingly well?

A

Even with violated independence assumptions, the ranking of class probabilities often remains correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is Naive Bayes especially effective?

A

Text classification

High-dimensional sparse features

Real-time or low-latency applications

17
Q

What is Laplace smoothing and why is it used?

A

Adds pseudo-counts to avoid zero probabilities for unseen words/features.

18
Q

What is the intuition behind the SVM margin?

A

SVM maximizes the smallest distance between the decision boundary and any training point → robustness to noise.

19
Q

Why are kernel methods powerful?

A

They implicitly map data to high-dimensional spaces without explicitly computing those features (via kernel trick).

20
Q

When are SVMs not a good choice?

A

Very large datasets → slow training

Very large feature spaces without kernel approximation

When probabilistic outputs are needed (unless Platt scaling)

21
Q

What is the role of C in SVM?

A

Controls trade-off between maximizing margin and minimizing misclassification.

Large C → focus on correctness

Small C → focus on larger margin

22
Q

What is entropy in a classification tree?

A

A measure of uncertainty; lower entropy → purer nodes.

H(p)=−∑pi​ log pi​

23
Q

What is Gini impurity?

A

Equivalent impurity measure, computationally faster.

G=∑pi​(1−pi​)

24
Q

Why do decision trees overfit easily?

A

They split until leaves become pure, capturing noise. Pruning or limiting depth is needed.

25
How do you prevent overfitting in trees?
Limit depth Min samples per split/leaf Pruning Using ensembles (RF, boosting)
26
Why does randomization help random forests?
Two components reduce variance: Bootstrap sampling Random feature selection per split
27
Why do random forests have low bias and low variance?
Each tree is high variance, low bias → averaging reduces variance, retaining low bias.
28
What hyperparameters matter most?
max_depth n_estimators max_features min_samples_split/leaf
29
How do RFs handle imbalanced data?
Class weights Balanced bootstrapping Threshold moving SMOTE + RF
30
How does boosting differ from bagging?
Bagging: trains independent models in parallel Boosting: trains models sequentially, each correcting predecessor’s errors
31
Why can boosting overfit less than expected?
Boosting focuses on fitting hard examples, but uses shrinkage, depth limits, and regularization, preventing runaway fitting.
32
Why is XGBoost so effective?
Second-order optimization Regularization Built-in handling of missing values Histogram-based splits Parallelization Tree pruning with “loss-guided growth”
33
How do you fix overfitting in boosting?
Increase regularization (λ, α) Reduce tree depth Reduce learning rate Increase min child weight Decrease number of estimators
34
Why do even sophisticated LLM agents rely heavily on classical classifiers?
Agents need deterministic, debuggable, low-latency decision nodes: Safety classifiers Intent detection Routing (choose a model/tool) Detect hallucinations Flagging harmful content Traditional ML models excel here.
35
How is boosting used in LLM / agent systems?
Ranking models (LightGBM/XGBoost) for retrieval re-ranking Safety classification Tool routing based on embedding features Reward modeling features Boosting is widely used because it's interpretable, fast, and structured-data-friendly.
36
How is classification used in RAG systems?
Relevance classification Chunk ranking with boosting models Hallucination detection Query type classification Document quality scoring
37
Why is AUC valuable in retrieval scoring?
Because ranking matters more than raw classification—AUC reflects the correctness of ordering relevant vs irrelevant chunks.
38
How are SVMs/logistic regression used in embedding space?
Linear classifiers on embeddings for: Topic classification Content moderation Multi-label tags Safety filters Weak supervision of agent behavior
39
The agent frequently chooses the wrong tool. What ML approach helps?
Use a structured classifier (logistic regression or boosted trees) on features like: last agent step intent class embedding of user query historical success rate This improves deterministic routing.