What is the confusion matrix and why is it important?
A table summarizing TP, TN, FP, FN. It underpins all classification metrics (precision, recall, F1, specificity) and reveals model behavior beyond accuracy.
When is accuracy a misleading metric?
In imbalanced datasets (e.g., fraud, disease detection) where accuracy hides minority class failures.
What is the difference between precision and recall?
Precision: Of predicted positives, how many are correct? TP/(TP + FP). High precision = model’s pos predictions are highly reliable. Spam filter. Helps avoid false alarms.
Recall: Of actual positives, how many did we catch? TP/(TP + FN). High recall = model catches most positive instances. Cancer/Disease prediction. Helps avoid missed detections.
How do you choose between precision and recall?
Depends on business risk:
Precision priority → expensive FP (e.g., sending human reviewers)
Recall priority → expensive FN (e.g., medical diagnosis)
Why is F1-score the geometric mean of precision and recall?
It penalizes extreme imbalance by requiring both precision and recall to be high.
What does the ROC curve represent?
True Positive Rate vs. False Positive Rate trade-off at various decision thresholds.
What does AUC measure?
The probability that the classifier ranks a random positive higher than a random negative. Equivalent to ranking quality.
When is PR curve preferred over ROC?
Highly imbalanced datasets → PR curve is more sensitive to performance on the positive class.
How are ROC/AUC used in LLM agent evaluation?
Tool selection classification
Retrieval relevance classifier
Routing tasks (which LLM or tool to pick)
Safety classifiers (detect harmful intent)
AUC helps evaluate ranking quality of these decision layers.
Why can’t logistic regression be solved with a closed-form solution?
The log-likelihood is concave but not quadratic → derivative doesn’t yield an algebraic solution → requires iterative optimization.
How do regularization terms affect logistic regression?
L1 → sparse features
L2 → stable coefficients, mitigates multicollinearity
Regularization improves generalization.
How to interpret logistic regression coefficients?
Exponentiating coefficients yields odds ratios; positive weights → increase log-odds of being class 1.
Why is logistic regression still important in LLM pipelines?
Lightweight safety and routing classifiers
Calibration layers for confidence
Reward model pre-steps
Linear probing on embeddings
Its interpretability makes it especially valuable in AI agent decision layers.
Why is Naive Bayes “naive”?
It assumes conditional independence between features given the class.
Why does Naive Bayes often perform surprisingly well?
Even with violated independence assumptions, the ranking of class probabilities often remains correct.
When is Naive Bayes especially effective?
Text classification
High-dimensional sparse features
Real-time or low-latency applications
What is Laplace smoothing and why is it used?
Adds pseudo-counts to avoid zero probabilities for unseen words/features.
What is the intuition behind the SVM margin?
SVM maximizes the smallest distance between the decision boundary and any training point → robustness to noise.
Why are kernel methods powerful?
They implicitly map data to high-dimensional spaces without explicitly computing those features (via kernel trick).
When are SVMs not a good choice?
Very large datasets → slow training
Very large feature spaces without kernel approximation
When probabilistic outputs are needed (unless Platt scaling)
What is the role of C in SVM?
Controls trade-off between maximizing margin and minimizing misclassification.
Large C → focus on correctness
Small C → focus on larger margin
What is entropy in a classification tree?
A measure of uncertainty; lower entropy → purer nodes.
H(p)=−∑pi log pi
What is Gini impurity?
Equivalent impurity measure, computationally faster.
G=∑pi(1−pi)
Why do decision trees overfit easily?
They split until leaves become pure, capturing noise. Pruning or limiting depth is needed.