Dddm Flashcards

(64 cards)

1
Q

What is statistical learning?

A

A set of methods for estimating an unknown relationship between predictors and a response, used for prediction or inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between prediction and inference?

A

Prediction focuses on accurately predicting future outcomes, while inference focuses on understanding relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the bias–variance trade-off?

A

More flexible models reduce bias but increase variance; optimal performance balances both to minimize test error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is test error more important than training error?

A

Training error is optimistically biased, while test error reflects performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning uses labeled outcomes, while unsupervised learning identifies structure without known responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is linear regression unsuitable for classification?

A

It can produce predictions outside the [0,1] interval and does not model class probabilities correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does logistic regression model?

A

The log-odds of the probability that an observation belongs to a class as a linear function of predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are coefficients interpreted in logistic regression?

A

A one-unit increase in a predictor multiplies the odds by eβ, holding other variables constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a decision threshold in logistic regression?

A

A cutoff probability used to convert predicted probabilities into class labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is the choice of threshold important?

A

Different thresholds change the balance between false positives and false negatives and affect economic outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we use cross-validation?

A

To estimate test error and compare models when a separate test set is unavailable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between a validation set and k-fold cross-validation?

A

A validation set is simpler but noisier, while k-fold cross-validation is more stable and data-efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is leave-one-out cross-validation?

A

A form of cross-validation where each observation is used once as the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of the bootstrap?

A

To estimate the variability and uncertainty of model estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do decision trees work?

A

They recursively split the predictor space into regions that minimize prediction error within each region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a main advantage of decision trees?

A

They are easy to interpret and can model nonlinearities and interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a main weakness of single decision trees?

A

High variance, meaning small changes in data can lead to very different trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is bagging?

A

An ensemble method that averages predictions from trees trained on bootstrap samples to reduce variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do random forests differ from bagging?

A

Random forests add random feature selection at each split to reduce correlation between trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is boosting?

A

A sequential ensemble method that focuses on observations that were previously mispredicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is predictive analytics?

A

Techniques used to predict future outcomes based on historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is descriptive analytics?

A

Techniques used to summarize and describe patterns in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is data preprocessing important?

A

Poor data quality directly reduces model performance and decision quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why is model evaluation critical in business analytics?

A

High predictive accuracy does not necessarily imply high economic value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is profit-driven predictive analytics?
An approach where models are developed and evaluated based on economic value rather than accuracy.
26
What is a cost matrix?
A table assigning costs or profits to each possible classification outcome.
27
Why can accuracy be misleading in business problems?
Because it treats all errors equally despite different economic consequences.
28
What is cost-sensitive classification?
Classification methods that explicitly account for unequal misclassification costs.
29
What is average misclassification cost?
The expected cost per decision based on predicted outcomes and their associated costs.
30
Why tune the classification cutoff?
To maximize profit when misclassification costs are asymmetric.
31
How do ROC curves relate to profit-driven evaluation?
They show trade-offs between true and false positive rates that correspond to different profit levels.
32
What is the key idea of profit-driven model evaluation?
The best model is the one that maximizes business value, not predictive performance alone.
33
What is uncertainty in decision-making models?
Uncertainty arises when some parameters are unknown at the time decisions must be made and may vary randomly.
34
What is a here-and-now decision?
A decision made before the uncertain parameters are realized, typically based on forecasts or expected values.
35
What is a wait-and-see decision?
A decision made after the uncertainty is fully revealed, assuming perfect information.
36
What is the Value of Perfect Information (VoPI)?
The maximum additional value achievable if uncertain parameters were known before making the decision.
37
How is VoPI conceptually computed?
As the difference between the optimal wait-and-see solution and the optimal here-and-now solution.
38
What does VoPI measure in practice?
Whether improving forecasts or collecting more information is economically worthwhile.
39
Why does better forecasting not always improve decisions?
Because decisions often depend more on the structure of uncertainty than on precise point forecasts.
40
What is meant by uncertainty in constraints?
Situations where feasibility depends on random outcomes, not just the decision variables.
41
Why can an expected-value solution be problematic?
It may violate constraints for some realizations of uncertainty even if it is optimal in expectation.
42
What is a deterministic optimization model?
A model where all parameters are treated as known and fixed.
43
Why does deterministic optimization still matter in a stochastic world?
It provides clarity, tractability, interpretability, and a baseline for understanding trade-offs.
44
How do deterministic models support stochastic thinking?
They form the foundation upon which stochastic and robust models are built.
45
When are deterministic models often sufficient in practice?
When uncertainty is moderate, data is limited, or decisions are updated frequently.
46
Why are deterministic models useful for communication?
They are easier for non-experts to understand, discuss, and trust.
47
What is two-stage stochastic optimization?
A framework where decisions are made in two steps: before and after uncertainty is realized.
48
What is a first-stage decision?
A here-and-now decision made before uncertainty is revealed.
49
What is a second-stage or recourse decision?
A decision made after uncertainty is realized to adjust the initial plan.
50
What is meant by recourse?
The ability to adapt decisions after observing uncertain outcomes.
51
Why is two-stage modeling realistic?
Because many real-world decisions are sequential and cannot be fully decided in advance.
52
Give an example of a two-stage decision problem.
Facility location decided first, followed by distribution adjustments once demand is realized.
53
What is stochastic optimization with recourse?
Optimization that explicitly models the ability to correct decisions after uncertainty unfolds.
54
What is scenario generation?
The process of creating multiple plausible future outcomes to represent uncertainty.
55
Why are point forecasts often insufficient?
They show only the most likely outcome, not the range of possible outcomes.
56
What is a scenario in forecasting?
A possible realization of future values consistent with the model and its uncertainty.
57
How are forecasting errors used in scenario generation?
They are sampled to simulate realistic future paths around point forecasts.
58
What role does Monte Carlo simulation play?
It generates many scenarios by repeatedly sampling from error distributions.
59
Why is scenario generation useful for optimization?
It allows decisions to be evaluated across many possible futures.
60
What is RMSE used for in forecasting?
To measure the average magnitude of forecast errors.
61
What is exponential smoothing?
A forecasting method that weights recent observations more heavily than older ones.
62
What does the smoothing parameter control?
How quickly the model reacts to new information.
63
Why are scenarios important for decision quality?
Because good decisions perform well across many possible futures, not just the expected one.
64
What is the main takeaway across deterministic, stochastic, and scenario-based models?
Modeling should be incremental: start simple, understand trade-offs, and add uncertainty only when it improves decisions.