F.4. Analytical Models and Simulation Flashcards

Learn tools such as regression, time series analysis, sensitivity testing, and Monte Carlo simulation. (45 cards)

1
Q

What are the four types of data analytics?

A
  • Descriptive analytics
  • Diagnostic analytics
  • Predictive analytics
  • Prescriptive analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What question does descriptive analytics answer?

A

What happened?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of diagnostic analytics?

A

To answer the question, ‘Why did it happen?’ by analyzing historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the focus of predictive analytics?

A

It focuses on the future using correlative analysis to answer the question, ‘What is likely to happen?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What question does prescriptive analytics answer?

A

What needs to happen?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a model in data analytics?

A

A representation of a relationship between variables in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is statistical modeling?

A

The process of using statistical analysis on a dataset, usually a sample drawn from a population, to identify patterns and infer something about the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is linear regression analysis used for?

A

To develop a mathematical equation modeling the extent to which one variable (called the dependent or response variable) has historically been affected by one or more other variables (called the independent or predictor variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a time series?

A

A sequence of measurements of the same variable taken at equally spaced, ordered points in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the four patterns in time series analysis?

A
  • Trend pattern
  • Cyclical pattern
  • Seasonal pattern
  • Irregular pattern
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a trend pattern in time series analysis?

A

The historical data exhibits a gradual shifting to a higher or lower level over time, useful for making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a cyclical pattern in time series analysis?

A

A cyclical pattern exhibits fluctuations that last longer than one year, often due to the cyclical nature of the economy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a seasonal pattern in time series analysis?

A

Variability in a time series within a period such as a year due to seasonal influences, identified by regularly spaced peaks and troughs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an irregular pattern in a time series?

A

Random variations not repeating in any regular pattern, caused by short-term, non-recurring factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation of a simple linear regression line?

A

ŷ = a + bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the “a” represent in the linear regression equation
ŷ = a + bx?

A

The constant coefficient, or the y-intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the “b” represent in the linear regression equation
ŷ = a + bx?

A

The variable coefficient, or the slope of the line.

The slope of the line is the amount of change in the predicted value of the dependent variable for each unit of increase in the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the purpose of the regression line in time series analysis?

A

To make forecasts using historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the purpose of linear regression analysis?

A

To calculate the location of the regression line mathematically and predict the value of y for any given value of x if the independent variable serves well as the predictor variable.

Linear regression minimizes the sum or mean of the squares of the residuals using the least squares method.

20
Q

What must be determined before using linear regression analysis to develop a prediction?

A

Whether the dependent variable, y, has a linear relationship with the independent variable, x.

A scatterplot can be used to determine if a linear relationship exists between the variables.

21
Q

What is correlation analysis used for in simple linear regression analysis?

A

To understand the relationship or absence of a relationship between the independent variable and the dependent variable and to determine the strength of the linear relationship between the two variables.

22
Q

What does a positive correlation coefficient indicate about a regression?

A

Increasing measurements of the independent x-variable tend to be associated with increasing measurements of the dependent y-variable; and decreasing measurements of x tend to be associated with decreasing measurements of y.

23
Q

What does a negative correlation coefficient indicate about a regression?

A

Increasing measurements of the independent x-variable tend to be associated with decreasing measurements of the dependent y-variable and vice versa.

24
Q

What is the correlation coefficient (r) in simple linear regression and what does it measure?

A

A number between −1 and +1 that measures the strength and direction of the relationship between the independent and dependent variables.

A number close to either +1 or −1 means the two variables are correlated, and if there is a cause-and-effect relationship, a simple linear regression could be useful for forecasting.

A correlation coefficient of +1 indicates a perfectly positive linear relationship, while −1 indicates a perfectly negative linear relationship.

25
What does the coefficient of determination (r²) of a regression analysis represent?
The percentage of the total variation in the dependent variable (y) that can be explained by variations in the independent variable (x).
26
What does a high coefficient of determination (r²) indicate?
The coefficient of determination is a number between 1 and 0. The closer the coefficient is to 1, the greater is the percentage of variation in the dependent variable that is explained by variations in the independent variable. When the coefficient of determination is high (near 1), the data points lie close to the regression line, indicating better predictive ability of the regression.
27
What is the standard error of the estimate (SEE) of a regression analysis?
A measure of the average distance of the actual observed data points from the regression line, indicating how well the regression model explains the relationship between the independent variable and the dependent variable and thus its predictive ability.
28
What does the standard error of the estimate (SEE) measure for a linear regression of a time series, when the actual observed values of y constitute the whole population?
The standard error of the estimate (SEE) for a linear regression of a time series, when the actual observed values of y constitute the whole population, is a measure of the average distance of the actual observed data points from the regression line. It gives an indication of how well the regression model explains the relationship between the independent variable and the dependent variable and thus its predictive ability.
29
What is the standard form of the simple linear regression equation when the correlation is positive?
ŷ = a + bx ## Footnote The constant coefficient "a" represents the y-intercept, and "b" represents the slope of the regression line.
30
What is multiple regression analysis used for?
To forecast a dependent variable by using more than one independent variable, usually for causal forecasting. ## Footnote To use causal forecasting, there must be a reasonable basis to assume a cause-and-effect relationship between the independent variables and the dependent variable.
31
What is the purpose of the coefficient of determination, R², in multiple regression analysis?
To evaluate the reliability of the whole regression, including all the independent variables used. ## Footnote The higher the R², the better the model explains the relationships between the variables.
32
What does a confidence interval describe in regression analysis?
A confidence interval describes the amount of uncertainty in an estimate derived from sample data. It consists of a range of values, calculated from the sample, within which the true population parameter is expected to fall with a specified probability (the confidence level). ## Footnote A 95% confidence level means that if the same sampling procedure were repeated many times, approximately 95% of the confidence intervals constructed from those samples would contain the true population parameter.
33
What is logistic regression used for?
To make a prediction when the need is to make a yes or no prediction, called a binary outcome. ## Footnote Logistic regression is used to predict the probability of a binary (yes or no) event occurring based on various identified factors. That probability can then be used to classify outcomes (for example, by deciding that probabilities above 50% are classified as “yes”).
34
What is the term "goodness of fit" used to describe?
It indicates how well a model fits the sample data and explains the relationship between the variables. ## Footnote While a model with good fit to the sample data may be useful for prediction, goodness of fit alone does not guarantee the model will generalize well to predict future values.
35
What is underfitting in the context of fitting an analytic model to data?
Underfitting occurs when the model is too simple to capture the true patterns in the data. The model does not represent the actual relationships, and it does not work well on either training data or test data.
36
What is overfitting in the context of fitting an analytic model to data?
Overfitting means the model shows high goodness of fit on training data but low goodness of fit on test data. The model has learned not only the features of the historical data used for training the model, but it has also learned the random fluctuations, or “noise,” within the data. ## Footnote Overfitting can also occur if a model is too complex, such as a multiple regression that contains too many predictor variables.
37
What are the benefits of regression analysis?
* Objective * Provides specific results from a given data set * A tool for drawing insights, making recommendations, and decision-making
38
What are the limitations of regression analysis?
* Requires historical data * Historical data may not be valid for making predictions if conditions have changed * Results depend on the choice of predictor variables * Statistical relationships may be valid only for the range of data in the sample
39
What is sensitivity analysis used for?
To determine how much the prediction of a model will change if one input to the model is changed. ## Footnote Sensitivity analysis is known as 'what-if' analysis and helps identify the most important input parameters for achieving accurate predictions.
40
What is Monte Carlo simulation analysis?
A method used to find solutions to mathematical problems involving changes to multiple variables at the same time using repeated random sampling. It can develop probabilities of various scenarios coming to pass that can be used to compute a predicted result.
41
How can data analytics improve data quality?
The process of cleaning the data preparatory to processing it can detect errors, duplicate information, and missing values. If the errors and duplicate information can be corrected and the missing values supplied, the data quality can be improved.
42
What is a benefit of data analytics in terms of fraud prevention?
Data analytics can help reduce fraud losses by recognizing potentially fraudulent transactions and flagging them for investigation.
43
# True or False: When two variables are correlated, one variable must have caused the other.
False ## Footnote Correlation does not prove causation, and both variables could be caused by a third, unidentified factor.
44
What are two risks associated with using Big Data?
* Data breaches * Customer privacy issues
45
What is a potential drawback of using easy-to-use data analytics tools?
Use by those without a background in data science can lead to data inconsistency and poor decisions.