Module 10 - Probability and Statistics Flashcards

(115 cards)

1
Q

What is the mathematical study of the future that measures the chance of an event called?

A

Probability

Probability calculates the likelihood that an event may occur and assesses which outcomes are possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main areas of statistics?

A
  • Descriptive statistics
  • Inferential statistics

Descriptive statistics summarize outcomes, while inferential statistics test probabilistic models and draw conclusions about populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The mean in probability is calculated by summing all possible values of X multiplied by their probabilities. What is the formula?

A

E(X) = ∑ x_i p(x_i)

This formula is used to calculate the expected value in discrete cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false: Statistics is used to describe historical outcomes and to determine underlying probability models.

A

TRUE

Statistics helps in understanding past data to make predictions about future events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of random variables?

A
  • Discrete random variables
  • Continuous random variables

Discrete variables take on specific values, while continuous variables can take any value within a range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a Probability Density Function (PDF) define?

A

A distribution for continuous random variables

The PDF represents probabilities as areas under the curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the relationship between Cumulative Distribution Function (CDF) and Probability Density Function (PDF)?

A

CDF is the integral of PDF; PDF is the derivative of CDF

The CDF plots the probability that a random variable will take on a value less than a specified value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the bell curve commonly associated with?

A

Normal distribution

The bell curve depicts the frequently-occurring normal distribution in statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two sides of the same coin in probability and statistics?

A
  • Probability: modeling future outcomes
  • Statistics: making inferences about past outcomes

Both concepts are essential for understanding data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a population in statistical terms?

A

The set of all possible members of a specifically defined group

For example, all Navy aircraft designed after WWII can define a population for cost estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a sample in statistics?

A

A subset of the population

Sample data is used to conduct analysis when the entire population cannot be gathered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between parametric and non-parametric statistics?

A
  • Parametric: makes assumptions about underlying distributions
  • Non-parametric: makes few or no assumptions

This module focuses mainly on parametric statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the significance level in hypothesis testing?

A

The tolerance for error

It determines how confident one can be in the results of the statistical test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the central tendency in statistics?

A

The middle or expected locations of distributions

It helps understand the center of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the key ideas of this module?

A
  • Probability and statistics are two sides of the same coin
  • Distributions are described by their mean and standard deviation
  • The bell curve depicts the normal distribution

These ideas form the foundation for understanding cost analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the role of probability and statistics in cost estimating?

A
  • Predict future outcomes
  • Analyze past data
  • Model uncertainty

They are crucial for making informed cost estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a random variable?

A

A variable that cannot be fully controlled or accurately predicted

It represents outcomes in the sample space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the importance of sample statistics?

A

They estimate population parameters

Analysts strive to get larger and better samples to approximate the true population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the formula for calculating the mean of a discrete random variable?

A

E(X)=∑ x_{i}p(x_{i})

The mean is calculated by summing all possible values of X multiplied by the probability of that sum occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When rolling two dice, what is the probability of obtaining a sum of 2?

A

1/36

This is because both dice must show a 1, which has a probability of (1/6)*(1/6).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In the continuous case, how is the mean calculated?

A

E(X)=∫ x p(x) d x = μ

The integral is used to find the mean over an infinite number of possibilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the definition of the median in a data set?

A

The middle data point where half the data points are lower and half are higher

The median is not affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is the median calculated when there is an even number of data points?

A

Average the two middle values

This ensures that the median accurately represents the center of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

True or false: The median of a normal distribution is equal to its mean.

A

TRUE

This holds true for symmetric distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does it indicate if the **mean** is greater than the **median**?
The distribution is skewed right ## Footnote This means the tail of the distribution stretches to the right.
26
What is the **mode** of a distribution?
The most frequently occurring value ## Footnote The mode is the least used measure of central tendency.
27
What is the formula for the **variance** of a discrete distribution?
Var(X)=E((X−μ)²)=∑(x_{i}−μ)²p(x_{i}) ## Footnote Variance measures how spread out the data is around the mean.
28
What does the **Coefficient of Variation (CV)** measure?
CV=σ/μ ## Footnote It expresses the standard deviation as a percent of the mean.
29
What is the defining characteristic of a **normal distribution**?
Symmetric about the mean and bell-shaped ## Footnote The normal distribution occurs in many natural phenomena.
30
What are the two defining parameters of a normal distribution?
* Mean (μ) * Standard deviation (σ) ## Footnote These parameters determine the shape and spread of the distribution.
31
What is the notation for a random variable that is normally distributed?
X∼N(μ,σ²) ## Footnote This notation indicates that X follows a normal distribution with mean μ and variance σ².
32
What is a **standard normal distribution**?
X∼N(0,1) ## Footnote It has a mean of 0 and a standard deviation of 1.
33
What is the mean and standard deviation of a **standard normal distribution**?
Mean: 0 Standard Deviation: 1 ## Footnote A standard normal distribution is denoted as X∼N(0,1).
34
How can any random variable that is normally distributed be transformed into a **standard normal**?
Subtract the mean and divide by the standard deviation ## Footnote This transformation helps analysts understand how many standard deviations a value is from the mean.
35
What percentage of observations from a normal distribution are within **one standard deviation** of the mean?
68.3% ## Footnote This is part of the empirical rule for normal distributions.
36
What percentage of observations from a normal distribution are within **two standard deviations** of the mean?
95.5% ## Footnote This indicates that 4.5% of the data falls outside of two standard deviations.
37
What percentage of observations from a normal distribution are within **three standard deviations** of the mean?
99.7% ## Footnote This means only 0.3% of the data falls outside of three standard deviations.
38
What is the **Central Limit Theorem (CLT)**?
The sum of a large number of independent, identically distributed random variables approaches a normal distribution ## Footnote Generally true by n=30.
39
What happens to the **t-distribution** as the degrees of freedom increase?
It approaches a normal distribution ## Footnote The t-distribution is used for small sample sizes.
40
What is the mean and variance of the **t-distribution**?
Mean: 0 Variance: n/(n-2) ## Footnote The t-distribution is used in many statistical tests.
41
Who first identified the **t-distribution** and under what pseudonym did he publish?
William Sealy Gossett, pseudonym: 'Student' ## Footnote He worked for the Guinness brewery.
42
What defines a **lognormal distribution**?
Formed by raising e to the power of a normal random variable ## Footnote The mean is calculated as e^(μ + σ²/2).
43
What is the domain of the **lognormal distribution**?
x ∈ [0, ∞) ## Footnote This means it only takes positive real numbers.
44
What is the **F-distribution** a ratio of?
Two chi-squared random variables ## Footnote It has parameters degrees of freedom m and n.
45
What is the mean of the **F-distribution**?
n/(n-2) where n > 2 ## Footnote This is important for statistical analysis.
46
What is the variance of the **F-distribution**?
2n²(m+n-2)/(m(n-2)²(n-4)) where n > 4 ## Footnote This is used in various statistical tests.
47
What is the formula for the **F-distribution**?
F = \frac{2n^{2}(m+n-2)}{m(n-2)^{2}(n-4)} ## Footnote The F-distribution is useful in hypothesis testing and regression analysis.
48
As **m** and **n** tend to infinity, the **F** tends to what distribution?
Normal distribution ## Footnote The F-distribution approaches the normal distribution under certain conditions.
49
The **F-distribution** is useful in which two statistical analyses?
* Hypothesis testing * Regression analysis ## Footnote The F-test denotes whether the regression model is statistically significant.
50
What is the **mean** of a **triangular distribution**?
\frac{a+b+c}{3} ## Footnote The triangular distribution has three parameters: minimum (a), maximum (b), and mode (c).
51
The **variance** of a triangular distribution is calculated as?
\frac{a^{2} + b^{2} + c^{2} - ab - ac - bc}{18} ## Footnote Triangular distributions are often used in risk analysis.
52
The **Bernoulli distribution** has how many outcomes?
Two outcomes ## Footnote The outcomes are typically represented as one (success) and zero (failure).
53
What is the **mean** of a **Bernoulli distribution**?
p ## Footnote The variance of a Bernoulli distribution is pq, where q = 1 - p.
54
The sum of **n** Bernoulli distributions results in which distribution?
Binomial distribution ## Footnote The Binomial distribution is denoted as Binomial(n, p).
55
What is the **covariance** formula for random variables **X** and **Y**?
\text{Cov}(X,Y) = E[(X-\mu_{x})(Y-\mu_{y})] = E[XY] - \mu_{x} \mu_{y} ## Footnote Covariance indicates how two variates fluctuate together.
56
Correlation is a measurement that describes the strength of the **linear relationship** between two variates. True or False?
TRUE ## Footnote Correlation is calculated by dividing the covariance of X and Y by the product of their standard deviations.
57
What is the null hypothesis denoted as in hypothesis testing?
H_{0} ## Footnote The null hypothesis is assumed to be true unless proven otherwise.
58
What test is used to see if two populations have different means?
t-test ## Footnote The null hypothesis for the t-test states that the means are the same.
59
What test is used to see if two populations have different standard deviations?
F-test ## Footnote The null hypothesis for the F-test states that the standard deviations are the same.
60
The **standard normal distribution** has a mean of ___ and a standard deviation of ___?
0, 1 ## Footnote The standard normal distribution is a specific case of the normal distribution.
61
What is the **null hypothesis** for testing if two populations are identically distributed?
H₀: f(x) = g(x) ## Footnote This hypothesis states that the distributions of the two populations are the same.
62
What is the **alternative hypothesis** for testing if two populations are identically distributed?
H₁: f(x) ≠ g(x) ## Footnote This hypothesis states that the distributions of the two populations are different.
63
True or false: A **one-tailed test** assumes a direction of difference in the alternative hypothesis.
TRUE ## Footnote For example, testing if one population mean is greater than another.
64
What is the **significance level** denoted by in hypothesis testing?
α ## Footnote It represents the probability of rejecting the null hypothesis when it is true.
65
What is a **Type I error** in hypothesis testing?
Rejecting H₀ when it is true ## Footnote This error occurs when a test incorrectly indicates a significant effect.
66
What is a **Type II error** in hypothesis testing?
Accepting H₀ when it is false ## Footnote This error occurs when a test fails to detect a true effect.
67
What does the **test statistic** represent in hypothesis testing?
A function of the sample data ## Footnote It is calculated under the assumption that the null hypothesis is true.
68
What is the purpose of the **critical value** in hypothesis testing?
To determine whether to reject H₀ ## Footnote It is the threshold that the test statistic must exceed to reject the null hypothesis.
69
In a **two-tailed test**, what are the hypotheses?
* H₀: μ₁ = μ₂ * H₁: μ₁ ≠ μ₂ ## Footnote This test checks for any difference without specifying a direction.
70
What is the typical level of significance used in hypothesis testing?
0.05 ## Footnote This level indicates a 5% chance of committing a Type I error.
71
What does a lower significance level (α) indicate about a test?
Stronger evidence needed to reject H₀ ## Footnote It reduces the probability of committing a Type I error.
72
What is the **power of a statistical test** related to?
1 - β ## Footnote It is the probability of correctly rejecting a false null hypothesis.
73
What is the formula for the pooled variance ( S_p^2 )?
Sₚ² = ((n-1)S₁² + (m-1)S₂²) / (n + m - 2) ## Footnote This formula combines the variances of two samples.
74
What is the **average DoD Cost Growth Factor (CGF)** compared to NAVAIR CGF?
14 percentage points lower ## Footnote The data shows an average overrun of 19% for DoD programs and 33% for NAVAIR programs.
75
What should be done before conducting a hypothesis test?
Choose the significance level α ## Footnote This choice should be made prior to testing and reported in results.
76
What is the **test statistic** associated with testing equality of means?
T-test ## Footnote This statistic is used to determine if there is a significant difference between the means of two groups.
77
What is the formula for the **T-test** statistic?
T = (X̄ - Ȳ - (μX - μY)) / (S_P * √(1/n + 1/m)) ## Footnote This formula is used to determine whether to reject or accept the null hypothesis in hypothesis testing.
78
What does the **T-test** compare?
It compares the means of two groups ## Footnote The T-test is particularly useful for assessing whether the means of two populations are statistically different from each other.
79
The degrees of freedom for the T-test is calculated as _______.
n + m - 2 ## Footnote This calculation is essential for determining the critical values in hypothesis testing.
80
True or false: The T-test assumes that the variances of the underlying distributions are equal.
TRUE ## Footnote While testing for unequal means, the T-test assumes equal variances unless otherwise specified.
81
What is the critical value range for the example problem with a T-test statistic of -1.32?
+/- 2.02 ## Footnote Since the test statistic is between -2.02 and 2.02, it fails to reject the null hypothesis.
82
What is the **p-value** in hypothesis testing?
The smallest α for which one could reject the null hypothesis ## Footnote The p-value indicates the probability of observing the test results under the null hypothesis.
83
What was the p-value for the two-tailed test in the example problem?
0.194 ## Footnote This indicates a 19.4% probability of incorrectly rejecting the null hypothesis if it were true.
84
A **confidence interval (CI)** indicates _______.
(1 - α) * 100% confidence that the true parameter value is contained within the calculated range ## Footnote CIs provide a range of values that likely contain the true population parameter.
85
For a 95% confidence interval, what is the probability of the true parameter being above or below the critical values?
α/2 chance ## Footnote For a 95% CI, there is a 2.5% chance of being too low or too high.
86
What is the formula for calculating a confidence interval for the mean?
(X̄ - t(α/2, n-1) * (s/√n), μ, X̄ + t(α/2, n-1) * (s/√n)) ## Footnote This formula uses the sample mean, standard deviation, and sample size to estimate the CI.
87
What was the calculated 95% CI for the mean DoD CGF in the example?
(1.03, μ, 1.35) ## Footnote This interval suggests that the true mean DoD CGF is between 1.03 and 1.35 with 95% certainty.
88
What does it mean if the 95% CI for the DoD average CGF contains the value 1.33?
Less than 95% certain that the two means are different ## Footnote This indicates that the difference in means is not statistically significant.
89
What is the purpose of a **one-sample t-test**?
Checks if the mean of a population group is equal to a particular constant ## Footnote Null hypothesis: mean equals the constant; alternative hypothesis: mean does not equal the constant.
90
What are the hypotheses for a **one-sample t-test**?
* H0: μ = μ0 * H1: μ ≠ μ0 (or μ > μ0 or μ < μ0) ## Footnote H0 is the null hypothesis and H1 is the alternative hypothesis.
91
What is the distribution of the test statistic in a **one-sample t-test**?
t-distribution with n - 1 degrees of freedom ## Footnote n is the sample size.
92
What is the formula for the test statistic in a **one-sample t-test**?
T = (X̄ - μ0) / (s / √n) ## Footnote X̄ is the sample mean, μ0 is the population mean, s is the sample standard deviation, and n is the sample size.
93
What does a **two-sample t-test** check for?
Checks if the means of two population groups are equal ## Footnote Null hypothesis: means are equal; alternative hypothesis: means are not equal.
94
What are the hypotheses for a **two-sample t-test**?
* H0: μx = μy * H1: μx ≠ μy (or μx > μy or μx < μy) ## Footnote H0 is the null hypothesis and H1 is the alternative hypothesis.
95
What is the distribution of the test statistic in a **two-sample t-test**?
t-distribution with (n + m - 2) degrees of freedom ## Footnote n and m are the sample sizes of the two groups.
96
What is the formula for the test statistic in a **two-sample t-test**?
T = (X̄ - Ȳ - (μX - μY)) / (Sp√(1/n + 1/m)) ## Footnote Sp is the pooled standard deviation.
97
What is the purpose of **t-tests in regression analysis**?
To test if the coefficients in the regression equation are statistically significantly different from zero ## Footnote Null hypothesis: bi = 0; alternative hypothesis: bi ≠ 0.
98
What is the formula for the test statistic in regression analysis?
T = Estimated Coefficient / Standard Error ## Footnote T follows a t-distribution with n - k degrees of freedom.
99
What does the **chi-squared test** investigate?
Goodness of fit ## Footnote It can also be used to test for variance.
100
What are the hypotheses for a **chi-squared test for variance**?
* H0: σ² = σ0² * H1: σ² ≠ σ0² (or σ² < σ0² or σ² > σ0²) ## Footnote H0 is the null hypothesis and H1 is the alternative hypothesis.
101
What is the test statistic for a **chi-squared test for variance**?
T = (n - 1)s² / σ0² ## Footnote s is the sample standard deviation and σ0² is the predetermined variance.
102
What is the purpose of an **F-test**?
To check if the variance of one group is equal to the variance of another group ## Footnote Assumes both groups are normally distributed.
103
What are the hypotheses for an **F-test**?
* H0: σx² = σy² * H1: σx² ≠ σy² (or σx² > σy² or σx² < σy²) ## Footnote H0 is the null hypothesis and H1 is the alternative hypothesis.
104
What is the test statistic for an **F-test**?
T = s1² / s2² ## Footnote s1² and s2² are the sample variances of the two groups.
105
What is the **null hypothesis** for the **F-test** in regression analysis?
H₀: b₁ = b₂ = b₃ ... = bₖ = 0 ## Footnote This hypothesis states that all coefficients in the regression equation are equal to zero.
106
What is the **alternative hypothesis** for the **F-test** in regression analysis?
H₁: at least one bᵢ ≠ 0 ## Footnote This indicates that at least one of the coefficients in the regression model is not equal to zero.
107
The test statistic for the F-test is calculated as the ratio of **Mean Square Error of the Regression** to **Mean Square Error**. What is the formula?
MSR / MSE ## Footnote This value is derived using ANOVA.
108
What is the purpose of the **Kolmogorov-Smirnov test (K-S test)**?
To determine whether sample data is representative of a distribution ## Footnote The K-S test applies only to continuous distributions.
109
What are the **null and alternative hypotheses** for the K-S test?
* H₀: the data follow a specified distribution * H₁: the data do not follow a specified distribution ## Footnote These hypotheses help assess the fit of the sample data to the theoretical distribution.
110
What is the formula for the **test statistic** in the K-S test?
D = max₁≤i≤n (F(Yᵢ) - (i-1)/n, (i/n) - F(Yᵢ)) ## Footnote This statistic measures the maximum difference between the empirical and theoretical cumulative distributions.
111
What is the formula for a **confidence interval (CI)** for the mean of a normal distribution?
(x̄ - tₐ/₂,n₋₁ * (s/√n), μ, x̄ + tₐ/₂,n₋₁ * (s/√n)) ## Footnote This formula provides a range within which the true mean is expected to lie.
112
What is the formula for a **confidence interval (CI)** for the difference of two means?
(x̄ - ȳ - tₐ/₂,n+m-₂ * Sₚ√(1/n + 1/m), x̄ - ȳ, x̄ - ȳ + tₐ/₂,n+m-₂ * Sₚ√(1/n + 1/m)) ## Footnote This CI estimates the range for the difference between two population means.
113
What are the two main types of **statistics**?
* Descriptive statistics * Inferential statistics ## Footnote Descriptive statistics summarize data, while inferential statistics draw conclusions about populations from samples.
114
What measures are used to assess **central tendency**?
* Mean * Median * Mode ## Footnote These measures provide insights into the average or typical values in a dataset.
115
What measures are used to assess **dispersion**?
* Variance * Standard deviation * Coefficient of variation ## Footnote These measures indicate the spread or variability of data points in a dataset.