Statistics_4E_Freedman Flashcards

Question

Descriptive Statistics

Answer 1

Definition: Methods used to summarize and describe the main features of a dataset. Purpose: To make data more manageable and easier to interpret. Techniques: Includes calculating averages (mean, median), measuring spread (standard deviation, range), and creating visual summaries (histograms, scatter diagrams). Contrast: It only describes the sample data; it does not draw conclusions or make inferences about a larger population.

Answer 2

Definition: A graphical tool used to display the distribution of a quantitative variable (e.g., height, income, test scores). Construction: The horizontal axis shows the possible values of the variable, divided into classes (intervals or bins). The vertical axis shows the frequency (or percentage) of observations that fall into each class. Key Feature: The area of each rectangular bar is proportional to the number of cases in that class. Purpose: To quickly visualize the shape of the distribution (e.g., symmetric, skewed), its center, and its spread.

Answer 3

Definition: The ranges of values used to group quantitative data when constructing a histogram or frequency table. Characteristics: They must cover all the data, be non-overlapping, and are typically of equal width (though not always). Purpose: To simplify a wide variety of data values into a small, manageable number of categories so the distribution can be clearly visualized and summarized. Freedman's Principle: The choice of interval width significantly affects the shape of the histogram; too wide, and detail is lost; too narrow, and the distribution looks jagged.

Answer 4

Percentage of total

Answer 5

Definition: A table that summarizes the distribution of a variable by listing the class intervals (or values) and the frequencies (or percentages) of the observations that fall into each interval. Purpose: To clearly organize and present raw data in a way that shows how the data is spread out and where the majority of observations lie. Relationship to Histogram: A distributional table is essentially the numerical source data used to create a histogram; the intervals are the bases of the bars, and the frequencies are the heights of the bars. Also Known As: A frequency table (when using counts) or a relative frequency table (when using percentages).

Answer 6

Definition: The rule used when defining class intervals for a histogram or distributional table to determine exactly where observations that fall on a boundary (or endpoint) should be counted. The Standard Rule: In Freedman's text, the standard convention is that an observation that falls exactly on a class boundary is counted in the interval to its right (the next higher interval). Example: If intervals are 10−20 and 20−30, an observation of 20 is counted in the 20−30 interval, not the 10−20 interval. Purpose: To ensure that every observation is counted in one and only one class interval, making the frequency table and histogram accurate and unambiguous.

Answer 7

Definition: A vertical scale used in a histogram where the height of a bar is made equal to the percentage of cases in the interval divided by the width of the interval. Formula: Height = Percentage of Cases / Width of Interval Purpose: To make the histogram's visual interpretation accurate. Using the density scale ensures that the area of the bar, not just the height, represents the percentage of cases in that class interval. When Used: It's essential when the class intervals have unequal widths. If widths are equal, the area is proportional to the height, and a simple percentage or frequency scale is sufficient.

Answer 8

Percentage per horizontal unit (density)

Answer 9

Definition: A characteristic or attribute that can take on different values from one individual or unit of observation to another (e.g., height, income, gender, blood pressure). Purpose: Variables are the basic building blocks of any statistical study, as researchers measure and analyze their distributions and relationships. Types (as discussed in the text): Quantitative: Measured in numbers (e.g., age, weight). Qualitative (or Categorical): Sorted into distinct categories (e.g., occupation, blood type). Key Idea: If an attribute were the same for every subject in a study (a constant), it would be useless for statistical analysis.

Answer 10

1. Qualitative 2. Quantitative

Answer 11

1. Discrete 2. Continuous

Answer 12

Standard Deviation

Answer 13

Definition: A single, representative value that describes the center or typical size of a set of numbers. Intuitive Explanation: If you were to smooth out the data—taking from the high values and giving to the low values—the average is the height or amount everyone would end up with. It's the leveling point of the distribution. Most Common Type (Arithmetic Mean): Calculated by summing all the values in a list and then dividing by the count of those values. Purpose: It allows you to summarize a large dataset with a single, easily comparable number.

Answer 14

Definition: A measure of the spread or variability of a list of numbers, showing how much the values typically differ from the average (mean). Intuitive Explanation: It's the typical distance between an individual data point and the center of the data. A small SD means the data points are tightly clustered around the average; a large SD means the points are widely scattered. Key Property: It is calculated using the Root Mean Square (RMS) error, which essentially finds the average size of the deviations (differences) from the mean. Relationship to Variance: The variance is the SD squared (SD2). The SD is more commonly used because it is in the same units as the data (e.g., dollars, inches), unlike the variance.

Answer 15

Definition: The arithmetic average of a list of numbers, calculated by summing all the values and dividing by the total count of values. Intuitive Explanation: The mean is the balance point of the data distribution. If you put all the values on a seesaw, the mean is the exact spot where the seesaw would balance. It's the fair share or the amount each item would receive if the total were distributed equally. Key Property: The sum of the deviations (differences) of all data points from the mean is always zero.

Answer 16

Definition: A measure of spread that quantifies the difference between the first quartile (Q1) and the third quartile (Q3). Formula: IQR=Q3−Q1 Intuitive Explanation: The IQR gives the range of the middle 50% of the data. It tells you how spread out the most central half of your values are, ignoring the extreme 25% on the low end and the 25% on the high end. Key Advantage: Unlike the standard deviation, the IQR is robust (resistant) to outliers (extreme values), making it a reliable measure of spread for skewed distributions.

Answer 17

Definition: The middle value in a list of numbers that has been arranged in numerical order. If there's an even count of numbers, it's the average of the two middle values. Intuitive Explanation: The median is the value that splits the data exactly in half; 50% of the observations are below it, and 50% are above it. It's the point where you'd cut a distribution to have an equal number of cases on either side. Key Advantage: It's robust (resistant) to outliers (extreme values) because its calculation only depends on the position of the data points, not their magnitude. Use: The median is often preferred over the mean for data that is heavily skewed (like income or house prices).

Answer 18

Definition: The value or category that appears most frequently in a dataset or distribution. Intuitive Explanation: It's the most popular choice or the most common result. If you looked at a histogram, the mode is the value under the tallest bar. A distribution can have one mode (unimodal), two modes (bimodal), or more. Use: It is the only measure of center that can be used for qualitative (categorical) data (e.g., favorite color, brand preference), as you cannot calculate a mean or median for categories. Limitation: It is easily affected by small changes in data and can sometimes be a poor representation of the center if the rest of the data is far away.

Answer 19

Definition: A survey that measures a snapshot of a population at a single point in time. Data is collected on both the variables of interest and the outcome simultaneously. Intuitive Explanation: It's like taking a photo of a group—you capture information right now, but you don't track how things change over time. Study Type: It is a type of observational study. Limitation: It is very difficult to establish a clear temporal order (which came first: the exposure or the outcome), making it weak evidence for causation. You can only confirm that an association exists at that moment in time.

Answer 20

Definition: A type of observational study where the same subjects are followed and measured repeatedly over a long period of time. Intuitive Explanation: It's like taking a movie of a group—you track them over months or years, observing how variables and outcomes change as time progresses. Key Advantage: Because the data is collected over time, it helps researchers establish temporal precedence (which came first—the exposure or the outcome), which strengthens the evidence for a possible causal link more than a single cross-sectional survey does. Examples: Cohort studies (following a group forward in time) and panel studies. Limitation: They are expensive, time-consuming, and suffer from attrition (subjects dropping out over the years), which can introduce bias.

Answer 21

We need to understand if survey data is cross-sectional or longitudinal primarily to determine the strength of the evidence for a time-based relationship and to assess the study's limitations. Establishing Cause and Effect (Causation): Longitudinal studies are better because they track subjects over time, helping to establish temporal precedence—that the exposure or cause actually happened before the outcome or effect. This is a critical requirement for suggesting causation. Cross-Sectional studies measure everything at once, making it difficult or impossible to tell which variable came first. Assessing Changes and Trends: Longitudinal data allows researchers to measure changes within the same individual over time, track trends, and determine the duration of an effect. Cross-Sectional data can only show differences between individuals at one moment; it cannot show how any individual variable changes. Evaluating Bias: Cross-Sectional surveys are highly susceptible to confounding. Longitudinal surveys face challenges with attrition (subjects dropping out) and can be very expensive and time-consuming. In short, knowing the survey type tells you what conclusions you can reasonably draw from the data, especially regarding the critical difference between mere association and a potential causal link.

Answer 22

Definition: A general statistical calculation used to find the typical size or magnitude of a set of numbers, especially when those numbers include both positive and negative values (like deviations). Intuitive Explanation: It's the average size of a list of numbers, ignoring their sign. It's the standard way to measure the typical size of deviations, giving larger weight to larger numbers. Steps to Calculate: Square (S): Square all the numbers in the list. Mean (M): Find the average (mean) of those squares. Root (R): Take the square root of the mean. Key Application: The Standard Deviation (SD) is calculated as the RMS of the deviations from the mean.

Answer 23

Definition: A specific, symmetrical, bell-shaped distribution that frequently arises in natural phenomena and statistical theory. It is characterized entirely by its mean (μ) and its standard deviation (σ). Intuitive Explanation: It's the ideal shape for many large, naturally occurring datasets (like heights, weights, or measurement errors). It shows that most observations cluster around the average (the center), and values become increasingly rare the further they deviate from the average. Key Property (The 68-95-99.7 Rule): For any Normal Distribution: Roughly 68% of the data falls within 1 SD of the mean. Roughly 95% of the data falls within 2 SDs of the mean. Roughly 99.7% of the data falls within 3 SDs of the mean. Significance: It serves as a crucial benchmark distribution in statistics, especially for understanding variation and performing inference.

Answer 24

Definition: A quick, practical rule that approximates the spread of data for any Normal Distribution using the standard deviation. Intuitive Explanation: It tells you where the vast majority of your data will fall, as long as the distribution looks like a bell-shaped curve. It's a handy way to estimate percentages without complex calculations. The Three Key Percentages: Approximately 68% of the data falls within 1 Standard Deviation (SD) of the mean. Approximately 95% of the data falls within 2 SDs of the mean. Approximately 99.7% of the data falls within 3 SDs of the mean. Condition: The rule only applies to data that follows the Normal Curve.

Answer 25

Definition: A standardized measure that shows how many standard deviations (SDs) a value is above or below the average (mean). Intuitive Explanation: Standard units are a way to level the playing field and compare apples to oranges. By converting different types of measurements (like height and weight, or scores on different tests) to a common unit (the SD), you can easily see which value is relatively more extreme. Formula: Standard Units=(Value−Average)/SD Interpretation: A positive standard unit means the value is above the average; a negative standard unit means the value is below the average. A standard unit of 0 means the value is exactly the average.

Answer 26

Definition: The practice of using the Normal Distribution (Normal Curve) to estimate or approximate the percentages, probabilities, or counts within a given range for a different, non-normal dataset or distribution. Intuition: Many real-world distributions—especially those built from chance processes, like sums or averages—are very close to the Normal Curve, even if the underlying data isn't perfectly normal. The Normal Approximation allows us to use the well-known 68-95-99.7 Rule and the Standard Unit calculations to quickly and effectively estimate percentages for these complex distributions. Key Step: To use the Normal Approximation, you must first convert the data value(s) into Standard Units (Z-scores) using the mean and standard deviation of the dataset. You then use a Normal Curve table to find the desired percentage. Applicability: It works best when the original dataset has a large number of observations and the histogram's shape is close to the smooth, symmetrical bell curve.

Answer 27

Definition: A value in a distribution such that a specific percentage of the data falls at or below that value. Intuition: It tells you what percentage of people, scores, or measurements you outperformed or are equal to. For example, if you score in the 90th percentile, 90% of the scores are below yours, and 10% are above. Special Cases: The median is the 50th percentile. The interquartile range (IQR) is defined by the 25th percentile (Q1) and the 75th percentile (Q3). Use: Percentiles are particularly useful for understanding relative standing in skewed data where the mean and standard deviation may not be good measures of center and spread.

Answer 28

Definition: The percentage of scores or values in a distribution that are equal to or less than a particular value. Intuition: The percentile rank tells you the relative standing of a specific score compared to the entire group. If your test score has a percentile rank of 80, it means you scored as well as or better than 80% of the people who took the test. Contrast with Percentile: A percentile is a value (a score); a percentile rank is a percentage (the position of the score). Use: It is primarily used to interpret individual scores by placing them in the context of a larger distribution.

Answer 29

The difference between "percentile" and "percentile rank" is what they measure: one is a value from the dataset, and the other is a percentage of the data. Percentile Definition: A value (a score, height, etc.) in a distribution such that a specified percentage of the data falls at or below that value. Intuition: It's the boundary line. For example, the 80th percentile is the specific income amount that separates the bottom 80% of earners from the top 20%. Output: A value in the original units of the data (e.g., 75 inches, 550 test score, or $80,000). Percentile Rank Definition: The percentage of scores or values in a distribution that are equal to or less than a particular value. Intuition: It's the position. For example, if a test score of 550 has a percentile rank of 80, it means that 80% of all test-takers scored 550 or lower. Output: A percentage (a number between 0 and 100).

Answer 30

Definition: The process of adjusting a list of numbers by either adding a constant to every value (shifting the data) or multiplying by a constant (rescaling the data), or both. Intuition: This describes what happens to the summary statistics (like the average and standard deviation) when you change the units of measurement. For instance, converting temperatures from Celsius to Fahrenheit involves both a multiplication and an addition. Effect on Mean/Average: If you add/subtract a constant to every number, the mean changes by that exact amount. If you multiply/divide by a constant, the mean changes by that exact factor. Effect on Standard Deviation (SD): Adding/Subtracting a constant does not change the SD (because the spread between the numbers remains the same). Multiplying/Dividing by a constant changes the SD by that factor (because the distances between numbers are also stretched or compressed). Key Idea: The standard deviation measures spread, which is independent of the location of the mean.

Answer 31

Mean increase/decrease by the constant amount

Answer 32

Mean is multiplied by constant value

Answer 33

SD is multiplied by constant value

Answer 34

Definition: The inevitable chance variation that occurs when a quantity is measured, meaning the recorded value differs from the true value. Formula: Measured Value=True Value+bias+chance error Intuition: No measurement tool is perfectly precise, and no human can use a tool perfectly. The error represents the combination of the instrument's limitation and human imperfection. It is assumed to be a random chance error. Key Idea: Measurement errors are generally assumed to be independent and follow the Normal Distribution, clustering around zero. This means small errors are common, and large errors are rare, with no systematic tendency to be too high or too low.

Answer 35

Definition: The component of measurement error that is random and unpredictable. It causes a measurement to be sometimes too high and sometimes too low, with no systematic pattern. Intuition: It represents the small, inevitable fluctuations that happen every time a measurement is taken, such as slight variations in reading a scale, environmental noise, or the inherent imprecision of the instrument. It is due to chance. Key Property: Chance errors are assumed to average out to zero over many repeated measurements. They are also generally assumed to follow the Normal Distribution. Contrast: It differs from bias (or systematic error), which is a consistent error that pushes the measurement in one direction (always too high or always too low). We can measure the chance error by taking the standard deviation of the repeated measurements made under identical conditions

Answer 36

Definition: The process of checking and adjusting a measurement instrument to ensure its readings are accurate relative to a known standard. Intuition: It's the act of zeroing or correcting a tool so that the measurements it provides are reliable and trustworthy. For example, before using a scale, you check that it reads zero when nothing is on it. Purpose in Statistics: Proper calibration helps to eliminate or minimize systematic error (bias). If a scale is consistently reading 5 pounds too high, calibration is the step that fixes this bias, ensuring that any remaining error is only due to random chance. Contrast: Calibration deals with systematic error (bias), while statistical analysis deals primarily with chance error.

Answer 37

Definition: An observation (a value) that is far away from the rest of the data points in a distribution. Intuition: It's the odd one out—a score, measurement, or data point that seems extreme or highly unusual compared to the bulk of the data. Causes: Outliers can be caused by a simple error in measurement or recording, or they can be genuine, rare occurrences that correctly reflect an extreme case in the population. Significance: Outliers can have a disproportionate effect on some summary statistics: They heavily pull the Mean towards them. They dramatically increase the Standard Deviation (SD). They have little to no effect on the Median or the Interquartile Range (IQR).

Answer 38

Definition: A consistent, systematic error that causes a measurement or estimate to be consistently too high or consistently too low. Intuition: It's a one-sided error that doesn't cancel out, even if you repeat the process many times. Think of a scale that is poorly calibrated and always reads five pounds heavy—that's bias. Source in Measurement: Caused by a faulty instrument, a consistent environmental factor, or a flaw in the procedure (e.g., neglecting to calibrate the tool). Source in Studies: Caused by flaws in the study design, such as selection bias (non-comparable groups) or response bias (subjects misreporting information). Key Contrast: Unlike chance error, which is random and averages out to zero, bias must be addressed by calibration or a better study design.

Answer 39

Definition: A system that uses a set of numbers (coordinates) to uniquely locate a point in space (typically a plane). It consists of two perpendicular number lines: the horizontal x-axis (abscissa) and the vertical y-axis (ordinate), which intersect at the origin (0,0). Intuition: It's like giving someone driving directions on a grid. To find any point, you first say how far to go horizontally (the x-value) and then how far to go vertically (the y-value), written as (x,y). Use in Statistics: Cartesian coordinates are the foundation for nearly all statistical graphs, including scatter diagrams (to show the relationship between two variables) and histograms (where frequency/density is the y-axis and the variable value is the x-axis). Named After: The French philosopher and mathematician René Descartes.

Answer 40

Definition: A measure of the steepness and direction of a line. In a scatter diagram or regression line, it quantifies how much the dependent variable (y) changes for every one-unit change in the independent variable (x). Intuition: The slope is the rate of change or the "rise over run." It tells you exactly how fast and in what direction the line is going. A steep slope means a small change in x leads to a large change in y. Formula: slope = change in y / change in x = y2-y1 / x2 - x1 = rise/run Interpretation: Positive Slope (+): The line goes up from left to right, indicating a positive association (as x increases, y increases). Negative Slope (-): The line goes down from left to right, indicating a negative association (as x increases, y decreases). Slope of Zero (0): A horizontal line, indicating no association between x and y.

Answer 41

Definition: The point where a line (such as a regression line) crosses the vertical axis (y-axis). It is the value of y when the independent variable, x, is equal to zero. Intuition: The intercept gives you the starting value or the baseline amount of the dependent variable (y) before any influence from the independent variable (x) has occurred. Formula (in the context of a line): It's the term 'a' in the equation of a line: y=a+bx. Interpretation: The intercept is only meaningful in context if it is plausible for x to be zero and if the data supports extending the line back to that point. In many statistical contexts (like predicting adult height from infant weight), an x value of zero may be outside the range of the observed data, making the intercept value practically meaningless.

Answer 42

y = mx + b where: - m = slope - b = intercept

Answer 43

Definition: A graphical tool used to display the relationship between two quantitative variables for a set of individuals. Intuition: It lets you visually check for an association between two variables. Each subject in the study is represented by a single point on the graph, located using Cartesian coordinates (one variable on the x-axis, the other on the y-axis). Purpose: To reveal the direction, form, and strength of the relationship: Positive Association: The points cluster around an upward-sloping line (as x increases, y increases). Negative Association: The points cluster around a downward-sloping line (as x increases, y decreases). No Association: The points look like a cloud with no clear direction. Key Idea: The points show the raw data, allowing you to easily spot outliers or non-linear patterns.

Answer 44

r How can we summarize the correlation between two variables (and independent and dependent variable)? Convert each variable to standard units and then take the average product Reversible. r(x,y)=r(y,x) Definition: A single number that measures the strength and direction of the linear association between two quantitative variables displayed in a scatter diagram. Measures clustering around a line, relative to the SDs Intuition: It tells you how tightly the points cluster around a straight line. An r value close to +1 or −1 means the points are tightly clustered, indicating a strong linear relationship. An r value close to 0 means the points are widely scattered, indicating a weak or non-existent linear relationship. Range: The correlation coefficient always falls between −1 and +1 (i.e., −1≤r≤1). Sign: The sign ( + or − ) indicates the direction of the association (positive or negative slope). Limitation: It only measures linear association. A strong non-linear curve (like a U-shape) might have a correlation coefficient close to zero.

Answer 45

Definition: The single point on a scatter diagram whose coordinates are the average (mean) of the independent variable (xˉ) and the average (mean) of the dependent variable (yˉ). Coordinates: The point is (xˉ,yˉ). Intuition: It represents the typical or central subject in the entire dataset. It is the center of mass for all the points on the scatter diagram. Key Property: The regression line (the line of best fit used for prediction) always passes through the point of averages. This ensures that the line is centered on the data and makes predictions relative to the mean of both variables.

Answer 46

1. average of x-values 2. SD of x-values 3. average of y-values 4. SD of y-values 5. correlation coefficient, r

Answer 47

Definition: A line drawn on a scatter diagram that passes through the point of averages (xˉ,yˉ) and has a slope equal to: slope= (SD of y)/(SD of x) Intuition: The SD Line helps visualize the spread of the data in Standard Units. It shows the slope the line would have if the correlation coefficient (r) were +1 or −1. In other words, it represents the tightest possible linear clustering for the given SDs. Key Contrast with Regression Line: Unlike the regression line, the SD line is not used for prediction. The regression line is always closer to the horizontal axis than the SD line (except when r=±1), illustrating the principle of regression to the mean. Purpose: It serves as a visual reference to assess the strength of the correlation (r); the closer the scatter of points is to the SD line, the stronger the linear association.

Answer 48

Definition: A statistical technique used to predict the value of one variable (the dependent variable, y) based on the value of another variable (the independent variable, x). Intuition: It finds the best straight line (the regression line or "line of best fit") that passes through the scatter diagram. This line minimizes the overall prediction errors (specifically, the sum of the squared vertical distances from the points to the line). It acts as the most informed estimate for y given x. Slope (b) = r * SD of y / SD of x Significance: Because the regression slope is a fraction of the SD line slope (unless r=±1), it mathematically embodies the principle of regression to the mean: predictions for extreme x values are always less extreme in y (closer to the average yˉ) than they would be if the prediction followed the SD line. For every one increase in SD in x, there is an increase of r SDs in y, on average. This plotted is the regression estimate or regression line for y on x.

Answer 49

Definition: A graphical tool used to summarize a scatter diagram by plotting the average y-value for each distinct x-value (or for each class interval of x). Intuition: Instead of showing hundreds of individual points, the graph of averages shows the trend more clearly. It collapses the vertical scatter for each x-value down to a single representative point—the mean of the y's for that x. It smooths out the noise to reveal the underlying relationship. Key Property: The graph of averages is used to check for linearity. If the original scatter diagram shows a linear association, the graph of averages will also follow a straight line. If the original association is non-linear (curved), the graph of averages will show that same non-linear curve. Relationship to Regression: The regression line is simply the straight line that best fits the points on the graph of averages.

Answer 50

Definition: The common mistake of attributing a real cause-and-effect relationship to observed changes that are actually due entirely to the statistical principle of regression to the mean. Intuition: It's the error of thinking that a variable must have changed because of some action when, in reality, it was just an unusually extreme measurement moving back toward the average. Example: A basketball player has an amazing, extreme scoring night. If the coach criticizes them afterward, and the player scores closer to their average the next game, the coach might mistakenly believe the criticism "worked," when the score was simply regressing to the mean. Key Property: This fallacy occurs whenever one makes an observation after an extreme measurement. Because of natural random variation (chance error), the next measurement is very likely to be less extreme (closer to the long-run average), regardless of any external factor. Significance: It highlights the need for a control group in experiments. Without a control group to measure natural regression, one might falsely conclude that a treatment or intervention caused a change.

Answer 51

Definition: The statistical tendency for subjects or measurements that are extreme on a first measurement to be closer to the average on a second, subsequent measurement. Intuition: This is a natural consequence of chance error. When you observe an extreme value (either very high or very low), part of that extremeness is likely due to good or bad luck (random chance). The next time you measure, that luck component is likely to be closer to zero, causing the overall value to "regress" or move back toward the long-run average. Mechanism: The effect is only observed when the two variables being measured are not perfectly correlated (r is not ±1). The weaker the correlation (the closer r is to 0), the stronger the regression effect. Significance: The Regression Effect is the statistical truth that underlies the Regression Fallacy (the error of misinterpreting this natural movement toward the average as a caused effect).

Answer 52

RMS Error for Regression

Answer 53

Definition: In statistics, error refers to the difference between an observed value and a predicted, theoretical, or true value. It represents the inherent variability or lack of perfect precision in data. Intuition: Error is the amount by which you miss the mark. It's the inevitable difference between what you measure or predict and the actual fact.

Answer 54

Definition: The vertical distance between an observed data point and the corresponding point on the regression line (the predicted value). It represents the error in prediction for a specific observation.

Answer 55

Definition: A systematic flaw in a study's design where the individuals or groups being compared are fundamentally different in a way that relates to the outcome being studied, before the treatment or exposure is even applied. Intuition: It means the groups weren't comparable to start with. The study suffers from a "non-comparable groups" problem. Any observed difference in the outcome could be due to these pre-existing differences rather than the treatment itself. Cause: It typically occurs in Observational Studies when subjects self-select their group (e.g., smokers vs. non-smokers), or when the researcher chooses non-random, non-representative samples. Key Solution: The best way to eliminate selection bias is through Random Assignment in a Randomized Controlled Experiment (RCT). Randomization ensures that, on average, the treatment and control groups are comparable in all aspects, both known and unknown. Example: Comparing the health of people who choose to run marathons to those who don't. The marathon runners were likely healthier, to begin with, leading to selection bias.

Answer 56

Definition: A scatter diagram that plots the residuals (the prediction errors) on the vertical (y) axis against the independent variable (x) or the predicted values (y^) on the horizontal (x) axis. Intuition: It's a way to magnify the errors of the regression line. If the regression line is a good fit, the residual plot should show no pattern—just a random, symmetric cloud of points centered around the horizontal line at y=0. Purpose (Diagnostic Tool): To check the assumptions of the linear regression model: Linearity: If the plot shows a curved pattern (e.g., a "U" or inverted "U"), the true relationship is non-linear, and the linear model is inappropriate. Equal Scatter (Homoscedasticity): If the plot shows a funnel shape (scatter widens or narrows across x), the assumption of constant RMS error is violated. Conclusion: A well-fitting model has a residual plot that looks like a horizontal band of random noise centered at zero.

Answer 57

Definition: A condition in a statistical model (like linear regression) where the variability (or spread) of the errors (residuals) is constant across all levels of the independent variable (x). Intuition: It means the vertical scatter of the data points around the regression line is the same width everywhere along the line. The data points form a consistent, uniform band. Key Importance: This is an important assumption for many statistical methods. When data is homoscedastic, the RMS Error for Regression is an accurate measure of prediction error across the entire range of the data. Contrast: The opposite is Heteroscedasticity (Heteroskedasticity), where the scatter changes (often forming a "funnel" shape). If data is heteroscedastic, predictions made for areas with low scatter will be less accurate than the RMS error suggests, and predictions for areas with high scatter will be more accurate than the RMS error suggests.

Answer 58

Definition: A condition in a statistical model (like linear regression) where the variability (or spread) of the errors (residuals) is not constant across all levels of the independent variable (x). Intuition: It means the vertical scatter of the data points around the regression line is wider in some places and narrower in others. The data points form a shape like a funnel or a wedge. Key Importance: When data is heteroscedastic, the RMS Error for Regression is misleading because it represents a single "average" spread. In areas of high scatter, the prediction error is actually larger than the RMS suggests, and in areas of low scatter, the error is smaller. This invalidates certain statistical inferences. Contrast: The opposite is Homoscedasticity, where the vertical scatter is uniform across the entire range of x.

Answer 59

y = slope * x + intercept = mx+b

Answer 60

m = r * SD of y / SD of x Allows us to understand the average change in y that would be caused by a change in x

Answer 61

Definition: The mathematical procedure used to find the equation of the best-fitting line (the regression line) for a set of data points in a scatter diagram. Intuition: The method determines the line that minimizes the total amount of error. It achieves this by finding the line that makes the sum of the squares of the vertical distances (the residuals) from all the data points to the line as small as possible. Why "Square" the Errors? Squaring the residuals accomplishes two things: It ensures that all errors (both positive and negative) contribute equally, preventing positive and negative residuals from cancelling each other out. It penalizes large errors much more heavily than small errors, ensuring the line is a good fit across all the data, not just the center. Outcome: The result is the unique line that provides the most reliable linear prediction of the dependent variable (y) from the independent variable (x).

Answer 62

Works best for processes which can be repeated over and over again, independently and under the same conditions Definition: The view that the probability of an event is defined by the long-run relative frequency with which the event occurs in a very large number of independent and identical trials. Intuition: Probability isn't just a guess; it's what actually happens over the long haul. If you flip a fair coin many, many times, the proportion of heads you get will stabilize and get closer and closer to 0.5. This stable proportion is the probability. Key Idea: It defines probability based on observability and empirical evidence. For an event to have a probability, it must be repeatable. The probability is the limit of the relative frequency as the number of trials approaches infinity. Contrast: It differs from the Classical (or Equally Likely Outcomes) Theory (where probability is defined by counting possibilities) and the Subjective Theory (where probability reflects a personal degree of belief).

Answer 63

Definition: A number that quantifies the likelihood of a specific event occurring. It is expressed as a number between 0 and 1 (or as a percentage between 0% and 100%). **Probability** = Number of favorable outcomes / Number of total possible outcomes Intuition: It's a measure of how often something is expected to happen. A probability of 0 (or 0%) means the event is impossible. A probability of 1 (or 100%) means the event is certain. A probability of 0.5 (or 50%) means the event is as likely to happen as not. Theoretical Calculation (Classical): If all possible outcomes are equally likely (e.g., rolling a die), the probability is calculated as: Empirical Calculation (Frequency Theory): The long-run relative frequency of the event occurring in many repeated trials. Key Idea: Chance introduces randomness into the world, and probability provides the mathematical framework for understanding and predicting the long-term patterns within that randomness.

Answer 64

Definition: The probability of an event occurring given that another event has already occurred. Notation: It is written as P(B∣A), which is read as "the probability of event B given event A." Intuition: It narrows the focus from the entire sample space to a subset of outcomes defined by the condition. It asks, "Out of all the times event A happened, how often did event B happen too?" Formula: P(A|B) = P(A AND B) / P(B) (The probability of both events happening divided by the probability of the condition event A happening.) Key Idea (Dependence): If P(B∣A) is different from P(B), then the two events are dependent; the occurrence of A changes the likelihood of B. If P(B∣A) is equal to P(B), the events are independent.

Answer 65

Definition: A fundamental probability rule used to calculate the probability that two or more events will all occur (the probability of their intersection). AND RULE Intuition: It's how you figure out the chances of getting "this" and "that" to happen. The core idea is that the chance of the second event happening depends on whether the first event actually occurred. 1. Dependent P(A AND B) = P(A) * P(B|A) Intuition: To get both outcomes, you first need to succeed at A, and then you need to succeed at B out of the restricted group where A has already happened. 2. Independent P(B|A) = P(B) P(A AND B) = P(A) * P(B) Intuition: Since the events don't influence each other, you can simply multiply their individual probabilities together to find the chance of both happening. 3. Mutually Exclusive P(A AND B) = 0 Intuition: A and B can not happen together. One prevents the other.

Answer 66

Definition: A fundamental probability rule used to calculate the probability that at least one of two or more events occurs. OR RULE Intuition: It's how you figure out the chances of getting either "this" or "that" to happen. If you simply add the individual probabilities, you might double-count the overlap, so the rule includes a way to correct for that. P(A OR B) = P(A) + P(B) - P(A AND B) Intuition: You add the probabilities of A and B, and then subtract the probability of their overlap (A and B both happening) because you counted that region twice (once in P(A) and once in P(B)). Mutually Exclusive: P(A AND B) = 0 P(A OR B) = P(A) + P(B) - 0 P(A OR B) = P(A) + P(B) Intuition: Since there is no overlap to double-count, you simply add the individual probabilities.

Answer 67

Definition: A probability rule that states the probability of an event not occurring is equal to one (or 100%) minus the probability that the event does occur. P(A) = 1 - P(A') P(A') = 1 - P(A) Intuition: Since an event must either happen or not happen, the probabilities of an event and its complement must sum to 1. This rule is especially useful for calculating the probability of complex events by finding the probability of the simpler opposite event and subtracting it from 1. Example: The chance of drawing at least one ace from a deck is often easier to find by calculating the chance of drawing no aces, and then subtracting that result from 1.

Answer 68

No overlap in Venn Diagram Definition: Two or more events are mutually exclusive (or disjoint) if the occurrence of one event precludes (makes impossible) the occurrence of the other. They cannot both happen at the same time. Intuition: The events have no overlap. If you're counting possibilities, the outcomes belonging to one event are completely separate from the outcomes belonging to the other. Example: When rolling a single six-sided die, the event "rolling a 2" and the event "rolling an odd number" are mutually exclusive. You cannot do both on the same roll. Key Rule (Addition Rule for Mutually Exclusive Events): The probability that at least one of two mutually exclusive events (A or B) occurs is simply the sum of their individual probabilities:

Answer 69

P(A|B) = P(A AND B) / P(B) How should I think about this graphically?

Answer 70

P(A|B) = P(A AND B) / P(B) P(A AND B) = P(A) * P(B|A) P(A|B) = P(B|A) * P(A) / P(B) Posterior Probability = P(A|B) Prior Probability = P(A) Likelihood = P(B|A)

Answer 71

Definition: Two events, A and B, are independent if the occurrence or non-occurrence of one event does not change the probability of the other event occurring. Intuition: The events have no influence on each other. If knowing whether A happened gives you no new information about the likelihood of B happening, they are independent. Mathematical Condition: The conditional probability of B given A is the same as the unconditional probability of B P(B|A)=P(B) Contrast: The opposite is dependence, where knowing the outcome of one event does change the probability of the other.

Answer 72

Definition: A formula used to calculate the exact probability of getting a specific number of "successes" (k) in a fixed number of independent trials (n), where there are only two possible outcomes for each trial (success or failure). Intuition: It answers the question: "If I repeat something n times, what's the chance that exactly k of those times turn out to be a success?" Formula: Complicated Conditions for Using the Binomial Formula (BINS): 1. Binary: Each trial has only two possible outcomes (success or failure) 2. Independent: The outcome of one trial does not affect the outcome of any other trial 3. Number (Fixed): The number of trials, n, is fixed in advance 4. Same Probability: The probability of success, p, is the same for every trial Is an application of the multiplication rule combined with the addition rule

Answer 73

Definition: A selection of items from a larger set where the order of selection does not matter. Intuition: It's about forming a group or a committee. For example, choosing a group of three people (A, B, and C) for a committee is the same combination whether you select them in the order A-B-C or C-B-A. Formula: The number of combinations of selecting k items from a set of n items (read as "n choose k) Contrast with Permutation: In a permutation, the order does matter. A combination is always a smaller number than the corresponding permutation because it eliminates all the repeated arrangements.

Answer 74

Definition: An arrangement of items in a specific order. It is a selection of a certain number of items from a set where the sequence of selection matters. Intuition: It's about forming a lineup or a ranking. For example, selecting three people (A, B, C) and assigning them roles (President, Vice-President, Secretary) is a permutation, because the order A-B-C is a different outcome than the order C-B-A. Formula: Contrast with Combination: In a combination, the order does not matter. Because order matters for a permutation, the number of permutations will always be greater than or equal to the number of combinations for the same n and k.

Answer 75

The essential difference between a combination and a permutation lies in whether the order of selection matters. A combination is a selection of items where order does not matter (it's a group or a committee). Key Concept: Forming a group. Intuition: Choosing a hand of cards. The hand (King, Queen, Jack) is the same regardless of the order the cards were drawn. A permutation is an arrangement of items where the order of selection does matter (it's an ordered list or a sequence). Key Concept: Forming an ordered arrangement. Intuition: Choosing a lineup or a ranking. Placing a King at position 1, Queen at position 2, and Jack at position 3 is different from placing the Queen at position 1, King at 2, and Jack at 3.

Answer 76

This is the combination formula Definition: The number that represents the number of ways to choose (or combine) k items from a set of n distinct items when the order of selection does not matter. Intuition: It calculates how many unique groups (or combinations) of a certain size can be pulled out of a larger total group. It's the "choose" part of the process.

Answer 77

Definition: The Law of Averages is a popular but incorrect (fallacious) interpretation of the Law of Large Numbers. It suggests that if an event has occurred less frequently than its expected probability over a short period, it is "due" to occur more frequently in the near future to "balance things out." Intuition (The Fallacy): People mistakenly believe that chance processes have a memory or a self-correcting mechanism. For example, believing that after five coin flips result in heads, the next flip is more likely to be tails. This is false. The Statistical Truth (The Law of Large Numbers): The true law states that as the number of trials increases, the proportion of times an event occurs will get closer to its theoretical probability. Crucial Insight: The law only governs the long run. Individual trials are independent. Past results do not influence future independent trials. A fair coin has a 50% chance of heads on every single flip, regardless of prior results. Key Consequence: The absolute difference between the expected number of occurrences and the actual number of occurrences usually increases as the number of trials grows, even as the proportional difference shrinks.

Answer 78

The Box Model is a conceptual tool used in probability and statistics to represent the structure of a chance process, such as drawing tickets, sampling, or repeated independent trials (like coin flips or dice rolls). Purpose The box model simplifies a complex real-world problem into a manageable statistical framework, allowing you to calculate the expected value and the standard error of the sum, average, or count of the draws. 1. What numbers go into the box? 2. How many of each kind? 3. How many draws total?

Answer 79

Definition: A fundamental theorem in probability that states that as the number of independent trials or observations increases, the average (or proportion) of the observed outcomes will tend to get closer and closer to the expected value (or theoretical probability). Intuition: The law formalizes the idea of the long run. It means that while the outcome of any single random event is unpredictable, the results of many repeated trials follow a predictable, stable pattern dictated by probability. The randomness of individual events tends to cancel out over time. Key Insight: The LLN only concerns the proportion or average of events. It does not mean that the absolute number of heads and tails in a coin flip experiment will get closer together. In fact, the absolute difference usually grows, but it becomes insignificant when viewed as a proportion of the total trials. Contrast with Law of Averages: The LLN is the correct statistical principle, while the Law of Averages is the fallacy that suggests a streak must be corrected in the short term. The LLN proves that for independent trials, past results do not influence future results.

Statistics_4E_Freedman Flashcards

Feedman, Pisani, Purves - 2007 (107 cards)