Probability and Statistics Flashcards

Question

What type of analysis is performed when describing **one variable**?

Answer 1

Univariate analysis ## Footnote This focuses on a single variable's characteristics.

Answer 2

Bivariate analysis ## Footnote This examines the relationship between two variables.

Answer 3

Multivariate analysis ## Footnote This involves analyzing multiple variables simultaneously.

Answer 4

Variability ## Footnote These studies help summarize the dataset's characteristics.

Answer 5

All existing elements from a particular variable ## Footnote Collecting data from a population can be challenging due to its size.

Answer 6

A part of a population ## Footnote A sample ideally preserves the statistical characteristics of the population if its size is statistically significant.

Answer 7

The error decreases ## Footnote A larger sample size leads to more accurate inferences about the population.

Answer 8

Data points that differ from most of the data ## Footnote Outliers can indicate errors or new behaviors in the data.

Answer 9

* New natural behavior of the data * Error in the data collection process * Human error or bias * Equipment not being calibrated ## Footnote Understanding the context of data collection is crucial for interpreting outliers.

Answer 10

TRUE ## Footnote Recognizing the presence of outliers is important for accurate data interpretation.

Answer 11

To provide figures that summarize the data ## Footnote Central tendency includes measures such as mean, median, and mode.

Answer 12

* Variance * Standard Deviation * Skewness * Kurtosis * Percentiles and Quartiles * Range ## Footnote These measures help quantify how data is spread around the mean.

Answer 13

The spread of the data around the mean ## Footnote It indicates how far/close the data points are from the mean.

Answer 14

σ ## Footnote Standard deviation measures the dispersion of the data from the mean in either direction.

Answer 15

* Using the square root of variance * Using the method .std() ## Footnote np.sqrt() can be used for the square root calculation.

Answer 16

The asymmetry of the data ## Footnote A distribution is symmetric when it looks the same to the left and right of the center point.

Answer 17

The tail on the right side is longer ## Footnote This indicates that there are more extreme values on the higher end.

Answer 18

The tail of the left side of the distribution is longer ## Footnote This suggests more extreme values on the lower end.

Answer 19

Use .skew() ## Footnote This method provides the skewness value for the data.

Answer 20

Studying the tails of the distribution ## Footnote It measures the presence of outliers in the distribution.

Answer 21

Significant tails or many outliers ## Footnote This suggests a distribution with extreme values.

Answer 22

All values below a given percentage ## Footnote For example, the 50th percentile includes all values below which 50% of observations may be found.

Answer 23

* First quartile (Q1 or 25th percentile) * Second quartile (Q2 or 50th percentile or median) * Third quartile (Q3 or 75th percentile) ## Footnote Quartiles divide the percentiles into four parts.

Answer 24

The difference between the first and third quartile ## Footnote It helps understand the range where 50% of the most frequent data is.

Answer 25

Use .quantile() ## Footnote This method provides the quartile values for the data.

Answer 26

The difference between the maximum and minimum values ## Footnote It provides a measure of the spread of the data.

Answer 27

Use .min() ## Footnote This method retrieves the smallest value in the dataset.

Answer 28

To measure a test statistic that explains the relationship between variables ## Footnote Statistical tests are essential for analyzing data and drawing conclusions.

Answer 29

* Normality tests * Correlation tests * Parametric tests * Non-parametric tests ## Footnote These tests are grouped based on their objectives.

Answer 30

To evaluate if data is normally distributed ## Footnote It is a type of normality test.

Answer 31

* Pearson's * Spearman's * Chi-Squared Test ## Footnote These tests evaluate how variables correlate.

Answer 32

To evaluate if data from distinct groups are similar or different ## Footnote Examples include Student's t-test and ANOVA.

Answer 33

* Mann-Whitney U Test * Wilcoxon Test ## Footnote Non-parametric tests do not assume a specific data distribution.

Answer 34

If data from distinct groups are similar or different ## Footnote It is a non-parametric alternative to ANOVA.

Answer 35

* Determine differences or similarities between groups * Evaluate if a predictor variable has statistical importance to a target variable ## Footnote Statistical tests provide insights into data relationships.

Answer 36

relationship ## Footnote Understanding the relationship is crucial for interpreting statistical tests.

Answer 37

Significant difference between expected and observed frequencies in categorical variables ## Footnote It assesses how well observed data fits with expected data.

Answer 38

No difference in frequency or proportion of occurrences in each category ## Footnote It serves as a default position that indicates no effect or no difference.

Answer 39

There is a difference in frequency or proportion of occurrences in each category ## Footnote It typically represents the research question being tested.

Answer 40

Forming opinions or conclusions from collected data ## Footnote It involves comparing observed data to expected data.

Answer 41

Probability of rejecting the null hypothesis when it is true ## Footnote Commonly set at 5%, indicating a 5 in 100 chance of error.

Answer 42

A number explaining how different the relationship between variables is ## Footnote It varies in calculation depending on the type of statistical test.

Answer 43

Probability that the null hypothesis is true ## Footnote A smaller p-value indicates stronger evidence against the null hypothesis.

Answer 44

Enough evidence to reject the null hypothesis ## Footnote This indicates that the observed effect is statistically significant.

Answer 45

Whether a given data set is normally distributed ## Footnote The null hypothesis states that the population is normally distributed.

Answer 46

Reject the null hypothesis ## Footnote This suggests that the data is not normally distributed.

Answer 47

* Parametric tests * Nonparametric tests ## Footnote The choice between these tests depends on the normality of the data.

Answer 48

When the data is **normally distributed** ## Footnote Parametric tests assume that the underlying data follows a normal distribution.

Answer 49

When the data is **not normally distributed** ## Footnote Nonparametric tests do not assume a specific distribution of the data.

Answer 50

To test the difference between **two sample means** ## Footnote It tests if the difference in the means is 0.

Answer 51

William Gosset of **Guinness's Brewery** ## Footnote The t-test is also known as Student's t-test.

Answer 52

There are **no significant levels of difference** between the samples ## Footnote The alternative hypothesis states that there are significant levels of difference.

Answer 53

To test the difference between **two sample parameter values** ## Footnote The samples should be dependent or paired.

Answer 54

The samples should be **dependent (paired)** ## Footnote An example is testing the same group before and after an intervention.

Answer 55

Analysis of **Variance** ## Footnote ANOVA compares mean variation between three or more groups.

Answer 56

The data should be **normally distributed** ## Footnote ANOVA tests assume normality in the data distribution.

Answer 57

Pain threshold levels across different people with varying **hair colors** ## Footnote This dataset can show mean variations among groups.

Answer 58

To determine differences between two groups where at least one group is not normally distributed ## Footnote It is a nonparametric test and requires independent (unpaired) samples.

Answer 59

FALSE ## Footnote The Mann-Whitney U Test is a nonparametric test.

Answer 60

* Independent samples * Unpaired samples ## Footnote This test is used when at least one group is not normally distributed.

Answer 61

To analyze paired samples when at least one sample is not normally distributed ## Footnote It is a non-parametric test and requires dependent (paired) samples.

Answer 62

* Dependent samples * Paired samples ## Footnote This test is suitable for matched pairs, such as before and after measurements.

Answer 63

When examining the difference between scores on a test before and after a training intervention ## Footnote This involves the same group being tested twice.

Answer 64

To determine differences between three or more groups when at least one distribution is not normally distributed ## Footnote It is a nonparametric alternative to a one-way ANOVA.

Answer 65

FALSE ## Footnote The Kruskal-Wallis test is a nonparametric test.

Answer 66

It is used for three or more groups ## Footnote It assesses whether there are differences among groups when at least one distribution is not normally distributed.

Answer 67

A method of comparing two versions of a product to determine which one performs better ## Footnote Widely used in web development, marketing, and product design.

Answer 68

* Version A * Version B ## Footnote These versions are shown to different segments of users simultaneously.

Answer 69

To make data-driven decisions that can lead to improvements in user experience and business outcomes ## Footnote It helps identify which version of a product performs better based on specific metrics.

Answer 70

Testing a webpage with: * Version A: White background * Version B: Light blue background ## Footnote The objective is to determine which background color leads to a higher conversion rate.

Answer 71

Identify the objective clearly ## Footnote The objective should be specific, measurable, attainable, relevant, and time-bound (SMART).

Answer 72

Increase the conversion rate on product pages ## Footnote For example, testing whether adding customer reviews increases sales.

Answer 73

Improve the open rate of a promotional email ## Footnote For example, testing two different subject lines to see which leads to a higher open rate.

Answer 74

Enhance user engagement ## Footnote For example, testing whether adding a daily motivational quote feature increases average time spent on the app.

Answer 75

Increase sales of a product in the store ## Footnote For example, testing whether showing an ad at the store's entrance increases sales.

Answer 76

Visitors are randomly assigned to different versions to ensure unbiased results ## Footnote This helps in accurately measuring the performance of each version.

Answer 77

To ensure a large enough sample size to detect a meaningful difference ## Footnote For example, having 10,000 visitors in each version can provide reliable data.

Answer 78

FALSE ## Footnote A/B testing can be applied to various products, including apps and marketing campaigns.

Answer 79

41 different shades of blue for its link colors ## Footnote The objective was to identify the most effective color for increasing click-through rates.

Answer 80

To increase user interaction with ads ## Footnote Users were randomly assigned to see one of two ad formats.

Answer 81

Different layouts of the product detail page ## Footnote The objective was to see which layout led to higher conversion rates.

Answer 82

Create hypotheses ## Footnote A hypothesis is a statement that can be tested and will either be supported or rejected by the test results.

Answer 83

A statement that can be tested ## Footnote Each hypothesis should be specific and measurable.

Answer 84

* 2-sample cases * 1-sample cases ## Footnote A two-sample hypothesis test compares the performance metric of two different versions.

Answer 85

No significant difference between conversion rates ## Footnote Any observed difference is due to random chance.

Answer 86

There is a significant difference between conversion rates ## Footnote The observed difference is not due to random chance.

Answer 87

No significant difference between the sample statistic and the known industry standard ## Footnote Any observed difference is due to random chance.

Answer 88

The smallest improvement you want to detect ## Footnote For example, a 5 percent increase in the conversion rate.

Answer 89

0.05 ## Footnote This means you are willing to accept a 5 percent chance of a false positive.

Answer 90

The probability of detecting a true effect ## Footnote Typically set at 0.8, meaning an 80% chance.

Answer 91

To randomly assign users to control or treatment groups ## Footnote This helps eliminate biases and confounding variables.

Answer 92

Equal group sizes ## Footnote This ensures that the test results are more reliable.

Answer 93

Setting up and launching the A/B test correctly ## Footnote Any errors can compromise the integrity of the test results.

Answer 94

* Conversion rates * Click-through rates * Engagement metrics * Retention rates ## Footnote These metrics measure the success of the test.

Answer 95

Whether differences between groups are statistically significant ## Footnote It evaluates if observed differences are due to random chance.

Answer 96

p-value less than 0.05 ## Footnote Indicates that the results are statistically significant.

Answer 97

A range of values for the true effect of the treatment ## Footnote A typical value might be a 95% confidence interval.

Answer 98

* Magnitude of the effect * Business impact * Cost and feasibility * Long-term implications ## Footnote This ensures informed decisions that drive business outcomes.

Answer 99

FALSE ## Footnote Always assess practical implications and real-world impact.

Answer 100

Ignoring external factors ## Footnote These can influence test results, such as seasonality or market trends.

Answer 101

Unreliable results ## Footnote Ensure the sample size is large enough to detect a meaningful effect.

Answer 102

Selectively reporting favourable results ## Footnote Report all results transparently and conduct thorough analysis.

Answer 103

Long-term impact of changes ## Footnote Conduct follow-up tests if necessary.

Answer 104

increase completed forms ## Footnote This can be evaluated by A/B testing the two forms.

Answer 105

Regression model ## Footnote This model predicts the best time for maintenance based on data collected.

Probability and Statistics Flashcards

(129 cards)