Analysing Data Flashcards

(101 cards)

1
Q

Alpha Value

A

defines the probability that the null hypothesis will be rejected, typically 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Alternative Hypothesis

A

states there is a relationship between two variables being studied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ANOVA

A

Analysis of Variance

tests whether there are statistically significant differences between the means of 3 or more samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

One-Way ANOVA

A

used with independent groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Factorial ANOVA

A

used with multiple independent groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ANCOVA

A

an ANOVA that includes a control variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

MANOVA

A

an ANOVA with multiple dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A Priori Test

A

a version of post-hoc test carried out before the ANOVA, data-driven

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Beta

A

how much one variable changes as another changes, demonstrated by a slope of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bimodality

A

distribution with two peaks, generally indicates two groups within a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bin Size

A

categories for continuous data in a histogram to be organised into

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bivariate Data

A

data involving two variables, eg how tall and heavy they are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bonferroni Correction

A

form of a priori test, involves adjusting the alpha level for our rejection of the null hypothesis with the new alpha level being based on the number of comparisons, then evaluate any p values from the T test against the new alpha cut off; alternatively, you an adjust the p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Categorical/Nominal Data

A

where data items exist as one of a number of unrelated options, eg participants select their answer from options red, yellow and green; these cannot have a mean or median, but can have a mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Central Limit Theorem

A

states that under appropriate conditions, the distribution of a normalised version of the sample mean converges to a standard normal distribution; this holds even if the original variables themselves aren’t normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Central Tendency

A

ways of measuring the centralness of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mean

A

add up all data items and divide by the number of data items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Median

A

middle number in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mode

A

the value that happens the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Confidence Interval

A

a range of values likely to include the population value with a degree of confidence, express how accurate an estimation of the a population parameter will be, normally indicated as a percentage where the population mean lies between an upper and lower bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Continuous/Interval Data

A

where potential data items go in a specific order with a fixed gap in between; mean, median and mode are all possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Correlations

A

quantify relationships by showing how much of a relationship there

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Correlation Coefficient

A

size of the effect, gives an indication of effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Curvilinear Relationship

A

as one variable changes, so does the other but only up to a certain point, after which there’s either no relationship or the direction of the relationship changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Data Wrangling/Data Tidying
involves getting data into a useful format so you can visualise and model the data, for example, removing empty data cells
26
Degrees of Freedom
the number of independent pieces of information used to calculate a statistic
27
Discrete Data
observations that only ever exist at limited values, often counts
28
Effect Size
strength of relationship between two continuous variables, eg the predictor and outcome variables in linear regression
29
Exponential Relationship
as one variable increases, the rate at which the other variable changes also increases
30
External Validity
relates to the consistency of the test over time
31
Linear Relationship
as one variable increases, so does the other and the rate of change remains constant
32
F-Statistic
produced by ANOVAs, a ratio of 2 variances, which can be combined with the degrees of freedom to calculate the p-value to determine if you can reject the null hypothesis. In a low F-statistic, group means cluster tightly together, whereas in a high F-statistic, the group means spread out more than the variability between groups
33
Factors
independent variable in an ANOVA
34
Levels
categories in each factor
35
Ordinal Data
where data items are ranked; these cannot have a mean or median but can have a mode
36
Population
all the hypothetical individuals that we want to understand about; an abstract concept of all the infinite individuals who do, have and could exist that we want to understand something about
37
Histogram
useful for visualising continuous numerical data and performing maths on data, eg averages
38
Hypothesis Testing
the process of making influences based on data from samples in a systematic way to test ideas about the population
39
Identical Distribution Data
observations that come from the same distribution or family or distributions
40
Independent Data
where one observation is unrelated from the next, eg when assessing the spread of Covid, you should only sample one person per household
41
Independent Samples T-Test
statistical test used to show whether tow means collected from independent variables differ significantly; the independent variable must have 2 independent groups and the dependent variable must be measured using continuous and normally distributed data
42
Intercept
the value of one variable if the other was zero, demonstrated with where the line of best fit intercepts the x axis
43
Internal Validity
whether the test is consistent within itself
44
Conbach's Alpha
estimates whether different questions measure the same idea eg if people who rate 1 item highly rate another item highly, Conbach's alpha is high
45
Split-Half Method
46
Kurtosis
an indicator of the number of extreme values in the data
47
Leptokurtic Distribution
characterised by a narrower centre and longer tails, indicating more outliers
48
Mesokurtic Distribution
where distributions have a kurtosis of exactly three and conform to the classic bell curve shape of normal distribution
49
Platykurtic Distribution
have a kurtosis of less than three and so have a flatter profile with short tails and few outliers
50
Linear Regression
used to test how well one variable predicts another variable, telling us the strength of relationship between two continuous variables, how much one variable changes as another variable changes, the value of one variable if the other variable was zero, can predict a person's score on a variable and tells us statistical significance; commonly used when making predictions or looking to understand how much something changes as a function of something else
51
Line of Best Fit
models the relationship between two continuous variables
52
Mean Square
variances that account for degrees of freedom
53
Negative Relationship
as one variable increases, the other variable decreases
54
Non-Parametric Statistical Model
used with Chi-squared test, Mann-Whitney test or Spearman's
55
Positive Relationship
as one variable increases, the other variable increases
56
Paired Samples
where data is collected from the same participant twice in order to look for a change
57
Paired Samples/Repeated Measures T-Test
used to establish whether the mean difference between two sets of observations is zero; data is collected from the same participant for both sets of observations, resulting in paired observations
58
Parametric Statistical Model
make assumptions about the input they receive so the reliability of the output depends on how well the input corresponds to assumptions; a family of probability distributions with a finite number of parameters; parametric models include t-tests, linear regression or ANOVA
59
Pearson's Correlation
allows us to estimate how much of the variance in one variable can be explained by another, measures linear correlation between two variable
60
Post-Hoc Test
carried out to tell you where the difference the ANOVA found is, data-driven
61
Psychological Construct
abstract things we can't directly observe in people eg happiness, may be your dependent variable
62
Psychological Test
instrument developed to measure a psychological construct
63
PsycTESTS
database of psychological measures, scales, surveys and other research instruments
64
Radloff (1977)
came up with the CES-D, a self-report depression scale with 22 items ranked on 5-point Likert scale
65
Reliability
relates to consistency of the test
66
Robust Statistics
statistics unaffected by outliers
67
Rube-Goldberg Machine
a machine intentionally designed to perform a simple task in an overly complicated way
68
Sample
several randomly chosen individuals from the population who we can test or study and assume represent the population
69
Sampling Distribution
the distribution of a sample, which must be smaller than that within the population, so the means of the samples will be less dispersed than they are in the population
70
Sampling Distribution of a Statistic
a probability distribution based on a large number of samples from a given population and this will have a mean and standard deviation of its own; the more samples used, the less variable the means of the sample groups will be and, as a result, the standard error decreases
71
Sampling Variation
the variability from one sample to another
72
Skewness
where data contains outliers that leads to asymmetry in data distribution
73
Significance
the likelihood of finding the observed relationship in the sample, if there was no relationship in the population
74
Significance Level
defines the probability that the null hypothesis will be rejected, typically 5%
75
Significance Values
shows the likelihood of observing an effect of this size and the hypothesis we're testing is true
76
Negative Skew
where the longer tail slopes to the left because the mode is higher than the median
77
Outcome Variable
continuous variable we are looking to predict, put on the Y axis
78
Positive Skew
where the longer tail slopes to the right because the mode is lower than the median
79
Predictor Variable
continuous variable we think will predict the variance in the outcome variable, put on the X axis
80
Statistical Inference
the process of creating a statistical model, inferring something about a population based on a sample taken
81
Statistical Model
helps infer information about the population by taking information about the sample and generalising it to the population
82
Standard Error
the standard deviation of a sample population, measuring the accuracy with which a sample represents a population
83
Strength
how much of a relationship there is
84
Test Statistic
a value that is calculated when conducting a statistical test of a hypothesis, showing how closely your observed sample data matches the distribution expected under the null hypothesis of that statistical test
85
Tukey Boxplot
useful for visualising continuous data and descriptive statistics, eg minimum and maximum bound, media, Q1 and Q3, useful for visualising outliers
86
Tukey's Honest Significant Difference (HSD) Test
form of ad hoc test, makes adjustments based on the number of comparisons being conducted by taking the absolute value of difference between pairs of means and dividing it by the standard error of the mean
87
T-Value
result of a t-test; the greater the t-value, the larger the effect
88
Type 1 Error
when you reject the null hypothesis but it's true (false positive)
89
Type 2 Error
when you accept the null hypothesis but it's fasle (false negative)
90
Variability
tells us how spread out, or far from the mean, the data tends to be
91
Interquartile Range
difference between the 25% mark and the 75% mark of the data set, is not impacted by outliers
92
Range
the difference between the largest and smallest data items, alerts you to errors in the data
93
Standard Deviation
a measure of the average distance of each point from the mean in the original units, telling us how concentrated around the mean data points are
94
Univariate Data
data involving one variable, eg how tall someone is
95
Validity
relates to whether the test measures what it is supposed to measure
96
Concurrent Validity
do scores correlate with other measures taken at the same time
97
Construct Validity
does the test measure the construct it was designed to measure
98
Face Validity
do the items on the test appear to be measuring what they're measuring
99
Predictive Validity
can test scores be used to predict events
100
Test-Retest Validity
assesses the correlation between scores taken at 2 points in time from the same sample
101
Variance
how far the numbers are spread out from the mean