Analysing Data Flashcards

Question

Data Wrangling/Data Tidying

Answer 1

involves getting data into a useful format so you can visualise and model the data, for example, removing empty data cells

Answer 2

the number of independent pieces of information used to calculate a statistic

Answer 3

observations that only ever exist at limited values, often counts

Answer 4

strength of relationship between two continuous variables, eg the predictor and outcome variables in linear regression

Answer 5

as one variable increases, the rate at which the other variable changes also increases

Answer 6

relates to the consistency of the test over time

Answer 7

as one variable increases, so does the other and the rate of change remains constant

Answer 8

produced by ANOVAs, a ratio of 2 variances, which can be combined with the degrees of freedom to calculate the p-value to determine if you can reject the null hypothesis. In a low F-statistic, group means cluster tightly together, whereas in a high F-statistic, the group means spread out more than the variability between groups

Answer 9

independent variable in an ANOVA

Answer 10

categories in each factor

Answer 11

where data items are ranked; these cannot have a mean or median but can have a mode

Answer 12

all the hypothetical individuals that we want to understand about; an abstract concept of all the infinite individuals who do, have and could exist that we want to understand something about

Answer 13

useful for visualising continuous numerical data and performing maths on data, eg averages

Answer 14

the process of making influences based on data from samples in a systematic way to test ideas about the population

Answer 15

observations that come from the same distribution or family or distributions

Answer 16

where one observation is unrelated from the next, eg when assessing the spread of Covid, you should only sample one person per household

Answer 17

statistical test used to show whether tow means collected from independent variables differ significantly; the independent variable must have 2 independent groups and the dependent variable must be measured using continuous and normally distributed data

Answer 18

the value of one variable if the other was zero, demonstrated with where the line of best fit intercepts the x axis

Answer 19

whether the test is consistent within itself

Answer 20

estimates whether different questions measure the same idea eg if people who rate 1 item highly rate another item highly, Conbach's alpha is high

Answer 21

an indicator of the number of extreme values in the data

Answer 22

characterised by a narrower centre and longer tails, indicating more outliers

Answer 23

where distributions have a kurtosis of exactly three and conform to the classic bell curve shape of normal distribution

Answer 24

have a kurtosis of less than three and so have a flatter profile with short tails and few outliers

Answer 25

used to test how well one variable predicts another variable, telling us the strength of relationship between two continuous variables, how much one variable changes as another variable changes, the value of one variable if the other variable was zero, can predict a person's score on a variable and tells us statistical significance; commonly used when making predictions or looking to understand how much something changes as a function of something else

Answer 26

models the relationship between two continuous variables

Answer 27

variances that account for degrees of freedom

Answer 28

as one variable increases, the other variable decreases

Answer 29

used with Chi-squared test, Mann-Whitney test or Spearman's

Answer 30

as one variable increases, the other variable increases

Answer 31

where data is collected from the same participant twice in order to look for a change

Answer 32

used to establish whether the mean difference between two sets of observations is zero; data is collected from the same participant for both sets of observations, resulting in paired observations

Answer 33

make assumptions about the input they receive so the reliability of the output depends on how well the input corresponds to assumptions; a family of probability distributions with a finite number of parameters; parametric models include t-tests, linear regression or ANOVA

Answer 34

allows us to estimate how much of the variance in one variable can be explained by another, measures linear correlation between two variable

Answer 35

carried out to tell you where the difference the ANOVA found is, data-driven

Answer 36

abstract things we can't directly observe in people eg happiness, may be your dependent variable

Answer 37

instrument developed to measure a psychological construct

Answer 38

database of psychological measures, scales, surveys and other research instruments

Answer 39

came up with the CES-D, a self-report depression scale with 22 items ranked on 5-point Likert scale

Answer 40

relates to consistency of the test

Answer 41

statistics unaffected by outliers

Answer 42

a machine intentionally designed to perform a simple task in an overly complicated way

Answer 43

several randomly chosen individuals from the population who we can test or study and assume represent the population

Answer 44

the distribution of a sample, which must be smaller than that within the population, so the means of the samples will be less dispersed than they are in the population

Answer 45

a probability distribution based on a large number of samples from a given population and this will have a mean and standard deviation of its own; the more samples used, the less variable the means of the sample groups will be and, as a result, the standard error decreases

Answer 46

the variability from one sample to another

Answer 47

where data contains outliers that leads to asymmetry in data distribution

Answer 48

the likelihood of finding the observed relationship in the sample, if there was no relationship in the population

Answer 49

defines the probability that the null hypothesis will be rejected, typically 5%

Answer 50

shows the likelihood of observing an effect of this size and the hypothesis we're testing is true

Answer 51

where the longer tail slopes to the left because the mode is higher than the median

Answer 52

continuous variable we are looking to predict, put on the Y axis

Answer 53

where the longer tail slopes to the right because the mode is lower than the median

Answer 54

continuous variable we think will predict the variance in the outcome variable, put on the X axis

Answer 55

the process of creating a statistical model, inferring something about a population based on a sample taken

Answer 56

helps infer information about the population by taking information about the sample and generalising it to the population

Answer 57

the standard deviation of a sample population, measuring the accuracy with which a sample represents a population

Answer 58

how much of a relationship there is

Answer 59

a value that is calculated when conducting a statistical test of a hypothesis, showing how closely your observed sample data matches the distribution expected under the null hypothesis of that statistical test

Answer 60

useful for visualising continuous data and descriptive statistics, eg minimum and maximum bound, media, Q1 and Q3, useful for visualising outliers

Answer 61

form of ad hoc test, makes adjustments based on the number of comparisons being conducted by taking the absolute value of difference between pairs of means and dividing it by the standard error of the mean

Answer 62

result of a t-test; the greater the t-value, the larger the effect

Answer 63

when you reject the null hypothesis but it's true (false positive)

Answer 64

when you accept the null hypothesis but it's fasle (false negative)

Answer 65

tells us how spread out, or far from the mean, the data tends to be

Answer 66

difference between the 25% mark and the 75% mark of the data set, is not impacted by outliers

Answer 67

the difference between the largest and smallest data items, alerts you to errors in the data

Answer 68

a measure of the average distance of each point from the mean in the original units, telling us how concentrated around the mean data points are

Answer 69

data involving one variable, eg how tall someone is

Answer 70

relates to whether the test measures what it is supposed to measure

Answer 71

do scores correlate with other measures taken at the same time

Answer 72

does the test measure the construct it was designed to measure

Answer 73

do the items on the test appear to be measuring what they're measuring

Answer 74

can test scores be used to predict events

Answer 75

assesses the correlation between scores taken at 2 points in time from the same sample

Answer 76

how far the numbers are spread out from the mean

Analysing Data Flashcards

(101 cards)