03.5 - statistics and data science Flashcards

(39 cards)

1
Q

data definition

A

facts, quantities, or items of information about a person or object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

data can be _ or _

A

quantitative (numerical) or qualitative (categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do we collect data in the sciences?

A

by conducting experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

data helps us understand out

A

study system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ways to get/use data

A

measure, observe, generate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to test hypothesis//expected outcomes of experiment

A

use statistics with collected data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

when do scientists use stats

A

To design studies and select sample sizes
To support or negate hypotheses
To understand error, uncertainty, and outliers in data
To interpret and summarize data
To make money??

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

4 top skills of data science

A

data processing
statistics
data visualization
presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

summary statistics

A

presentation of data in an easy to understand and easy to digest snippets
better than using long lists of numbers - not informative or easily interpretable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mean

A

average of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to calculate mean

A

add variables together and divide by sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

median

A

middle number of data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to find median

A

place numbers in order and find. middle value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

range

A

how spread out the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to find range

A

largest - smallest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

best way to visualize data

A

use a box plot

17
Q

box plots show

A

minimum
first quartile
median
mean
third quartile
maximum
outliers

18
Q

first step after asking a research question

A

generate null and alternate hypothesis

19
Q

how to decide which hypothesis to reject

A

use more complex statistical analysis (like p-values)

20
Q

p value definition

A

the probability that any differences seen between datasets are due to natural variation

21
Q

p represents

22
Q

p value will always be between

A

0-1 (0%-100%)

23
Q

higher p value means

A

differences in data are more likely to be due to natural variation

24
Q

which hypothesis is more likely with a higher p value

25
which hypothesis is more likely with a lower p value
alternate
26
lower p value means
differences in data are less likely to be due to natural variation
27
p=0.68 means
there is a 68% chance that differences in the data are due to natural variation
28
how sure must you be that statistical differences are due to random variation?
less than 5% chance
29
when p>0.05
there is more than a 5% chance that differences between dataset are due to random chance insignificant amount of difference accept null hypothesis
30
when p<0.05
there is less than a 5% chance that differences between dataset are due to random chance significant amount of difference accept alternate hypothesis
31
F test statistic
X^2 statistic tells the amount of difference between groups
32
higher value of F test statistic means
more variation within the data
33
degrees of freedom tells
how many individual points or groups are within the data
34
higher degrees of freedom =
more significant results
35
degrees of freedom between groups =
number of groups -1
36
degrees of freedom within groups
number of samples - number of groups
37
post-hoc tests are used when
there are differences (p<0.05) to see where the differences are
38
tukey's HSD compares
each treatment pairwise to tell us where differences are
39
interpretation of results in tukey's hsd
If p < 0.05 then there is a difference between the two groups If p > 0.05 then there is no difference between the two groups