discrete vs continuous data
discrete:
- set number of values, eg shoe size
continuous:
- can have any value, eg height
definition:
population
total set of possible values that could be selected for the sample
definition
sampling unit
a single member of the population
definition
sample
a selection of sampling units observed to make conclusions about population as a whole
definition
sampling frame
a list of all members of the population
advantages and disadvantages:
sample
advantages
- less time consuming/ expensive
- fewer people to respond
- less data to process than census
disavantages:
* data may not as accurate as census
* may not be large enough to give info abt small sub groups of population
dis/advantages
census
pros
* should give accurate results
cons
* time / expensive
* can’t be used when testing process destroys the item
* hard to process large quantity of data
Systematic sampling definition
A sample is formed by choosing members of a population at regular intervals using a list
stratified sampling
pros and cons of stratified sampling
PROS
* useful when very diff groups in population
* sample represenative of population structure
* members selected randomly
CONS
* can’t be used if not possible to split population into specific groups
* same cons as simple random
opportunity sampling
sample is formed using available members of population who fit criteria
Pros and cons of opportunity sampling
PROS
* Quick and easy
* useful when list of population not possible
CONS
* unlikely to be representative of population structure
* likely to produce biased results
pros and cons of quota sampling
PROS
* useful when sampling frame not available
* sample will be representative of population structure
CONS
* may introduce bias as some members of the population may choose not to be sampled
in a data set
outliers are
any data points 2 standard deviations more or less than mean
in a box plot
outliers are
any data point that is 1.5x IQR more or less than upper or lower quartile
how to work out estimated mean in a frequency table
coding
measure of location is affected by:
measure of spread is affected by:
measure of location is affected by: all operations
measure of spread is affected by: only multiplication or division
linear interpolation
what do you do to the value when finding quartiles / percentiles for discrete data?
How to work out outliers?
if not in the range:
[Q1-1.5(IQR)] , [Q3+1.5(IQR)]
2 events CANNOT be both:
independent and mutually exclusive
because
- when mutually exclusive: P(A n B) = 0
- when independent: P(A n B) = P(A) x P(B) and these 2 cannot be equal
to work out P(A l B’):
P(A n B’) / P(B’)
probability
condition for independency:
P(AnB) = P(A) x P(B)
condition for mutually exclusive:
P (A n B) = 0
What is a histogram?