whats census
-when you sample the entire population
what problems come with taking a census?
-too expensive
-undercover ( may not actually include everyone)
-too time consuming
how to solve the issues that come with using census
-to collect data from a sample of the census (population)
sample statistics
are summaries that are found from data in a sample
what are the two inferences (conclusions) that can be made using sample statistics
-population inference: when the sample stats is used to represent the entire population (can be used for both observational and experimental statistics)
-Causal (cause-and-effect) inference: the process of determining the independent, actual effect of a particular phenomenon (this means that it can’t be used in observational studies)
population inference
-We can only make population inferences when we have random sampling (as the best manner to represent the entire pop)
why is randomizing much more beneficial that non-randomizing
-randomizing allows for the sample to be a better representation of the entire pop
-it eliminates bias (the entire population will be represented no over or under-representation typically)
if we use non-random sampling who does that data represent
the data represents only the sample not the enitre population
What are the random sampling methods?
-simple random sampling (SRS)
-stratified random sampling
-cluster random sampling
-systematic random sampling
What’s simple random sampling?
(the rate of sample viability)
when each sample of size n in the population has the same chance of being selected
-there is sample viability in the sense that each draw selects different people and thus meaning we will have different values
whats sampling viability
the extent which the value of a statistic (data) differs across series of samples
whats stratified random sampling
(the rate of sample viability)
when the population is first divided into strata and then we take an SRS within each stratum
-this reduces the level of viability of our results
-it can reduce bias as population is being represented
what is a strata
a homologous group
whats systematic random sampling
-start from any random individual in a list or etc. and pick the kth individual
-this can give a representative sample if the list is in no order
-this can be less expensive that true random sampling as you are able to have more control over the sample
cluster random sampling
splitting the population into similar groups called clusters, and then selecting a few clusters at random (SRS) and performing a census (when you sample the entire pop) within the chosen clusters
-it gives us an unbiased sample
whats bias
is the tendency of the sample to differ from the corresponding pop in some systematic way
what are sources of bias
-selection bias: when a portion of the population is over or under-represented in a sample (usually the ones under rep differ from the pop)
-response bias: refers to anything in the survey design that influences the response, respondents may live because of this
-voluntary response: occurs when individuals can choose on their own whether they want to participate or not
-Nonresponse bias: occurs when a large proportion of those samples failed to respond
Causal (cause and effect ) inference
only made when we have random allocation
-when random allocation isn’t present the difference in reasons may be caused by lurking variables
whats random allocation
choosing individuals to be part of the control group or treatment group at random
what are lurking variables
variables that are related to both group memberships and the response. these are other variables that could possibly explain the results
what are the two types of study designs
-observational studies
-randomized experiment
whats observational studies
(the two types)
-the investigator observes individuals and measures variables of interest, no treatment is being done on the sample.
-the two types are:
*retrospective study: when collecting the data it is already been present (the info is collected from the past)-this is useful when an outcome is rare however it can contain many types of observation errors
*prospective study: the a=data is being collected as the event unfolds
whats randomized comparative experiments
-an experiment that allows us to prove a cause-and-effect relationship
-this puts the samples in a treatment
-manipulates factor levels to create a treatment
-random allocation
compares the response of the subjects across treatment levels