In statistics, the same inputs and process should have only one output.
What term describes this?
deterministic
What is the opposite something deterministic – in other words, what term refers to a process where the same inputs and factors produce multiple outputs?
stochastic
Multiple outputs may arise from what factor, in which the same results are obtained but the technology used to document the observation is imprecise?
measurement error
What stochastic factor describes the variation which exists between subjects of study that gives rise to different results?
natural heterogeneity
What stochastic factor includes variables like the disappearance of funding or poor weather?
uncontrollable factors
What part of data analysis refers to visually displaying observations, removing the outliers, and subsetting the data?
preprocess
What are the two goals of exploratory data analysis (EDA)?
What is the long-term frequency of an event taking place known as?
probability
What word refers to probabilities associated with integers and categories (i.e., number of oranges on a tree)?
discrete
What do statisticians employ to analyze discrete outcomes?
probability mass functions (PMFs)
What are the four common kinds of distribution associated with probability mass functions?
What word refers to probabilities associated with non-integer numbers (i.e., likelihood that someone was born exactly 250 years after Horatio Nelson)?
continuous
What do statisticians employ to analyze continuous outcomes?
probability density functions (PDFs)
What are the three common kinds of distribution associated with probability density functions?
What kinds of distributions do statisticians employ for continuous outcomes without either positive or negative constraints?
normal distribution
Normal distributions form the cornerstone of what three kinds of data analyses?
The t-test, ANOVA, simple regression analysis and multiple regression analysis are all considered what?
linear models
What theorem says that even if data is not technically normally distributed, the samples which are very large will move towards normality?
central limit theorem
What is the formula used to express a normal distribution?
y ~ N(mu, sigma^2)
In the formula for the null distribution, what do the following variables refer to:
1. “y”
2. “N”
3. “mu”
4. “sigma^2”
The t-distribution is widely used in many statistical models and looks like the normal distribution, but becomes more divergent with smaller sample sizes due to the influence of what parameter?
degrees of freedom (df)
The normal distribution is useful with what thing, which assumes the mean/median (mu) varies linearly?
regression models
What does an analysis of variance show about the treatment?
whether the treatment effects the results relative to the control
What is a necessary characteristic of a valid hypothesis?
falsifiability