What is the objective of Generalising Linear Models?
To allow us to do regression in problems where our Yi is not normally distributed
What is the stochastic/random part of a model?
The form of the model which characterises the distribution of Yi (eg. Yi ~ N(mu(i), sigma²)
What is the structural part of the model?
A function of mu(i) which describes its relationship with the covariates (eg. mu(i) = B0 + B1X1 + B2X2 + … + BPXP)
What are the two types of model which we go over in this course?
- Binomial Model (for binary or binomial outcomes)
What is the difference between a binomial outcome and a binary outcome?
Binary (or Bernoulli) outcome is dependent on a single trial where as Binomial outcome is dependent on a number of trials
What is a link function?
A function which describes the relationship between the parameter of a distribution and the covariates
What is the link function for the Poisson Model?
log(lambda) = linear covariates
*natural logarithm
What is the link function for the Binomial Model?
log(odds of success) = linear covariates
*natural logarithm
Define the term “odds”?
A quantity which the the ratio of the probably of an event occurring divided by the probability of the event not occurring.
= [p(A)] / [p(not A)]
*in Bernoulli events, “A” is success and “not A” is failure
How do you read data from a CSV file into R?
data = read.csv(“filename.csv”)
What is the R function for viewing the first few rows of a data object?
head(data)
What is the R function for viewing the names of the variables in a data object?
names(data)
What is the R code for viewing the values under a specific variable name in a data object?
data$variableName
What is the R code for viewing the number of each type of value under a specific variable name in a data object?
table(data$variableName)
What is the R code for viewing the proportion of each type of value under a specific variable name in a data object?
prop.table(table(data$variableName))
What is the R code for adding a variable name to a data object based on some condition of each row?
data$newVariable = ifelse(data$conditionVariable == “something”, 1 , 0)
Will set newVariable to 1 if condition is true else set newVariable to 0
What is the R code for fitting a GLM to a binomial dependent variable and viewing a summary of the model?
model1 = glm(dependent ~ explanatory, family = “binomial”, data = dataObject)
summary(model1)
What is does logit(p) equate to?
log(odds of p)
How do we know how well the model fits the data?
D = -2(l(c) - l(f)) D ~ chi-squared with n-k-1 * where n is number of observations * where k+1 is the parameters estimated * where l is likelihood * where c is current * where f is a full/ideal model which fits all of the data
%Deviance Explained = [Dnull - Dcurrent]/[Dnull]
* where Dnull is the Deviance of the model with just the intercept
What is the R code for fitting a GLM to a “count” dependent variable and viewing a summary of the model?
model2 = glm(dependent ~ explanatory, family = “poisson”, data = dataObject)
summary(model2)
What is the criteria for a distribution to be part of the Exponential Family?
The distribution must be able to be written in the form:
exp{a(y).b(theta) + c(theta) + d(y)}
What is the Interaction Component/Term of a distribution written in Exponential form?
a(y).b(theta)
What is the Additive Component/Term of a distribution written in Exponential form?
c(theta) + d(y)
What is the Natural Parameter of a distribution written in Exponential form?
b(theta)