Describe the 2 components of a GLM
Our goal in modelling with GLM is to shift as much of the variability as possible away from random component into systematic component.
Identify the variation function for the following distributions:
1. Normal
2. Poisson
3. Gamma
4. Inv Gaussian
5. NB
6. Binomial
7. Tweedie
Define the 2 most common link functions
List 3 advantages of log link function
Log link function transforms linear predictor in a multiplicative structure: u = exp(b0 + b1x1 +…)
Which has 3 advantages:
1. Simple and practical to implement
2. Avoids negative premiums that could arise if additive structure
3. Impact of risk characteristics is more intuitive
List 2 uses of offset terms
Allows you to incorporate pre-determined values for certain variables into model so GLM takes them as given.
2 uses:
1. Deductible factors are often developed outside of model
ln(u) = b0 + b1x1 +…+ ln(Rel(1-LER))
Describe the steps to calculate offset
Describe 3 methods to assess variable significance
Describe 2 distributions appropriate for severity modelling and their 5 desired characteristics
Gamma and Inverse Gaussian
1. Right-skewed
2. Lower bound at 0
3. Sharp peaked (inv gauss > gamma)
4. Wide tail (inv gauss > gamma)
5. Larger claims have more variance (u^2, u^3)
Describe 2 distributions appropriate for frequency modeling
Describe 2 characteristics that should have frequency error distribution
Describe which distribution is appropriate for pure premium / LR modeling and gives 3 reasons/desired characteristics
Tweedie:
1. Mass point at zero (lots of insured have no claims)
2. Right-skewed
3. Power parameter allows some other distributions to be special cases (p=0 if normal, p=1 if poisson, p=2 if Gamma, p=3 if Inv Gauss)
What happens where power parameter of tweedie between 1 and 2
Compound poisson freq & gamma sev
Smoother curve with no apparent spike
Implicit assumption that freq & sev move in same direction (often not realistic but robust enough)
Calculate mean of Tweedie
lambda * alpha * theta
Calculate power parameter of Tweedie
p = (a+2) / (a+1)
Calculate dispersion parameter of Tweedie
lambda^(1-p) * (a*theta)^(2-p) / (2-p)
Identify 3 ways to determine p parameter of Tweedie
Describe which distribution is appropriate for probability modeling
Binomial
Use mean as modelled prob of event occurring
Use logic function:
u = 1/(1+exp(-x))
Odds = u/(1-u)
It is good practice to log continuous variables before using in model.
Explain why and give 2 exceptions.
Forces alignment of predictors scale to that of entity they are predicting. Allows flexibility in fitting different curve shapes.
2 exceptions:
1. Using a year variable to pick up trend effects
2. If variable contains values of (since ln(1) undefined)
Why do we prefer choosing level with most observations as base level
Otherwise, there will be wide CI around coefficients estimates (although same predicted relativities)
Discuss how high correlation between 2 predictor variables can impact GLM
Main benefit of GLM over univariate analysis is being able to handle exposure correlation.
However, GLM run into problems when predictor variables are very highly correlated. This can result in unstable model, erratic coefficients and high standard errors.
Describe 2 options to deal with very high correlation in GLM
Describe multicollinearity, its potential impacts and how to detect
Occurs when there is a near-perfect linear dependency among 3 or more predictor variables.
When exists, the model may become unstable with erratic coefficients and may not converge to a solution
One way to detect is to use variation inflation factor (VIF) which measures impact on square error of a predictor due to presence of collinearity with other predictors.
VIF of 10 or more is considered high and would indicate to look into collinearity structure to determine how to best adjust model.
Describe aliasing
Aliasing occurs when there is a perfect linear dependency among predictor variables (ex: when missing data are excluded)
The GLM will not converge (no unique solution) or if it does, coefficients will make no sense.
Most GLM will detect and automatically remove one of the variables.
Identify 2 important limitations of GLM