What’s the purpose of GLM model?
to model the incremental losses q(w,d)
How to handle negative incremental value in triangle?
How to calculate Standardized Residual
Residual divded by Standard Deviation (aka. standard error)
Model Result Reasonability check for Standard Error
Standard Error
- S.E. should increase from older to more recent years (bc s.e. follows the magnitude of the results)
- Total s.e. should be larger than any individual year s.e.
- Total s.e. should be less than the sum of s.e. across all AYs (since the model assumes independence btw AYs)
Model Result Reasonability check for CoV
CoV
- CoV should GENERALLY decrease when moving from oldest to most recent years
- CoV may rise in the most recent year’s due to
1. Increasing parameters will bring in higher parameter uncertainty to more recent years
2. The model may be overestimating the variability in the most recent years
- Total CoV should be smaller than any individual year’s CoV
- Total CoV should be less than the sum of CoV across all AYs
Why homoscedasticity is needed for bootstrapping
Bootstrap assumes residuals to be independent and identically distributed. Heteroscedasticity will violate this assumption since the variance of different residuals will be different
What are the solutions for heteroscedasticity?
How to handle heteroscedasticity with stratified sampling
Group development periods with homogeneous variances
Sample with replacement from the groups only
Disadvantage - some groups may have limited residuals thus reduced credibility
How to handle heteroscedasticity with Calculating Variance Parameter
Group development periods with homogeneous variances
Calculate s.e. of the standardized residuals in each of the “hetero” groups
Calculate the hetero-adjustment factor hi = (all std residuals combined)/(std residuals in group i)
Multiply all residuals in group i by the hi
Sample with replacement within ENTIRE triangle. Divide the resampled residuals by hi corresponding to the cell
How to handle heteroscedasticity with scale parameters
Hetero-adj facors based on different scale params
use the ratio of SQRT(overall scale param) to SQRT(scale param by age i).
N = total cells in the triangle
p = alpha (AYs) + beta (development periods,usually alpha -1)+ #hetero adj factors
#hetero adj factors = #of hetero groups - 1
ni= # cells in group i
rw,d here is pearson (uncaled) residual, not standardized residual
Then hetero-adj factors are used the same way as in hetero-adj based on s.e. of residuals
see below for scale param for hetero group i
How to handle Exposure Change in triangle
Divide all loss data by exposure for each AY to get Pure Premium
Run model based on pure premium
apply back exposure
How to handle Heteroecthesious data in triangle (partial first development period data)
Partial first development period data -
reduce future incremental losses for the latest AY to correspond to the earned exposure
-> then simulate process variance
How to handle heteoecthesious data (partial last calendar period data)
Partial last calendar period data -
Annualize the last triangle so that they're in line with the rest of the triangle Calculate the fitted triangle and residuals During ODP bootstrap simulation, calculate and interpolate LDFs from the fully annualized sample triangles De-annualize last triangle Project future values by multiplying the interpolated LDFs with the new cumul values Reduce future incr values for the latest AY to remove future exposure
Formula for unscaled Pearson unscaled residual
General ODP Bootstrap
GLM model setup
Formula for unscaled pearson residuals
note that z=1 for poisson (most of the time)
Formula for standardized residuals
The power z is in estimated variance for each distribution
Possion z=1
Gamma z=2
Inverse Gaussian z =3
Ways to handle Outliers
Three options to remove outliers when calculating ATA factors
What do we do if significant amount of outliers?
May indicate poor fit of model
For GLM bootstrap, choose new params/change the distribution
For ODP bootstrap, L-yr wtd avg can be used to provide a better model fit, but if skewness is real, then the bootstrap will keep it
how to handle missing values?
GLM Bootstrap Model -
Missing data simply reduces the number of observations in the data
ODP Bootstrap Model -
estimate from surrounding values
or, modify LDFs to exclude missing values
Solution 1: estimate missing values from surrounding values Solution 2: modify LDFs to exclude the missing value, no residual for missing value -> don’t resample from the missing value Solution 3: if missing value is on the latest diagonal, estimate value/ use value in the 2nd to last diagonal
Negative Values during simulation of process variance (aka. mw,d is negative), how to handle?
Option 1 :
- change the sign of simulated value
Option 2:
- shift the entire distribution to have a mean of mw,d