Original link: http://tecdat.cn/?p=13963

Original source:Tuo End number according to the tribe public number

 

In actuarial science and insurance rate setting, considering exposure to risk ** can be a nightmare. Somehow, the simple result is more complicated to calculate, simply because we have to take into account the fact that exposure is a heterogeneous variable.

 

Exposure in premium rate setting can be viewed as a matter of reviewing the data (in my data set, the exposure is always less than 1 because the observation is the contract, not the policyholder), and the interest variable is the unobserved variable because we have to price the insurance contract for one (full) year of coverage. Therefore, we must model the annual frequency of insurance claims.

 

In our data set, we consider the ratio of total claims to total risk-taking. For example, if we consider the Poisson process, the probability is

namely

 

 

So we have an estimate of the expected value, a natural estimate.

Now, we need to estimate the variance, or more accurately, the conditional variable.

This can be used to test whether poisson’s hypothesis is valid for frequency modeling. Consider the following data set,

>  nombre=rbind(nombre1,nombre2)
>  baseFREQ = merge(contrat,nombre)
Copy the code

Here, we do have two variables of interest, the exposure per contract,

>  E <- baseFREQ$exposition
Copy the code

And the number of claims (observed) (over that time period)

>  Y <- baseFREQ$nbre
Copy the code

Without covariates, the average (annual) number of claims per contract can be calculated along with the associated variances

> (mean=weighted. Mean (Y/E,E)) [1] 0.07279295 > (variance=sum((y-mean *E)^2)/sum(E)) [1] 0.08778567Copy the code

It looks like the variance is (slightly) greater than the average (we’ll see how to test it more formally in a few weeks). Covariates can be added to the area where the policyholder lives, such as population density,


Density, zone 11 average = 0.07962411  variance = 0.08711477 
Density, zone 21 average = 0.05294927  variance = 0.07378567 
Density, zone 22 average = 0.09330982  variance = 0.09582698 
Density, zone 23 average = 0.06918033  variance = 0.07641805 
Density, zone 24 average = 0.06004009  variance = 0.06293811 
Density, zone 25 average = 0.06577788  variance = 0.06726093 
Density, zone 26 average = 0.0688496   variance = 0.07126078 
Density, zone 31 average = 0.07725273  variance = 0.09067 
Density, zone 41 average = 0.03649222  variance = 0.03914317 
Density, zone 42 average = 0.08333333  variance = 0.1004027 
Density, zone 43 average = 0.07304602  variance = 0.07209618 
Density, zone 52 average = 0.06893741  variance = 0.07178091 
Density, zone 53 average = 0.07725661  variance = 0.07811935 
Density, zone 54 average = 0.07816105  variance = 0.08947993 
Density, zone 72 average = 0.08579731  variance = 0.09693305 
Density, zone 73 average = 0.04943033  variance = 0.04835521 
Density, zone 74 average = 0.1188611   variance = 0.1221675 
Density, zone 82 average = 0.09345635  variance = 0.09917425 
Density, zone 83 average = 0.04299708  variance = 0.05259835 
Density, zone 91 average = 0.07468126  variance = 0.3045718 
Density, zone 93 average = 0.08197912  variance = 0.09350102 
Density, zone 94 average = 0.03140971  variance = 0.04672329
Copy the code

This information can be visualized

> plot(meani,variancei,cex=sqrt(Ei),col="grey",pch=19,
+ xlab="Empirical average",ylab="Empirical variance")
> points(meani,variancei,cex=sqrt(Ei))
Copy the code

 

The size of the circle is related to the size of the group (the area is proportional to the total exposure within the group). The first diagonal corresponds to the Poisson model, i.e. the variance should be equal to the mean. Other covariables can also be considered

 

Or car brands,

 

Age of the driver can also be regarded as a categorical variable

Let’s take a closer look at different age groups,

 

On the right, we can observe the young (inexperienced) driver. That was to be expected. But some categories fall below the first diagonal: the expected frequency is large, but the variance is small. That said, we can be sure that young drivers will have more accidents. On the contrary, it is not an anomaly: young drivers can be regarded as a relatively homogeneous class, with a high frequency of crashes.

Using the original data set (in this case, I’m only using a subset with 50,000 customers), we did get the following graph:

 

Since the circle is dropping from 18 to 25 years old, there is a noticeable experience effect.

We can also see that it is possible to treat exposure as a standard variable and see if the coefficient is actually equal to 1. If I don’t have any covariables,

Call: glm(formula = Y ~ log(E), family = poisson("log")) Deviance Residuals: Min 1Q Median 3Q Max -0.3988-0.3388-0.2786-0.1981 12.9036 Coefficients: Estimate Std. Error z value (Pr > | z |) (Intercept) to 2.83045 0.02822 100.31 < 2-16 * * * E the log (E) 0.53950 0.02905 18.57 < 2 E - 16  *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '0.1' (Dispersion parameter for poisson family taken to be 1) Null deviance: 12931 on 49999 degrees of freedom Residual deviance: 12475 on 49998 degrees of freedom AIC: 16150 Number of Fisher Scoring iterations: 6Copy the code

That is, the parameter is clearly strictly less than 1. It has nothing to do with importance,

Linear hypothesis test Hypothesis: log(E) = 1 Model 1: restricted model Model 2: Y ~ log(E) res. Df Df Chisq Pr(>Chisq) 1 49999 2 49998 1 251.19 < 2.2e-16 *** -- signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '0.1' 1Copy the code

I’m not taking covariates into account,

Deviance Residuals: Min 1Q Median 3Q max-0.7114-0.3200-0.2637-0.1896 12.7104 Coefficients: Estimate Std. Error z value (Pr > | z |) (Intercept) 14.07321 181.04892 0.078 0.938042 log (exposition) 0.56781 0.03029 18.744 < 2e-16 *** carburantE -0.17979 0.04630-3.883 0.000103 *** as.factor(ageconducteur)19 12.18354 181.04915 0.067 As.factor (ageconducteur)20 12.48752 181.04902 0.069 0.945011Copy the code

Therefore, it may be too strong an assumption to assume that exposure is the exogenous variable here.

Next we begin to discuss excessive dispersion in modeling the frequency of claims. Earlier, I discussed the calculation of the variance of experience with different levels of exposure. But I only use one factor to calculate the class. Of course, more factors can be used. For example, using the Cartesian product of factors,

Class D A (17,24] average = 0.06274415 variance = 0.06174966 Class D A (24,40] average = 0.07271905 variance = 0.07675049 Class D A (40,65] average = 0.05432262 variance = 0.06556844 Class D A (65,101] average = 0.03026999 variance = 0.02960885 Class B (17,24] average = 0.2383109 variance = 0.2442396 Class B (24,40] average = 0.06662015 variance = 0.07121064 Class B (40,65] average = 0.05551854 variance = 0.05543831 Class B (65,101] average = 0.0556386 Variance = 0.0540786 Class D C (17,24] average = 0.1524552 variance = 0.1592623 Class D C (24,40] average = 0.0795852 Variance = 0.09091435 Class D C (40,65] average = 0.07554481 variance = 0.08263404 Class D C (65,101] average = 0.06936605 variance = 0.06684982 Class D (17,24) average = 0.1584052 variance = 0.1552583 Class D (24,40] average = 0.1079038 variance = 0.121747 Class D (40,65] average = 0.06989518 variance = 0.07780811 Class D (65,101) average = 0.0470501 variance = 0.04575461 Class D E (17,24] average = 0.2007164 variance = 0.2647663 Class D E (24,40] average = 0.1121569 variance = 0.1172205 Class D (40,65] average = 0.106563 variance = 0.1068348 Class D (65,101] average = 0.1572701 variance = 0.2126338 Class D F (17,24] average = 0.2314815 variance = 0.1616788 Class D F (24,40] average = 0.1690485 variance = 0.1443094 Class D F (40,65] average = 0.08496827 variance = 0.07914423 Class D F (65,101) average = 0.1547769 variance = 0.1442915 Class E A (17,24] average = 0.1275345 variance = 0.1171678 Class E A (24,40] average = 0.04523504 variance = 0.04741449 Class E A (40,65] average = 0.05402834 variance = 0.05427582 Class E A (65,101] average = 0.04176129 variance = 0.04539265 Class E B (17,24] average = 0.1114712 variance = 0.1059153 Class E B (24,40] average = 0.04211314 variance = 0.04068724 Class E B (40,65] average = 0.04987117 variance = 0.05096601 Class E B (65,101] Average = 0.03123003 variance = 0.03041192 Class E C (17,24] average = 0.1256302 variance = 0.1310862 Class E C (24,40] Average = 0.05118006 variance = 0.05122782 Class C (40,65] average = 0.05394576 variance = 0.05594004 Class C (65,101] average = 0.04570239 variance = 0.04422991 Class D (17,24] average = 0.1777142 variance = 0.1917696 Class D (24,40] average = 0.06293331 variance = 0.06738658 Class E D (40,65] average = 0.08532688 variance = 0.2378571 Class E D (65,101] average = 0.05442916 variance = 0.05724951 Class E E (17,24] average = 0.1826558 variance = 0.2085505 Class E E (40, 40] average = 0.07804062 variance = 0.09637156 Class E (40,65] average = 0.08191469 variance = 0.08791804 Class E E (65,101] average = 0.1017367 variance = 0.1141004 Class E F (17,24] average = 0 variance = 0 Class E F (24,40] Average = 0.07731177 variance = 0.07415932 Class E F (40,65] average = 0.1081142 variance = 0.1074324 Class E F (65,101) Average = 0.09071118 variance = 0.1170159Copy the code

Again, you can plot the variance against the mean,

> plot(vm,vv,cex=sqrt(ve),col="grey",pch=19,
+ xlab="Empirical average",ylab="Empirical variance")
> points(vm,vv,cex=sqrt(ve))
> abline(a=0,b=1,lty=2)
Copy the code

 

An alternative is to use trees. The tree can be obtained from other variables, but it should be fairly close to our ideal model. In this case, I did use the entire database (over 600,000 rows)

The tree is as follows

> plot(T)
> text(T)
Copy the code

 

Each branch now defines a class that you can use to define a class. Should be considered homogenous.

Class 6 Average = 0.04010406 variance = 0.04424163 Class 8 Average = 0.05191127 variance = 0.05948133 Class 9 average = 0.07442635 VARIANCE = 0.08694552 Class 10 Average = 0.4143646 variance = 0.4494002 Class 11 Average = 0.1917445 variance = 0.1744355 Class 15 Average = 0.04754595 variance = 0.05389675 Class 20 Average = 0.08129577 variance = 0.0906322 Class 22 Average = 0.05813419 variance = 0.07089811 Class 23 Average = 0.06123807 variance = 0.07010473 Class 24 Average = 0.06707301 variance = 0.07270995 Class 25 Average = 0.3164557 variance = 0.2026906 Class 26 Average = 0.08705041 Variance = 0.108456 Class 27 Average = 0.06705214 variance = 0.07174673 Class 30 Average = 0.05292652 variance = 0.06127301 Class 31 Average = 0.07195285 variance = 0.08620593 Class 32 average = 0.08133722 variance = 0.08960552 Class 34 Average = 0.1831559 variance = 0.2010849 Class 39 average = 0.06173885 variance = 0.06573939 Class 41 average = 0.07089419 variance = 0.07102932 Class 44 Average = 0.09426152 variance = 0.1032255 Class 47 average = 0.03641669 Variance = 0.03869702 Class 49 Average = 0.0506601 variance = 0.05089276 Class 50 Average = 0.06373107 variance = 0.06536792 Class 51 Average = 0.06762947 variance = 0.06926191 Class 56 average = 0.06771764 variance = 0.07122379 Class 57 Average = 0.04949142 variance = 0.05086885 Class 58 Average = 0.2459016 variance = 0.2451116 Class 59 average = 0.05996851 variance = 0.0615773 Class 61 Average = 0.07458053 variance = 0.0818608 Class 63 average = 0.06203737 Variance = 0.06249892 Class 64 Average = 0.07321618 variance = 0.07603106 Class 66 Average = 0.07332127 variance = 0.07262425 Class 68 average = 0.07478147 variance = 0.07884597 Class 70 average = 0.06566728 variance = 0.06749411 Class 71 Average = 0.09159605 variance = 0.09434413 Class 75 average = 0.03228927 variance = 0.03403198 Class 76 average = 0.04630848 variance = 0.04861813 Class 78 Average = 0.05342351 variance = 0.05626653 Class 79 average = 0.05778622 Variance = 0.05987139 Class 80 Average = 0.0374993 and 0.0385351 Class 83 Average = 0.06721729 0.07295168 Class 86 Average = 0.09888492 variance = 0.1131409 Class 87 Average = 0.1019186 variance = 0.2051122 Class 88 Average = 0.05281703 variance = 0.0635244 Class 91 average = 0.08332136 variance = 0.09067632 Class 96 average = 0.07682093 variance = 0.08144446 Class 97 Average = 0.0792268 variance = 0.08092019 Class 99 Average = 0.1019089 Variance = 0.1072126 Class 100 Average = 0.1018262 and 0.1081117 Class 101 average = 0.1106647 0.1151819 Class 103 Average = 0.08147644 variance = 0.08411685 Class 104 Average = 0.06456508 variance = 0.06801061 Class 107 Average = 0.1197225 variance = 0.1250056 Class 108 Average = 0.0924619 variance = 0.09845582 Class 109 Average = 0.1198932 variance = 0.1209162Copy the code

Here, when plotting the empirical variance based on the empirical mean of the claim, we get

 

Here, we can identify classes with residual heterogeneity.