The original link: http://tecdat.cn/?p=14017.

Original source:Tuo End number according to the tribe public number

 

Usually, one of the things we say in regression models is “please look at the data.”

In the last article, we didn’t look at the data. If we look at the distribution of individual losses, then in the dataset we see the following:

> > n = nrow (couts) plot (sort (couts $cout), (1, n)/(n + 1), xlim = c (0100), type = "s", LWD = 2, col = "green")Copy the code

 

Looks like we have a fixed cost claim in our database. What do we do in the standard case? We can use a mixed distribution here,

with

  • smallDistribution of claimsFor example, exponential distribution
  • Dirac distribution, i.e.,
  • distribution, such as the Gamma distribution or lognormal distribution
>  I1=which(couts$cout<1120)
>  I2=which((couts$cout>=1120)&(couts$cout<1220))
>  I3=which(couts$cout>=1220)
>  (p1=length(I1)/nrow(couts))
[1] 0.3284823
>  (p2=length(I2)/nrow(couts))
[1] 0.4152807
>  (p3=length(I3)/nrow(couts))
[1] 0.256237
>  X=couts$cout
>  (kappa=mean(X[I2]))
[1] 1171.998
Copy the code

 

In the previous article, we discussed the idea that all parameters might be related to some covariable, i.e

Produces the following model,

For probability, we should usepolynomialModel. Recall the logistic regression model if,

namely

 

To export multivariate extensions

 

and

Again, you can use maximum likelihood because

In this case, the variableDivided into three indicators (just like any categorical explanatory variable in a standard regression model). Thus,

For logistic regression, Newton Raphson algorithm is used to calculate the maximum likelihood numerically. In R, first we must define levels, for example

> couts$tranches=cut(couts$cout,breaks=seuils,
+ labels=c("small","fixed","large"))
Copy the code

Then, we can define a multi-category Logistic model regression

Use some selected covariables

> formula=(tranches~ageconducteur+agevehicule+zone+carburant,data=couts)
# weights:  30 (18 variable)
initial  value 2113.730043 
iter  10 value 2063.326526
iter  20 value 2059.206691
final  value 2059.134802 
converged
Copy the code

So the output is here

Coefficients: Ageconducteur agevehicule zoneB zoneC fixed -0.2779176 0.012071029 0.01768260 0.05567183 -0.2126045 large 0.7029836 0.008581459-0.01426202 0.07608382 0.1007513 zoneE zoneF carburantE fixed - 0.1548064-0.2000597 -0.8441011-0.009224715 Large 0.3434686 0.1803350-0.1969320 0.039414682 Std. Errors: Ageconducteur agevehicule zoneB zoneC zoneD Fixed 0.2371936 0.003738456 0.01013892 0.2259144 0.1776762 0.1838344 Large 0.2753840 0.004203217 0.01189342 0.2746457 0.2122819 0.2151504 zoneE zoneF carburantE Fixed 0.1830139 0.3377169 0.1106009 Large 0.2160268 0.3624900 0.1243560Copy the code

To visualize the effect of covariates, you can also use spline functions

> library(splines)

> reg=(tranches~bs(agevehicule))
# weights:  15 (8 variable)
initial  value 2113.730043 
iter  10 value 2070.496939
iter  20 value 2069.787720
iter  30 value 2069.659958
final  value 2069.479535 
converged
Copy the code

For example, if the covariable is the life of the car, then we have the following probabilities

> predict(reg,newdata=data.frame(Agevehicule =5),type="probs") small fixed large 0.3388947 0.3869228 0.2741825Copy the code

For all ages from 0 to 20,

 

For new cars, for example, fixed costs are small (in purple here) and increase as the car ages. If the covariable is the population density of the driver’s area, then we obtain the following probabilities

# weights:  15 (8 variable)
initial  value 2113.730043 
iter  10 value 2068.469825
final  value 2068.466349 
converged
> predict
    small     fixed     large 
0.3484422 0.3473315 0.3042263
Copy the code

 

Based on these probabilities, the expected cost of a claim can be derived given some covariates, such as density. But first, define a subset of the entire data set

> sbaseA=couts[couts$tranches=="small",]
> sbaseB=couts[couts$tranches=="fixed",]
> sbaseC=couts[couts$tranches=="large",]
Copy the code

Threshold by

> (k = mean (sousbaseB $cout)) [1] of 1171.998Copy the code

 

Then, let’s run four models,

> reg 
> regA 
> regB 
> regC 
Copy the code

Now, we can calculate predictions based on these models,

> pred=cbind(predA,predB,predC)
Copy the code

To visualize the effect of each component on the premium, we can calculate probabilities, expected costs (given the cost of each subset),

> cbind(proba,pred)[seq(10,90,by=10),]
       small     fixed     large    predA    predB    predC
10 0.3344014 0.4241790 0.2414196 423.3746 1171.998 7135.904
20 0.3181240 0.4471869 0.2346892 428.2537 1171.998 6451.890
30 0.3076710 0.4626572 0.2296718 438.5509 1171.998 5499.030
40 0.3032872 0.4683247 0.2283881 451.4457 1171.998 4615.051
50 0.3052378 0.4620219 0.2327404 463.8545 1171.998 3961.994
60 0.3136136 0.4417057 0.2446807 472.3596 1171.998 3586.833
70 0.3279413 0.4056971 0.2663616 473.3719 1171.998 3513.601
80 0.3464842 0.3534126 0.3001032 463.5483 1171.998 3840.078
90 0.3652932 0.2868006 0.3479061 440.4925 1171.998 4912.379
Copy the code

Now, you can plot these numbers on a graph,

 

(The horizontal dashed line in our data set is the average cost of a claim).

 

 

 


 

 

column

 

Actuarial science

 

Insights into combining mathematics, statistical methods, and programming languages for risk analysis and assessment of economic activities.

 

To explore the column ➔


reference

 

1. Use SPSS to estimate the HLM hierarchical linear model

 

2. Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Regular Discriminant Analysis (RDA) of R language

 

3. Lmer mixed linear regression model based on R language

 

4. Simple Bayesian linear regression simulation analysis of Gibbs sampling in R language

 

5. Use GAM (Generalized additive Model) to analyze power load time series in R language

 

6. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus

 

Ridge regression, lasso regression, principal component regression in R language: Linear model selection and regularization

 

8. Prediction of air quality ozone data by linear regression model in R language

 

9.R language hierarchical linear model case