Original link:tecdat.cn/?p=6129

 

introduce

Finite-mix models are useful when applied to data where observations are made from different populations and the population membership is unknown.

Simulated data

First, we’ll simulate some data. Let’s simulate two normal distributions – one with an average of 0 and the other with an average of 50, both with a standard deviation of 5.

m1 <- 0
m2 <- 50
sd1 <- sd2 <- 5
N1 <- 100
N2 <- 10

a <- rnorm(n=N1, mean=m1, sd=sd1)
b <- rnorm(n=N2, mean=m2, sd=sd2)
Copy the code

Now let’s “mix” the data together……

  

 

 

 

print(table(clusters(flexfit), data$class))
##    
##       1   2
##   1 100   0
##   2   0  10
Copy the code

What about the parameters?

cat('pred:', c1[1], '\n') cat('true:', m1, '\n\n') cat('pred:', c1[2], '\n') cat('true:', sd1, '\n\n') cat('pred:', C2 [1], '\ n') the cat (' true: ', m2, '\ n \ n') the cat (' Mr Pred: ', c2 [2], '\ n') the cat (' true: 'sd2,' \ n \ n ') # # Mr Pred: 0.5613484 # # true: 0 ## ## pred: 4.799484 ## true: 5 ## ## pred: 52.86911 ## true: 50 ## ## pred: 6.89413 ## true: 5Copy the code

Let’s visualize the real data and the hybrid model we fit.

ggplot(data) + geom_histogram(aes(x, .. density..) , binwidth = 1, colour = "black", fill = "white") + stat_function(geom = "line", fun = plot_mix_comps, args = list(c1[1], c1[2], lam[1]/sum(lam)), stat_function(geom = "line", fun = plot_mix_comps, args = list(c2[1], C2 [2], lam, [2] / sum (lam)), see colour = "blue", LWD = 1.5) + ylab (" Density ")Copy the code

Looks like we’re doing great!

 

 

example

Now, let’s consider a real-world example of a iris with petal width.

p <- ggplot(iris, aes(x = Petal.Width)) + geom_histogram(aes(x = Petal.Width, .. density..) , binwidth = 0.1, colour = "black", fill = "white") pCopy the code

 

 

 

flexfit <- flexmix(Petal.Width ~ 1, data = iris, k = 3, model = list(mo1, mo2, mo3)) print(table(clusters(flexfit), iris$Species)) ## ## setosa versicolor virginica ## 1 0 2 46 ## 2 0 48 4 ## 3 50 0 0 geom_histogram(aes(x = Petal.Width, . density..) , binWidth = 0.1, colour = "black", fill = "white") + args = list(c1[1], c1[2], lam[1]/sum(lam)), colour = "red", Stat_function (geom = "line", fun = plot_mix_comps, args = list(c2[1], C2 [2], lam[2]/sum(lam)), stat_function(geom = "line", fun = plot_mix_comps, args = list(c3[1], c3[2], lam[3]/sum(lam)), colour = "green", LWD = 0.5) + ylab("Density")Copy the code

 

Even if we do not know the underlying species allocation, we can make certain statements about the basic distribution of petal widths.