Original link:tecdat.cn/?p=22838 

Original source:Tuo End number according to the tribe public number

 

Problem: Use iris data set in R

(A) Part: K-means clustering The data was aggregated into two groups by k-means clustering. Draw a graph to show clustering using k-means clustering to gather data into 3 groups. Draw a graph to show clustering part (B) : Hierarchical clustering Clustering of observations using the full join method. The observed values were clustered using average and single joins. The tree graph of the above clustering method is drawn.

Question 01: Use the iris data set established in R.

(a) : K-means clustering

Discuss and/or consider standardizing data.

Data frame (" average "= the apply (iris [1:4], 2, mean" standard deviation "= the apply (iris [1:4], 2, sd)Copy the code

In this case, we will standardize the data because the petal width is much smaller than all other measurements.

The k-means clustering method was used to aggregate the data into two groups

With a large enough Nstart, it is easier to get a model that corresponds to a minimum RSS value.

kmean(iris, nstart = 100)
Copy the code

Draw a graph to show clustering

Plot (iris, y = seal.Length, x = seal.Width)Copy the code

To better consider the length and width of the petals, it would be more appropriate to use PCA to reduce the dimensions first.

Plot (PC, y = PC1, x = PC2, col = Pred) plot(PC, y = PC2, col = Pred)Copy the code

In order to better interpret the PCA diagram, the variance of principal components is taken into account.

# # look at the main composition explained variance for (I in 1: nrow) {pca [[" PC "]] [I] < - paste (" PC ", I)}Copy the code

 

Plot (data = pca,x = principal component, y = variance ratio, group = 1)Copy the code

 

80% of the variance in the data is explained by the first two principal components, so this is a pretty good data visualization.

The k-means clustering method was used to gather the data into 3 groups

In the previous principal component diagram, clustering seemed obvious because we actually knew that there should be three groups, and we could implement a model of three clusters.

Kmean (input, centers = 3, nstart = 100) #Copy the code

Draw a graph to show clustering

(Sepal length, sepal width, col =pred)Copy the code

 

PCA figure

To better consider the length and width of petals, it is appropriate to use PCA to reduce the dimensions first.

Plot (PCA, y = PC1, x = PC2,col = "forecast \n clustering ", col =" forecast \n clustering ", Select = "The first two principal components of iris data, the ellipse represents 90% of the normal confidence, and the K-means algorithm is used to predict the 2 classes ") +Copy the code

PCA hyperbola diagram

Sepal length to sepal width plots have a reasonable degree of separation. To choose which variables to use on X and Y, we can use hyperbolic plots.

biplot(PCA)
Copy the code

 

This hyperbolic plot shows that petal length and sepal width explain most of the variation in the data, and a more appropriate plot is:

Plot (Iris, col = KM forecast)Copy the code

Evaluate all possible combinations.

Iris %>% pivot_longer() %>% plot(col = KM forecast, facet_grid(name ~., scales = 'free_Y ', space =' free_Y ',) +Copy the code

Hierarchical clustering

The observed values were clustered using the total join method.

Observations can be clustered using the full join method (note the normalization of the data).

hclust(dst, method = 'complete')
Copy the code

The observations were clustered using average and single joins.

 hclust(dst, method = 'average')
hclust(dst, method = 'single')
Copy the code

Draw a forecast map

Now that the model is established, the tree cut is partitioned by specifying the number of groups required.

Plot (iris,col = KMeans forecast)Copy the code

The tree graph of the above clustering method is drawn

Color the tree.

Type < - c (" average ", "full", "single") for (hc) models in the plot (hc, cex = 0.3)Copy the code

 

 

 


Most welcome insight

1.R language K-Shape algorithm stock price time series clustering

2. Comparison of different types of clustering methods in R language

3. K-medoids clustering modeling and GAM regression are performed for time series data of electricity load using R language

4. Hierarchical clustering of IRIS data set of R. language

5.Python Monte Carlo K-means clustering

6. Use R to conduct website comment text mining clustering

7. Python for NLP: Multi-label text LSTM neural network using Keras

8.R language for MNIST data set analysis and exploration of handwritten digital classification data

9.R language deep learning image classification based on Keras small data sets