1) Pdist calculates Euclidean distance between two objects

Y=pdist(X) Computes the Euclidean distance between two objects in an M × N matrix X (regarded as m n-dimensional row vectors). 12. For a data set consisting of M objects, there are (M-1)⋅ M /2 binary object combinations.

The output Y is a vector containing distance information of length (M-1)⋅ M /2. This vector can be converted into a square matrix by the SquareForm function so that the elements (I,j) in the matrix correspond to the distances between the objects I and J in the original data set.

Y=pdist(X, ‘metric’) Calculates the distance between objects in matrix X using the method specified by ‘metric’. ‘metric’ takes the characteristic string values in Table 1.

Table 1 ‘metric’ values and meanings

string Containing righteousness
‘Euclid’ Euclidean distance (default)
‘SEuclid’ Standard Euclidean distance
“Mahal…” Mahalanobis distance
‘CityBlock’ Absolute distance
‘Minkowski’ Minkowski distance

Y=pdist(X, ‘Minkowski’,p) calculates the distance between objects in matrix X using Min distance. P is the min distance index value. The default value is 2.

2) The linkage uses the shortest distance algorithm to generate the hierarchical clustering tree

Z=linkage(Y) generates a hierarchical cluster tree using the shortest distance algorithm. The input matrix Y is the (M-1)⋅ M /2 dimensional distance row vector output by the Pdist function.

Z=linkage(Y, ‘method’) calculates the clustering tree using the algorithm specified by ‘method’. ‘method ‘takes the characteristic string values in Table 2.

Table 2 values and meanings of ‘method’

string Containing righteousness
‘single’ Minimum distance (default)
“Complete” Maximum distance
‘business’ The average distance
“Centroid” Center distance
“Ward” Deviation sum of squares (Ward method)

The output Z is the (M-1) × 3 matrix containing the clustering tree information. Leaf nodes in the clustering tree are objects in the original data set, ranging from 1 to M. They are single-element classes that generate all the higher-level classes. Corresponding to each newly generated class in line j in Z, its index is m+j, where M is the number of initial leaf nodes.

Columns 1 and 2, Z(I,1:2), contain indexes for all objects that are pairwise joined to create a new class. The generated new class index is M +j. There are m-1 higher-level classes, which correspond to internal nodes in the cluster tree. Column 3, Z(I,3), contains the corresponding connection distance between two objects in the class.

3) Cluster creates clusters from the linkage output

T=cluster(Z,cutoff) creates a cluster from the linkage. Table 3 shows the meanings of different values in cutoff to define how cluster function generates clustering threshold.

Table 3 Cutoff values and meanings

The cutoff value Containing righteousness
0 Cutoff as the threshold of the inconsistency coefficient. The inconsistency coefficient quantifies the differences between objects in the cluster tree. If a join’s inconsistency coefficient is greater than the threshold, the cluster function uses it as a cluster grouping boundary
2<=cutoff Cutoff as the maximum number of categories contained in the clustering tree

T=cluster(Z,cutoff,depth,flag) creates a cluster from the linkage. The depth parameter specifies the number of layers in the cluster number, which is used to calculate the inconsistency coefficient. The inconsistency coefficient compares the join of two objects in the cluster tree with the adjacent join. For details, see the function Inconsistent. When depth is specified, cutoff is usually used as the inconsistency coefficient threshold.

Parameter flag Default meaning of overload parameter cutoff. If flag is’ inconsistent ‘, cutoff is the threshold of the inconsistency coefficient. If flag is’ cluster ‘, cutoff is the maximum number of categories.

The output T is a vector of size m that identifies the class to which each object belongs with a number. To find objects from the original data set contained in class I, use find(T== I).

4) Standardized processing of data matrix by Zsore (X

The data matrix is standardized, and the processing method is

5) H = Dendrogram (Z,P) to draw the clustering tree graph

The data matrix Z generated by linkage draws the clustering tree. P is the number of nodes. The default value is 30.

6) T= ClusterData (X,cutoff) Classify the data of matrix X

Classify the data of the matrix X. X is an m by n matrix, regarded as m n-dimensional row vectors. It is equivalent to the following commands:

Y = pdist (X, 'Euclid') Z = linkage (Y, 'single') T = cluster (Z, cutoff)Copy the code

7) SquareForm converts the output of PDIST to a square matrix

Convert the output of PDIST to a square matrix.

8) Cophenet calculates the coherence coefficient

C =cophenet(Z,Y) computes the coherence coefficient by comparing the distance information in Z (generated by the linkage() function) with the distance information in Y (generated by the pdist() function). Z is the (m-1) × 3 matrix, and the distance information is contained in the third column. 12. Y is (M-1) ⋅ M /2 dimensional row vector.

Resource portal

  1. Pay attention to [be a gentle program ape] public account
  2. In [do a tender program ape] public account background reply [Python information] [2020 autumn recruit] can get the corresponding surprise oh!

“❤️ thank you.”

  1. Click “like” to support it, so that more people can see this content.
  2. Share your thoughts with me in the comments section, and record your thought process in the comments section.
  • Excel/Word/CSV with Python
  • Programmers generally like to browse 40 websites, tun so many years, I will not hide private, personally strongly recommend
  • Image encryption and restoration based on chaotic Logistic encryption algorithm
  • Write two dozen lines of code to draw dynamic fireworks in Python
  • AttributeError: ‘Module’ object has no attribute ‘XXXXX’
  • How to parse XML and PDF easily in Python
  • Affine transformations (translation, rotation, scaling, and flipping) in Python
  • ValueError: Not enough values to unpack (Expected 3, got 2)
  • Python implements the raspberry PI camera to continuously record video and send it to the host
  • Image affine transform in Python – Extracting handwritten digital image samples
  • How to play with sound files in Python, cutting sounds into fragments according to the gaps
  • Python implementation of image mask mask processing, super detailed explanation!!
  • QT implements message and file interaction between the client and server
  • ❤️QT implements file interaction between the client and server ❤️
  • ❤️ – Chinese chess shock attack, Python quickly get on the bus ❤️