【 Data analysis 】 FastNewman based data clustering MATLAB source code

A list,

I. Definition of community

Newman first proposed the definition of modular degree in the paper “Fast Algorithm for Community Structure in Networks” published in 2004, and used a quantitative formula to determine community division for the first time.

First, let’s look at how Newman defines community: the vertices in networks are often found to cluster into tightly knit groups with a high density of within-group edges And a lower density of between-group edges.

In plain English: as many edges as possible within the community, but as few edges as possible between communities

(Some definitions) : I and J refer to community I and community J;

N is the number of nodes in the network;

M is the number of edges in the network. One edge connects two nodes, and, obviously, 2m is the sum of the degrees of all nodes in the network

Two, how to quantify to mold speed?

We use first eij said community between I and j community connections than the number of the entire network while the number of eii said community within the number of the entire network while I number, since this way as long as we make ∑ ieii as large as possible, but the problem again, the biggest affirmation is 1, all the nodes as a community, Well, that obviously doesn’t make sense.

Therefore, he proposed that the proportion of the network connecting two edges of the same type (i.e. the proportion of edges within the community eiI) should be subtracted from the expectation of the proportion of any edge connecting these two nodes under the same structure, and then the modularity came into being

Q = ∑ I (eii – ai2)

Where, ai=∑jeij represents the proportion of edges connected to nodes in community I to all edges. If the ratio of edges within the community is not greater than the expected random connection of edges within the community, then Q=0 and 1 at maximum. Generally speaking, the community structure corresponding to the maximum Q value is the community structure in the network

Three, how to become algorithmic operability?

Okay, so we just optimize Q, but how do we divide n nodes into how many communities? How many nodes per community? The author points out that there is a 2n-1 possibility, in which case Q cannot be generalized to networks with more than 20 nodes. In order to reduce time complexity, the author proposes a greedy strategy

FN :(1) firstly, each node in the network is customized as a community

(2) Calculate the value of Q for the combination of two communities, and find the combination method that increases Q the most or reduces Q the least to carry out community combination

(3) Stop until all communities merge into one large community, and find out the largest Q in the process of community division

At this point, Newman noticed that when two communities merge, the increment in modularity detaQ=(EJI + EIJ-2AIAJ)=2(EIJ-2AIAJ)

Second, the code

clear; CLC load('50. Mat ') % adjacent_matrix = link; [Z, Q_all] = FastNewman(adjacent_matrix); class_count = 4; % Maximum number of modules class_cell = class_cal(Z, class_count); % classification % clustering Figure (1) H = dendrogram(Z,0); figure(2) plot(node(:,1), node(:,2), 'r*') hold on for i = 1:size(link, 1) plot(node(link(i,:), 1), node(link(i,:), 2), 'r--') end for i = 1:length(class_cell) c = [rand rand rand]; for j = 1:size(link, 1) if all(ismember(link(j,:), class_cell{i})) plot(node(link(j,:), 1), node(link(j,:), 2), '.-', 'linewidth', 2, 'color', c) end end end for i = 1:size(node,1) text(node(i,1), node(i,2), num2str(i), 'fontsize', 15, 'color', 'k') endCopy the code

Complete code downloadwww.cnblogs.com/ttmatlab/p/…

【 Data analysis 】 FastNewman based data clustering MATLAB source code

A list,

Second, the code

Related Posts

Understand DevOps from a decision-maker’s perspective

How to optimize MySQL 10 million large table, I wrote 6000 words of interpretation

Persistence of Redis