Original link:tecdat.cn/?p=5658

Original source:Tuo End number according to the tribe public number

 

 

Creating a Theme Network

Research publications in the social sciences, computers and informatics by analyzing texts and co-author social networks.

One question I encountered was: how do you measure the relationship (relevance) between topics? I wanted to create a web visualization that connected to similar topics and helped users navigate a large number of topics more easily.

 

Data preparation

Our first step is to load the topic matrix of the LDA output. LDA has two outputs: the word topic matrix and the document topic matrix.

As an alternative to loading files, you can use the output of the TOPicModels package LDA function to create a matrix of word topics and document topics.

Topic < -read. CSV ("topics. CSV ", stringsAsFactors = F) # load into the word-topic matrix, Colnames (author.topic) < -c ("author_name",name$topic_name)Copy the code

Unlike standard LDA, I ran “author-centric” LDA, where summaries of all authors are combined and treated as one document for each author. This is because my ultimate goal is to use topic modeling as an information retrieval process to determine the expertise of researchers.

Creating a Static Network

In the next step, I create a network using correlations between word probabilities for each topic.

First, I decided to keep only relationships (edges) that had significant correlations (0.2+ correlations). I used 0.2 because it has a level of statistical significance of 0.05 for 100 observed samples.

Cor_threshold <-.2 Next, we use the correlation matrix to create the igraph data structure, removing all edges that have a minimum threshold correlation of less than 0.2. Library (igraph) lets us draw a simple igraph network. title( cex.main=.8)Copy the code

 

Each number represents a topic, and each topic is numbered to identify it.

Use community detection, especially the tag propagation algorithm in igraph, to determine clusters in the network.

clp <- cluster_label_prop(graph)
class(clp)
Copy the code

 

The community test found 13 communities, as well as multiple communities with isolated themes (that is, topics without any connection).

Similar to my initial observations, the algorithm finds the three main clusters we identified in the first graph, but also adds other smaller clusters that don’t seem to fit into any of the three main clusters.

 

V(graph)$community <- clp$membership
V(graph)$degree <- degree(graph, v = V(graph))
Copy the code

Dynamic visualization

In this section, we will use the visNetwork interactive network diagram.

First, let’s call the library and run the visIgraph interactive network, setting it up to run on the igraph structure (graph).

We did this by creating a visNetwork data structure and then splitting the list into two data frames: nodes and edges.

data <- toVisNetworkData(graph)nodes <- data[[1]]
Copy the code

Delete nodes (topics) that are not connected (degree = 0).

nodes <- nodes[nodes$degree != 0,]
Copy the code

Add colors and other network parameters to improve the network.

library(RColorBrewer) col <- brewer.pal(12, "Set3") [as factor (nodes $community)] nodes $shape < - "dot" s $betweenness)) +. 2) * # 20 nodes $node size color highlight. The background <- "orange"Copy the code

Finally, create our network with interactive diagrams. You can zoom using the mouse wheel.

visNetwork(nodes, edges) %>%visOptions(highlightNearest = TRUE, selectedBy = "community", nodesIdSelection = TRUE)
Copy the code

First, there are two drop-down menus. The first drop-down list allows you to look up any topic by name (top five words by word probability).

The second drop-down list highlights the communities detected in our algorithm.

The biggest three seem to be:

  • Calculation (gray, cluster 4)
  • Social (Green-blue, cluster 1)
  • Healthy (yellow, cluster 2)

What is unique about the smaller communities detected? Can you explain that?

 


Most welcome insight

1. Research hot spots of big data journal articles

2.618 Online Shopping data Review – What are the Chopped people concerned about

3. R language text mining TF-IDF topic modeling, sentiment analysis N-gram modeling research

4. Python Topic Modeling Visualization LDA and T-SNE interactive visualization

5. Observation of news data under the epidemic

6. Python topics LDA modeling and T-SNE visualization

7. Topic-modeling analysis of text data in R language

8. Theme model: Data listening to the “online events” on the message board of People’s Daily Online

9. Python crawler is used to analyze semantic data of web fetching LDA topic