Rio side tecdat | using Python and SAS Viya social network analysis

Original link:http://tecdat.cn/?p=7303

This example uses Python and SAS to analyze the results of a preventive high-risk drug study. The social network has 194 nodes and 273 edges, representing connections between drugs and users.

background

The latest version of SAS Viya offers a full suite of innovative algorithms and proven analytical methods for exploring experimental problems, but it is also built on an open architecture. This means that you can seamlessly integrate SAS Viya into your application infrastructure and use any programming language to drive the analysis model.

Although you can go ahead and simply make a series of REST API calls to access the data, it’s common to use a programming language to organize your work and make it repeatable and efficient. I decided to use Python because of its popularity among young data scientists.

For demonstration purposes, I use an interface called Jupyter, which is an open, Web-based interactive platform that can run Python code and embed markup text.

Access SAS Cloud Analysis Services (CAS)

At the heart of SAS Viya is an analysis run environment called SAS Cloud Analytic Services (CAS). Sessions need to be connected in order to perform operations or access data. You can use a connection, which is recommended to transfer large amounts of data, or you can use the REST API over HTTP or HTTPS communication.

Import matplotlib.colors as colors # The package includes programs for color ranges import matplotlib.cm as CMX import networkx as nx # to render network graphs

Now that the library is loaded, we can issue a connection to CAS and create a session for a given user.

 s = CAS('http://sasviya.mycompany.com:8777', 8777, 'myuser', 'mypass')

For this network analysis, I’ll use a set named _hyperGroup _.

 s.loadactionset('hyperGroup')

Load the data

In order to perform any analytical modeling, we need data. Uses the local CSV file to the server and stores the data in a table named _DRUG_NETWORK_. The table has only two columns of numeric type _FROM_ and _TO_.

 inputDataset = s.upload("data/drug\_network.csv", casout=dict(name='DRUG\_NETWORK', promote = True))

During analysis modeling, it is often necessary to change the data structure and filter or merge the data sources. The _put_ function here converts both numeric columns to the new character columns _SOURCE_ and _TARGET_.

sasCode = 'SOURCE = put(FROM,best.); TARGET = put(TO,best.); \\n' dataset = inputDataset.datastep(sasCode,casout=dict(name='DRUG_NETWORK2', replace = True))

Data exploration

A common task when building an analysis model is to first understand the data. The following example returns the first five rows of the dataset.

Dataset. Fetch (to=5, sastypes=False, format=True) # list the first 5 rows

A simple summary statistics will show more detailed information, including the total number of 273 edges in our dataset.

 dataset.summary()

Graphical layout

First, we visualize the network to get a basic idea of its structure and size. We will calculate the vertex positions using the force-guided algorithm. HyperGroup can also be used to find clusters, calculate graph layouts, and determine network metrics such as community and centrality.

HyperGroup.hyperGroup (createOut = "NEVER", # This disallows the creation of normally generated table AllGraphs = True, # Process all graph inputs = \["SOURCE", "TARGET"\], # Edges = Table (name='edges',replace=True), # Vertices = table(name='nodes',replace=True) # Vertices = table(name='nodes',replace=True) renderNetworkGraph() # Create graphs using NetworkX

The following networks are presented and provide a first view of the graph. We can see the two main branches and understand the high-density and low-density areas.

Community detection

In order to understand the relationships among users in social networks, we will analyze the communities to which individuals belong. Community detection, or clustering, divides the network into communities so that links in community subgraphs are more closely connected than links between communities. People in the same community often share common attributes and say they are closely connected.

The updated node table now contains an additional column _\_Community\__ that contains the value of each node in our network. Given this data set, we can perform basic statistics, such as different counts across columns:

The results table shows that 24 communities in our network were identified.

Let’s take a look at the five largest communities and analyze the node distribution.

We redirect the fetched row into a Python variable. We’ll use it to generate a bar chart showing the top 5 largest communities:

This indicates that the largest community 13 has 35 vertices. The following example shows nodes in Community 4:

Finally, let’s render the network again — this time with the community in mind when coloring the nodes:

Often, you need to adjust the number of communities based on the size of your network and the desired results. Consolidate small communities into large ones. Communities can incorporate:

Randomly enter the neighborhood
Enter the neighborhood with the lowest number of vertices
With the maximum number of vertices
Enter a community that already has a _NCommUnities_ vertex

This will reduce the total number of communities to 5 by specifying the _NCommUnities_ parameter.

Centrality analysis

Analytical centrality helps determine who is important in a network. Important people will be well connected and therefore have a high degree of influence over other individuals in the network. As far as we are concerned with social networks, this will indicate the potential for viral transmission and associated risk behavior of individuals.

Each metric is represented as an output column in the node’s dataset.

Let’s render the network again using one of the centrality measures as the node size.

Subset network branching

From our network, it seems that the users in Community 2 play an important role. This is indicated by the overall centrality of the community and also by the high beetweenness value of most of the individuals in the community. The following code filters and renders Community 2’s network, just to give us a better visualization of the subnetwork.

The example above uses the standard 2D guide chart layout. In more complex cases, you may also need to consider using additional dimensions when analyzing the network structure.

The most popular insight

1. SPSS-Modeler Web complex network was used to analyze all the acupoints

2. Use R language and Python for community detection in social networks

3.R language text mining NASA data network analysis, TF-IDF and topic modeling

4. Use the complex network of airlines to model the epidemic situation in R language

5. Python Membership Graph Model Detection of Dense Overlapping Communities in Model-based Networks

6. Analyze social networks using Python and SAS Viya

7. Connected network analysis: data portraits of migrants who have moved out of Beijing

8. Emotional semantic network: travel data perception of tourist destination image

9. Explore the rules of drug compatibility by association rule data mining