Original link:tecdat.cn/?p=6715

Original source:Tuo End number according to the tribe public number

 

Visualization has become a key application of data science in the telecommunications industry. In particular, telecommunications analysis is highly dependent on the use of geospatial data.

This is because telecommunications networks themselves are geographically dispersed, and an analysis of this dispersion can yield valuable insights about network structure, consumer needs, and availability.

data

To illustrate this point, the k-means clustering algorithm was used to analyze the geographical data of free public WiFi.

Specifically, the K-means clustering algorithm is used to form a cluster of WiFi usage based on the latitude and longitude data associated with a particular provider.

From the data set itself, use R to extract latitude and longitude data:

# 1 newYorkdf <-data.frame (New York $LAT, New York $LON)Copy the code

Here’s a snippet of data:

Determine the number of clusters

Now you need to determine the number of clusters using the Scree diagram.

# 2. Determine the number of clustersCopy the code

As can be seen from above, the curve is flat at about 11 clusters. Therefore, this is the number of clusters that will be used in the K-means model.

K-means analysis

The k-means analysis itself is:

Ggplot (newYorkdf, AES (x = newYork.lon, y = newYork.lat, color = newYorkdf $fit.cluster)) + geom_point ()Copy the code

In the data box newYorkdf, display latitude and longitude data and cluster labels:

> newyorkdf newyork.lat newyork.lon fit.cluster 1 40.75573-73.94458 1 2 40.75533-73.94413 13 40.75575-73.94517 1 4 40.75575-73.94517 1 5 40.75575-73.94517 1 6 40.75575-73.94517 1..... 80 40.84832-73.82075 11 81 40.84923-73.82105 11 82 40.84920-73.82106 11 83 40.85021-73.82175 11 84 40.85023 -73.82178 11 85 40.86444 -73.89455 11Copy the code

This example is useful, but the ideal situation would be to attach these clusters to a map of New York City itself.

Map visualization

To generate a map of New York City, see the following.

Gg + geom_point (data = newYorkdf, AES (x = newYork.lon, y = newYork.lat), color = newYorkdf $fit.cluster, Alpha =.5) + GGTitle (" New York Public WiFi ")Copy the code

After running the above, NYC maps and associated clusters are generated:

This type of clustering can provide insight into the structure of WiFi networks in cities. For example, there are 650 individual points in cluster 1 and 100 points in cluster 6.

This indicates that the geographic area marked by cluster 1 shows a lot of WiFi traffic. On the other hand, the lower number of connections in cluster 6 indicates low WiFi traffic.

K-means clustering by itself does not tell us why traffic is high or low for a particular cluster. However, this clustering algorithm provides a good starting point for further analysis and makes it easier to gather additional information to determine why one geographic cluster might have a higher traffic density than another.

conclusion

This example demonstrates how k-means clustering can be used with geographic data to visualize the entire WiFi access point. In addition, we saw how k-means clustering indicates high and low density areas for WiFi access, as well as the potential insights that can be extracted about population, WiFi speed, and other factors.

 

Thank you very much for reading this article, please leave a comment below if you have any questions!