“This is the 26th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

preface

Use Python to realize data visualization of NetEase cloud music playlist. Without further ado.

Let’s have a good time

The development tools

Python version: 3.6.4

Related modules:

Requests the module

Pandas module

Matplotlib module;

And some modules that come with Python.

Environment set up

Install Python and add it to the environment variables. PIP installs the required related modules.

This time through the acquisition of NetEase cloud Music Chinese playlist data, the Chinese playlist data visualization analysis.

Use the Matplotlib visualization library to make use of the underlying library for visualization.

Web analytics

Playlist index page

Select Chinese hot songs list page.

Get playlist number, name, and author, and playlist details page link.

A total of 1,302 Chinese songs were collected.

Playlist details page

Get playlist details page information, more information.

There are single names, favorites, comments, tags, introductions, total number of songs, number of plays, title of songs included.

Here the length of the song, artist, album information in the web page in the iframe.

If you want to access information, you can use Selenium

To get the data

Playlist index page

from bs4 import BeautifulSoup
import requests
import time

headers = {
    'User-Agent''the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}

for i in range(0.1330.35) :print(i)
    time.sleep(2)
    url = 'https://music.163.com/discover/playlist/?cat=, Europe and the United States & order = hot&limit = 35 & offset =' + str(i)
    response = requests.get(url=url, headers=headers)
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    Get the tag that contains the playlist details page URL
    ids = soup.select('.dec a')
    Get the tag that contains the playlist index page information
    lis = soup.select('#m-pl-container li')
    print(len(lis))
    for j in range(len(lis)):
        Get the playlist details page address
        url = ids[j]['href']
        Get the playlist title
        title = ids[j]['title']
        Get playlist number
        play = lis[j].select('.nb') [0].get_text()
        Get the playlist contributor name
        user = lis[j].select('p') [1].select('a') [0].get_text()
        # output playlist index page information
        print(url, title, play, user)
        Write the information to a CSV file
        with open('playlist.csv'.'a+', encoding='utf-8-sig'as f:
            f.write(url + ', ' + title + ', ' + play + ', ' + user + '\n')
Copy the code

Through the above code we get the playlist index page information

Playlist details page

Part of the code

from bs4 import BeautifulSoup
import pandas as pd
import requests
import time

df = pd.read_csv('playlist.csv', header=None, error_bad_lines=False, names=['url'.'title'.'play'.'user'])

headers = {
    'User-Agent''the Mozilla / 5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}

for i in df['url']:
    time.sleep(2)
    url = 'https://music.163.com' + i
    response = requests.get(url=url, headers=headers)
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    Get the playlist title
    title = soup.select('h2') [0].get_text().replace(', '.', ')
    # fetch tag
    tags = []
    tags_message = soup.select('.u-tag i')
    for p in tags_message:
        tags.append(p.get_text())
    Format the tag
    if len(tags) > 1:
        tag = The '-'.join(tags)
    else:
        tag = tags[0]
 
Copy the code

Details of 1302 Chinese playlists obtained

Data visualization

TOP10 songs

Of the ten songs on the list, except “Mercury”, little F listened to many times.

Playlist contributed UP to TOP10

TOP10 playlists

Top 10 playlists with more than 70 million streams.

TOP10 in playlist collection

Recommended collection

TOP10 comments on playlist

The playlist “Goodbye Warrior: The death of Martial arts novel master Jin Yong” received the most comments.

Playlist collection quantity distribution

Mainly distributed between 0 and 150,000 (ln(150000)=12).

Playlist number distribution

The number of playlists is mainly distributed between 0 and 10 million.

Playlist label diagram

Since the selection is the Chinese song list, so the Chinese word is indispensable

Playlist introduction word cloud map,

Playlist introduces word cloud map, hope you can find a song you like