This tool can check the overlap rate between you and her favorite music on netease Cloud, as well as which songs you both like.

The cause of

Did you see a comment on a song that said you want netease Cloud to provide such a feature? On second thought, in fact, after obtaining the list of songs to do a simple calculation of the coincidence rate should be quite simple. On the one hand, I want to try the simple climbing of two interfaces, on the other hand, I want to take advantage of my own server. After a few days, although the preliminary realization, but… The problems encountered will be detailed later.

How to use

You can follow me directly at python.com or scan the QR code below

At the bottom of the interface of the official account, there is a “small tools” menu > “netease Cloud Song List overlap rate” sub-menu

Realize the function

Function realization is divided into three steps:

  1. Get the list of songs in the playlist according to the playlist ID, that is, which songs.

  2. According to the user name to get each user has the favorite song list, and then through the first step to get the song name, that is, which user.

  3. Deploy to the network, the user enters the user name, automatically return the result.

Gets a list of song titles

Climb take such as hair elder sister song list: http://music.163.com/playlist?id=17281445. Notice that there is a # missing from the page.


BeautifulSoup is BeautifulSoup with the ul Class named ‘F-hide’, and the text of all a labels is found under ul. Get this part of the song name stored in the list, part of the code is as follows:

#link1 is a link and header is constructed

s1 = requests.session()

s1 = BeautifulSoup(s1.get(link1,headers=headers).content,'lxml')

main = s1.find('ul', {'class': 'f-hide'})

for music in main.find_all('a') :

lists1.append(music.text)

Copy the code

According to this method, get another list of song names, and then to process, calculate the coincidence rate. The relevant codes are as follows:

# regex is used to replace the U before Unicode with 

in order to convert the encoding display, and in order to display the song name after the newline.


#decode(' Unicode-escape ') is also for display, decoding unicode encodings.

myset1 = set(lists1)

myset2 = set(lists2)

pattern = re.compile('\Wu\'')

intersectionset = re.sub(pattern,'<br>\'',str(myset1 & myset2))

length = len(myset1 | myset2)

print intersectionset

return(U "Your song list overlap rate is :%f%%<br>< BR > repeat songs % D

As follows: % s"
%(len(myset1 & myset2)*100/length,len(myset1&myset2),intersectionset.decode('unicode-escape')))

Copy the code

Get the playlist link based on the user name

Just to remind you that playlists have a corresponding ID, and users have a corresponding userID. We see http://music.163.com/playlist?id=17281445 in front of the playlist is a unique id, the front is fixed, so how to obtain this? You can obtain the user ID and home page from the search interface of netease Cloud. The link to the playlist can be found on the home page. It was found that it was loaded by JS, but no suitable method was found, so PhantomJS and Selenium were used. Note the search page constructed below. S is the search content, and type=1002 is the search user.

def get_playlist_by_name(username):

# select "TTC "class," A "tag, "TTC "tag," TTC "tag, "TTC "tag," TTC "tag, "TTC "tag," TTC "tag, "TTC "tag," TTC "tag

#quote transcoding Chinese

try:

driver = webdriver.PhantomJS(executable_path="/usr/local/phantomjs/bin/phantomjs")

driver.get('http://music.163.com/#/search/m/?s={}&type=1002'.format(quote(username.encode('utf8'))))

# WebDriverWait (driver, 5, 0.3) until (EC) presence_of_element_located (locatorttc))

driver.switch_to.frame("contentFrame")

sleep(1)

tr = driver.find_element_by_class_name('ttc')

user = tr.find_element_by_tag_name('a')

Get a link to your favorite playlist and return it, as shown in the user page below.

driver.get(user.get_attribute('href'))

# WebDriverWait (driver, 5, 0.3) until (EC) presence_of_element_located (locatordec))

driver.switch_to.frame("contentFrame")

sleep(1)

dec = driver.find_element_by_class_name('dec')

#print(dec.page_source)

playlist = dec.find_element_by_tag_name('a')

return playlist.get_attribute('href')

except Exception as e:

print e

return ""

finally:

driver.close()

Copy the code
Search interface
User home page

Deploying to the Network

Port 80 is already in use on my server and I don’t want to add a port number to the link, so I need to configure nginx to transfer port 80 from the subdomain to port 8081 on the server.

server {

listen 80;

server_name api.brainzou.com;

location / {

proxy_pass http://xxx.xxx.xxx.xxx:8081/;

}

location /buy {

proxy_pass http://xxx.xxx.xxx.xxx:8081/;

}

error_page 500 502 503 504 /50x.html;

location = /50x.html {

root html;

}

}

Copy the code

Then look at the actual deployment of the network, using web.py, which was modified under the official simple form example. Get the value from the form and pass it to the get_playlist_by_name method. Finally, the data is returned.

# -*- coding: utf-8 -*-

# filename: main.py

import web

from web import form

import Music163RepetitiveRate

render = web.template.render('templates/')

urls = ('/music163'.'index')

app = web.application(urls, globals())

myform = form.Form(

form.Textbox("fname",form.notnull, description=U "username 1"),

form.Textbox("sname",form.notnull, description=U "User name 2"))

class index:

def GET(self):

web.header('Content-Type'.'text/html; charset=UTF-8')

form = myform()

print(form.render())

return render.formtest(form)

def POST(self):

web.header('Content-Type'.'text/html; charset=UTF-8')

form = myform()

if not form.validates():

print(form.render())

return render.formtest(form)

else:

print "begin"

playlist1=Music163RepetitiveRate.get_playlist_by_name(form.d.fname)

print playlist1

playlist2=Music163RepetitiveRate.get_playlist_by_name(form.d.sname)

print playlist2

content = Music163RepetitiveRate.repetitive_rate_by_playlistlink(playlist1,playlist2)

return content

if __name__ == "__main__":

web.config.debug = False

web.internalerror = web.debugerror

app.run()

Copy the code

The fortest.xml is then placed under Templates.

$def with (form)  

<div class="center">

<form name="main" method="post">

$if not form.valid: <p class="error">Please try again!</p>

$:form.render()

<input class="input"type="submit" />

</form>

<a>After submission, it will take about 20 seconds to collect the playlist data and analysis, please wait patiently!</a>

<div>

<style>

.center {

width:500px;

height: 500px;

position: absolute;

left:50%;

top:50%;

margin-left: -100px;

margin-top: -100px;

}

.input{

width:100px;

margin-left:100px;

}

</style>

Copy the code

Finally, manually specify that Python uses UTF-8 encoding. Background run plus specified port 8081. Remember that the server opens port 8081.

Problems encountered

  1. PhantomJS is not closed when it runs out, causing a lot of indescribable problems later on.

  2. Coding problems. You can look at it in a little bit more detail, there’s a lot of places, get, post, set, return. Even a final run of main.py requires utF-8, whereas python main.py 8081 does not. (Since python 2’s default encoding is ASCII, under normal circumstances, The encoding used by Python 2 to convert unicode is not the Python defaultencoding sys.getDefaultencoding (), but the encoding set by sys.stdout.encoding).

  3. Server (mine is in Tencent) need to open 8081 port, the default is not open. Then turn off the firewall.

  4. At first, I wanted to directly access the message interface of wechat public account, but I didn’t find it difficult to get data until all access was completed. I found that I needed to return messages to wechat public account within 5s, otherwise I needed to use customer service interface to asynchronously return data. However, my personal public account could not access customer service, so I gave up. Change to web format.

  5. If the number of music is too large, such as 1000-2000, the coincidence rate is usually relatively low. I want to improve the algorithm, but I haven’t thought of any good algorithm for the time being.

use

Enter the user name
Returns the result

Personal comparison for many times, found that 10% or so is relatively high, and the more music in the list, generally this coincidence rate is low.

Welcome to our wechat official account BrainZou, learn together. Reply to “information”, I have carefully collected Python, Java, Android, small programs, back-end, algorithm and so on nearly 1T of network disk resources to share with you for free.