Simple music recommendation system

This paper provides two simple traditional music recommendation system (next- Songs direction) ideas and implementation. (Mathematical principles and machine learning methods omitted)

The following is only the idea and key code, see the complete implementation: github.com/cdfmlr/mure…

1. Based on audio features

Analyze audio characteristics and make content-based Filtering (CBF).

1.1 Design Roadmap

A person who likes Bach might also like Chopin, so the natural idea is that we can give audio to a machine to learn and try to make it separate different kinds and styles of music. Given a song, feed it into the trained model and suggest other songs that are closest in style.

Song-classification. Ipynb implements the training of this model.

At the university of Victoria in Canada genres data set (opihi. Cs. Uvic. Ca/sound/genre…) , provides a well-annotated variety of music pieces.

$ ls genres
blues     country   hiphop    metal     reggae
classical disco     jazz      pop       rock
Copy the code

We transformed these fragments into Mel-spectrogram using librosa library.

(The figure above is the average spectrum of hip-hop style fragments in the dataset)

The spectrum is fed into a one-dimensional convolutional pooling stacked + fully connected classification head neural network, and the resulting model is a music style detector.

def cnn_model(input_shape) :
    inputs = Input(input_shape)
    x = inputs

    Pooling of one-dimensional convolution
    levels = 64
    for level in range(3):
        x = Conv1D(levels, 3, activation='relu')(x)
        x = BatchNormalization()(x)
        x = MaxPooling1D(pool_size=2, strides=2)(x)
        levels *= 2

    # x -> shape(128)
    x = GlobalMaxPooling1D()(x)

    Compute a fully connected network of type tags
    for fc in range(2):
        x = Dense(256, activation='relu')(x)
        x = Dropout(0.5)(x)

    labels = Dense(10, activation='softmax')(x)

    model = Model(inputs=[inputs], outputs=[labels])

    # optimizer and compile model
    sgd = SGD(learning_rate=0.0003, momentum=0.9, decay=1e-5, nesterov=True)
    model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

    return model


model = cnn_model((128.128))
Copy the code

The trained model, Song_classify. H5, can classify music types with obvious characteristics (such as classical music) well, but it is not good for music types with relatively fuzzy boundaries (such as rock music).

(Confusion matrix of classification results)

Using this model, the similar music recommendation is implemented in index-local-mp3s. Ipynb.

The specific approach is to manually make a simple data set, select some of the music that individuals often listen to, into the same quality mp3 files.

(Type of music selected)

These files are traversed and processed to extract meyer spectrum.

def process_mp3(path) :
    signal, sr = librosa.load(path,
                              res_type="kaiser_fast",
                              offset=30,
                              duration=30)
    melspec = librosa.feature.melspectrogram(signal, sr=sr).T[:1280,]if len(melspec) ! =1280:
        return None
    return {'path': path,
            'melspecs': np.asarray(np.split(melspec, 10))}

Index all the spectrum of each MP3
songs = [process_mp3(path) for path in tqdm(mp3s)]
songs = [song for song in songs if song]

# They can be linked together to facilitate batch completion
inputs = []
for song in songs:
    inputs.extend(song['melspecs'])
Copy the code

The pre-processed data set is then fed into the trained model.

Since we only need to extract audio features and do not need to do classification, the fully connected classification head of the last base of the model is removed, leaving only the convolutional feature extraction layer in front. Input the audio spectrum and output a 256-dimensional vector as the “eigenvector” of the music.

cnn_model = load_model('song_classify.h5')
vectorize_model = Model(inputs=cnn_model.input,
                        outputs=cnn_model.layers[-4].output)
vectors = vectorize_model.predict(inputs)
Copy the code

Build an unsupervised nearest neighbor model and calculate the similarity of these feature vectors, i.e. the similarity of the MP3 songs they represent.

nbrs = NearestNeighbors(
    n_neighbors=10, algorithm='ball_tree'
).fit(vectors)

def most_similar_songs(song_idx) :
    distances, indices = nbrs.kneighbors(
        vectors[song_idx * 10: song_idx * 10 + 10])
    c = Counter()
    for row in indices:
        for idx in row[1:]:
            c[idx // 10] + =1
    return c.most_common()

def print_similar_songs(song_idx, start=1, end=6) :
    print("Designated song :", song_name(song_idx))
    for idx, score in most_similar_songs(song_idx)[start:end]:
        print(F "[similarity{score}] {song_name(idx)}")
Copy the code

Finally, given a song, you can find the closest songs from the nearest neighbor model.

(Example of recommended results)

The model turned out to be okay. It’s just like the classification, it’s not good at rock music.

1.2 Advantages and disadvantages of the model

This is the approach I prefer, based on the characteristics of the music itself, not based on previous user data, no track list restrictions. With the help of a trained classifier network, it is possible to make recommendations to any unseen audio.

However, you need to process the full audio. The process of spectrum analysis consumes computational power. And being able to recommend only locally owned tracks can be seen as a limitation on the other hand.

This model can be used for offline device music recommendation.

1.3 Room for Improvement

  1. The genres of data that are used to train the classifiers are very high quality, but not very large. Think about using more data, maybe you can get a better model;
  2. The structure of classifier network is also rough, which can be further studied and adjusted. For example, consider using the pre-trained NLP model for transfer learning, which may be more sensitive;
  3. Consider building multi-input models (or using multiple models) with additional data such as song metadata (song title, artist, album, frequent, etc.) and lyrics that are not easily derived from the spectrum.

2. Based on existing playlist data

Collaborative Filtering (CF) based on past data from other users.

This line of thinking is actually more common. Get a list of playlists built by people. Some way to establish the distance between the tracks. Given a song, recommend the closest one.

2.1 Obtaining Data

In Spotify -playlist. Ipynb, use Spotify’S API to randomly fetch playlists and tracks (just metadata, no audio download).

However, since this method requires a large amount of data (hundreds of thousands of songs), and the network and database environment are limited, the Python implementation is not stable enough to get the job done. So in spotify/ subdirectory, Golang is used to rewrite the implementation to provide a more robust data retrieval service. Store the obtained data in an SQLite database.

(Acquired playlist and track data)

It currently captures several GiB data, including nearly 5 million songs from 800,000 artists in 170,000 playlists.

sqlite> select count(*) from playlists;
177889
sqlite> select count(*) from artists;
801357
sqlite> select count(*) from tracks;
4995249
Copy the code

There are two ways to take advantage of this data:

2.2 Word2vec

In train-a-music-recommender.ipynb, use the song as a word and the playlist of the song as a sentence:

sentences = [
    ["track_1_id"."track_2_id". ] .# playlist_1[...]. .# playlist_2. ]Copy the code

Based on this corpus, Word2vec model is established.

model = gensim.models.Word2Vec(
    sentences=PlaylistTracksIter(DB), min_count=4)
Copy the code

After training, the closest recommendation can be obtained for a given track.

def suggest_songs(song_id) :
    similar = dict(model.wv.most_similar([song_id]))
    song_ids = ', '.join(("'%s'" % x) for x in similar.keys())

    c = conn.cursor()
    c.execute("SELECT * FROM tracks WHERE id in (%s)" % song_ids)

    res = sorted((rec + (similar[rec[4]], 
                         find_artists(rec[4]))for rec in c.fetchall()),
        key=itemgetter(-1),
        reverse=True)
    return suggest_songs_result([*res])

def suggest_from(song_name: str) :
    s = find_song(song_name, limit=1)
    return s + suggest_songs(s[0] ["id"])
Copy the code

This model can also be used, but the results are not particularly good.

(Recommended example of Word2Vec model)

2.3 Surprise KNNBaseline

In surprise.ipynb, the song is an item and the playlist is a user. If the playlist contains a song, the user gives the item a rating (rating=1).

This processed data set was handed over to Surprise for basic collaborative filtering.

from surprise import KNNBaseline
from surprise import Reader, Dataset

# custom dataset
reader = Reader(rating_scale=(0.1))
train_data = Dataset.load_from_df(
    pt_train[['userID'.'itemID'.'rating']],
    reader)
trainset = train_data.build_full_trainset()

# compute similarities between items
sim_options = {
    'user_based': False
}

# Algorithm, training
algo = KNNBaseline(sim_options=sim_options)
algo.fit(trainset)
Copy the code

The model of KNN is also obtained, and the nearest recommendation is obtained from the model for a given song.

def find_sim(track_id, k=5) :
    sim = algo.get_neighbors(
        iid=algo.trainset.to_inner_iid(track_id), k=k)

    track_ids = [track_id] + list(
        map(algo.trainset.to_raw_iid, sim))

    tracks = []
    c = conn.cursor()
    for tid in track_ids:
        c.execute(f"SELECT * FROM tracks WHERE id = '{tid}'")
        tk = c.fetchall()[0]
        tracks.append(tk + (find_artists(tid),))
    c.close()

    return sim_result(tracks)
Copy the code

This works out pretty well.

(Shout Baby is the input song, and the following 5 songs are recommended, and the newt is ecstatic.)

2.4 Advantages and disadvantages of the model

This idea is the traditional analysis of the past user data, is a more conventional implementation of the recommendation system, the scheme is relatively mature. Based on massive data, it can achieve better recommendation effect.

However, the processing speed of big data can be slow, and the memory overhead is not affordable for the terminal. At the same time, for users, proximity recommendations based on data can easily create information cocoon problems, which are not healthy.

This scheme can be used for music recommendations in the cloud.

2.5 Room for Improvement

  1. Algorithm: The current implementation is the most basic benchmark algorithm, you can consider trying other algorithms.
  2. Data: More data for this model will almost certainly lead to better results.
  3. Consider capturing NetEase cloud music data, which may be better: localized music, comments, heat, playlist label classification. Recommendations can be made with a more comprehensive model

reference

[1] Douwe Osinga. Deep Learning Cookbook[M]. O’Reilly, 2018: 210-227.

[2] Nicolas Hug. Surprise: A Python library for recommender systems[J]. Journal of Open Source Software, 2020, 5(52): 2174.