Make writing a habit together! This is my first day to participate in the “Gold Digging Day New Plan · April More text challenge”, click to see the details of the activity.

Machine Learning Assignment 8 — Exception Detection and Recommendation Systems (Python implementation)

Introduction

In the first experiment, the anomaly detection algorithm will be implemented and applied to detect faulty servers on the network. In the second experiment, collaborative filtering will be used to build a movie recommendation system.

1 Anomaly Detection

In this experiment, an anomaly detection algorithm will be implemented to detect abnormal behavior in the server computer. Characteristics measure the response speed (MB /s) and latency (ms) of each server. When the server is running, m=307m=307m=307 samples of computer behavior are collected, so there is an unlabeled dataset {x(1),x(2),… ,x(n)}\lbrace x^{(1)}, x^{(2)} ,… , x^{(n)}\rbrace{x(1),x(2),… ,x(n)}, the vast majority of the samples are normal, but a small part of the samples are abnormal.

Gaussian model is used to detect abnormal samples in the data set. Firstly, a 2D data set is used to visualize the running process of the algorithm. On this data set, a Gaussian distribution is fitted, and then a value with a low probability is found, which is taken as the standard. If it is lower than that, it is considered as an anomaly. Generalization: The exception detection algorithm is applied to data sets with multiple dimensions.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.io import loadmat
Copy the code
data=loadmat("./data/ex8data1.mat")
data.keys()
Copy the code
dict_keys(['__header__', '__version__', '__globals__', 'X', 'Xval', 'yval'])
Copy the code
X=data['X']
Xval,yval=data['Xval'],data['yval']
X.shape,Xval.shape,yval.shape
Copy the code
((307, 2), (307, 2),Copy the code
def plot_data(X) :
    plt.figure(figsize=(8.6))
    plt.plot(X[:,0],X[:,1].'x',color="blue")
    plt.xlabel("Latency (ms)")
    plt.ylabel("Throughput (mb/s)")
    plt.title("The first dataset")
Copy the code
plot_data(X)
Copy the code



1.1 Gaussian Distribution

To perform exception detection, you first need to match the distribution of data to a model. Given the training set {x(1)… , x (m)} (x (I) ∈ Rn) \ lbrace x ^ {} (1),… ,x^{(m)}\rbrace (x^{(i)}\in R^n){x(1),… , x (m)} (x (I) ∈ Rn), estimates that each feature xi (I ∈ [1, n]) x_i (I \ [1, n)) in the xi (I ∈ [1, n)) of the gaussian distribution, obtained mu I, sigma i2 \ mu_i, \ sigma ^ 2 _i mu I, sigma i2 to fit the data.

Where μ\muμ is the mean value, σ2\sigma^2σ2 controls the variance.

Estimating Parameters for a Gaussian

Estimated parameter (μ I,σi2)(\mu_i,\sigma^2_i)(μ I,σi2) :

Write the function estimateGaussian to take the data matrix X as input and output an NNN dimension vector μ\muμ (containing the mean value of NNN features) and another NNN dimension vector σ2\sigma^2σ2 (containing the variance of NNN features).

After the above function is written, the contour of the gaussian distribution can be visualized. It can be seen that most samples are in the region with the highest probability, while the abnormal samples are in the region with low probability.

def estimateGaussian(X) :
    mu=X.mean(axis=0)
    sigma=X.var(axis=0)
    return mu,sigma
Copy the code
mu,sigma=estimateGaussian(X)
mu,sigma
Copy the code
(array ([14.11222578, 14.99771051]), array ([1.83263141, 1.70974533]))Copy the code

Calculate the probability values of each sample in the Gaussian distribution model with σi2\mu_i,\sigma_i^2μ I,σi2:

def gaussian(X,mu,sigma) :
    #X:(m,n)
    m,n=X.shape
    if(np.ndim(sigma)==1):
        sigma=np.diag(sigma)
    left=1.0/(np.power(2*np.pi,n/2)*np.sqrt(np.linalg.det(sigma)))
    right=np.zeros((m,1))
    for row in range(m):
        x=X[row]
        right[row]=np.exp(-0.5*(x-mu)[email protected](sigma)@(x-mu))
    return left*right
Copy the code
# xplot = np.linspace(0, 25, 100) # split into m=100 copies
# yplot = np.linspace(0, 25, 100) # split into n=100 copies
Meshgrid (Xplot, Yplot) # generate n*m matrix
# points=np.c_[Xplot.flatten(),Yplot.flatten()]
# Z = gaussian(points, mu, sigma).reshape(Xplot.shape)
# contour = ax. Contour (Xplot Yplot, Z, [* * - * * 10 to 11, 10 7, 10 * * - 5, 10 * * - 3, 0.1], colors = 'k') # drawing contour
# ax.plot(X[:, 0], X[:, 1], 'x', color="blue")
# ax.set_xlabel("Latency (ms)")
# ax.set_ylabel("Throughput (mb/s)")
# ax.set_title(" The Gaussian distribution contours of the distribution fit to the dataset")
# plt.show()

Copy the code

Visualize data:

def plotContours(fig=None,ax=None) :# for contour drawing
    xplot=np.linspace(0.25.100)# Divide into m=100 pieces
    yplot=np.linspace(0.25.100)# split into n=100 pieces
    Xplot,Yplot=np.meshgrid(xplot,yplot)Generate n*m matrix
    # Z = np. J exp (0.5 * (np) power (Xplot - mu [0], 2)/sigma [0] + np. The power (Yplot - mu [1], 2)/sigma [1]))
    Z = gaussian(points, mu, sigma).reshape(Xplot.shape)
    contour=ax.contour(Xplot, Yplot, Z,[10* * -11.10* * -7.10* * -5.10* * -3.0.1],colors='k')# Draw contour lines
    ax.plot(X[:,0],X[:,1].'x',color="blue")
    ax.set_xlabel("Latency (ms)")
    ax.set_ylabel("Throughput (mb/s)")
    ax.set_title(" The Gaussian distribution contours of the distribution fit to the dataset")
    # plt.show()
Copy the code
# PLT. Figure (figsize = (8, 6))
fig,ax=plt.subplots(figsize=(8.6))
plotContours(fig,ax)
plt.show()
Copy the code



From the figure above, we can clearly see which samples have high probability and which samples have low probability. Samples with low probability are outliers to a large extent.

1.3 Selecting the Threshold
ϵ \epsilon

Now that the Gaussian parameter has been estimated, it is possible to study which samples have a very high probability and which samples have a very low probability for a given distribution. Low probability samples are likely outliers in the data set. To determine which samples are outliers, consider selecting a threshold based on the cross-validation set. In this section of the experiment, an algorithm will be implemented to select the threshold ϵ\epsilonϵ using the F1F_1F1 score on the cross validation set.

Write selectThreshold: use the cross validation set {(XCV (1), yCV (1)),… ,(xcv(mcv),ycv(mcv))}\lbrace (x_{cv}^{(1)},y_{cv}^{(1)}),… ,(x_{cv}^{(m_{cv})},y_{cv}^{(m_{cv})})\rbrace{(xcv(1),ycv(1)),… ,(XCV (MCV), yCV (MCV))} (where y=1y=1y=1 corresponds to an abnormal sample, and y=0y=0y=0 corresponds to a normal sample). For each cross-validation sample, p(XCV (I))p(x_{CV}^{(I)})p(XCV (I)) is calculated. Probability vector P (XCV (1))… ,p(xcv(mcv))p(x_{cv}^{(1)}),… ,p(x_{cv}^{(m_{cv})})p(xcv(1)),… ,p(XCV (MCV)) is passed to the function selectThreshold as argument pval, tag yCV (1)… ,ycv(mcv)y_{cv}^{(1)},… ,y_{cv}^{(m_{cv})}ycv(1),… ,ycv(MCV) is passed to the function selectThreshold as parameter yval. The selectThreshold function returns two arguments: Threshold ϵ\epsilonϵ (probability of sample x p(x)

The F1F_1F1 score is calculated using prec and REc: F1=2×prec×recprec+recF_1=\frac{2\times prec \times rec}{prec+rec}F1=prec+rec2×prec×rec

The prec = TPTP + fpprec = \ frac {tp} {tp + fp} prec = tp + an, rec = TPTP + fnrec = \ frac {tp} {fn} tp + rec = tp + FNTP

  • tp: is itself an outlier and the model predicts an outlier, that is, a true outlier
  • fp: is normal and the model predicts outliers, false outliers
  • fn: is itself an outlier and the model predicts a normal value, i.e. a false normal value
  • precision : indicates how many of the samples predicted as outliers are true outliers
  • recall: represents how many of the samples that were actually outliers were successfully predicted to be outliers

Using SciPy’s built-in method STATS: Calculate the probability that a data point is normally distributed:

from scipy import  stats
dist=stats.norm(mu[0],sigma[0])
dist.pdf(15)
Copy the code
0.1935875044615038
Copy the code

The array is passed to the probability density function and the probability density of each point in the data set is obtained:

dist.pdf(X[:,0[])0:10]
Copy the code
Array ([0.183842, 0.20221694, 0.21746136, 0.19778763, 0.20858956, 0.21652359, 0.16991291, 0.15123542, 0.1163989, 0.1594734])Copy the code

Calculate and save the probability density of each value in the data set of the above Gaussian model parameters:

p=np.zeros((X.shape))
p[:,0]=stats.norm(mu[0],sigma[0]).pdf(X[:,0])
p[:,1]=stats.norm(mu[1],sigma[1]).pdf(X[:,1])
p.shape
Copy the code
(307, 2)
Copy the code

Calculate and save the probability density of each value in the verification set of the above Gaussian model parameters:

pval=np.zeros((Xval.shape))
pval[:,0]=stats.norm(mu[0],sigma[0]).pdf(Xval[:,0])
pval[:,1]=stats.norm(mu[1],sigma[1]).pdf(Xval[:,1])
pval.shape
Copy the code
(307, 2)
Copy the code

Write the function selectThreshold: calculate the F1F_1F1 score for different ϵ\epsilonϵ values (F1F1F1 is a function of the number of true positives, false positives, and false negatives) to find the best threshold for a given probability density value and true label. If a given data set is run, the best ϵ\epsilonϵ obtained is about 8.99e− 058.99E-058.99E −05.

def selectThreshold(pval,yval) :
    best_e,best_f1,f1=0.0.0
    step=(pval.max()-pval.min/ ())1000
    for e in np.arange(pval.min(),pval.max(),step):
        preds=pval<e
        tp=np.sum(np.logical_and(preds==1,yval==1))
        fp=np.sum(np.logical_and(preds==1,yval==0))
        fn=np.sum(np.logical_and(preds==0,yval==1))
        precision=tp/(tp+fp)
        recall=tp/(tp+fn)
        f1=(2*precision*recall)/(precision+recall)
        if(f1>best_f1):
            best_f1=f1
            best_e=e
    return best_e,best_f1
Copy the code
epsilon, f1 = selectThreshold(pval, yval)
epsilon, f1
Copy the code
<ipython-input-92-c5a1372ada03>:9: RuntimeWarning: Invalid value Encountered in long_SCALars precision= TP /(TP +fp) (0.009566706005956842, 0.7142857142857143)Copy the code

Apply the resulting optimal threshold to the data set and visualize the results:

First, check the predicted outlier samples: the first array is the outlier sample number, and the second array is the corresponding x(0)/y(1)x(0)/y(1)x(0)/y(1).

out_index=np.where(p<epsilon)
out_index
Copy the code
(array([300, 301, 301, 303, 303, 304, 306, 306], dtype=int64),
 array([1, 0, 1, 0, 1, 0, 0, 1], dtype=int64))
Copy the code

The red dots in the figure below are points marked as abnormal samples, but there are some abnormal samples (but they are not marked), for example, the upper right corner may also be an abnormal sample. The best ϵ\epsilonϵ and the picture drawn are not very ideal. Next, a different method was used to solve the sample probability: the previous function Gaussian was used to solve the sample probability, instead of directly using scipy library functions.

fig,ax=plt.subplots(figsize=(8.6))
plotContours(fig,ax)
plt.xlabel("Latency (ms)")
plt.ylabel("Throughput (mb/s)")
plt.title("The classified anomalies")
plt.scatter(X[out_index[0].0],X[out_index[0].1],s=100,facecolors="none",edgecolors="r")
plt.show()
Copy the code



The best ϵ\epsilonϵ solved below meets the requirements, and the graph is relatively ideal.

p = gaussian(X, mu, sigma)
pval = gaussian(Xval, mu, sigma)
epsilon, f1 = selectThreshold(pval, yval)
epsilon, f1
Copy the code
<ipython-input-92-c5a1372ada03>:9: RuntimeWarning: Invalid value Encountered in long_SCALars precision= TP /(TP +fp) (8.990852779269496E-05, 0.8750000000000001)Copy the code
out_index=np.where(p<epsilon)
out_index
Copy the code
(array([300, 301, 303, 304, 305, 306], dtype=int64),
 array([0, 0, 0, 0, 0, 0], dtype=int64))
Copy the code
fig,ax=plt.subplots(figsize=(8.6))
plotContours(fig,ax)
plt.xlabel("Latency (ms)")
plt.ylabel("Throughput (mb/s)")
plt.title("The classified anomalies")
plt.scatter(X[out_index[0].0],X[out_index[0].1],s=100,facecolors="none",edgecolors="r")
plt.show()
Copy the code



1.4 High dimensional dataset

In the data set of the experiment in this section, each sample has 111111 features, including more attributes of the computing server. Use the above code to estimate the gaussian parameters μ I,σi2\mu_i,\ sigma_I ^2μ I,σi2, and evaluate the probability of the samples in dataset X and cross validation set Xval based on these parameters. Finally, the function selectThreshold is used to obtain the best threshold ϵ\epsilonϵ, and the best ϵ\epsilonϵ is about 1.38e− 181.38E-181.38e −18, and 11,117,117117 abnormal samples are found.

data2=loadmat("./data/ex8data2.mat")
X2=data2['X']
Xval2,yval2=data2['Xval'],data2['yval']
X2.shape,Xval2.shape,yval2.shape
Copy the code
(1000, 11), (100, 11), (100, 1))Copy the code
mu,sigma=estimateGaussian(X2)
p = gaussian(X2, mu, sigma)
pval = gaussian(Xval2, mu, sigma)
epsilon, f1 = selectThreshold(pval, yval2)
epsilon, f1
Copy the code
<ipython-input-92-c5a1372ada03>:9: RuntimeWarning: Invalid value encountered in long_SCALars precision= TP /(TP +fp) (1.3772288907613575E-18, 0.6153846153846154)Copy the code

Check the number of abnormal samples: 117117117, which meets the answer requirements.

out_index=np.where(p<epsilon)
len(out_index[0])
Copy the code
117
Copy the code

2 Recommender Systems

In this part of the experiment, the collaborative filtering learning algorithm will be implemented and applied to the data set of film rating. The data set consists of ratings from 1 to 51\ SIM from 51 to 5, nu= 943N_u = 943NU =943 users and nm=1682n_m=1682nm=1682 movies. In the next section of this experiment, the function cofiCostFunc will be implemented: calculate the cofitting objective function and gradient. After implementing the cost function and gradient, the function Fmincg is used to learn the parameters of collaborative filtering.

2.1 Movie ratings dataset

Data set ex8 movies. The mat of variable Y, and R, the matrix Y (nm by nu, Y (I, j) ∈ n_m \ [1, 5] times n_u, Y ^ {} (I, j) \ [1, 5) in the nm * nu, Y (I, j) ∈ (1, 5)) is a score of different users of different films, The number of rows is the number of movies, and the number of columns is the number of users. Matrix R is binary matrix, R(I,j)=1R(I,j)=1R(I,j)=1 means that user JJJ has rated movie III, R(I,j)=0R(I,j)=0R(I,j)=0 means that user JJJ has not rated movie III. The goal of collaborative filtering is to predict the user’s rating of his unrated movies, that is, R(I,j)=0R(I,j)=0R(I,j)=0, so as to recommend the movie with the highest predicted audience rating to the user.

In this experiment, the matrices X and Theta will be used:

X (I)∈R100 X ^{(I)}\in R^{100} X (I)∈R100, The JJJ column of the Theta matrix represents the parameter vector θ(j)∈R100\ Theta ^{(j)}\in R^{100}θ(j)∈R100 for user JJJ. Thus, the matrix X is an nm×100n_m\times 100nm×100 matrix, and Theta is a nu× 100N_u \times 100nu×100 matrix.

data=loadmat("./data/ex8_movies.mat")
data.keys()
Copy the code
dict_keys(['__header__', '__version__', '__globals__', 'Y', 'R'])
Copy the code
Y,R=data['Y'],data['R']
nm,nu=Y.shape#y(I,j)∈[0,5],0 means not rated
Y.shape,R.shape
Copy the code
((1682, 943), (1682, 943))
Copy the code

Take the average of the first movie rating (some people didn’t rate this movie).

Y[0].sum()/R[0].sum(a)Copy the code
3.8783185840707963
Copy the code
plt.figure(figsize=(8.8* (1692/943)))
plt.imshow(Y,cmap="rainbow")
plt.colorbar()# display values for different colors
plt.xlabel("Users ({})".format(nu))
plt.ylabel("Movies ({})".format(nm))
plt.show()
Copy the code



2.2 Collaborative Filtering Learning Algorithm

We will now start implementing the collaborative filtering learning algorithm. You will start by implementing the cost function (without regularization). Collaborative filtering algorithm in movie recommendation considers a set of NNN dimension parameter vector x(1)… ,x(nm)x^{(1)},… ,x^{(n_m)}x(1),… , x (nm) and theta (1),… , theta (nu) \ theta ^ {} (1),… Theta, \ theta ^ {(n_u)} (1),… , theta (nu), of which the model to predict the user JJJ for film grade iii (I, j) = y (theta (j)), Tx (I) y ^ {} (I, j) = (\ theta ^ {} (j)) ^ Tx ^ {(I)} (I, j) = y (theta (j)), Tx (I). Given the data set of the user’s rating of the movie, the learning parameter vector x(1)… ,x(nm)x^{(1)},… ,x^{(n_m)}x(1),… , x (nm) and theta (1),… , theta (nu) \ theta ^ {} (1),… Theta, \ theta ^ {(n_u)} (1),… ,θ(nu), get the best fit and minimize the cost function.

Write the function cofiCostFunc: calculate the cost function and gradient of the collaborative filtering algorithm, passing in the parameters as matrix X and Theta. To use an off-the-shelf minimization function, such as fmincg, you need to expand the arguments X and Theta into one-dimensional vector arguments.

data=loadmat("./data/ex8_movieParams.mat")
data.keys()
Copy the code
dict_keys(['__header__', '__version__', '__globals__', 'X', 'Theta', 'num_users', 'num_movies', 'num_features'])
Copy the code
X,Theta,nu,nm,nf=data['X'],data['Theta'],data['num_users'],data['num_movies'],data['num_features']
nu,nm,nf=map(int,(nu,nm,nf))
nu,nm,nf
Copy the code
(943, 1682, 10)
Copy the code

Reduce the size of the data set and speed up the algorithm:

nu,nm,nf=4.5.3
X=X[:nm,:nf]
Theta=Theta[:nu,:nf]
Y=Y[:nm,:nu]
R=R[:nm,:nu]
Copy the code
X.shape,Theta.shape
Copy the code
((5, 3), (4, 3))
Copy the code

2.2.1 Collaborative filtering cost function

Cost function of collaborative filtering algorithm (irregularized) :

Add regularized cost function: The parameters of the collaborative filtering algorithm do not need to add bias items, so they can be regularized.

Next, write the function cofiCostFunc: returns the cost function variable J. Add up the cost of user JJJ and movie III (R(I,j)=1R(I,j)=1R(I,j)=1). Run this function and expect the answer: the result of the cost function without regularization is about 22.2222.2222.22; The regularized cost function results in approximately 31.3431.3431.34.

def serialize(X,Theta) :# Expand X and Theta into one dimension
    return np.r_[X.flatten(),Theta.flatten()]
Copy the code
def deserialize(seq,nm,nu,nf) :
    return seq[:nm*nf].reshape(nm,nf),seq[nm*nf:].reshape(nu,nf)
Copy the code
def cofiCostFunc(seq,Y,R,nm,nu,nf,l=0) :
    #Y: rating matrix (nm, nu)
    #R: 0-1 matrix, whether user J has rated movie I
    #X:(nm,nf) each line represents the nf features of movie I
    #Theta:(nu,nf) each line represents the nf characteristics of user I
    X,Theta=deserialize(seq, nm, nu, nf)
    error=0.5*np.sum(np.multiply(np.power([email protected],2),R))
    reg1=0.5*l*np.sum(np.power(Theta,2))Add regularization
    reg2=0.5*l*np.sum(np.power(X,2))
    return error+reg1+reg2

Copy the code
seq=serialize(X, Theta)
cofiCostFunc(seq, Y, R, nm, nu, nf),cofiCostFunc(seq, Y, R, nm, nu, nf,1.5)
Copy the code
(22.224603725685675, 31.34405624427422)
Copy the code

2.2.2 Collaborative filtering gradient

Now implement the gradient function (unregularized), write the function cofiGradient that returns the variables X_grad (of the same size as X) and Theta_grad (of the same size as Theta), and expand the two variables into a single vector to return their gradients. Write the function checkCostFunction to check the gradient function, if the function is implemented correctly, then the analytical gradient and numerical gradient match very well.

Add regularized gradient function:

def cofiGradient(seq,Y,R,nm,nu,nf,l=0) :
    X,Theta=deserialize(seq, nm, nu, nf)
    X_grad=np.multiply((([email protected])-Y),R)@Theta+l*X
    Theta_grad=np.multiply((([email protected])-Y),R).T@X+l*Theta
    return serialize(X_grad, Theta_grad)
Copy the code
def checkCostFunction(seq,Y,R,nm,nu,nf,l=0) :
    grad=cofiGradient(seq,Y,R,nm,nu,nf,l)

    e=0.0001# Δ value
    len_seq=len(seq)
    e_vec=np.zeros(len_seq)# Gradient value of vector
    for i in range(10):
        idx=np.random.randint(0,len_seq)
        e_vec[idx]=e
        loss1=cofiCostFunc(seq-e_vec,Y,R,nm,nu,nf,l)
        loss2=cofiCostFunc(seq+e_vec,Y,R,nm,nu,nf,l)
        num_grad=(loss2-loss1)/(2*e)
        e_vec[idx]=0
        diff=np.linalg.norm(num_grad-grad[idx])/np.linalg.norm(num_grad+grad[idx])
        print('num_grad:%.15f \t grad[idx]:%.15f \t difference %.15f' %(num_grad, grad[idx], diff))
Copy the code
print("Checking gradient with lambda = 0:")
checkCostFunction(serialize(X,Theta), Y, R, nm, nu, nf)
print("\nChecking gradient with lambda = 1.5:")
checkCostFunction(serialize(X,Theta), Y, R, nm, nu, nf, l=1.5)

Copy the code
Checking gradient with lambda = 0: Num_grad :4.742718424690651 GRAD [IDX]:4.742718424695921 0.000000000000556 NUM_grad :-10.568020204448914 Grad [IDX]:-10.568020204450614 Difference 0.000000000000080 NUM_GRAD :-0.803780061460202 Grad [IDX]:-0.803780061452057 Difference 0.000000000005067 NUM_GRAD :-0.568195965513496 Grad [IDX]:-0.568195965515757 Difference 0.000000000001990 Num_grad :-7.160044429745938 GRAD [IDX]:-7.160044429740946 Difference 0.000000000000349 NUM_grad :7.575703079698570 Grad [IDX]:7.575703079709334 0.000000000000710 NUM_GRAD :-0.383582784628800 GRAD [IDX]:-0.383582784622124 Value difference 0.000000000008702 NUM_GRAD :1.164413669449971 GRAD [IDX]:1.164413669446225 Value difference 0.000000000001609 Num_grad :7.575703079698570 GRAD [IDX]:7.575703079709334 Difference 0.000000000000710 NUM_grad :-0.766778776704058 Grad [IDX]:-0.766778776703673 0.000000000000251 Checking gradient with lambda = 1.5: Num_grad :0.129856157187191 [IDX]:0.129856157163688 0.000000000090497 NUM_grad :0.482440977869203 Num_grad :1.092897577699148 GRAD [IDX]:1.092897577688307 Value difference 0.000000000004960 NUM_GRAD :1.092897577699148 GRAD [IDX]:1.092897577688307 Value difference 0.000000000004960 Num_grad :-0.892473343601097 GRAD [IDX]:-0.892473343597432 Difference 0.000000000002053 NUM_grad :4.901853273224788 Grad [IDX]: -0.647874841526175 GRAD [IDX]:-0.647874841514519 Value difference 0.000000000008996 NUM_GRAD :1.092897577699148 GRAD [IDX]:1.092897577688307 Value difference 0.000000000004960 Num_grad :-0.647874841526175 GRAD [IDX]:-0.647874841514519 0.000000000008996 NUM_grad :2.101362561361952 Grad [independence idx] : difference between 0.000000000006360 2.101362561388682Copy the code

2.3 Learning Movie recommendations

After completing the cost function and gradient of the collaborative filtering algorithm, the film recommendation algorithm can be trained. First get the name and number of all movies, and then input the user’s preference for movies, so as to get the user’s movie recommendation.

movies=[]# List of movies
with open("./data/movie_ids.txt"."r",encoding='gbk') as f:
    for line in f:
        movie=line.strip().split(' ')# Remove space at the beginning and end of the string
        movies.append(' '.join(movie[1:]))
movies=np.array(movies)
Copy the code
ratings = np.zeros((1682.1))
Set preferences using data from the reference blog
ratings[0] = 4
ratings[6] = 3
ratings[11] = 5
ratings[53] = 4
ratings[63] = 5
ratings[65] = 3
ratings[68] = 5
ratings[97] = 2
ratings[182] = 4
ratings[225] = 5
ratings[354] = 5

ratings.shape
Copy the code
(1682, 1)
Copy the code

Add the user’s rating vector to the existing dataset for use in the model. Matrix Y (nm) nu, Y (I, j) ∈ n_m \ [1, 5] times n_u, Y ^ {} (I, j) \ [1, 5) in the nm by nu, Y (I, j) ∈ (1, 5)) is a different users of different film score, number of lines for the film number, the number of columns as the number of users; Matrix R is binary matrix, R(I,j)=1R(I,j)=1R(I,j)=1 means that user JJJ has rated movie III, R(I,j)=0R(I,j)=0R(I,j)=0 means that user JJJ has not rated movie III. The goal of collaborative filtering is to predict the user’s rating of his unrated movies, that is, R(I,j)=0R(I,j)=0R(I,j)=0, so as to recommend the movie with the highest predicted audience rating to the user.

data=loadmat("./data/ex8_movies.mat")
Y=data['Y']
R=data['R']
Copy the code
Y.shape,R.shape
Copy the code
((1682, 943), (1682, 943))
Copy the code
Y=np.append(ratings,Y,axis=1) R=np.append(ratings! =0,R,axis=1)
Copy the code
Y.shape,R.shape
Copy the code
((1682, 944), (1682, 944))
Copy the code

Initialize the parameter matrix X,Theta:

nm,nu=Y.shape
nf=10
l=10
X=np.random.random(size=(nm,nf))
Theta=np.random.random(size=(nu,nf))
seq=serialize(X, Theta)
Copy the code
X.shape,Theta.shape,seq.shape
Copy the code
(1682, 10), (944, 10), (26260,)Copy the code

Standardizing data so that a feature that has no rating at all ends up with a non-zero value because the mean is added back. Among them, the mean value is calculated only for the data with scores, and the data without scores are excluded. All the data need to be judged by r-matrix.

def normalizeRatings(Y,R) :
    Y_mean=((Y.sum(axis=1))/(R.sum(axis=1))).reshape(-1.1)
    Y_norm=np.multiply(Y-Y_mean,R)
    return Y_norm,Y_mean
Copy the code
Y_norm,Y_mean=normalizeRatings(Y,R)
Copy the code
Y_norm.shape,Y_mean.shape
Copy the code
((1682, 944), (1682, 1))
Copy the code

Learning model training:

import scipy.optimize as opt
res=opt.minimize(fun=cofiCostFunc,x0=seq,args=(Y_norm,R,nm,nu,nf,l),method='TNC',jac=cofiGradient)Run # 1 min ~ 2 min
Copy the code
res
Copy the code
Fun: 70124.80723141175 jac: Array ([0.12481091, 3.80257643, 4.25142191,..., -0.2913018, 0.386691, 0.7792457]) message: 'Max. number of function evaluations reached' nfev: 100 nit: 9 status: 3 success: False x: Array ([0.73096179, 0.16482085, 0.39889075,..., 0.30653932, 0.17673335, 0.97425473])Copy the code

Trained parameters: X and Theta, which are used to provide movie recommendations to the user.

X_train,Theta_train=deserialize(res.x,nm,nu,nf)
Copy the code
X_train.shape,Theta_train.shape
Copy the code
((1682, 10), (944, 10))
Copy the code

Finally, the trained parameter model is used to recommend movies:

predition=X_train@Theta_train.T
user_predition=predition[:,0]+Y.mean()
idx=np.argsort(-user_predition)Sort in descending order to get the subscript
Copy the code
idx.shape
Copy the code
(1682)Copy the code
user_predition[idx][:10]
Copy the code
Array ([4.03635302, 3.96571345, 3.86437942, 3.8627442, 3.86094835, 3.85971196, 3.83496685, 3.83174567, 3.71952258, 3.70899912])Copy the code
print("Top recommendations for you:")
for i in range(10) :print('Predicting rating %.1f for movie %s'%(user_predition[idx[i]],movies[idx[i]]))# 5"

print("\nOriginal ratings provided:")
for i in range(len(ratings)):
    if ratings[i] > 0:
        print('Rated %d for movie %s'% (ratings[i],movies[i]))
Copy the code
Top recommendations for you: Predicting Rating 4.0 for Movie Star Wars (1977) Predicting rating 3.9 for Movie Empire Strikes Back, Predicting rating 3.9 For Movie Titanic (1997) Predicting rating 3.9 for Movie Good Will Hunting (1997) Predicting Rating 3.9 for Movie Shawshank Redemption, The (1994) Predicting Rating 3.8 for Movie Braveheart (1995) For Movie Return of The Jedi Predicting rating 3.7 for Movie Usual Suspects, Predicting rating 3.7 for Movie Schindler's List (1993) Original ratings Provided: Rated 4 for movie Toy Story (1995) Rated 3 for movie Twelve Monkeys (1995) Rated 5 for movie Usual Suspects, The (1995) Rated 4 for movie Outbreak (1995) Rated 5 for movie Shawshank Redemption, The (1994) Rated 3 for movie While You Were Sleeping (1995) Rated 5 for movie Forrest Gump (1994) Rated 2 for movie Silence of the Lambs, The (1991) Rated 4 for movie Alien (1979) Rated 5 for movie Die Hard 2 (1990) Rated 5 for movie Sphere (1998)Copy the code

Retrained with non-normalized data Y:

Y_norm = Y - Y.mean()
Y_norm.mean()
Copy the code
4.6862111343939375 e-17Copy the code
import scipy.optimize as opt
res=opt.minimize(fun=cofiCostFunc,x0=seq,args=(Y_norm,R,nm,nu,nf,l),method='TNC',jac=cofiGradient)# 2min38s run
Copy the code
res
Copy the code
Fun: 69380.70267946126 jac: Array ([3.22931558E-06, 4.74670755E-06, 1.53795013E-06,... -1.043915922E-06, 1.12803573e-06, 5.47338254 e-07]) message: 'Converged (| f_n - f_ (n - 1) | ~ = 0)' nfev: 1430 nit: 56 status: 1 success: True x: Array ([0.75326985, 0.44698399, 0.4576434..., 0.36815454, 0.46085362, 0.66916822])Copy the code
X_train,Theta_train=deserialize(res.x,nm,nu,nf)
predition=X_train@Theta_train.T
user_predition=predition[:,0]+Y.mean()
idx=np.argsort(-user_predition)Sort in descending order to get the subscript
Copy the code
user_predition[idx][:10]
Copy the code
Array ([4.28262534, 4.11498853, 3.98044977, 3.9058028, 3.88817712, 3.87460875, 3.87178834, 3.86527196, 3.76237659, array([4.28262534, 4.11498853, 3.98044977, 3.9058028, 3.88817712, 3.87460875, 3.87178834, 3.86527196, 3.76237659, 3.75394194])Copy the code
print("Top recommendations for you:")
for i in range(10) :print('Predicting rating %.1f for movie %s'%(user_predition[idx[i]],movies[idx[i]]))

print("\nOriginal ratings provided:")
for i in range(len(ratings)):
    if ratings[i] > 0:
        print('Rated %d for movie %s'% (ratings[i],movies[i]))
Copy the code
Top recommendations for you: For Movie Titanic (1997) Predicting rating 4.1 for Movie Star Wars (1977) Predicting rating 4.0 For Movie Raiders of the Lost Ark (1981) For Movie Good Will Hunting (1997) Predicting rating 3.9 for movie Shawshank Redemption, The (1994) Predicting Rating 3.9 for Movie Braveheart (1995) For Movie Return of The Jedi For Movie Empire Strikes Back, The (1980) Predicting rating 3.8 for movie Terminator 2: Judgment Day (1991) Predicting rating 3.8 for movie As Good As It Gets (1997) Original ratings Provided: Rated 4 for movie Toy Story (1995) Rated 3 for movie Twelve Monkeys (1995) Rated 5 for movie Usual Suspects, The (1995) Rated 4 for movie Outbreak (1995) Rated 5 for movie Shawshank Redemption, The (1994) Rated 3 for movie While You Were Sleeping (1995) Rated 5 for movie Forrest Gump (1994) Rated 2 for movie Silence of the Lambs, The (1991) Rated 4 for movie Alien (1979) Rated 5 for movie Die Hard 2 (1990) Rated 5 for movie Sphere (1998)Copy the code