In the last article, I have explained the principle of Siamese Net and the key of this network architecture — the loss function Contrastive Loss. Now let’s use PyTorch for a simple example. After this case, my personal gains are as follows:

  • Siamese Net is suitable for small data sets;
  • Siamese Net is currently used for sorting tasks (if you know how to use it for partitioning or other tasks, please email me, WX: Cyx645016617)
  • Siamese Net is interpretable.

1 Preparing Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch 
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset,DataLoader
from sklearn.model_selection import train_test_split
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Copy the code
data_train = pd.read_csv('.. /input/fashion-mnist_train.csv')
data_train.head()
Copy the code

The data file is in CSV format, the first column is the category, and the 784 columns that follow are actually 28×28 pixel values.

Divide the training set and validation set, and then convert the data into 28×28 images

X_full = data_train.iloc[:,1:]
y_full = data_train.iloc[:,:1]
x_train, x_test, y_train, y_test = train_test_split(X_full, y_full, test_size = 0.05)
x_train = x_train.values.reshape(-1.28.28.1).astype('float32') / 255.
x_test = x_test.values.reshape(-1.28.28.1).astype('float32') / 255.
y_train.label.unique()
>>> array([8.9.7.6.4.2.3.1.5.0])
Copy the code

It can be seen that the Fashion MNIST data set is similar to MNIST, which is divided into 10 different categories.

  • 0 T-shirt/top
  • 1 Trouser
  • 2 Pullover
  • 3 Dress
  • 4 Coat
  • 5 Sandal
  • 6 Shirt
  • 7 Sneaker
  • 8 Bag
  • 9 Ankle boot
np.bincount(y_train.label.values),np.bincount(y_test.label.values)
>>> (array([4230.4195.4135.4218.4174.4172.4193.4250.4238.4195]),
 array([1770.1805.1865.1782.1826.1828.1807.1750.1762.1805]))
Copy the code

As you can see, the data for each category is very even.

2. Construct Dataset and visualization

class mydataset(Dataset) :
    def __init__(self,x_data,y_data) :
        self.x_data = x_data
        self.y_data = y_data.label.values
    def __len__(self) :
        return len(self.x_data)
    def __getitem__(self,idx) :
        img1 = self.x_data[idx]
        y1 = self.y_data[idx]
        if np.random.rand() < 0.5:  
            idx2 = np.random.choice(np.arange(len(self.y_data))[self.y_data==y1],1)
        else:
            idx2 = np.random.choice(np.arange(len(self.y_data))[self.y_data!=y1],1)
        img2 = self.x_data[idx2[0]]
        y2 = self.y_data[idx2[0]]
        label = 0 if y1==y2 else 1
        return img1,img2,label
Copy the code

The construction structure of torch.utils.data.Dataset will not be repeated, as it was explained in the PyTorch series. So the logic is, given an IDX, then we decide whether the data is looking for two images in the same category or two images in different categories. There is a 50% probability that two images of the same category will be selected, and then at the last output, these two images will be output, and then a label will be output. When the label is 0, it means that the two images have the same category, and 1 means that the two images have different categories. In this way, model training and loss function calculation can be carried out.

train_dataset = mydataset(x_train,y_train)
train_dataloader = DataLoader(dataset = train_dataset,batch_size=8)
val_dataset = mydataset(x_test,y_test)
val_dataloader = DataLoader(dataset = val_dataset,batch_size=8)
Copy the code
for idx,(img1,img2,target) in enumerate(train_dataloader):
    fig, axs = plt.subplots(2, img1.shape[0], figsize = (12.6))
    for idx,(ax1,ax2) in enumerate(axs.T):
        ax1.imshow(img1[idx,:,:,0].numpy(),cmap='gray')
        ax1.set_title('image A')
        ax2.imshow(img2[idx,:,:,0].numpy(),cmap='gray')
        ax2.set_title('{}'.format('same' if target[idx]==0 else 'different'))
    break
Copy the code

This section of code is a batch of data visualization:

There should be no problem so far. If there is any problem, please contact me for discussion and exchange. WX: CYX645016617. Personally, I think communication can solve problems quickly and make progress.

3 Model Building

class siamese(nn.Module) :
    def __init__(self,z_dimensions=2) :
        super(siamese,self).__init__()
        self.feature_net = nn.Sequential(
            nn.Conv2d(1.4,kernel_size=3,padding=1,stride=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(4),
            nn.Conv2d(4.4,kernel_size=3,padding=1,stride=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(4),
            nn.MaxPool2d(2),
            nn.Conv2d(4.8,kernel_size=3,padding=1,stride=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(8),
            nn.Conv2d(8.8,kernel_size=3,padding=1,stride=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(8),
            nn.MaxPool2d(2),
            nn.Conv2d(8.1,kernel_size=3,padding=1,stride=1),
            nn.ReLU(inplace=True)
        )
        self.linear = nn.Linear(49,z_dimensions)
    def forward(self,x) :
        x = self.feature_net(x)
        x = x.view(x.shape[0],-1)
        x = self.linear(x)
        return x
Copy the code

For a very simple convolutional network, the dimension of the output vector is the size of z-dimensions.

def contrastive_loss(pred1,pred2,target) :
    MARGIN = 2
    euclidean_dis = F.pairwise_distance(pred1,pred2)
    target = target.view(-1)
    loss = (1-target)*torch.pow(euclidean_dis,2) + target * torch.pow(torch.clamp(MARGIN-euclidean_dis,min=0),2)
    return loss
Copy the code

Then a loss function calculation of Contrastive loss is constructed.

4 training

model = siamese(z_dimensions=8).to(device)
# model.load_state_dict(torch.load('.. /working/saimese.pth'))
optimizor = torch.optim.Adam(model.parameters(),lr=0.001)
Copy the code
for e in range(10):
    history = []
    for idx,(img1,img2,target) in enumerate(train_dataloader):
        img1 = img1.to(device)
        img2 = img2.to(device)
        target = target.to(device)
        
        pred1 = model(img1)
        pred2 = model(img2)
        loss = contrastive_loss(pred1,pred2,target)

        optimizor.zero_grad()
        loss.backward()
        optimizor.step()
        
        loss = loss.detach().cpu().numpy()
        history.append(loss)
        train_loss = np.mean(history)
    history = []
    with torch.no_grad():
        for idx,(img1,img2,target) in enumerate(val_dataloader):
            img1 = img1.to(device)
            img2 = img2.to(device)
            target = target.to(device)

            pred1 = model(img1)
            pred2 = model(img2)
            loss = contrastive_loss(pred1,pred2,target)

            loss = loss.detach().cpu().numpy()
            history.append(loss)
            val_loss = np.mean(history)
    print(f'train_loss:{train_loss},val_loss:{val_loss}')
Copy the code

Here, in order to speed up the training, I increased batch-size to 128, and other items remain unchanged:

This is the result of running 10 epochs, don’t forget to save the model:

torch.save(model.state_dict(),'saimese.pth')
Copy the code

It looks something like this. Then take a look at the visualization of the validation set, using the T-SNE high-order feature visualization method, the kernel of which is PCA dimensionality reduction:

from sklearn import manifold
"X" is characteristic, excluding target; X_tsne is the feature that has been reduced in dimension.
tsne = manifold.TSNE(n_components=2, init='pca', random_state=501)
X_tsne = tsne.fit_transform(X)
print("Org data dimension is {}. \ Embedded data dimension is {}".format(X.shape[-1], X_tsne.shape[-1]))
      
x_min, x_max = X_tsne.min(0), X_tsne.max(0)
X_norm = (X_tsne - x_min) / (x_max - x_min)  # normalization
plt.figure(figsize=(8.8))
for i in range(10):
    plt.scatter(X_norm[y==i][:,0],X_norm[y==i][:,1],alpha=0.3,label=f'{i}')
plt.legend()
Copy the code

Input image is:

It can be seen that the division between different categories is better, you can see the distance between different categories is relatively large, more obvious, and even put down the name of the public number. The implicit variable used here is 8.

There is a problem, my heart has the answer I don’t know how to everyone’s way of thinking, if I put z latent variable dimension to 2 directly, so you don’t need to use tsne and pca method to reduce the dimension can direct visualization, but so that the visual effect is not better than from 8 d to 2 to the visual effect is good, is this why?

Tips: On the one hand, too small dimension leads to information loss, but this explanation is untenable, because PCA is actually equivalent to a degenerate linear layer, so PCA will also cause such loss. In my opinion, the key should be the calculation of Euclidean distance in the loss function. If the dimension is high, the Euclidean distance will be larger, so the value of MARGIN needs to be adjusted accordingly.