The best way to learn a tool is to use it. On the way to learning “deep learning”, you need to choose a framework to build neural networks. Common frameworks include Tensorflow, Caffe, And Pytorch, among which Pytorch is the most recommended. Especially for beginners, Pytorch is quick to start and easy to use. The code is very Pythonic. Whether you’re building your own demo or a production-level application, Pytorch is a must-have for home travel.

## Environment building

First of all, we need to build the hardware and software environment, if there is a GPU that is the best, if not, it doesn’t matter, running demo is still ok. If the data set is large, it still needs GPU support, and the training speed of GPU is more than 10 times that of CPU. Linux is recommended for operating system. I have changed from Linux to Windows due to work needs, and I will mainly introduce Windows. The general steps for setting up the environment are as follows. If you have any problems, please leave a comment below.

  • Install Python. Python3 is recommended. You can download the exe from the official website and install it.Add it to the environment variable“Option to enter the Python prompt interface by typing Python directly from the command line.
  • If you have a GPU, you need to install the GPU driver and CUDA. For the driver, you can directly find the corresponding graphics card version on the official website to Download and install it. For CUDA, you can directly search CUDA and click the system selection page to select your own system version Download. After the installation is complete, run nvidia-smi.exe in the “C: Program Files\NVIDIA Corporation\NVSMI” path to confirm the successful installation. NVIDIA driver download page website: www.nvidia.com/Download/in… CUDA download page: developer.nvidia.com/cuda-downlo…
  • Install PyTorch and TorchVision. You can choose the version you want and how to install it from the PyTorch website. It is recommended to install PIP directly. Pytorch website:Pytorch.org/get-started…

    After the installation is complete, verify the actual installation on the command line. If it is successful, it should be the same as me:

Model training

The most important part of the training model is the preparation of the training set. The model is like a child who doesn’t know anything at first, and the training process is “teaching” it. If he was “taught” wrong in the first place, it is impossible to expect him to get it right on the test. The preparation of training sets usually requires a lot of manpower and resources, so it is moving towards semi-supervised or unsupervised, which is a story later. Verbose so much, in fact, I just want to emphasize the importance of training set, because before suffered losses, here to remind you.

Here I use open data sets as an example:

Transforms = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), transforms. 0.5)]) trainset = torchvision. Datasets. CIFAR10 (root ='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=1)

valset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
valloader = torch.utils.data.DataLoader(valset, batch_size=64,   shuffle=False, num_workers=1)

datasets = {"train":trainset, "val":valset}
dataloaders = {"train":trainloader, "val":valloader}
Copy the code

Torchvision integrates some open data sets and can be downloaded directly. Batch_size indicates the number of images in each batch. If the video memory is small, it can be set to a smaller size, such as 8/16/32. Shuffle indicates whether to scramble the data set. Num_workers indicates the number of processes that load the data set. On Windows, this parameter must be set to 1; otherwise, an error message will be reported. On Linux you can set it to a larger size to speed up training. You can also define your own Dataset by inheriting torch.utils.data.dataset and implementing your own getitem() and len(). Here is a simple example that you can customize to your needs:

class MyDataset(Dataset):
    def __init__(self, image_path, transform=None):
        self.image_path = image_path
        self.transform = transform

    def __len__(self):
        return 1

    def __getitem__(self, index):
        pic = Image.open(self.image_path).convert("RGB")

        if self.transform:
            pic = self.transform(pic)

        return (index, pic)
Copy the code

When the training set is ready to finish, you can start writing training code:

model = torchvision.models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 10)
Copy the code

We use the pre-trained ResNET-18 model integrated in TorchVision. To learn more about Resnet you can look at my other classic Classification network Resnet paper reading and the PYTORCH sample code, because the output of this data set has 10 types. So finally the output of the full connection layer is changed to 10.

Criterion = nn.CrossentRopyLoss () optimizer = optim.sgd (model.parameters(), lr=0.01, momentum=0.9)Copy the code

The loss function and the optimizer are defined. Cross entropy is adopted as the objective of optimization. The optimizer adopts SGD, and the initial learning rate is 0.01 and momentum is 0.9.

cuda = torch.cuda.is_available()
ifCuda: model.cuda() best_accuracy = 0.0 start_time = time.time() epoches = 5for epoch in range(epoches):
    print('Epoch {}/{}'.format(epoch, epoches - 1))
    print(The '-' * 40)
    since_epoch = time.time()


    for phase in ["train"."val"] :if phase == "train":
            model.train()
        else:
            model.eval()

        running_loss = 0.0
        running_corrects = 0

        for data in dataloaders[phase]:

            inputs, labels = data
            
            # put data on GPU
            if cuda:
                inputs = inputs.cuda()
                labels = labels.cuda()

            # init optimizer
            optimizer.zero_grad()

            # forward
            outputs = model(inputs)
            _, preds = torch.max(outputs.data, 1)

            # loss
            loss = criterion(outputs, labels)

            if phase == "train":
                # backward
                loss.backward()
                # update params
                optimizer.step()

            # total loss
            running_loss += loss.item() * inputs.size(0)
            # correct numbers
            running_corrects += torch.sum(preds == labels.data)


        epoch_loss = running_loss / len(datasets[phase])
        epoch_acc = float(running_corrects) / len(datasets[phase])

        time_elapsed_epoch = time.time() - since_epoch
        print('{} Loss: {:.4f} Acc: {:.4f} in {:.0f}m {:.0f}s'.format(
            phase, epoch_loss, epoch_acc, time_elapsed_epoch // 60, time_elapsed_epoch % 60))

        if phase == "val" and epoch_acc >= best_accuracy:
            best_accuracy = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

    time_elapsed = time.time() - start_time
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))

    print('Best val Acc: {:4f}'.format(best_accuracy))

model.load_state_dict(best_model_wts)
Copy the code

The above is a basic model training backbone, there are some printing you can see the training process, I wrote some notes, if you have any questions, welcome to leave a message to discuss. Take a look at the training results:

It can be seen that with the progress of training, the accuracy rate of both training set and test set is gradually rising. Later, when loss is stable, adjustment methods such as reducing learning rate can be tried to train.

Finally, in addition to the pre-trained models in torchvision, another library called pretrainedmodels is introduced, which contains many relatively new pre-trained models and is very convenient to use, which is highly recommended. Installation:

pip install pretrainedmodels
Copy the code

It is as simple to use as se_ResNeXt101:

model_name = "se_resnext101_32x4d"
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
model.avg_pool = nn.AvgPool2d(int(image_size / 32), stride=1)
model.last_linear = nn.Linear(model.last_linear.in_features, your_num_classes)
Copy the code

First identify the model you want to use and initialize it with its corresponding name. Note that not only is the last fully connected layer changed, but the previous pooled layer is also changed for dimension matching, because the default input image size of this model is (448, 448), which is not matched. Pretrainedmodels github address: github.com/Cadene/pret…

Wx MachineLearning learning path. Learn python, MachineLearning, and computer vision with me!