This is the 23rd day of my participation in the First Challenge 2022

Built entirely from standard ConvNet modules, ConvNeXts competes with Transformer in accuracy and scalability achieving 87.8% ImageNet top-1 accuracy, Superior to Swin Transformers in COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNet.

Links to papers: arxiv.org/pdf/2201.03…

Code link: github.com/facebookres…

If Github cannot be downloaded, you can use the following connection:

Gitcode.net/hhhhhhhhhhw…

Characteristics of ConvNexts;

  • A 7×7 convolution kernel was used. In classic CNN models such as VGG and ResNet, small convolution kernels were used, but ConvNexts proved the validity of large convolution sums. The authors tried several kernel sizes, including 3, 5, 7, 9, and 11. Network performance improved from 79.9% (3×3) to 80.6% (7×7), while network FLOPs remained roughly the same and the benefits of kernel size reached saturation point at 7×7.

  • GELU (Gaussian error linear unit) activation function is used. GELUs is the combination of dropout, zoneout, and Relus. The GELUs multiplys the input by a mask of 0,1, which is generated probabilistically and randomly depending on the input. The experimental results were better than those of Relus and ELUs. The following figure is the experimental data:

  • Use the LayerNorm instead of the BatchNorm.

  • Inversion bottleneck. Figures 3 (a) through (b) illustrate these configurations. Although the FLOPs of the deep convolutional layer increased, the FLOPs of the whole network were reduced to 4.6g due to the fast 1×1 convolutional layer of the down-sampled residual block. The score increased from 80.5% to 80.6%. In the ResNet-200/Swin-B scenario, this step resulted in more returns (81.9% to 82.6%) and less flops.

ConvNeXt Residual module

Residual module is the core of the whole model. The diagram below:

Code implementation:

class Block(nn.Module) :
    r""" ConvNeXt Block. There are two equivalent implementations: (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back We use (2) as we find it slightly faster in PyTorch Args: dim (int): Drop_path (float): Stochastic depth rate. Default: 0.0 layer_SCALe_init_value (float): Init value for Layer Scale. Default: 1e-6. """
    def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6) :
        super().__init__()
        self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
        self.norm = LayerNorm(dim, eps=1e-6)
        self.pwconv1 = nn.Linear(dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers
        self.act = nn.GELU()
        self.pwconv2 = nn.Linear(4 * dim, dim)
        self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)), 
                                    requires_grad=True) if layer_scale_init_value > 0 else None
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
    def forward(self, x) :
        input = x
        x = self.dwconv(x)
        x = x.permute(0.2.3.1) # (N, C, H, W) -> (N, H, W, C)
        x = self.norm(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)
        if self.gamma is not None:
            x = self.gamma * x
        x = x.permute(0.3.1.2) # (N, H, W, C) -> (N, C, H, W)
        x = input + self.drop_path(x)
        return x
Copy the code

Data enhancement Cutout and Mixup

ConvNext used Cutout and Mixup, and I included both enhancements in my code to improve my score. Timm is officially used. I chose torchtoolbox instead of the official one. Installation command:

pip install torchtoolbox
Copy the code

The Cutout implementation, in transforms.

from torchtoolbox.transform import Cutout

# Data preprocessing

transform = transforms.Compose([
    transforms.Resize((224.224)),
    Cutout(),
    transforms.ToTensor(),
    transforms.Normalize([0.5.0.5.0.5], [0.5.0.5.0.5]])Copy the code

Mixup implementation, in the train method. From torchtoolbox.tools import mixup_data, mixup_criterion

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
        data, labels_a, labels_b, lam = mixup_data(data, target, alpha)
        optimizer.zero_grad()
        output = model(data)
        loss = mixup_criterion(criterion, output, labels_a, labels_b, lam)
        loss.backward()
        optimizer.step()
        print_loss = loss.data.item()
Copy the code

The project structure

Use the tree command to print the project structure

ConvNext_demo ├─ Data │ ├─ Test │ ├─Black-grass │ ├─ Heavy Metal │ ├─Common Chickweed │ ├─Common Wheat │ ├ ─ Fat Hen │ ├ ─ Loose Silky - bent │ ├ ─ Maize │ ├ ─ Scentless Mayweed │ ├ ─ Shepherds Purse │ ├ ─ Small - flowered Cranesbill │ └ ─ Sugar beet ├ ─ the dataset │ ├ ─ just set py │ └ ─ dataset. Py ├ ─ Model │ └ ─ convnext. Py ├ ─ test1. Py ├ ─ test2. Py └ ─ train_connext.pyCopy the code

The data set

The data set selected seedling classification, a total of 12 categories. Data set connection is as follows: link: pan.baidu.com/s/1TOLSNj9J… Extraction code: SYNg

Create a new data folder in the root directory of the project. After obtaining the data set, extract trian and test and put them under the data folder, as shown below:

Import model file

Locate the convnext. Py file from the official link and place it in the Model folder. As shown in figure:

Install the libraries and import the required ones

The model uses the Timm library. If not, run the following command:

pip install timm
Copy the code

Create a new train_connext. Py file and import the required packages:

import torch.optim as optim
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
from dataset.dataset import SeedlingData
from torch.autograd import Variable
from Model.convnext import convnext_tiny
from torchtoolbox.tools import mixup_data, mixup_criterion
from torchtoolbox.transform import Cutout
Copy the code

Setting global Parameters

Set GPU, and set the learning rate, BatchSize, and epoch.

Set global parameters
modellr = 1e-4
BATCH_SIZE = 8
EPOCHS = 300
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Copy the code

Data preprocessing

Data processing is relatively simple, there is no complex attempt, interested can join some processing.

# Data preprocessing

transform = transforms.Compose([
    transforms.Resize((224.224)),
    Cutout(),
    transforms.ToTensor(),
    transforms.Normalize([0.5.0.5.0.5], [0.5.0.5.0.5])

])
transform_test = transforms.Compose([
    transforms.Resize((224.224)),
    transforms.ToTensor(),
    transforms.Normalize([0.5.0.5.0.5], [0.5.0.5.0.5]])Copy the code

Py and dataset. Py will be added to the dataset folder. Mydatasets. py will be added to the dataset folder.

Talk about the core logic of the code.

The first step is to create a dictionary and define the ID of the category, replacing the category with a number.

The second step is to write a method to get the image path in __init__. There is only one layer path for test set to read directly. The training set is the category folder under the train folder. The category is obtained first, and then the specific picture path is obtained. The training set and verification set are then split in a 7:3 ratio using the sklearn method of dividing the data set.

The third step is to define the method to read individual images and categories in the __getitem__ method. Since there is a 32-bit depth in the image, I made the conversion when reading the image.

The code is as follows:

# coding:utf8
import os
from PIL import Image
from torch.utils import data
from torchvision import transforms as T
from sklearn.model_selection import train_test_split

Labels = {'Black-grass': 0.'Charlock': 1.'Cleavers': 2.'Common Chickweed': 3.'Common wheat': 4.'Fat Hen': 5.'Loose Silky-bent': 6.'Maize': 7.'Scentless Mayweed': 8.'Shepherds Purse': 9.'Small-flowered Cranesbill': 10.'Sugar beet': 11}


class SeedlingData(data.Dataset) :

    def __init__(self, root, transforms=None, train=True, test=False) :
        "" Main objective: to obtain the address of all images and divide the data according to training, validation and testing.
        self.test = test
        self.transforms = transforms

        if self.test:
            imgs = [os.path.join(root, img) for img in os.listdir(root)]
            self.imgs = imgs
        else:
            imgs_labels = [os.path.join(root, img) for img in os.listdir(root)]
            imgs = []
            for imglable in imgs_labels:
                for imgname in os.listdir(imglable):
                    imgpath = os.path.join(imglable, imgname)
                    imgs.append(imgpath)
            trainval_files, val_files = train_test_split(imgs, test_size=0.3, random_state=42)
            if train:
                self.imgs = trainval_files
            else:
                self.imgs = val_files

    def __getitem__(self, index) :
        """ Return data one image at a time ""
        img_path = self.imgs[index]
        img_path = img_path.replace("\ \".'/')
        if self.test:
            label = -1
        else:
            labelname = img_path.split('/')[-2]
            label = Labels[labelname]
        data = Image.open(img_path).convert('RGB')
        data = self.transforms(data)
        return data, label

    def __len__(self) :
        return len(self.imgs)
Copy the code

Then we call SeedlingData in train.py to read the data, remember to import the dataset we just wrote.

# fetch data
dataset_train = SeedlingData('data/train', transforms=transform, train=True)
dataset_test = SeedlingData("data/train", transforms=transform_test, train=False)
# import data
train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)
Copy the code

Set up the model

Set the loss function to nn.CrossentRopyLoss ().

  • Set the model to coatnet_0 and modify the last-layer fully connected output to 12 (the category of the dataset).

  • The optimizer is set to Adam.

  • The learning rate adjustment strategy is changed to cosine annealing

Instantiate the model and move it to the GPU
criterion = nn.CrossEntropyLoss()
#criterion = SoftTargetCrossEntropy()
model_ft = convnext_tiny(pretrained=True)
num_ftrs = model_ft.head.in_features
model_ft.head = nn.Linear(num_ftrs, 12)
model_ft.to(DEVICE)
print(model_ft)
# Choose simple violent Adam optimizer, learning rate down
optimizer = optim.Adam(model_ft.parameters(), lr=modellr)
cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer,T_max=20,eta_min=1e-9)
Copy the code

Define training and validation functions

Alpha =0.2 Parameter required for Mixup.

# Define the training process
alpha=0.2
def train(model, device, train_loader, optimizer, epoch) :
    model.train()
    sum_loss = 0
    total_num = len(train_loader.dataset)
    print(total_num, len(train_loader))
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
        data, labels_a, labels_b, lam = mixup_data(data, target, alpha)
        optimizer.zero_grad()
        output = model(data)
        loss = mixup_criterion(criterion, output, labels_a, labels_b, lam)
        loss.backward()
        optimizer.step()
        print_loss = loss.data.item()
        sum_loss += print_loss
        if (batch_idx + 1) % 10= =0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                       100. * (batch_idx + 1) / len(train_loader), loss.item()))
    ave_loss = sum_loss / len(train_loader)
    print('epoch:{},loss:{}'.format(epoch, ave_loss))

ACC=0
# Validation process
def val(model, device, test_loader) :
    global ACC
    model.eval()
    test_loss = 0
    correct = 0
    total_num = len(test_loader.dataset)
    print(total_num, len(test_loader))
    with torch.no_grad():
        for data, target in test_loader:
            data, target = Variable(data).to(device), Variable(target).to(device)
            output = model(data)
            loss = criterion(output, target)
            _, pred = torch.max(output.data, 1)
            correct += torch.sum(pred == target)
            print_loss = loss.data.item()
            test_loss += print_loss
        correct = correct.data.item()
        acc = correct / total_num
        avgloss = test_loss / len(test_loader)
        print('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            avgloss, correct, len(test_loader.dataset), 100 * acc))
        if acc > ACC:
            torch.save(model_ft, 'model_' + str(epoch) + '_' + str(round(acc, 3)) + '.pth')
            ACC = acc


# training

for epoch in range(1, EPOCHS + 1):
    train(model_ft, DEVICE, train_loader, optimizer, epoch)
    cosine_schedule.step()
    val(model_ft, DEVICE, test_loader)

Copy the code

Then you can start training

Training 10 epochs gives good results:

test

The first way to write it

The test set is stored in the following directory:

The first step is to define the category, the order of this category and the training of the category order corresponding, do not change the order !!!!

classes = ('Black-grass'.'Charlock'.'Cleavers'.'Common Chickweed'.'Common wheat'.'Fat Hen'.'Loose Silky-bent'.'Maize'.'Scentless Mayweed'.'Shepherds Purse'.'Small-flowered Cranesbill'.'Sugar beet')

Copy the code

Second, define transforms. Transforms are the same as the validation set’s transforms, without data enhancement.

transform_test = transforms.Compose([
         transforms.Resize((224.224)),
        transforms.ToTensor(),
        transforms.Normalize([0.5.0.5.0.5], [0.5.0.5.0.5]])Copy the code

Step 3 loads the model and places it in DEVICE.

DEVICE = Torch. DEVICE ("cuda:0" if torch. Cuda.is_available () else "CPU ") model = Torch. Load ("model_8_0.971.pth") model.eval() model.to(DEVICE)Copy the code

The fourth step is to read the Image and predict the category of the Image. Note here that Image is read using PIL library Image. Don’t use CV2, transforms is not supported.

path = 'data/test/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name:{},predict:{}'.format(file, classes[pred.data.item()]))
Copy the code

Test complete code:

import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image from torch.autograd import Variable import os classes = ('Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 'Scentless Mayweed', 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet') transform_test = transforms.Compose([ transforms.Resize((224, 224)), Transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, DEVICE = Torch. DEVICE (" cudA :0" if torch. Cuda.is_available () else "CPU ") model = Torch. Load ("model_8_0.971.pth")  model.eval() model.to(DEVICE) path = 'data/test/' testList = os.listdir(path) for file in testList: img = Image.open(path + file) img = transform_test(img) img.unsqueeze_(0) img = Variable(img).to(DEVICE) out = model(img) # Predict _, pred = torch.max(out.data, 1) print('Image Name:{},predict:{}'.format(file, classes[pred.data.item()]))Copy the code

Running results:

The second way to write it

Second, use a customized Dataset to read the image. The first three steps are the same as above, but the difference is mainly in the fourth step. Data is read using the Dataset’s SeedlingData.

dataset_test =SeedlingData('data/test/', transform_test,test=True)
print(len(dataset_test))
# Label of the corresponding folder
 
for index in range(len(dataset_test)):
    item = dataset_test[index]
    img, label = item
    img.unsqueeze_(0)
    data = Variable(img).to(DEVICE)
    output = model(data)
    _, pred = torch.max(output.data, 1)
    print('Image Name:{},predict:{}'.format(dataset_test.imgs[index], classes[pred.data.item()]))
    index += 1
Copy the code

Running results:

The complete code: download.csdn.net/download/hh…