introduce

This part is flower classifier.

The flower classifier uses language: Python, deep learning framework: PyTorch, method: training convolutional neural network

The basic usage of PyTorch can be found on the blog PyTorch Notes

Gitee Warehouse: Flower identification

Github Warehouse: Flower identification

For Git usage, see the blog: Git Usage Notes

There will be more detailed articles on some of these later

The data set

At present, data of 20 flower species are selected for classification

The data folder in the warehouse stores the data sets of 20 kinds of flowers I use. It will continue to expand.

Data sources mainly depend on three aspects:

  • Five flower datasets, each containing between 600 and 900 images
  • Derived from the Oxford 102 Flowers dataset, which contains 102 categories of UK flower data, each containing between 40 and 258 images
  • The last part comes from Baidu Pictures, which uses Python program to collect flower image data in batches

Some flower names I write myself, using the scientific name of the flower, usually in Latin.

The data of 20 flowers I selected are as follows:

Serial number name The name of the The number of
1 daisy Daisy 633
2 dandelion The dandelion 898
3 roses The roses 641
4 sunflowers sunflower 699
5 tulips The tulip 799
6 Nymphaea Water lily 226
7 Tropaeolum_majus nasturtium 196
8 Digitalis_purpurea foxglove 190
9 peach_blossom The peach blossom 55
10 Jasminum The jasmine 60
11 Matthiola violet 54
12 Rosa Chinese rose 54
13 Rhododendron azalea 57
14 Dianthus carnations 48
15 Cerasus The cherry blossom 50
16 Narcissus daffodils 52
17 Pharbitis Morning glory 46
18 Gazania The medal of chrysanthemum 108
19 Eschscholtzia California poppy 82
20 Tithonia Swollen handle chrysanthemum 47

Floral style:

Extend the data

The number of collected flowers of each type is not very large, and for example, cherry blossom and daffodil, there are about 50 pieces in each category, so the amount of data is too small. If directly used in the training model, the accuracy rate will not be too high, and serious over-fitting will occur.

Currently, there are three data expansion methods in use: mirror flip, up-and-down flip and salt-and-pepper noise.

Mirror flip: Flip an image left to right to generate new data

Flip up and down: Flip the image up and down to generate new data

Salt and pepper noise: Add noise to the image and generate new data

The number of flowers after expansion is as follows:

Serial number name The name of the The number of Incremental quantity
1 daisy Daisy 633 2496
2 dandelion The dandelion 898 3588
3 roses The roses 641 2400
4 sunflowers sunflower 699 2796
5 tulips The tulip 799 3196
6 Nymphaea Water lily 226 1808
7 Tropaeolum_majus nasturtium 196 1568
8 Digitalis_purpurea foxglove 190 1360
9 peach_blossom The peach blossom 55 440
10 Jasminum The jasmine 60 480
11 Matthiola violet 54 432
12 Rosa Chinese rose 54 432
13 Rhododendron azalea 57 456
14 Dianthus carnations 48 384
15 Cerasus The cherry blossom 50 400
16 Narcissus daffodils 52 416
17 Pharbitis Morning glory 46 368
18 Gazania The medal of chrysanthemum 108 464
19 Eschscholtzia California poppy 82 656
20 Tithonia Swollen handle chrysanthemum 47 376

Data segmentation

The data set is ready to be shred into training set, validation set, and test set.

In PyTorch torchvision package contains data on a computer vision ImageFolder reading class, its invocation style is torchvision datasets. ImageFolder, main function is to read the image data, and require the picture is below the deposit.

Then call the class like this:

train_dataset = ImageFolder(root='./data/train/',transform=data_transform)
Copy the code

Root indicates the root directory, and transform indicates the data preprocessing mode.

In this way, all images in cat and dog folders under train directory are used as training sets, while the folder names cat and dog are used as label data for training.

So we’re going to shard the data set as required by ImageFolder.

I cut it 3:1:1. In fact, if you don’t want to slice out the validation set, you can annotate the code portion of the validation set, or you can use the training set and test set directly.

# proportion
scale = [0.6.0.2.0.2]
Copy the code

At this point, the data part is ready.

Model training

AlexNet and VGG16 are currently used. In fact, the two networks are similar, but VGG16 is more “deep” than AlexNet.

AlexNet network structure is as follows:

VGG16 network structure is as follows:

Compared with the two, VGG16 has higher accuracy, which shows that deeper network has certain help to improve accuracy.

The accuracy changes during AlexNet training are as follows:

The accuracy of VGG16 after 200 EPOCH training is as follows:

AlexNet was able to achieve an 83% accuracy after 500 epoch training sessions

After 200 epochs, VGG16 achieved 90% accuracy

I have put all the model parameters obtained from the above two trainings into the warehouse

Model validation

In addition to verifying test sets, images can also be used to verify the training effect of models.

VGG16 network with good verification effect was selected, and the parameters read were 200 epoch-trained parameters

It can be seen that the test effect is very good, and the model can very accurately judge the species of flowers.

A complementary

If you happen to have a cloud server and want to build a Web server, you can try the Flask framework (you can also use flask locally, but this doesn’t make much sense).

After running it on the server, follow the procedures in the Flask folder, then open a new web page and type IN IP: Port? The address of the picture can be used for identification.

Sjcup. cn is my domain name, which can be replaced by the public IP address of the server

Another problem is that the image name cannot be Chinese, otherwise it will not be detected

The problem that the public IP address cannot be accessed can be modified according to the blog

Next step

  • Amplified data sets can identify more species of flowers
  • Adopt new network training, such as Inception V3