Yolov3-voc. CFG parameter description is very detailed

For the [net] # [XXX] layer, the first line represents the layer and the next line contains the parameters for that layer. For the particular layer, the whole network # Testing # and the batch and subdivisions are generally1
batch=1                 
subdivisions=1For batch and subdivisions, you need to select them yourself.# Batch =64 # Batch is the number of images sent to the network in each iteration.Increasing batch allows the network to complete an epoch in fewer iterations. #1A epoch is trained once using all the samples in the training set. # Under the premise of fixed maximum number of iterations, increasing batch will extend the training time, but it will be better to find the direction of gradient descent. # If the video memory allows, the batch can be appropriately increased to improve memory utilization and training effect. Generally, the larger the batch is, the better. The value of # needs to be selected constantly. If it is too small, the training will not converge enough, and if it is too large, the training will fall into the local optimal.And # subdivisions=16After finishing each iteration, pack the iteration together to complete one iteration. #1Iteration is a training session using batch samples. Use batch as a multiple of subdivisionsThe greater the # subdivisions, the less CPU/GPU stress.The larger the number of groups, the smaller the sample size of each group, Graphics card pressure will be reduced accordingly) ---------------------------------------------------------------------------------------------------------------- width=416Input image width height=416As the video memory allows, the larger the width and height, the better the recognition effect for small objects.# width and height affect the network's resolution of the input image, thus affecting the precision, can only be set to a multiple of 32,# because the subsampling parameter is32, so images of different sizes are also selected as32A multiple of (320.352...608), minimum320The biggest608.# width and height can be different.

channels=3# Input the number of channels in the image,3For RGB color images,1Is the grayscale image,4For RGBA figure, Channel A represents transparency ---------------------------------------------------------------------------------------------------------------- momentum=0.9Note: One drawback of SGD method is that its update direction completely depends on the current batch calculated gradient, so it is very unstable.The # Momentum algorithm borrows the concept of momentum from physics, which simulates the inertia of an object in motion,When updating, the direction of the previous update is retained to a certain extent, while the gradient of the current batch is used to fine-tune the final update direction. In this way, stability can be increased to a certain extent, so that learning can be faster, and there is a certain ability to get rid of the local optimal. decay=0.0005# Weights decay regular terms to prevent overfitting. Large weights will result in overfitting of the system and decrease its generalization performance. Therefore, in order to avoid overfitting, a penalty term will be added to the error function. # The usual penalty term is the sum of the square of the weight of ownership multiplied by a attenuation constant. To punish large weights. The weight attenuation penalty term causes the weight to converge to a smaller absolute value, thereby penalizing a larger weight.The larger the # decay parameter is, the stronger the overfitting suppression is
----------------------------------------------------------------------------------------------------------------
angle=0# Data enhancement parameter, by randomly rotating [-angle, Angle] degree to generate more training sample saturation =1.5# Data enhancement parameter to generate more training samples by adjusting saturation exposure =1.5# Data enhancement parameter, by adjusting exposure to generate more training samples Hue =1.# In each iteration, new training samples will be generated based on Angle, saturation, exposure and hue to prevent overfitting. ---------------------------------------------------------------------------------------------------------------- learning_rate=0.001# Learning rate determines the speed of weight updating. If the value is too large, the result will exceed the optimal value; if the value is too small, the rate of decline will be too slow. The learning rate determines how fast the parameter moves to the optimal value. If the learning rate is too high, the function may pass the optimal value and fail to converge or even diverge. On the contrary, if the learning rate is too small, the efficiency of optimization may be too low, # the algorithm cannot converge for a long time, and it is easy to make the algorithm fall into the local optimal (non-convex function cannot guarantee to reach the global optimal). # The appropriate learning rate should be convergent as soon as possible under the premise of ensuring convergence. Setting a good learning rate requires constant attempts. At the beginning, you can set the weights to be larger, # this will make the weights change faster. After iterating a certain number of epochs, you can manually reduce the learning rate. # Generally, dynamic learning rate is set according to the number of training rounds. In YOLO training, network training160Epoches, # Initial learning rate is0.001In the60and90In EPOchs, divide the learning rate by10. # Near end of training: learning rate decay should be at100Or more times. In the actual training process, the learning rate should be adjusted dynamically according to the change of Loss and other indicators. After the training is over, the manual parameter adjustment can be completed by pressing CTRL + C to modify the learning rate and load the saved model just now to continue the training. The adjustment is based on the training log. It indicates that the learning rate is too large, and it is appropriately reduced to1/5.1/10If loss is almost constant, # the network may have converged or fallen into a local minimum. At this time, the learning rate can be appropriately increased. Pay attention to training for a long time after adjusting the learning rate each time. # The actual learning rate depends on the number of Gpus. For example, your learning rate is set to0.001If you have4Block GPU, the true learning rate is0.001/4. burn_in=1000For iterations smaller than burn_in, there is a way for the learning rate to be updated, for iterations larger than burn_in, Adopt the policy of update way # * * * here before updating way don't know what the * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * max_batches =50200# Batch policy=steps # Adjust the study rate: constant, steps,exp, poly, step, sig, RANDOM, constant, etc. Steps =40000.45000     
scales=1..1.            # Steps and scale are set to change the learning rate. For example, when the iteration reaches 40000 times, the learning rate decreases by 10 times.
                        #   45000In the next iteration, the learning rate will decay on the basis of the previous learning rate10Times. ---------------------------------------------------------------------------------------------------------------- [convolutional] # The configuration description of the convolutional layer, up to the next [XXX], is the configuration of that layer batch_normalize=1# whether to do BN processing, about BN, HTTPS://www.cnblogs.com/eilearn/p/9780696.html
filters=32# The number of convolutional kernels is also the number of output channels, the number of output feature graphs size=3# Convolution kernel size, here is3*3
stride=1# Convolution step pad=1# if pad is0, padding is specified by the padding parameter. # if pad is1, the padding size is size/2Round down, such as3/2=1. Activation = Leaky # Network layer activation function# Logistic, Loggy, Relu, Elu, relie, plse
                        # Hardtan, Lhtan, Linear, Ramp, Leaky, Tanh, Stair
       
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
----------------------------------------------------------------------------------------------------------------
[shortcut]              # Shotcut layer configuration description
                        # shortcut is convolution cross-layer connectivity like the one used in Resnet,The input and output of this layer are generally consistent, and no other operation is performed, only the difference is calculated. from=- 3The # argument from is −3"Means reciprocal before the shortcut layer3The output of the layer network serves as the input of this layer. Activation =linear #//cloud.tencent.com/developer/article/1148375-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --... # there are several layers here, the same parameters, do not repeat...... . ---------------------------------------------------------------------------------------------------------------- [convolutional] # YOLO Configure the convolutional layer in front of the layer1
stride=1
pad=1
filters=75# Filters in this layer need to be set according to the formula# filters = (classes + 5) * anchors_num
                        # classes is the same as the number of classes in the yOLO layer below
                        #   5Is the meaning of4In this paper, TX, TY, TW,th,to # anchors_num are the number of prediction boxes in each cell in YOLO, and in YOLOV3 are3# where filters = (20 + 5) * 3 = 75activation=linear ---------------------------------------------------------------------------------------------------------------- [yolo] In yoloV2, yOLO layer is called region layer mask =6.7.8# Use the index of Anchor from0To start,6.7.8Indicates that the last three anchor anchors defined below are used =10.13.16.30.33.23.30.61.62.45.59.119.116.90.156.198.373.326# Initial width and height of the predictors, # first is W, second is H classes=20# Number of classes num=9# Grid Cell predicts a total of several boxes, the same number of anchors. Num Jitter = needs to be increased when you want to use more anchors3.YOLOv2 uses crop, filp, and Angle ignore_thresh = in net layer. 5# ignore_thresh specifies the threshold size of the IOU participating in the calculation. # When IOU is greater than ignore_thresh, the detection box will not participate in loss calculation. Otherwise, the detection box will participate in loss calculation. # About IOU, HTTPS://www.cnblogs.com/darkknightzh/p/9043395.html# Parameter purpose and understanding: The purpose is to control the size of the detection box involved in loss calculation, when ignore_thresh is too large, # close to1, the number of regression loss participating in the detection box will be relatively small, and it is also easy to cause overfitting; # If ignore_thresh is set too small, then the number of detection box regression losses involved in the calculation will be large. # It's also easy to underfit when checking box regression. # Parameter Settings: Generally selected0.50.7Between truth_thRESH =1        
random=1# for1Open random multi-scale training, for0Tip: When random multi-scale training is turned on, the network input dimensions width and height set above are actually not effective.# width= 320 to 608 and width=height,# every10The training range of random scale can be modified according to your own needs. This increases the batch ---------------------------------------------------------------------------------------------------------------- [route] Layer = layer = layer = layer = layer = layer = layer = layer = layer4 -# Place the bottom of this floor4The output of the layer is treated as the output of this layer # if layers =- 1.61Represents the layer above this layer and the entire network61The output phase of this layer is used as the output of this layer52 x 52 x 128In the first61The output of the layer is52 x 52 x 256The output of this layer is52 x 52 x (128+256), so the weight and height of the two layers of the route must be equal # If not, then the output of this layer is0 x 0 x 0# will report Layer before convolutional must output image. Then stop * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * [upsample] stride = # on the sample layer2# Convolution step size# https://www.cnblogs.com/hls91/p/10911997.html
# https://blog.csdn.net/qq_35872456/article/details/84216129
# https://blog.csdn.net/phinoo/article/details/83022101
Copy the code
Yolov3-voc. CFG parameter description is very detailed

Related Posts

Hands-on deep learning – Implement linear regression manually from zero

Image segmentation based on GMM-HMRF image segmentation

Paper Hot In 2021 Transformer(ViT)