Share with me a recent pothole experience

These days in the realization of a semantic segmentation loss

The loss takes into account the edge, consistency of results and other factors as shown in the figure

The most complex part of Loss

Due to the complexity of the formula, we decided to use PyTorch, which has a similar style to Numpy

And since Torch is a dynamic diagram and Python’s for loop is slow, we’re going to do all the Tensor operations.

I think it should be about the same as Numpy with medium difficulty and a day of API familiarity is enough

A set of images, ground truth and proB, which pretends to be the probability matrix of segmented network results, are prepared as test cases before construction begins.

Image, ground truth and proB pretending to be segmentation network results

It says, “I found the processed image is not correct at all. Visualize it. This is what happens when ground truth is transformed into FloatTensor


The reason is that passing a bool to ‘torch.FloatTensor’ numpy Array, i.e. ‘Torch.FloatTensor(Bool_Ndarray)’, would be such a jumble

FloatTensor(NP. Float32 (Labels)) ‘

Continue to write that Torch does not support [::-1] and flip? So write a flip

And then you have all the Tensor operations, and when you have a complex equation, that means you have super multidimensional select, index, latitude transformation, latitude matching, and if you have bugs it’s really hard to analyze

Such as:

otherSideEdgeLossMap = -th.log(((probnb*gtind).sum(-3)*gtdf).sum(-3)/gtdf.sum(-3))
Copy the code

Writing in the dark, debugging is complicated and slow.

To do a good job, he must sharpen his toolsCopy the code

I changed the tool code yllab, which had previously only supported Numpy visualization, to add support for Torch.

Print xx.shape,xx.mean(

Finally, loss was finally written, and the visualized Loss Map was in line with the expected effect and also very beautiful!

Left :crossEntropyMap Right: edgeLossMap

The last step is to change the probability matrix proB into Variable to test the back propagation. I am sure that the work will be finished soon

Change Variable(prob) to loss. Backward () The Error? Variable can’t play with Tensor! What’s the difference between grad Variable and Tensor? No words, then all Variable

Click, Tensor and Variable parts of the API are different like (‘.type ‘). For compatibility, add Variable to all functions and convert to variable.data

Tensor/Variable = 8 cudA/CPU = 2*8*2=32 classes The eight data types do not support automatic conversion, so all 32 data types are mutually exclusive

Conversion is required between different types of operations, and the scary thing is that conversion operations have many pits like those mentioned above!

Change the code, back propagation, quickly visualize prob.grad

Wool!!!!! All white analysis of grad 99.97% are nan, other people loss is a gradient how can you become nan! The math doesn’t make sense

Visualization analysis of logA (Grad) results

Then began the long DEBUG road, and finally continued to open the Loss

After Backpropagation, the murderer accurately located the loss leading to Nan

The pyTroch BUG was submitted to PyTorch’s GitHub

X.grad should be 0 but get NaN after x/0 · Issue #…

The BUG is as follows

Reproduction BUG code

x = Variable(torch.FloatTensor([1.,1]), requires_grad=True) div = Variable(torch.FloatTensor([0.,1])) y = x/div # => y is [inf, 1] mask = (div! =0) # => mask is [0, 1] loss = y[mask] loss.backward() print(x.grad) # grad is [nan, 1], but expected [0, 1]Copy the code

Blocked by ‘mask’, ‘x[0]’ is not in the calculation at all so ‘x[0]’ gradient should be zero but returns’ nan ‘

I also provided a solution to the BUG:

Your code should’t generat any inf in forward, which often produce by torch.log(0) and x/[0, ]


That means 0 should be filtered before do torch.log(x) and x/div

The code gets more complicated to avoid this BUG

variables

├─ ├─ exercises ([1, 2, 300, 400]) ├─ Exercises ([1, 2, 300, 400]) ├─ Exercises Torch.Size([1, 300, 400]) Torch. Cuda. The torch. The Size ([1, 8, 2, 300, 400]) torch. Cuda. FloatTensor └ ─ ─ GTDF: torch.Size([1, 8, 300, 400]) torch.cuda.FloatTensor th = torch tots = lambda x:x.dataCopy the code

code(before)

otherSideEdgeLossMap = -th.log(((probnb*gtind).sum(-3)*gtdf).sum(-3)/gtdf.sum(-3))
otherSideEdgeLossMap[~tots(edge)] = 0
Copy the code

code(after)

numerator = ((probnb*gtind).sum(-3)*gtdf).sum(-3)
numerator[tots(edge)] /= gtdf.sum(-3)[tots(edge)]
numerator[tots(edge)] = -th.log(numerator[tots(edge)])
otherSideEdgeLossMap = (numerator)
otherSideEdgeLossMap[~tots(edge)] = 0
Copy the code

Finally, the code is adapted into multiple batch versions and the segmentation network runs smoothly. The most useful tool is pyTorch for dynamic images, which is flexible and easy to debug and visualize. I’m afraid it would be bloody to debug static images. I had a lot of trouble extracting an MXnet Featrue.

But with static diagrams, you might not have to do so many matrix operations for efficiency, so it’s easier and more straightforward to define the network structure for the for loop.

After all loss backpropagation, ProB’s grad

Written in 2017.12


The torch version: 0.3