Abstract:This series of articles aims to share TensorFlow-> OnNX-> Caffe-> WK model conversion process, mainly for HI3516CV500, HI3519AV100 support NNIE reasoning framework of Heisch algorithm engineering implementation.

This article was shared from the Huawei Cloud Community, “Converting Models to NNIE Framework Supported WK Models — Taking TensorFlow Framework as an Example (1)”, originally by: WWWYX_ ^▽^.

Those of you who have used the NNIE framework know that the NNIE framework only supports the reasoning of the WK model.

Ruyistudio, a conversion software tool provided by Heath, was used to convert caffe 1.0 model into WK during actual use. Normally, if you purchase the chip, Hayes will send the relevant SDK package directly to the customer. If not, you can get it from this link: Ruyistudio

In order to use Ruyistudio, you need to convert other frameworks to Caffe. The current mainstream frameworks include PyTorch, TensorFlow, MXNet, etc. PyTorch to Caffe has already been written about PyTorch -> Caffe. Those of you who need to go look at it. This article focuses on the issues and solutions that may arise when the TensorFlow framework is converted to Caffe. MXNet has a direct interface to convert to OnNX, or you can use this article to convert to caffe.

So let’s get down to business.


This is a big hole (cry), I used the intermediate model ONNX, which means that the path to the final conversion is PB-> ONNX-> CAFFE-> WK. Here’s how it works

Step 1: tensorflow->onnx

This step is the easiest step = =, so far some models have not encountered pits here. Use the open source project on GitHub: Tensorflow-> onnx, directly after using the PIP install.

The main function of each parameter is to use NCHW or NHWC (caffe framework is NCHW, so here all use NCHW), OPSET (default use 9). Many parameters I have not used, you can directly go to the issues above.

Here is a conversion command for your reference:

Python-m tf2onnx.convert --input./model.pb --inputs input_image:0[1,112,112,3] -- as-nchw input_image:0 --outputs output_0:0,output_1:0,output_2:0,output_3:0,output_4:0 --output ./convert.onnx

Once you have the OnNX model, you can use OnNX Simplifer to combine some of the scattered operators, or to remove some of the redundant operators, as appropriate.

python -m onnxsim input_onnx_model output_onnx_model

After conversion to ONNX, it is necessary to verify whether the output result is consistent with PB, and then proceed to the following process!!

Step 2: onnx->caffe

We’ve got the OnNX model here, but we’re still 99% away from success!!

Baseline: Onnx2caffe

Environment: caffe 1.0 + onnx 1.8.0


+-- onnx2caffe

|   +-- _operators.py

|   +-- _weightloader.py

+-- convertCaffe.py

+-- MyCaffe.py

Run the command:

python convertCaffe.py ./model/MobileNetV2.onnx ./model/MobileNetV2.prototxt ./model/MobileNetV2.caffemodel

If you encounter problems in the conversion process, you can adapt from the following aspects,

(1) If you encounter operators that caffe and NNIE do not support, you can modify the node in the OnNX model to work with caffe (see PyTorch-> caffe blog here for some operator substitutions). (2) If you run into an operator supported by NNIE or OnNX, but caffe 1.0 does not support it, you can add a new layer to caffe, recompile it, and then do the conversion. (3) The conversion tool does not support the conversion of this operator. If the conversion code does not support the conversion of this operator, add the corresponding operator to the conversion code. (4) Operator conversion is successful in the transformation process, but shape problem occurs, manually add some operations that do not need parameters to the generated Prototxt.

Give a solution for each of the above methods.

Modify Node in the OnNX model to fit caffe

To rewrite the OnNX model, we first need to understand what operators OnNX supports.

OnNX supported OP: OnNX OP

When you change an operation in the model, look at the input and output modes of the node and rewrite the model according to the format. OnNX model rewriting involves a variety of situations. The following are some commonly used methods.

1. Rewriting a node sometimes requires knowing the size of its input and output, so first prepare an onnx model that contains the input and output of each node.

import onnx.helper as helper from onnx import shape_inference, TensorProto import onnxruntime import onnx def add_input_output_from_onnx(onnx_path, save_path): ONNX_DTYPE = { 0: TensorProto.FLOAT, 1: TensorProto.FLOAT, 2: TensorProto.UINT8, 3: TensorProto.INT8, 4: TensorProto.UINT16, 5: TensorProto.INT16, 6: TensorProto.INT32, 7: TensorProto.INT64, 8: TensorProto.STRING, 9: TensorProto.BOOL } # load model onnx_model = onnx.load(onnx_path) graph = onnx_model.graph # rewrite the input tensor of  graph input_tensor = graph.input[0] input_shape = input_tensor.type.tensor_type.shape.dim input_tensor_new = onnx.helper.make_tensor_value_info(name = input_tensor.name, elem_type = 1, shape = [1, input_shape[1].dim_value, input_shape[2].dim_value, input_shape[3].dim_value]) graph.input.remove(input_tensor) graph.input.insert(0, input_tensor_new) # append all tensor infos to graph input weight_infos = [] tensors = graph.initializer for i, tensor in enumerate(tensors): value_info = helper.make_tensor_value_info(tensor.name, ONNX_DTYPE[tensor.data_type], tensor.dims) weight_infos.append(value_info) graph.input.insert(i+1, value_info) # because 0 is for placeholder, so start index is 1 # run node shape inference node = graph.node value_info = graph.value_info inferred_onnx_model = shape_inference.infer_shapes(onnx_model) onnx.checker.check_model(onnx_model) inferred_graph = inferred_onnx_model.graph  inferred_value_info = inferred_graph.value_info onnx.save(inferred_onnx_model,save_path) return

Open the OnNX model using Netron to see what happened after adding size:

2. If Caffe and NNIE do not support operators, delete nodes in the ONNX model and carry out related operations in the external preprocessing stage. This situation only involves the deletion of existing nodes in the OnNX model and the change of existing edge connections, not the establishment of new edge relationships.

Delete graph node 0,1,2 and modify the input edge of node 3 input_image -- -> mul_1 -- -> sub --> mul -- -> conv1 Input_image -- --> conv1 'def delete_node(onnx_path, save_path): onnx_model = onnx.load(onnx_path) graph = onnx_model.graph Mul_1 = graph.node[0] sub = graph.node[1] mul = graph.node[2]  conv1 = graph.node[3] conv1.input[0] = Mul_1.input[0] graph.node.remove(Mul_1) graph.node.remove(sub) graph.node.remove(mul) onnx.checker.check_model(onnx_model) onnx.save(onnx_model, save_path)

3. Change operator not supported by Caffe and NNIE, and modify Node in OnNX model to match. For example, the squeeze operator, the squeeze operator will report an error when the ONNX-> caffe, then the squeeze operator in the ONNX model can be replaced by the reshape operator. Reshape requires two inputs, while squeeze corresponds to only one input, and a new constant tensor input needs to be created in the graph. This case involves replacing existing nodes, adding a new constant tensor, but does not involve the creation of new edge relationships.

The reshape tensor can not be reshaped from the ONNX graph. The reshape tensor can not be reshaped from the ONNX graph. The reshape tensor can not reshape the ONNX graph. Tsor size can be used to view the output size of this squeeze node in the ONNX model generated in the first step, namely input --> squeeze --> output becomes input --> reshape(shape). --> output` def remove_headpose_squeeze_node(onnx_path, save_path): SHAPE = ONNX.HELPER. SHAPE = ONNX.HELPER. SHAPE = ONNX.HELPER. SHAPE = ONNX.HELPER. INT64, [2], [1,3]) Graph. Initializer.append (shape) for I in range(len(Graph. Node)): if graph.node[i].op_type == "Squeeze": reshape_node_def = helper.make_node( 'Reshape', # node name inputs=[graph.node[i].input[0], 'shape'], # inputs outputs=[graph.node[i].output[0]], # outputs name = graph.node[i].name ) graph.node.remove(graph.node[i]) graph.node.insert(i, reshape_node_def) onnx.checker.check_model(onnx_model) onnx.save(onnx_model, save_path)

4. Caffe does not support DIV operators. You can convert DIV operators to POW + MUL. This involves replacing one node with two, adding a new constant tensor, and a new edge-join relationship.

Div operation: z = x/y

Replace with pow + mul, pow is power operation, mul is multiplication operation:

temp = pow(y, -1)

z = temp * x

'that is: input_x input_y \\ // \ // div changed to: Input_x input_y \\ // \ // \ Pow (constant tensor as index input) \\ // \\ // --> (new edge) mul 'def change_headpose_div_node(onnx_path, save_path): onnx_model = onnx.load(onnx_path) graph = onnx_model.graph pow_scale = onnx.helper.make_tensor('pow_scale', Onnx. TensorProto. FLOAT, [3], [1.0, 1.0, -1.0]) mul12_output = helper.make_tensor_value_info('pred_pose/ mul_12_power :0', onnx.tensorproto. Float, [1]) 3]) graph.initializer.append(pow_scale) # 'pred_pose/mul_12:0' is similar to the corresponding exponent tensor # that input_y # pow_scale created above 'pred_pose/mul_12_pow_output:0' for the newly created output tensor # pow name given to a node that does not duplicate the name mul12_pow_node_def = helper.make_node( 'Pow', # node name inputs=['pred_pose/mul_12:0', 'pow_scale'], # inputs outputs=['pred_pose/mul_12_pow_output:0'], # outputs name = 'pred_pose/mul_12_pow' ) graph.node.insert(len(graph.node), mul12_pow_node_def) for i in range(len(graph.node)): if graph.node[i].name == "pred_pose/truediv_3": input1 = graph.node[i].input[0] input2 = graph.node[i].input[1] output = graph.node[i].output[0] name = graph.node[i].name pow_node_def = helper.make_node( 'Mul', # node name inputs=[input1, mul12_pow_node_def.output[0]], # inputs outputs=[output], # outputs name = name ) print(graph.node[i].name, i) graph.node.remove(graph.node[i]) graph.node.insert(i, pow_node_def) break graph = helper.make_graph(graph.node, graph.name, graph.input, graph.output, graph.initializer) info_model = helper.make_model(graph) model = onnx.shape_inference.infer_shapes(info_model) onnx.save(model, save_path)

After this modification, use Netron to look at the Node edge relationship to see if it is correct.

5. To print the output of a node in the middle of onnx, add an Output Tensor to Graph.

def add_outputNode_info(onnx_path, add_name, output_size, save_path): onnx_model = onnx.load(onnx_path) graph = onnx_model.graph prob_info = helper.make_tensor_value_info(add_name,onnx.TensorProto.FLOAT, output_size) graph.output.insert(0, prob_info) onnx.save(onnx_model, save_path) return if __name__ == '__main__': onnx_model = './model.onnx' add_node_path = "./addPreprocessOutput.onnx" # "mul:0": Output name # [1,24,14,14]: Output size add_outputNode_info(onnx_model, "mul:0", [1,24,14,14], add_node_path)

The examples above have covered most of the Node modifications, and the code above can be used to modify the OnNX model.

Reshape can be used for different dimensions. In addition, transpose can also be a web celebrity node. In addition, it can be used for different dimensions

Add the corresponding operator implementation to the transformation code

There is nothing wrong with adding a new layer to caffe, just follow the link given above, which describes how to modify the transformation code to fit a model transformation. After the above step of modifying the OnNX model, we have replaced all nodes in the OnNX model with the operators supported by caffe and NNIE. However, there may still be problems with OnNx2caffe at this time. We will do code adaptation of OnNx2caffe from different situations to complete the model transformation step by step.

1. Both caffe and NNIE support an operation, but the onnx2caffe model transformation is wrong.

/caffe/ SRC /caffe/layers/ NNIE supports the TanH operation, but the conversion error is reported. When we look at the source code of onnx2caffe, we find that there is no conversion implementation of TanH. At this time, we need to add the corresponding conversion code. We mainly modify the two files _operations.py and _weightloader.py.

The _operator.py file is used to convert the onnx operation to the Caffe operation. For TanH adaptations, you first need to add TanH in the registration operator module at the end of the file, and then add the conversion code.

'def _convert_tanH(node,graph,err): input_name = str(node.inputs[0]) output_name = str(node.outputs[0]) name = str(node.name) layer = myf("TanH",name,[input_name],[output_name]) graph.channel_dims[output_name] = graph.channel_dims[input_name] return To Layer 'add a registration operator:' _ONNX_NODE_REGISTRY = {... "Tanh": _convert_tanH, }

The _weightloader.py file is used to pass node parameters from onnx to Caffe. The first step is also to add the registration operator at the end of the file by adding the same _operators.py. Step 2: Check if weight exists in the TANH operation from caffe.proto:

message TanHParameter {

  enum Engine {

    DEFAULT = 0;

    CAFFE = 1;

    CUDNN = 2;


  optional Engine engine = 1 [default = DEFAULT];


Because the tanh operation does not have weight, the argument from onnx to caffe is passed null:

def _convert_tanH(net, node, graph, err):


At this point, add tanh operation to onnx2caffe, the specific project includes the modification of the above two folders, mainly registration operator, operation conversion implementation, weight value transfer.

2. Both caffe and NNIE support an operation, which is also supported by onnx2caffe, but one of the inputs in the operation is written as weight in the model, which is inconsistent with the original implementation.

For example, MUL operator. Ordinary MUL operator generally contains two inputs. In the model, there may be a MUL operator with only one input, and the other input is used as the weight parameter, as shown below:

In this case, since there is already a MUL registration operator, we only need to add a new branch to implement it when the MUL operator is converted, and it still only involves the rewrite of two files.

_operator.py adds branching code

def _convert_input1_is_weight_mul(node,graph,max_dim, err): Node_name = node.name 'The input_name here needs to be observed in the Netron view to see which input is used as the external input. You cannot write the input name of weight here! ` input_name = str(node.inputs[0]) output_name = str(node.outputs[0]) scale_layer = myf("Scale", node_name, [input_name],[output_name],in_place=False,bias_term=False) graph.channel_dims[output_name] = max_dim return scale_layer def _convert_Mul(node,graph,err): Input_Name_List = [STR (I) for I in node. Inputs] output_name = STR (Node. output [0]) node_name = node.name If node_name == "mul_1": if node_name == "mul_1": Max_dim = 16 return _convert_input1_is_weight_mul(node,graph,max_dim, err

_weightloader.py also doesn’t need to be re-registered, just add the branch code

Def _convert_input1_is_weight_mul(net, node, graph, err): node_name = node.name 'Note!! Scale = np.ones(3) * 3.0 for external input size = (1,3), weight size = (1), in which case you can align weight with external input channel by numpy and there is another case where, For example, the external input size = (1,128,8,8), weight = (1,128,1,1) can be done like this: scale = node.input_tensors[node.inputs[1]] scale = np.reshape(scale, Scale. SHAPE [1]) 'scale = np.ones(3) * 3.0 np.copyto(net.params[node_name][0].data, scale, scale, Pass 'def _convert_Mul(net, node, graph, err) when casting='same_kind')' mul has no weight Node_name = node.name if node_name == "mul_1": _convert_input1_is_weight_mul(net, node, graph, err) else: pass

In the actual conversion process, the above situation will also occur to the add operator, where there is an input as the operator parameter. In this case, it can be analogous to the scale operation in _convert_BatchNorm. The weight of the scale is regarded as 1, and the bias is the internal input parameter of the add operator. You can change the code by referring to BatchNorm, which I won’t write in detail here.

The operator conversion was successful during the conversion process, but there was a SHAPE problem, so modify the PROTOTXT manually

The output of the feature map does not match the shape of the feature map. When the onnx2caffe tool is converting, the output of each layer is printed out. Locate the first problem node by comparing it to the Netron view.

Only when you know yourself and your enemy can you win every battle. In order to locate shapes, why are they inconsistent? We should first understand the padding strategies of different frames and the corresponding calculation methods of output size.

  • Output_size = Floor ((w+2*pad-(d(k-1)+1))/s)+1
template <typename Dtype> void ConvolutionLayer<Dtype>::compute_output_shape() { const int* kernel_shape_data = this->kernel_shape_.cpu_data(); const int* stride_data = this->stride_.cpu_data(); const int* pad_data = this->pad_.cpu_data(); const int* dilation_data = this->dilation_.cpu_data(); this->output_shape_.clear(); for (int i = 0; i < this->num_spatial_axes_; ++i) { // i + 1 to skip channel axis const int input_dim = this->input_shape(i + 1); const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1; const int output_dim = (input_dim + 2 * pad_data[i] - kernel_extent)/ stride_data[i] + 1; this->output_shape_.push_back(output_dim); }}
  • Caffe’s Conv Padding Policy is the same as TensorFlow Pad = Valid. The Conv Padding Policy is the same as TensorFlow Pad = Valid Pixel that cannot participate will be automatically removed from calculation.

OK, after understanding the padding strategies of different frames and the calculation methods of output size, let’s analyze our model. The model transformation is as follows:

Analyze the table parameters of the model transformation above:

  • TensorFlow pad=SAME. In order to make all input pixels participate in the calculation, TensorFlow surmisely adds a line of 0 to the bottom right of the input during inference, so that the final output is: Output size = (112 – (1 * (3-1) + 1) + 1) / 2 + 1 = 56
  • For ONNX, after query and experiment, it is found that the pads parameter [0,0,1,1] indicates that the top of the feature map is not supplemented, the left side is not supplemented, the bottom row of 0 is supplemented, and the right column is supplemented, which is consistent with TF, and there is no problem in the output.
  • After converting into caffe, the CONV pad parameters of caffe model are 0, which is not added. At this time, according to caffe’s outputShape formula, the final calculation result is (1,3,55,55). The last row and last column of input are directly removed and not involved in the calculation.

In order to make the output shape consistent, and the calculation results are the same, I used the following solution.

Set pad_h:2, pad_w:2 in caffe. Because caffe sets the pad parameter and then completes 0 symmically, that is, two rows or two columns of 0 are added to the input, then combined with the output_shape formula, the final output shape is:

output_shape = floor((112 + 2
2 – (1 (3-1) + 1) + 1) / 2) + 1 = 57

By considering the CONV principle, we can know that the feature map obtained by Caffe at this time is only more than the top row and the left column of TF. For a brief explanation, caffe sets pad=2, but according to caffe’s Conv implementation, the row and column in the bottom right that are more than TF are automatically removed from the calculation. At this time, the output of the feature map is (1,3,57,57). In order to get the correct result, add two slice operations after the conv operator in the prototxt file, and remove the top row and the leftmost column.

layer {

  name: "add_slice1"

  type: "Slice"

  bottom: "depthwise:0"

  top: "add_slice1/split:0"

  top: "add_slice1/split:1"

  slice_param {

    axis: 2

    slice_point: 1



layer {

  name: "add_slice2"

  type: "Slice"

  bottom: "add_slice1/split:1"

  top: "add_slice2/split:0"

  top: "add_slice2/split:1"

  slice_param {

    axis: 3

    slice_point: 1



The above is the adaptation of caffe model, many things are very complex, sometimes need some novel ideas to solve the problem, of course, also involves some prototxt file operator param modification, specific problems specific analysis, here will not be expanded to speak.

Step 3: Verify

The output result of caffe model will be compared with the output result of pb. In general, it should be the same. If not, we should focus on input preprocessing, output preprocessing, and whether the output of the node before the modified node is OK (mainly the problem of locating whether the modified node is OK or not). Don’t be impetuous, master the method. Every time a magic change to do a reasoning, so better positioning.


For TF to Caffe does have some trouble, the above may be listed only one of the ten thousand problems, but I hope to help everyone. Everybody is aiming at this respect what good idea hopes to be able to communicate more Austria ~

The onnx model magic change is probably unnecessary. It would have been better to write the related transformation directly into the onnx2caffe transformation tool, but I thought it would be easier to change onnx, and then I hope I have time to make the transformation tool more general

It is strongly requested that the algorithm students take a look at the operators supported by the NNIE framework before training the model!! For details, refer to the operator types supported in Section 5.3.2 of HisVP Development Guide and the specification supported by each operator in Section, so as to avoid the need to rework if the model transformation is not over!!

Click on the attention, the first time to understand Huawei cloud fresh technology ~