Hello everyone, I am Jin Xiaogang from Zhejiang University-Tencent Game Intelligent Graphics Innovation Technology Joint Laboratory. The topic I share with you today is “Automatic Skin System Based on Heterogeneous Graph Neural Network”. This work is completed by Zhejiang University and Tencent Photon Studio Group.

In computer animation, we need a variety of characters to show the story. To Disney and pixar, this year’s latest animated feature “Luca Luca the protagonist, for example, digital modeler will first for these roles, modeling design, general character design begins with sketches and clay sculpture artists, to make each role just right, and then the digital modeler create virtual 3 d model for the role of, Sometimes models are digitally scanned for reference. Digital modelers must understand anatomy because the position of bones and muscles affects the surface shape.

In computer animation, digital characters usually move. How do you make them move? That involves using a technology called Rigging. What is a Rig? A Digital Rig is a virtual skeleton, joint, and muscle that allows a Digital model to move. It’s kind of like the rope on a marionette. A good Rig has just the right amount of flexibility, but without the right controls, Animators will not be able to create the position they need, but too much flexibility and can make the model pose too time-consuming, rigging is most common in the games and movie animation role, this technique simplifies the animation process and improve the production efficiency, to any 3 d objects and bones after binding, we can according to the need to control them and change their shape. In the digital entertainment industry, bone binding is almost the standard way of animating characters, and is an extremely important step. Achieving smooth and complex animation depends entirely on the quality of the rigging stage in the animation pipeline.

Rigger, or character binding artist, starts with a virtual 3D model of a character. They figure out how a character needs to move based on a story, like Randall in Monsters, Inc., who moves like a chameleon, but also walks on two legs. Character binding artists break those movements down into individual elements, And create hundreds of control points that animators can use to create poses. In this illustration, the character binder starts with a wireframed model on the left and adds a virtual skeleton on the right so that Sulley, the main character from monsters inc., can pose.

Rigging in the field of movies and games are widely used, the picture above is the role of some typical binding example, the left is the binding, in the movie kung fu panda skeleton binding techniques are used to all the role of the right is in the game example, both “fortress nights” and “tianya moon sword”, all USES the technology of role bindings.

The binding technology can drive the animation of the model with the skeleton, which has the advantages of wide application range, wide action range, fast operation speed and good effect. For example, the skeleton of the hand on the left can be adjusted to make the hand present a variety of states, while the skeleton on the right shows two different models of a horse and a human, which means that the technology can be used not only for mannequins, but also for other non-humanoid characters.

During the binding process, the animator first creates a set of skeleton for the model in T-pose, as shown in the second step of RIG in the figure. After that, the skeleton is associated with the vertex of the model with a certain weight of Skin, which is the third step of Skin. Then, the movement of the character skeleton can drive the movement of the model.

Bone-binding technology has a wide range of applications, not only for animation and games, but also for social media, virtual reality, virtual hosting, intelligent manufacturing, and any other field that requires avatars. Tencent’s Next Matt AI project, for example, also uses bone-binding technology.

Back to the core content of the paper, first talk about our study motive: to determine the weighing values of skin in bone skin is the most time consuming, in order to achieve the fine control of bone on the model, the animators usually adopt the way of manual fine adjustment skin weights, even a professional animator, to get a satisfactory results of skin is also very time consuming. Our first motivation was to take advantage of the animators’ experience. The second motivation is that in our practice, we found that different bones have different effects on model regions of different sizes. For example, the bones in the spine of a character will affect the vertices of the torso far away from it, while the bones in the fingers will only affect the vertices within a very close range. Therefore, we need to find an algorithm for automatic skin taking into account the differences between the bones and the vertices. The third motive, in some complex models have some bones may be placed outside the model, such as the blue bone sample model, there are even some models may not by multiple connected components, such as the right side of the model of the arms and legs, as a result, we need to find a robust algorithm, can deal with these general conditions.

In the following lecture, I will first introduce relevant work, then introduce our methods and experiments, and finally carry out some relevant discussions.

For automatic skin researchers have explored a lot of methods, these methods can be roughly divided into geometric features, geometric features and data-driven methods. Methods based on geometric features, namely the upper part of PPT, such as Pinocchio Pinicchio and voxel geodesic distance, use specific mathematical equations to calculate skin weight. The disadvantage of these methods is that they make artificial assumptions about the distribution of skin weight, but in fact the distribution of skin weight does not necessarily conform to these equations. The data-driven approach, also known as the second half of PPT, is to learn from a series of models by animator Skin, so as to obtain anatomical information of the models to overcome the above shortcomings. RigNet, for example, uses a graph neural network as the equation of skin weight. All the above methods regard skin weight as the equation of some features of model vertices, but these methods have the following limitations: Although RigNet can process any model, differences of bones are not taken into account. NeuroSkinning method learns different skin weight equations for different bones, but NeuroSkinning can only be used for fixed human skeleton structure, not for non-human skeleton model.

Another related field is graph neural network, in recent years, the application of graph neural network in computer graphics is a popular direction in recent years. Graph neural networks fall into two categories: Spatial networks can process graphs of different topologies, such as MeshCNN. Spectral networks can only process graphs of one topological structure, such as AutoEncoder for Mesh. Similar to MeshCNN, our network can be applied to Mesh models with different structures. On the other hand, heterogeneity, that is, graph with different types of nodes and edges, is a hot direction in graph neural network. However, researchers have made few attempts in this aspect.

We put forward methods of automatic skin including the contribution of two key points, first of all, we built a heterogeneous map neural network, it can not only consider the characteristics of the different model vertex, also can consider different skeletal characteristics, in this network, we use a new figure network operation to associate will not be the same kind of node. Secondly, we propose a distance equation, HollowDist, to calculate the relationship between the model vertices and the skeleton. This distance equation can deal robustlywith the skeleton in vitro and the model grid structure composed of unconnected parts.

Let’s briefly outline our approach. Given a character we have a mesh model of it and a skeleton structure of it, which can be viewed as a graph structure.

Then for each vertex on the model grid and skeleton pair in the skeleton, we calculate the HollowDist between them. This distance function is used to connect the model grid graph and skeleton graph to build a heterogeneous graph.

Our network performs a series of graph convolution operations on this heterogeneous graph, including convolution within and between two homogeneous graphs of model grid graph and skeleton graph, and finally inferences the skin weight corresponding to each vertex and skeleton.

Below we will elaborate on the method, first introduces HollowDist its calculating process, function is used to calculate the distance from the model on the mesh vertex distance to the character skeleton bones, we hope to find a kind of the distance from the bones to the grid vertex, the path cannot penetrate with the grid, and then we to path length as bone to the vertex distance, In order to consider the shape of model mesh simultaneously, we get inspiration from the calculation of voxel geodesic distance.

First of all, in the upper part of PPT, we voxelized the surface of the model grid. This operation divided the space into three parts: the grid voxel, which were marked by bold borders in the figure; Bone voxels they’re white in the picture; And other empty squares, which I’ll do in gray here. Here we voxelized the surface of the model instead of the inside of it, so that the position relationship between the skeleton and the mesh can be ignored, that is to say, the skeleton outside the model can be processed robustly.

Then, in the lower part of PPT, we can use the Breadth First Search (BFS) to find a path to the grid voxel from the bone voxel. This process is shown in the figure on the right, and the color depth of the voxel represents the distance from the bone at the lower left corner.

In the previous step, there may be some breadth first search voxel cannot traverse to the grid, this is because the model grid caused by not connected parts, in the voxel, we have a voxel traversal to grid from the edge to restart breadth-first search, until all the grid to traverse the voxel, eventually the lower part is the PPT, Based on the distance between the voxels calculated in the previous step, we can calculate the distance between the bones and the vertices. The formula is listed below.

We begin to build the next figure (PPT) of the upper of the given a model of a grid and skeleton and at the same time, we build for them figure, model grid Gm of each node corresponds to the model grid vertex, it is also a model of the grid while, skeleton model figure of Gs build and the skeleton forms a slight difference, Its nodes represent the bones in the skeleton, and its edges represent the joints that join the bones together. After constructing the model grid diagram and model skeleton diagram, we can use the calculated HollowDist to connect the two diagrams composed of model grid and model skeleton to build the heterogeneous graph. According to the near and far relationship of HollowDist, we can connect the nodes represented by vertices with their nearest K skeleton nodes. That is to say, We assume that a node is only affected by its close to K bone in this figure, we give a series of property as a figure of the neural network input, the lower part (PPT) for the model grid node, its attributes is the representative of the position of the vertices, and with it, the reciprocal of the recent K bone HollowDist, For nodes in the skeleton diagram of the model, its property is the vertex positions of the joints at both ends.

Our heterograph neural network operates on the heterograph just constructed, and its structure is shown in the figure above. Intra-graph Conv represents intra-graph convolution and Inter-graph Conv represents inter-graph convolution. The two convolution operations are carried out in the model grid Graph and model skeleton Graph respectively. And figure between convolution operation between the model grid graph and skeleton diagram, in after will detail the two kinds of operation, the two kinds of convolution operation could extract the corresponding node and the bones of the corresponding node local characteristics, and we also use pooling to extract the characteristics of global operation, the joining together of global features and local features, then use a multilayer perceptron for processing, You can get the final skin weight.

Graph convolution can aggregate the information of nodes and their connected similar nodes. We build graph convolution module based on EdgeConv. The convolution operation in model grid diagram and model skeleton diagram is slightly different. For the model grid diagram (lower part of PPT), the experience domain of a node is affected by the subdivision degree of the model. To solve this problem, we use the convolution operation introduced in RigNet, which performs EdgeConv operation on the node, its geodesic distance neighbor and its connected neighbor at the same time. The result is spliced and processed by a multi-layer perceptron to obtain the final result. The geodesic distance neighbor of a node is the node within a certain range of its geodesic distance.

In inter-graph Convolution module, nodes represented by vertices and nodes represented by bones converge and exchange information with each other. Due to the imbalance between model grid Graph and model skeleton Graph, a general model grid has thousands of vertices, while the model skeleton has only dozens of bones. And from vertex nodes to bone node design different convolution operation, from bone node to the vertex nodes (PPT) of the upper of the convolution operation because one vertex node with the most similar K bone, simple features of them together, it generated characteristic length is fixed, As for the convolution operation from vertex node to bone node (lower part of PPT), due to the different number of vertices affected by bone, similar operations cannot generate features of fixed size. Therefore, we turn to extracting the maximum, average and variance of features of vertex nodes affected by bone. The above two convolution operations add a multilayer perceptron after the feature splicing operation to ensure nonlinearity.

At the end of the method, we introduce the design of the loss function, which is composed of two items. The first is the Data Fitting Term, which means to make the predicted result close to the real result. For the vertex, the skin weight of all bones is positive and their sum is one. So instead of using simple L2 distance, we used KL divergence to calculate this Term. In addition, we want the skin weight to be smooth on the mesh surface of the model, so that the skeleton animation of the model can look smooth. So we added a Smoothing Term Term. We use a Laplace matrix to compute this term.

The data set we used is ModelResource-rignet-V1, which contains a series of different types of models, such as humanoid, bird and fish, with completely different topologies and skeletons. In some models, there are even exoskeletons and disconnected parts, but we can handle all of these models.

Next, we show the results of our method. We use hand-built bone animations to drive the bound model, and you can see from the results that our method produces results that are very close to the truth value.

Shown here in comparison with other methods, our method first is qualitative results above (PPT) in a column above, our method can better deal with the end of the skirt and the tail of a horse at the end of the parts, this is because our method to extract the characteristics of corresponding to the bone, and so in your bones and horsetail, skirts bone with different skin weight equation, In the lower column, our method has a smaller error in the joint region, because our method learns the relationship between bones, which cannot be achieved by previous methods. In terms of quantitative results (at the bottom of PPT), our method is superior to other methods in various quantitative indicators. Our accuracy index is 83.04%, recall rate is 81.11%, L1-norm is 0.3269 and distance error is 0.005682, all of which are better than the current SOTA method.

Our method is robust to the extracorporeal bone and the model composed of unconnected parts. The extracorporeal bone is highlighted in red in the image above. Our HollowDist can handle both in vivo and extracorporeal bone and is also robust to the unconnected part.

At the end of the experiment, some ablation experiments were performed to ensure that every part of the network was effective. The quantitative results were presented at the top, and the results of using and not using smoothing terms in the loss function were shown at the bottom. It can be seen that smoothing terms smoothed the distribution of skin weights on the surface of the model grid.

We integrated this approach into Maya and developed a Maya plug-in that automatically skates the mesh model and skeleton by importing the mesh model and skeleton. The plug-in provides the ability to select the skeleton that needs to be skated, making it easy for the user to add auxiliary skeletons. For a model with 5000 vertices and 50 skeletons, Our method takes only a minute or so to complete the processing.

Of course there are some limitations to our current method, for example Complex model due to the lack of Rigging data, paper processing model is not very complex, in the future we will be more complicated method to test our model, because models can used in the animation world than a more complex model we use experimental data set, Secondly, we also hope to improve the calculation speed of HollowDist, which is the time bottleneck in the whole process. Finally, we also hope to explore more features between model meshes and model skeletons.

Ok, so let’s finish with a brief summary of what we’ve done. First, we propose a heterogeneous graph neural network method to automatically estimate the skin weight of the role bone binding. Our method intergraph convolution operation allows feature aggregation between heterogeneous nodes, so our network can extract vertex and bone features. Second, based on a new distance calculation method called HollowDist, our method can deal with models containing multiple disjoint parts or outside the skeleton. Our work was presented at this year’s ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games and received very positive reviews. We believe that our work has taken a major step forward in addressing an Open Challenge. Well, that’s all for my report. Thank you for listening.

Q & A

Q1: AS far as I am concerned, the models handled by this method are not very complicated. Are there any feasible solutions for complex models?

A (Jin Xiaogang) : If the model is very much vertices We can simplify the complex model first, simplify the relatively simple model after we using our method for processing, if the model is more close We by the interpolation method to get the value of other vertex at the moment we have a problem We need very complicated model calibration data, And this is also the world at present are short of a data set, in the future, if our whole industry can contribute some very professional rigging result, and then build a very big a virtual cube, so that we can also deal with some very complex models, but also can get some more good results.

Q2: Is this currently being used?

A (Jin Xiaogang) : We have just introduced that our method has been developed into A Maya plug-in, which is now being applied in Tencent Photon Studio. This method can help animators reduce the time of bone binding and improve their production efficiency.

Q3: After using this method, is the art completely acceptable to the results of the automation?

A (Jin Xiaogang) : Is generated by our method, in fact the results with the fine arts do have very close, but may also in the application of some art think some parts need to adjust, but the corresponding workload is small, so we can put our results as a better initial value, then art can spend less time to adjust the results, This can achieve his desired and desired bone-binding results very quickly.