Hello everyone, I am Sun Danlu, technical artist from EpicGames China. Today, I will share with you the principle and process analysis of Metahuman.

Now digital man is a very hot topic, whether it is game, film and television, live broadcast, industry, advertising and other fields have a large demand. However, the production of traditional digital man is very expensive in terms of equipment, manpower and time. Every process and every link, such as role scanning, binding, dynamic capture and surface capture, also need a certain technical threshold to achieve. So in most cases on the market, such as the AAA games you see, only a limited number of protagonists will have full, high-precision digital human effects due to cost, and the rest of the characters will have much lower precision.

Based on this, we provide a set of solutions called MetaHuman Creator, a tool to quickly and easily create unique high-fidelity digital human. You can directly manipulate facial features, adjust skin tone, and choose from a range of preset body types, hairstyles, clothing, etc. MetaHuman can even edit the character’s teeth. After character creation, your character will be fully bound and can be animated directly in Unreal Engine or Maya.

In order to achieve the goal of zero threshold to create digital people, visible is what you get, we provide three kits based on Epic ecology to work together, quickly create digital people online through MHC, export digital people using Quixel Bridge, and apply them to games or movies in Unreal Engine.

I would like to show you how to pinch people through MHC, which relies on the network and interacts with Pixel Streaming on the web. In this video demonstration, I pinch a new digital character in 4 minutes. As you can see, the operation is very intuitive and easy to understand, and the threshold of learning is very low. In the face pinching section, face blending offers three modes

The first is the blending mode, which is the coarsest grained face pinching. Through the blending circle in the upper left, add 3~6 character areas for interpolation. For each feature point you control for face pinching, you will sample the feature points of these preset characters for interpolation.

The second mode is mobile mode, which divides the feature points according to the five senses. In this mode, the five senses can be quickly modified.

The third mode is the sculpting tool, which is the most fine-grained tool provided by Creator. Details can be adjusted according to the smallest feature points, such as the nose tip. In Creator, you should note that there are some differences between the sculpting here and the sculpting in Zbrush. Index a set of facial features in the gene bank and fuse them. Another common misconception in Creator is that moving and sculpting tools actually have nothing to do with the number of characters preset in your upper left corner. Its movement and sculpting are indexed across the entire gene pool and do not rely on preset characters in the upper left corner.

In addition, in fact, moving tools and carving tools, can be corresponding to one by one, such as moving mode of eyebrow arch control, corresponding to the carving tool, brow, brow peak, eyebrow tail, these four feature points.

In addition, skin color and texture details can also be adjusted in Creator. The texture option of skin can not only increase the details of the face surface, but also affect the shape of the face. For example, wrinkles actually have a deformation on the shape of the face.

Creator also provides relatively flexible makeup Settings, such as foundation, eye shadow and eyeliner, blush and lipstick in different parts, which can further reduce users’ need and cost for secondary modification of digital characters.

In terms of body shaping, MHC provides a preset 18 types of body shapes to choose from. The head currently does not automatically change with the height of the body or body shape selection. So sometimes when you choose your body type, it might look a little bit weird, so we have a head zoom feature.

The scaling of the head is not equal to the scaling of the head, but more consistent with human characteristics, you can see that when you scale the head now, your neck will not be infinitely thicker, so that the scaling can better handle the head neck and body connection.

We’ll look at the technology behind the MHC, first of all, we will pass the 4 d scanning a large number of realistic human face, by artificial and ML processing data, extracting the feature point information for each of the roles and placed into a data format, called DNA and the DNA is used to describe the appearance of the characters, bones, binding, finally we put these information stored in the GenePool, GenePool is one of our databases, and every time you pinch your face, you actually index it in GenePool’s database, mix it up, and ultimately generate a unique DNA data based on your digital persona.

In terms of body shape, the body shape made by the traditional process is adopted. Mainly at present we have no more perfect body database, so there is no way to shape and head can mix different feature points to pinch size, now we also do not provide of the arbitrary shape adjustment by using the zoom setting, the main reason is that the overall effect is not up to standard, such as local after adjusting the size, The seam between your body and neck is not easy to deal with. In addition, clothing is also a big problem. It is difficult to fit the fabric on the body that is partially spared.

In texture, skin texture was conducted by scanning data of texture synthesis to generate production, to ensure that achieve more realistic effect, scanning the texture of separated into low, medium and high frequency detail texture, used in the mixture of custom in the Creator, and other places to use texture, use the more traditional pipeline for production.

In addition, MHC is a cloud-based APP, and we put MHC in the cloud. Mainly because of the above data, especially the facial GenePool and texture data volume of data is very, very huge, and the future will be constantly expanded, single data volume and computing pressure I’m afraid I don’t fit into the operations on a PC, so we adopted based on the cloud, the cloud has a benefit at the same time, update the iterative soon, Many issues and improvements can be done directly in the background without requiring frequent updates from every user. In order to deliver MHC from the cloud to the browser, Pixel Streaming is adopted. Pixel Streaming is a streaming technology based on Unreal Engine. Unreal Engine application is run on the cloud server, and rendered frames and audio streams are sent to the browser and mobile device through WebRTC. Don’t know if you have to realize that using the pixel streaming means that the background is unreal engine running, in the air in the unreal engine production Numbers, export to local are being used in the unreal engine, all the parameters, the model of hair, fabric texture, texture, LOD parameter can be matched one by one, this is really what you see is what you get.

After pinching MHC, you can download the digital character you made through Bridge. Open Bridge and you can see the column of Metahuman, which will update the MH you pinched in real time. In Bridge, you need to configure the texture resolution you want to download. When you click Download, there will be an automatic step to generate textures and LODs based on the data set during kneading, such as skins, and parameters set in Download. As unreal Engine runs in the background, loDs for models and skeletons are based on rules. MetaHuman’s asset generation can take a bit of time to process, depending on what assets are selected, their texture resolution and your current network speed. It also takes into account how many assets in your current Bridge queue need to be generated. On average, if you download a MetaHuman with 1K textures, it takes about 20 minutes to generate and download.

The texture resolution selected at the time of download is not an absolute resolution, but a guide resolution, which means that if you choose to download 8K, only the most contributing Normal and cavity textures are actually 8K, while the skin Albedo is 2K. Generally, the scattering effect of SSS skin is close to the low-frequency filtering, which is similar to the texture after blur, so the high-frequency details of Albedo will be blotted out by the shading characteristics of SSS. We evaluated the effect and efficiency of Albedo’s contribution. The improvement of 8K 4K Albedo over 2K was found to be almost indistinguishable to the naked eye, so 2K resolution was used here. Roughness roughness is converted from normal to roughness via a built-in process in Unreal. Bridge export can be exported to Maya and Unreal Engine. Next, we analyze MH in Unreal Engine.

When you import it into the engine, there’s a special Metahuman folder. There are some assets that are Common to all Metahuman generated digital characters, male or female, and we store them in the Metahuman /Common folder. Each person’s unique assets are stored in folders with their respective names.

Located in the Metahuman folder, all parts of Metahuman body will be combined together in a blueprint, including torso, face and various facial hair. To balance performance and effect, and to adapt to different platforms, both torso and hair have their own LOD information.

For example, we set 8 layers of LOD in the header. We set certain rules for each layer of LOD. Each layer of LOD has its own differences, such as the number of vertices, BlendShape, number of joints, animation texture, skin influence, etc. Blendshape is not used since LOD1 due to its high performance overhead. For texture animation, LOD2 is not used at the beginning because there are so many textures to read. Of course, if you look at the heads on the screen, maybe LOD3 or LOD4 will start, and then the effect will be a little rough, but the choice of LOD must take into account the screen occupancy to measure the effect.

For example, I took these three pictures from the screen occupancy ratio of the corresponding LOD, which are respectively LOD0 at the nearest point, LOD3 that will be switched to when the whole body is seen at a middle distance, and LOD7 that will be switched to a long distance.

I at the same distance, the forced by LOD0 rendering as a comparison, the model is now on the drawing of new LOD3 and LOD7 corresponding beside the head and body, you can see the difference between the two is not particularly big, also is in the distance, the under such screen proportion, LOD switch will not affect for the effect of the picture is too big, LOD is not only for the performance of the improvement, for the picture effect will also have a certain improvement, such as the common picture distortion Moore grain, the introduction of LOD can improve the situation.

This is the LOD data of the body model. The body only has 4 loDS. Take the pants on the picture for example, LOD0 has about 15,000 faces, while LOD4 has about 1,500 faces.

For bones, the LOD of the skeleton model can greatly reduce the skinning cost, but it also needs the LOD of the skeleton, which can greatly reduce the cost of sampling, interpolation and mixing in animation Evaluation. So body bones LOD0 is 150 bones, LOD4 is 55 bones.

In fact, we have noticed that different parts may have different LOD levels. For example, just now, the head has 8 LOD levels, while the body has only 4 LOD levels. At the same time, it should be taken into account that the LOD calculated by the body and the head is not the same as the screen proportion, so the LOD calculated will be inconsistent. That can lead to some mismatches in the effect, like the body and head seams don’t match exactly, or if we’re thinking about the hair, the groom of eyebrows with a very small screen size may never be able to render to LOD0, whereas the groom that’s needed for very close use, because the surrounding box is the whole face, So it will also render to LOD0 at mid-range, which is a waste of performance. In order to solve this problem, the engine adds LODSync Component. By configuring the LODSync Component to configure the contribution and calculation methods of different parts, different parts can share a set of LODs.

Due to time constraints, it may not be possible to analyze every detail of Metahuman in the engine. I will distinguish between relatively static and dynamic assets and introduce the following major parts in combination with scalability.

The first is hair, which includes hair, eyebrows, facial fluff, eyelashes and beard.

The hair uses the engine’s Groom hair system. I won’t expand the Groom itself, but I’ll show you how the Groom handles Metahuman and what’s worth noting.

All the LODs of the Groom hair, whether inserts or mesh, use the same parent material.

The material parameters of hair correspond to MHC parameters one by one, which can be regarded as the “expanded version” of MHC. You can see, for example, that in the material, you have red pigment, you have melanin, and if you notice that when the MHC adjusts the hair color, those are the parameters and values in the MHC.

Additionally, the parameters for the Groom Hair are passed into the Hair texture via Hair Attributes.

The way to add depth attributes to hair is in the Groom asset. Cards TAB will specify the corresponding texture. For example, the depth texture will be used for pixel, depth, and offset in the groom hair. In addition, the characteristic of the texture is Attribute texture, which corresponds to Root UV and Seed. Seed is also the unique ID of hair, and is generally used to enhance the sense of hair silk in the material.

As you all know, we’re going to add multiple shading models for mobile, but mobile doesn’t support them yet, so we can only fall back to Default Lit for now.

Hair in effect has certain particularity, he basically comes from the two big functions, on the one hand, from the anisotropic properties of hair, the other is layer upon layer hair will have a certain transmission scattering, that in order to improve the mobile side effect, hair material for the mobile terminal in the separate increased the effect of anisotropy, is now a section in the screenshot, As for the transmission scattering part of hair, the mobile end only adopts first-level LOD and inserts the hair, and the rest are Mesh hair. It is difficult to show the delicate transmission scattering between the hairs on the Mesh hair. Therefore, we did not deal with this part on the mobile end after comprehensive consideration.

The shading system doesn’t fix the shading model for the whole material, but instead uses the shading model as an input parameter. Then different material attributes are input to different feature levels. In this way, a material implementation can be achieved for different platforms and render according to different shading models.

The multiple levels of hair loDs, Groom, inserts, and mesh are combined through a LOD system in the Groom file. These LODS can be managed uniformly through LOD pages.

This is the LOD data of MH hair. LOD 0-1 uses strand based hair, which is generally used for high-end platforms, such as next-generation hosts and high-end PCS. LOD 2-4 is based on insert hair, and LOD 5-7 is based on simplified mesh hair.

This is a LOD display of hair of a character in MH. We can see that there are actually several layers that are sparse due to the small number of faces.

One of the solutions is to use mask textures for overhead skin, and the engine also provides a tool for that, which means right clicking on the groom Asset, you can generate hair follicle textures. Also, for very short hair, it’s not really a good idea to create inserts or mesh for loDS. You can also use the groom asset to generate a hair texture, which we’ll talk about later when we talk about skin.

The skin and hair designs are somewhat similar: for example, there is only one material asset. The shading attributes are one-to-one with MHC. On mid and high-end devices, the skin rendering mode is Burley SSS, mobile will automatically fall back to Default Lit because there is no shading model.

In terms of skin texture, the special point lies in a set of facial texture, which is a basic pose and three facial gestures. By integrating facial texture in facial expression, it can improve the overall details.

Facial texture is mainly used to describe the shape of the skin when the expression is large and the skin wrinkles are deep. For example, the left picture is the basic posture, and the right picture is the shape when the eyebrows or forehead are raised with efforts.

As can be seen, emoji texture will use a large number of textures, so there will be some pressure on bandwidth and rendering. Emoji texture is not used on mobile terminal.

For the skin, in addition to the effect of the skin itself and makeup information, sometimes it is necessary to draw hair follicles or hair bundles. For example, the role in the demonstration has very short hair. Instead of making insert hair or Mesh hair, the texture of short hair bundles is drawn to the scalp part of the skin.

This texture on the scalp, which draws short hair and hair follicles, is the hair bundle texture generated through the aforementioned Groom asset.

The above mentioned groom hair and skin are all existing mechanisms in the engine. If you have used the virtual engine, you will already be familiar with them. Let’s focus on expressions.

In order to realize the authentic expression of digital characters, MetaHuman adopted a mixed scheme of bone deformation and blendShape material to add the expression texture. The expression itself comes from bone deformation, while BlendShape is used to supplement the subtle muscle details. The expression texture is mainly used to enhance the performance of skin wrinkles. As you may remember, metahuman facial LOD0 has more than 600 skeletons. Meanwhile, it is very complicated to ensure that the performance of different LODS is basically the same and to control these three sets of parameters. So we used the Rig Logic system from 3Lateral. Rig Logic is a very complex rule set that can drive thousands of BlendShapes and bones and facial textures from simple input. The Rig Logic operation requires some additional information of the model. For example, the feature data of models and rigging of models are provided by the DNA files unique to each model we mentioned at the beginning. The DNA files describe the appearance and skeletal binding of characters, based on which, even the bones, skins, and faces of different Metahuman faces vary. You can also use such a set of RigLogic to drive all the Metahuman expressions with a set of driving data, without having to re-create each character.

If you open the model header in the engine, you can see that there is a DNA Asset file specified in Details. This file is generated from the data of the face pinching when you pinch the face. The DNA file is a one-to-one relationship with the MH pinching. In addition, if you export MetaHuman to Maya, you will also be prompted to install RigLogic, including specifying this SET of DNA.

RigLogic driving expression adopts RigLogic expression curve, and each RigLogic expression curve corresponds to a group of bones, BlendShape and expression texture.

For example, I generate such an expression of breathing through sequence, and then based on your K’s data, I generate RigLogic expression curve, which further drives the skeleton, BlendShape and your expression texture, and this curve can be adjusted twice in the animation. You can see that when I change the curve, the model sends the change, and Rig Logic’s expression curve can also be adjusted at Runtime.

As mentioned above, facial expressions are composed of bones, BlendShape and facial texture. For basic facial expressions, subtle muscle changes are contributed by blendShape based on bones. For the quality of facial expression, a simple comparison is made here. From left to right, they are LOD0, 1,2,4 respectively. In addition to the influence caused by the decrease of the number of faces and the accuracy of texture itself, in terms of facial expression, the most left side is the expression containing all information. So the bones, the BlendShape, the texture, all of that information, the second one removes the blendShape contribution, and the third one removes the texture contribution. So you can see, by contrast, the difference between the second and the first one is not so big, only in the eyes have a little small changes, and for the third and the contrast of the second difference is actually pretty big, her crow’s feet will disappear, that is to say, for the final quality, most of the time expressions texture is greater than the contribution of the blendshape. Compared with the third picture, the fourth picture has further reduced the number of faces, with only 1400 faces. At the same time, there is no improvement in the texture of facial expressions, so the expression of facial expressions will be weaker. So making emojis in the engine, either by recording or by hand, is done using a Control Rig that manipulates the RigLogic curve of emojis. Let’s take a look at the Control Rig in Metahuman.

In Metahuman, Control Rig can be considered as divided into three binding layers. The first binding layer is puppet binding, which is the first binding layer of Metahuman for animators to create animations and expressions through Control Rig.

Both the body and face Control RIGS are located in the Common folder. When opened, you can see the complete ControlRig implementation. The body ControlRig is more conventional and drives the skeleton directly, while the face ControlRig drives the Rig Logic expression curve.

Of course, for animators, they don’t need to pay much attention to the implementation of Control Rig. What they really need is a good set of K animation tools. For animators to use K animation in the engine, it is easy to directly drag the character blueprint or character into the sequencer, because the model has Control Rig information. For the body, you can use the Editor Utility Widget preset in Metahuman to quickly select Control Rig and toggle IK FK.

For Face Control Rig, Face Control Rig is divided into three pieces, one is backwards solve for Face capture and the other is forwards solve, which is used for K animation in sequencer. Another relatively small module is the module of eye focus added to the setup Event, because facial expressions are very complex, and the Control Rig on the face is the curve that drives the Rig Logic expression. Rig Logic maps Rig Logic expression curves to BlendShapes and facial bones and animated textures based on the model’s DNA asset information.

For the face, we provide a set of preset Control Rig panels. Here is a simple operation demonstration. Each controller corresponds to a group of muscles. The muscles around the eyes also make the right changes.

In addition, we also provide a set of simple facial Control Rig, which is directly attached to the face, so that expressions can be made more intuitively.

If you want to use the animation in the game, after Sequencer has done the animation with Control Rig, it is better to generate K good data as an animation file for performance reasons. The engine provides such tools by default. The animation of the body stores the transform information of the bones, while the animation of the face stores the expression curve of Rig Logic.

Next comes the second binding layer of the Control Rig, the morph binding: The morph binding is used to drive all the rotations and structural joints, and the Post Process Animation Blueprint is specified on the model to run a set of Control RIGS for the Runtime.

Taking the head as an example, RigLogic is used to run in Runtime, where various animation interfaces are allowed to drive or run RigLogic expression curves to form the final facial expression animation.

The last binding layer is clothes, which are mainly modified by Control RIGS, but not all clothes are represented by Control RIGS. For example, Rigid Body is used for physical simulation of hoodie line and cap wrinkles at the back of neck.

Bones in the body, in order to show more authentic, relative to the people familiar with the little white people before, MetaHuman body skeleton adds some extra joints, increase of these joints is most of the leaves of the extremities joints or auxiliary joints, basic will only affect the performance, but requires additional attention is adds additional spine joints, This will result in a change in the skeleton hierarchy, so if you want to reuse the animation assets made by the white man, you will need to do a redirection operation.

For the extremity joint, we made a refinement of the finger bone and added a new toe joint.

Mentioned just now if you want to let metahuman reuse based on a small white people animated in the mall, you need to redirect the operation, redirect, you need to redesign the spine bone mapping relationship, for the other part of new bone, because there is no corresponding bone, small white people will skip this section the calculation of bone, so will not be affected.

Last but not least, we can only use Maya for DCC instead of Motion Builder and 3Dmax. This is actually because 3Lateral has been using Maya internally, only RigLogic plug-in for Maya, so if we need to make expressions in DCC, Because it relies on RIG Logic, you can only use Maya with rig Logic plug-ins. Secondly, we recommend Maya as an animation production tool, rather than a modification tool. I know that many users want to modify MetaHuman twice, but we actually do not recommend this, because as mentioned before, modifying the model may make it impossible for the DNA files of your model to match completely. The expression driven by Rig Logic relies on DNA, but if it passes through such a drive, there will be some mismatches, so it will look fake under some expressions, so our recommended process is to only use Maya as the animation tool. Of course, I also know that there are many people in the community who have tried to use RigLogic to drive their own model. It is very good to see the partial effect of real people and non-real people. Here are two examples from Chinese developers to share.

This is an attempt from the Digital Performance and Simulation Lab at Beijing Institute of Technology, which has its own library of scanned characters.

Shared by David Jiang, using his own orc character. But still want to emphasize here, in fact, we do not recommend such a process, can share with you is that we are also in the internal perception of the customer’s needs, so we are actively trying to do this thing, not far from the future will provide you with a more flexible kneading face including a second modification mechanism.

Thank you. That’s all for today.

Q & A

Q1: Why can’t Metahuman Creator squeeze Asian women with celebrity faces

Sun Danlu: The face pinching in Metahuman Creator depends on the face data scanned in the database. Currently, asians account for one quarter of the database, but due to the epidemic, the total amount of the overall database is not very large, so some face shapes or facial features are not included in the database. Therefore, it is difficult to figure out these face shapes and facial features. Based on this, in fact, in terms of application, especially in the game industry, we suggest Metahuman Creator be used for NPC pinching common faces. At present, our first priority is to expand the database. And we will collect these data sources to cover various industries and nationalities, and strive to add more types of data features to this database.

Q2: Why didn’t Metahuman Creator build a native library?

Sun Danlu: Actually this question in speech has simple mentioned just now, first the database data in the database is very, very big, when you are in a pinched face to deal with the amount of data is very big also, so considering your hard drive or a memory, CPU and gpu computing pressure, it is very big For computer configuration requirements are very high, Our expectation is to make every user can lower the threshold to use digital, the low threshold, of course, also including the hardware, the other is Metahuman is still a relatively young state, its update iteration is relatively fast, directly on the cloud can avoid each user to frequent updates, mainly is for the sake of the two above, We ended up putting Metahuman Creator in the cloud. The picture