Google brain engineer Eric Jang recently experimented with Snapchat’s gender exchange filter and wrote a post that AI Institute has compiled below.

Snapchat’s gender-swapping filter is a source of endless fun and hilarious parties, and the results are very satisfying. For someone who works with machine learning algorithms on a daily basis, this is incredibly powerful.

I was intrigued by the feature, and as a curious baby, I signed up for Snapchat this morning and tried it out for a while, trying to figure out how it worked and how I could hack it.

Note: This isn’t reverse-engineering Snapchat’s API files or studying how other apps might design similar features, it’s just some basic hypothesis testing of when it works and when it doesn’t, plus a little bit of narcissistic bathroom selfie fun, of course.

Preliminary observation

The photo in the middle is an original shot of a bathroom selfie. On the left is the effect of the “male” filter and on the right is the effect of the “female” filter.

The first thing most users will probably notice is that the filter works in real time, you can use several different angles, and you don’t need an Internet connection to work. When wearing a woolen hat, the rendering of hair is also very natural.

Below is a GIF I took while turning my head. The application appears to detect whether the face is pointing in the specified direction and will trigger the filter only if this Boolean value is met.

The gender swap filter works in a variety of light conditions, but the hair doesn’t seem to cast shadows.

Don’t you think I look cute when I’m a drag queen.

Here’s what I think is a cool example — synthetic hair can capture a key source of light.

  Keep out test

From the front, it works very well. So can we let it fail? The filter can detect if a face is in the wrong position, but what if something is blocking the face, will the blocked face also be ‘turned’?

The answer is yes. Here’s a test where I slide an object over my face. The filter works fine when only half of the face is covered, but if too much of the face is covered, the “Should I switch face” option is set to False.

Looking at vertical occlusion, the effect of the filter seems to depend on “the percentage of the area of the face covered” rather than what semantically important features (eyes, lips) are covered. You can see the white bottle in your hand blur just before the filter decides “Should I change my face?” should switch to “False.” Also, when I put the bottle in the middle of my line of sight, my hair turned golden.

This effect is interesting. In my opinion, this must be machine learning at work, extracting some data from the trained data for rendering. So the question is, will blondes continue to do more makeup tutorials?

I covered part of my face with a black charcoal mask and the resulting render seemed stable. Women’s filters do slightly eliminate the mask. It’s obvious from the GIF below that the “face swap” feature is limited to tracking the rectangular area of the head (note the sharp cutoff when the hair reaches my shoulder).

Once I covered the rest of my face with the mask, the filter stopped working. Interestingly, the exposed area of my face still seemed to be detected as a face, and the filter continued to perform the face style conversion for that area. You can see the rendering of the head and face flickering like one of Junji ITO’s horror stories.

The rendering is surprisingly stable when the mask is removed.

The hair layer

I was most impressed by the realism of hair, so I wanted to find out if there was a hair mesh model for dynamic lighting, or if it was all machine learning-based.

The hair appears to be rendered as the top layer (just like Photoshop layers), but instead of the simple puppy ear/tongue filter you normally use, this hair layer has a partially transparent alpha channel. If you look closely, the hair also has a clear partition mask that allows the face to be exposed. Snapchat may be doing head tracking to determine head position and calculate 2D alpha masks for hair.

How does it work? Here are my guesses.

At first glance, I had in mind some CycleGAN architecture that would map the distribution of male faces to female faces and vice versa. The data set is supposed to include billions of selfies that users have uploaded to Snapchat (and not deleted by Snapchat) over the past eight years.

But it does raise a lot of questions:

  • Is it true that the image converter they are training does not need to pair images? If true, this would be extremely shocking, given that CycleGAN is so problematic that it might not even be able to do it at all. So I bet they have an unpaired alignment goal, this goal is by limited real data set for specification in pairs, such as male/female siblings in pairs of images, and even some data is designed by sex change as a result, can be used as data enhancement (for example, make the chin contour more round effect can be done in the absence of machine learning).

  • The hair and face transformations appear to be composed independently of each other, as they occupy different layers (or may be composed together and split into different layers before rendering). This was also the first time I saw GANs being used to render alpha channels. I’m a little skeptical that hair is actually produced by GAN. On the one hand, there is clearly some smoothing functionality that can switch highlights and hair colors based on the location of occluded objects, suggesting that colors may be partly learned from the data. The hair, on the other hand, is so stable that I find it hard to believe it was synthesized entirely with a GAN generator. I’ve seen some examples of other East Asian men swapping faces with similar hairstyles, which suggests there may be a large library of Haridos templates (modified with some machine learning models).

  • How do ML engineers at Snap know if CycleGAN converges after training on such a large data set?

  • With such limited computing resources, how did they manage to run this level of neural networks? What is the resolution of the images they dynamically generate?

  • If it is indeed a CycleGAN, then applying the male filter to my female filter image should revert to the original image, right?

As you can see in the GIF above, the scale of the photo is pretty much the same, but when we zoom in really close, the face does look more like mine. My guess is that there is a pre-processing step to crop and resize a standard face image before feeding it into the neural network.

There may be other subroutines in this filter, such as jaw sizing, that do not use CycleGAN, but its addition will make the M2F and F2M filters not completely opposite.

  Technology of daydream

I had a friend who did that, and he had to do a lot of work before he got into drag. I’m really excited about technology like this because it will make it easier for makeup artists, cosplayers and drag artists to experiment with new ideas and identities in a cheaper and faster way.

Technologies such as facial and voice changes make the gap between public Internet characters and the real people behind them even wider. But that’s not necessarily a bad thing: If you’re a guy who loves being a cute anime girl online, which identity should we judge you on? As our daily social media normalizes gender distortion, will gender fluidity and cross-dressing culture become more normalized in society?

The future is very exciting.

via https://blog.evjang.com/2019/05/fun-with-snapchats-gender-swapping.html

Pita

Compiled by AI Institute, reprint is prohibited without permission.

Today is 32 days before the opening of CVPR 2019

Scan code to participate in the CVPR top Club sponsorship program

AI Institute will send you to the scene!