Demo: YouTube

Arxiv: 1903.04411

Github: hzwer/LearningToPaint

We’re trying to figure out how to give machines the ability to create fascinating paintings with just a few strokes. By combining a neural network stroke renderer with model-based deep reinforcement learning, our AI can depict richly textured natural images with strokes. With hundreds of strokes, AI can create a visually pleasing drawing, and for each stroke, the position and color of the stroke can be determined directly. The training process of AI does not require human drawing experience or stroke track data.

In the case of a face (CelebA), it takes a GPU, 10 hours to train the stroke renderer, and 40 hours to train the AI, during which the AI draws millions of images to learn.

Here’s how our AI drew different types of images. Our AI eventually learned the strategy of drawing the outline first and filling in the details later.

The generalization works well, too. See the picture

It will look better if you enlarge the picture and trim the details

introduce

When I learned sketching when I was young, I thought the teachers were very good, because they could draw still life and people very much alike. Later, I saw the works of some artists on the Internet, such as Durer (German painter), the early Picasso, and I understood a little about what is called master sketch. Their paintings are very clean, the lines are very expressive, with very few strokes, very simple tones, to build a vivid image. It is difficult for ordinary painting lovers to imitate even if they spend a lot of time. It requires a deep understanding of the structure of things, the ability to control the brush and a strong grasp of the relationship between strokes.

In a computer, a picture is composed of N x N pixels, each of which is colored by three RGB values. In this sense, the easiest way for a computer to copy a painting is to fill it pixel by pixel. When people draw, they use strokes to build a picture. How do you make a computer draw like a human? This is a question that I became very interested in shortly after I got into deep learning.

From a reinforcement learning perspective, we need to design an AI and give it a canvas and a goal map. Each step of the AI draws a stroke on the canvas, and when its stroke makes the canvas more like the target, we reward it and drive it to learn. We can set an upper limit for strokes and let the AI terminate after a given number.

AI painting process

The difficulty of the task

  1. Each stroke has a large movement space. The AI has to decide the position, shape, color and transparency of a stroke, and each parameter has many choices. If we force the action to be discrete, the precision of control will be lost, and the problem of combination explosion will also be faced (the principle of multiplication of each parameter selection number). General reinforcement learning methods require AI to model the environment through a lot of trial and error, which is very difficult and time-consuming. For those interested, check out Deepmind’s SPIRAL, which uses a lot of computing power to solve this problem.
  2. If we did plug the AI into a painting software, stroke rendering would be a time-consuming operation and data acquisition would be expensive.
  3. In order to complete the natural image painting with rich texture, a large number of strokes are needed, which requires the AI to have relatively strong planning ability. The AI needs to consider how strokes are combined, how strokes are covered, and so on.

algorithm

Our benchmark algorithm is depth determination strategy gradient (DDPG) algorithm. To put it simply, DDPG uses actor-critic framework. It is a hybrid algorithm of strategy gradient algorithm and value function method, in which strategy network is called Actor, and value network is called Critic. In the task of drawing, Actor draws a stroke each time, and Critic makes an evaluation for the stroke of Actor. The goal of Actor is to get a better evaluation, while the goal of Actor is to evaluate more accurately. The advantage of DDPG is the ability to make decisions in a continuous action space, that is, we can design a k-dimensional vector, where each dimension is between 0 and 1 to control the different properties of the stroke, for exampleControl stroke starting coordinates,Control stroke terminal coordinates, etc. The DDPG can also be trained offline, meaning that the AI’s exploration of the environment can be stored in a buffer from which the training is sampled, and a piece of information can be reused.

In order to solve the problem of difficult environment exploration modeling, we pre-trained a neural network stroke renderer, which can render strokes on canvas quickly according to stroke parameters and support parallel on GPU.

Different structures of DDPG and Model-based DDPG

Neural network stroke renderer can also be connected to the reinforcement learning framework to assist AI training. We adapted DDPG to a model-based approach that significantly improved the training speed and performance of AI (see figure below).

Comparison between DDPG and Model-based DDPG

Training skills

  1. Action bundles. Actor decides multiple strokes at a time. On the one hand, the experiment proves that Actor has such ability and can explicitly let Actor learn the combination of strokes. On the other hand, reducing the number of network inferences can accelerate the convergence of Critic network. Experiments show that it is appropriate for Actor to play 5 strokes at a time. Playing too many strokes at a time will increase the demand on Actor’s ability. The following figure shows the training curve of 200 strokes of human face. 1, 2, 5 and 8 strokes are drawn at a time respectively. The ordinate is L2 distance between the canvas and the target image when AI finishes painting.

Training curves for different action beam Settings

Wasserstein generates an adversarial Loss function (WGAN Loss). We need to measure the similarity between the canvas and the target graph to give the reward function. We find that WGAN Loss is a better measure than Euclidean distance, which makes the final graph more detailed.

3. Network structure design. Actor and Critic are input scaled-down versions of ResNET-18. Batch Normalization (BN) accelerates Actor training, but has less effect on Critic. Critic uses Weight Normalization (WN with TReLu) with TReLu activation functions. The renderer uses sub-pixel instead of deconvolution to significantly eliminate the checkerboard effect. GAN’s discriminator network uses a structure similar to PatchGAN, plus WN with TReLu. Our method is also less sensitive to hyperparameters and basically uses hyperparameters similar to others’ papers. See paper for details.

The effect

We experimented with several datasets, including handwritten numbers (MNIST), Street View numbers (SVHN), celebrity faces (CelebA), and natural scene images (ImageNet), limiting the number of strokes to 5, 40, 200, and 400, respectively.

We compared the results of drawing faces with different stroke counts, from 100 to 1,000 strokes, and the more strokes, the better the recovery of detail.

The training curve of drawing face with different stroke numbers

Interestingly, we can also design strokes of different shapes to achieve very interesting results, such as restricting the AI to drawing only circles or triangles.

Use different strokes to draw faces

Training curves for different data sets

Related work

There is a similar task called stroke-based rendering, and most of the solutions are based on greedy selection for each stroke, or use a lot of searching, etc. The results are also great. See some articles by Aaron Hertzmann.

There is SPIRAL, which we compare in this article. BMVC 2018’s Doodle-SDQ uses DQN to draw stick figures, while ICLR 2019’s StrokeNet proposes a similar stroke renderer. There was also earlier work on the Sketch-RNN series. Using neural networks to model paintings is a bit like NIPS World Models 2018, and David Ha tweeted about our AI.

conclusion

Using the method of deep reinforcement learning to make an AI that looks good, just need to modify the maximum number of strokes, can be adapted to different data sets, with it in the future street performance, I hope you can like it. We’ve tried to keep it as clean as possible, but there are a lot of drawbacks, and suggestions are welcome.

See our Arxiv for more details on the algorithm, as well as additional comparative experiments and formal definition derivations.