GMTC Beijing 2021 share | AI application in the design draft code generation platform imgcook scenario and ground practice

Taobao tmall has many business products, and there are many promotion activities every year. The different mentality of product pages leads to diversified UI vision and different business nature leads to different business logic. A large amount of UI is difficult to reuse, requiring a large amount of front-end manpower to develop, and the business pressure of front-end students is very great. To address this dilemma, we developed imgCook, an automated code-generating platform, to help improve front-end development efficiency.

In IMGCook, we gradually introduced CV, NLP and other AI technologies to assist in identifying design draft information to intelligently generate codes with high readability and maintainability, and automatically generated 79.34% front-end codes in the Double 11 promotion, helping to improve the front-end r&d efficiency by 68%. Imgcook.com, an intelligent code generation platform, opened in 2019, serving more than 25,000 users.

Imgcook this sharing will introduce the core ideas of the design draft generation code platform imgCook and the application scenarios and practice of AI technology in IMGCook.

The content is mainly about which design code generation problems can be solved with machine learning technology, and how to solve these problems with machine learning technology. Expect to stand in front of the perspective, explain the whole practice process of using machine learning to solve code generation problems.

Introduction to imgCook, the design draft generating code platform

Imgcook is an intelligent design code generation platform that generates maintainable front-end code from Sketch, PSD, Figma, images, etc., such as React, Vue, apts, etc.

Imgcook has gone through 3 phases so far. In January 2019, we officially launched ImgCook version 1.0, and after more than a year of development, phase 2.0, Gradually introduced computer vision, machine learning technology to help us solve some problems encountered in the design draft generation code, product link has been basically perfect. By 3.0, machine learning has been implemented in multiple scenarios in IMGCook.

Imgcook 3.0 was used to generate code for 90.4% of the modules that were added to last year’s Singles Day event page, with a 79% availability rate and a 68% increase in front-end coding efficiency.

So how does imgCook generate code?

We can extract the JSON Schema that describes the Sketch file from the Sketch file. If you have the Sketch file in your hand, you can unzip the Sketch file to see the JSON Schema.

Once we have this JSON Schema, we can use design protocol, program algorithms, and model algorithms to calculate, analyze, and identify this JSON Schema and convert it into a D2C JSON Schema with reasonable nested layout structure and code semantics.

If you’ve used imgCook, you can see the D2C Schema in the imgCook editor, and then we can convert the D2C Schema into different types of front-end code through various DSL conversion functions, such as React, Vue, and so on

(Principles of IMGCook 3.0)

This is imgcook generated code of a general process, the algorithm is refers to the rules of calculation by limited to automatically identify the design draft, for example, we can according to the coordinates of each layer, higher information to calculate whether it is a wide circulation structure, can root in font size and word count to determine the text is a heading or message.

But this kind of judgment rules is limited, so don’t go to solve the problem of hundreds of millions of design draft, and intelligent recognition by machine learning model can solve the problem of unable to cover part of the program algorithm, for application algorithm and model are still can’t solve the problem, we also provide the artificial tag design sketch layer to solve.

Application scenario of machine learning technology in D2C field

So what problems can be solved by using machine learning models to identify designs? Before we look at what code generation problems machine learning can solve, let’s look at what machine learning can do and what types of problems it can solve.

The types of problems machine learning solves

The topic we share today is the application practice of AI in IMGCook. AI is short for artificial intelligence. Artificial intelligence can provide adaptability and reasoning ability for machines, and let machines think and work instead of people, which can help us reduce labor costs.

Machine learning is a technology to realize artificial intelligence. It can input a large number of samples to the machine learning algorithm. After learning the characteristics of these samples, the algorithm can predict the characteristics of similar samples.

The process of machine learning is very similar to that of human learning. Knowledge and experience can be summarized through learning, and decisions or actions can be made based on existing experience when similar tasks are performed.

One difference is that the human brain only needs very little information to summarize highly applicable knowledge or experience. For example, we can correctly distinguish between cats and dogs as long as we have seen a few cats or dogs, but for a machine, it needs a lot of learning materials.

Deep learning is a branch of machine learning. The main difference between deep learning and traditional machine learning algorithms lies in the way of feature processing. I won’t go into details here. Imgcook uses mostly deep learning algorithms, but also traditional machine learning algorithms, which we will call machine learning algorithms in the future.

With these concepts in mind, let’s look at what types of problems machine learning can solve. Here are four basic tasks for image processing in computer vision and four basic tasks for text processing in NLP.

For image recognition, given an image, if just want to identify what is the object that is an image classification task, if it is to identify objects in the image, but also know the location of the object, is a target detection task, the recognition of image more granular segmentation and semantic segmentation, entities.

Text recognition is the same, such as zhihu problem here, if you want to don’t of this text is a problem the title or description, this is a problem in text categorization, while others, such as from a block of text identify what is the name which is the sequence of names question-and-answer this sentence tagging task, intelligent customer service relations judgment tasks, There are also text-generating tasks like machine translation.

Imgcook identifies UI information on a page, which is generally defined as image classification, object detection, and text classification tasks.

Imgcook UI semantic recognition

There are many different dimensions of UI information in the design draft, such as this design draft has these ICONS, we can train an icon recognition model to identify which ICONS in the page, such as the page return icon here, three… Represents more ICONS

There are also images, which can be used to capture the semantics of images, text recognition, component recognition, module recognition, and so on, which can be recognised by training a machine learning model

So what can be done with these results? So if you think about it, what do we write when we write code by hand?

Application scenarios of model recognition results

These model identification results can be used to give the name of the class name of an easy to understand, or automatically generated field binding code, if the recognition to a button component, can generate a barrier-free attribute code, or automatically when the generated code into external components, if identified is a module and modular code can be generated, and so on.

From the front-end perspective, machine learning practices in D2C field are implemented

How do you use a machine learning model to identify this UI information and apply it to the code generated link?

General steps for problem solving using machine learning techniques

The general process for solving problems with machine learning goes like this.

To determine what we want to solve business problems, and then want to make sure that this problem is not suitable for using machine learning to solve, not all of the questions are suitable for use machine learning to solve, such as if by program algorithm can solve the problem of more than 90%, and the accuracy is very high, so no need to use machine learning.

If confirmed the use of machine learning, we need to make sure the problem is to belong to what type of machine learning tasks, such as an image classification task or target detection task, after confirmed the task type, and then start to prepare samples, because different task types, model training data set requirements are not the same.

Training data ready, have to figure out which kind of machine learning algorithms, training data had, have been identified, the algorithm can be input to the algorithm a great deal of training data to study sample characteristics, namely training model, get the model, and service deployment model, finally in the engineering project information service invocation model to identify the UI.

That’s what I’m going to focus on today, the practice of using machine learning to solve business problems.

I’m not going to talk about how to write machine learning algorithms, nor will I go into detail about the principle of machine learning algorithms, and how to tune the parameters of training models. I am a front-end engineer, and I am concerned about how to use machine learning technology to solve my problems. What I want to share with you is the practice process of using machine learning technology to solve code generation problems, as well as some experience in the practice process.

Here, the problem of introducing external components in the process of code generation is taken as an example to introduce the implementation process of machine learning practice in detail.

Problem definition for component identification

For example, if you want to use Ant Design to implement a page with an input field and a Button button, you need to introduce the AND Design component library and use the INPUT and Button components in Antd.

Now, if we want to generate automatically, all we can extract from the design draft are rectangular boxes, text, pictures and other information. We do not know which components are components, so we need to identify the components in the page and replace them with Antd components introduced.

So there are two things to do, component identification and code representation.

We now have to identify which components are in this page, and if the input to the machine learning model is this page, the model needs to identify which components are in that page, where are the components located, what kind of components are in that page, and that’s an object detection task in machine learning.

However, we can get the D2C JSON description of this page. The location information of each node is known, such as the root node of this button. In the D2C JSON description, there is such a RECT attribute that describes the location size. I just need to know if the node is a component and what type of component it is.

So this is an image classification task in machine learning, we input the model with screenshots of these nodes, and then tell me what type of component it is after model recognition.

Sample preparation

After determining that it is an image classification task, we can prepare the sample of model training. The preparation of sample data is the most time-consuming and laborious thing in the whole stage, that is, a large number of samples need to be collected and their types need to be marked.

Because when we let the machine learning algorithm learning sample characteristics, in addition to a picture of a sample, this algorithm also need to tell the algorithm, what is this picture, this algorithm can learn, oh, this type of sample has these characteristics, the next time have similar characteristics of samples, model can identify what this sample is similar to that of type.

Before collecting samples, it is necessary to determine which categories of components the model needs to recognize. If the samples you give the algorithm to learn are only button, Searchbar, Stepper, Input, Switch and Tabbar, the trained model can only recognize these types of components. If you give the model a screenshot of a progress bar component and ask it to recognize it, the model will not recognize it as a progress bar, it will only tell the probability that the progress bar falls into one of these categories.

Here we identify the six types that the model needs to recognize, and then we can start collecting samples, each containing an image of the component and the category of the component image.

Collect component images from the page

The most direct way to collect images of these basic components is to take screenshots of the components from online pages or design drafts. For example, these pages are collected, and then some tools or scripts are used to save screenshots of the components in these pages.

The screenshots of these components also need to be typed and stored in this file format. Component images of the same type are placed in the same folder, and the name of the folder is the type of the component. Then compress it into a ZIP file, and this is the data set we need for the component image classification.

I extracted the pages from more than 20,000 internal design drafts of Ali, and then extracted the components manually from the pages by selecting the boxes. At that time, I recorded that it took 11 hours to extract the screenshots of two categories of components from more than 14,000 pages, which was only the extraction of two categories of components. There is also the time to extract page images, filter and remove weight from more than 20,000 Sketch files. The whole sample collection process is very difficult.

In addition, there are many buttons in the page, but there are few components of other types, such as switch and stepper, which will lead to unbalanced samples. We need a similar number of samples for each category we input to the algorithm to learn.

Sample labeling time: 14000 pages * 2 categories == 11 hours Sample imbalance: 25647 pages == 7757 buttons + 1177 Switch +…

Puppeteer is used to automatically generate sample components

We can solve this problem with a skill familiar to our front-end, which is to use the headless browser puppeteer to generate it automatically.

We can write a page that renders different types and styles of components every time we refresh it. The components are randomly generated, and the component styles are implemented with random generation of padding, margin, text content, text size, etc. The only sure thing is to add a type tag to each component. For example, if it is a Stepper component, Add a class to the button node that is the name of the Elemental-Stepper class.

In this way, we use the headless browser puppeteer to access this page automatically. Every time we visit this page, we get all the component node types according to the class name we set and save the screenshots. And then finally we have the data set that we need.

Some of you might say, well, let me just do this automatically, and I can generate as many as I want.

No, this way can save a lot of energy, but it is still based on limited rules after all, the sample features are not divergent enough, and there are certain differences with the real sample visual features.

If most of the samples input to the algorithm for learning are made images with certain differences from the real samples, then the features learned by the model are artificial samples, and the accuracy of identifying the real samples with the model will be poor.

So generally, the auto-generated sample is only used as a supplement to the real sample.

Algorithm selection

Now that the data set is ready, the next step is to determine which machine learning algorithm to choose to learn the features of these samples.

Since this is an image classification task, we can collect the development history of image classification algorithms to see what are the classic algorithms.

(Classical image classification algorithm)

Then select several algorithms that are widely used in the industry from these algorithms to make a comparison in terms of model accuracy and prediction time. For component recognition, the complexity of component image is relatively high, and the accuracy of recognition is also relatively high. Resnet50 is the last choice in component recognition.

For example, the icon recognition model, because the icon is relatively simple, the recognition results are basically used to name the classname, so the icon classification algorithm is MobileNetV2.

(Comparison of model effects of several image classification algorithms)

Model training

Now that we have identified the algorithm to use, the next step is to train the model.

Training model of workflow in general, we need to download the data sets, then you may need to do data collection format conversion, and then do data set processing, such as data sets will be prepared according to certain proportion into training set and test set, and then the next step into the algorithm defined model, and then start with the training set training model, model training is completed, Use test sets to evaluate model accuracy. This completes the model training process and finally deploys the training output model to the remote machine.

If you are a machine learning engineer who is familiar with machine learning and Python, you will be able to train the model by writing the script of the workflow.

For the front end students, we can use the front end algorithm engineering framework PipCook to train the model. We can understand PipCook in this way. Nodejs allows the front end engineers to do the service side, and PipCook allows the front end engineers to do machine learning tasks.

PipCook encapsulates and integrates these processes, which are represented by a JSON file. Each stage in the training workflow is defined by a plug-in, supporting the configuration of parameters for each stage. We simply replace the input value of the url parameter of the data collection plug-in with our own data set link.

Then execute pipcook run pipline.json to start training the model.

After the training process, a model file will be generated, and PipCook will package the file into an NPM package. We can directly write a JS script locally to test it. The predicted result is the probability of this test sample belonging to each category. We can set a threshold value, when the probability is greater than this threshold value, the result of identification is considered reliable.

Model service deployment

If you want to apply model recognition services in an engineering project, you also need to deploy the model to a remote machine.

There are many modes of deployment. For example, we can deploy the model on Ali Cloud EAS. Ali Cloud console selects the model online service and uploads the model file for deployment.

Model application

That is, once the model is deployed, you can get an accessible RESTful API like this. We can now call this API in the D2C project link to identify components, which is the final model application phase.

The application link of the model is different in different application scenarios. When D2C component identifies this scenario, we generate D2C Schema with reasonable nested layout structure through layout algorithm, and then cut out internal screenshots of these pages according to the location of div container nodes. Then, the component identification model service is called to identify, and the identification result is updated to the SMART field of the corresponding node in the D2C Schema.

Is that the end of the model? Not yet, this is just the component identification process, as we said before, there are two things to do, one is to identify the component, one is to express the code. We will also express the results of the identification into code.

Imgcook supports user-defined DSL conversion functions that convert D2C Schemas into different types of front-end code. The general logic is to recursively traverse nodes in D2C Schema, determine node types, and convert them into corresponding labels.

If the result of component recognition is also required to be converted into an imported external component during the transformation process, the user-typed components are also required to be fed to the DSL function and, depending on the type of component and the result of model recognition, the user-typed external components are imported into the code generation.

Review of practice process

This concludes the practical process of using machine learning to solve the problem of introducing external components.

As we recall, we chose to train a machine learning image classification model to identify components in a page in order to solve the problem of generating code automatically for scenarios that introduce external components.

First, we need to prepare samples. We can collect real pages or design drafts, extract components from them and annotate samples to generate data sets, or use Pupeteer to generate data sets automatically. After preparing data sets, Resnet algorithm is selected according to the balance of model recognition accuracy and recognition speed.

Then, we use the front-end machine learning algorithm framework PipCook to train the model. PipCook encapsulates a whole Pipline for us to load the data set, process the data, define the model, train the model and evaluate the model. As long as we execute a PipCook run, we can start training.

After the training is completed, a model file will be obtained, which can be deployed to Ali Cloud EAS to obtain an accessible model recognition service Restful API, and then call this API in the code generation link to identify components in the UI interface, and the component recognition results are finally used to generate codes.

Here are several step by step practice cases. The first one is to use pipcOOK training and execute it in COLab without considering environmental issues. The second is training on your computer with pipcook, which requires an environment to be installed on your computer. The third is a machine learning task implemented in Python. You can also see how to implement the process of data set loading, model definition, and model training in Python.

Practice 1: Use front-end components from Pipcook classification images in COLab
Practice 2: How can the front end quickly train a form recognition model?
Practice 3: Start now: Train and deploy a front-end component picture classification model with your Mac

Each of these practical tasks provides code and data sets. If you are interested, you can use this case to experience the machine learning practice introduced today.

D2C Double 11 zero research and development engineering practice

We have just introduced the whole process of machine learning practice and application, taking component identification as an example. In addition to component recognition, text recognition, icon recognition, etc., the basic practice is this step.

These model recognition results can be applied not only to the generation of UI code, but also to the generation of business logic code. In the Double 11 page, there are some modules whose business logic code is not very complex, and the generation code availability rate can reach more than 90%, basically achieving zero r&d investment.

For example, such a module in the double 11 activity page, if we manually handwritten code, what business logic will be involved.

For example, if we have a loop here, we just need to implement a UI for an item and use a map loop. For example, we need to bind dynamic data to the item title, price, item map, etc., and click on the item to jump, and send buried logs when the user clicks. Beyond that, there seems to be no other business logic.

That we are the business logic can be generated automatically according to the result of model recognition, here is a 1 row 2 of circulation, can identify that this title is a commodity, figure can be identified it as a commodity, the results of these recognition we bound can be used in the fields, the page jumps and other business logic code generation.

As mentioned above, the identified result is hung on the smart field of each node. For example, the text node is identified as an itemTitle itemTitle. If the identified result is used for field binding, We can replace the contents of the text node with the expression of the field binding,

So where do I do this substitution? Imgcook lets users log in their own business logic, and each business logic has a recognizer and an expression.

Each node in the D2C Schema passes through these logical identifiers to see if there are business logic points, and if so, according to the logical expression in the corresponding expression.

For example, if the itemTitle field is bound to the business logic, the identifier is set to determine whether the fieldbind field on the smart field of the text node is itemTitle. If so, the logic in the expression is followed to automatically bind this node to the itemTitle field.

In this way, we can update the D2C Schema according to the user-defined business logic, and finally convert the D2C Schema with the business logic into code using DSL transformation functions.

This is where the results of model recognition are applied to the business logic code generation process, and other business logic generation processes take the model recognition results from the business logic library and apply them to the code generation process.

future

In the future we will use intelligent technology continue to improve quality and quantity of the generated code, on the other hand, in the state of the UI and micro effect more complex UI etc. There will be some new exploration on code generation problems, such as the trial published report button here, it has a variety of state, according to the service side the data returned is different, showing different UI, So if you click on the enter button on this page, you have this fretting effect.

In addition, the design draft D2C code generation is the nature of the product (high fidelity design draft) “product design” of intelligent understanding and high-fidelity design draft is through product design cycle, from product manager based on the theory of some methods and determine the business goals and page structure, to visual designers combine product characteristics, functional understanding and visual design specification of output.

We expect to build a set of such D2C underlying cognitive theory system, the product design and visual design method into the practice of D2C, method based on product design and visual design principles of intelligent understanding of product design, can help our understanding of the design draft is more comprehensive, helps to produce more higher quality code.

(D2C practical theory System)

conclusion

Imgcook is an open platform for generating design proposals. If you have any questions, please join the imgCook community.

In addition, if all of this share something, also can think about it, with which problems in the field of machine learning to solve the front-end, such as intelligent design, code, recommend, smart UI, UI test automation, and so on, if you are interested, you can also use today introduced the practice process, to write a Demo project experience.

In the future, I also hope that more and more students can participate in the construction of front-end intelligence, and solve more front-end problems with intelligent technology.