The author new retail product | | rice, alibaba is a tech

As one of the four technical direction of Alibaba Economy Front End Committee, the front end intelligent project has experienced the stage test of 2019 Double 11, and has given a good result. 79.34% of the online code of the new module of Tmall Taobao Double 11 venue is automatically generated by the front end intelligent project. During this period, the R&D team experienced a lot of difficulties and thinking. This series of “How is the Front-end Code Intelligently Generated” will share with everyone the dribs and DRBS of technology and thinking in the front-end intelligent project.

An overview of the

In the front-end code of wireless promotion page, there are a large number of business modules or business components (hereinafter referred to as business modules), which are code units with certain business functions. After obtaining the information of the business modules in the page, it can be used to reuse the code, bind the business fields and other subsequent functions. Therefore, identifying business modules from visual documents has become a versatile functional link in the field of front-end intelligence.

Different from the basic component recognition and form recognition for the background, the business module recognition is mainly for the wireless end page, and the source is mainly visual draft. UI is much more complicated structure, relative, business module and content have been provided by the visual draft more discernible information (such as text, image size, etc.), so we did not use images directly deep learning scheme, but from the visual draft output characteristic values of extracting the predefined DSL with traditional learning classification methods to realize the recognition module. This recognition function finally returns the category of the business module, the location of the visual draft and other information.

The overall functions are shown in the following figure. Include:

1. Sample construction: enhance the UI layer of the visual draft according to user configuration and user-defined data enhancement rules, so as to obtain visual diversified samples. Then, on the basis of defining business fields, the eigenvalues are extracted and stored. 2, algorithm selection, currently provided are the traditional machine learning method of multi-classification algorithm.

3. Model realization: Realize model building and related algorithm engineering based on group machine learning platform, and achieve automatic training and deployment.

4. Interface is provided. The model provides prediction identification service and result feedback service externally.

The overall function is shown in the following figure:

In hierarchical

As shown in the figure below, our business module identification service is located at the material identification layer, providing further identification capability of business customization for the DSL of visual draft export, and infiltrating into fields binding, business logic and other functions in the subsequent code generation process. D2C functions are layered as shown in the following figure:

Sample structure

Machine learning is a training process based on large amounts of real data, and a good sample base can make your model training more effective with less effort. Our sample source is Sketch, but there are only a few visual sketches for the same module, so the number of available samples is too small. So we have to solve the quantity problem first.

Data to enhance

In order to solve the problem of sample size, we adopt the method of data enhancement. Data enhancement has a default set of rules and is configurable. Users can adjust attributes according to the possible changes of each element in the visual draft in real scenes, such as “whether it can be hidden”, “variable range of text words” and other dimensions, and produce customized configuration items. So the sample maker can clearly know what the difference is in the focus of his sample.

We diverge and combine properties from these configuration items to generate a large number of different visual DSLS. These DSLS differ randomly and regularly from one another, allowing us to obtain large numbers of samples.

The interface of the enhanced configuration is shown in the following figure. The DSL tree and rendering area are on the left and middle, and the enhanced configuration area is on the right. The configuration item consists of the following two parts:

1. Enhanced properties: size, position, hide, foreground background color, content

2, enhancement: continuous range, specify enumeration values

The interface of sample generation is shown below:

Number feature extraction

Once you have a large number of enhanced visual DSLS, how do you generate samples? Firstly, it is clear that the sample format we need should be tabular data to match the input format of traditional machine learning methods: a sample data is a feature vector. So we need to do feature extraction for DSL.

Based on previous model training experience, we find that some visual information is particularly important for module category judgment. Therefore, UI information is abstracted, customized and extracted into feature dimensions, such as width, height, layout direction, number of images contained, number of text contained, etc. Through the abstraction of various visual information, we get 40 multidimensional visual features.

In addition to the visual feature dimension, we also added custom business features. That is, according to certain “business rules”, certain element blocks are defined as elements with business meanings, such as “price” and “popularity”, and business characteristics of 10 dimensions are abstracted. In this process, user-defined service rules can be implemented by means of regular matching.

Visual abstract features plus business features constitute a feature vector. The eigenvector plus the classification label, which is a sample.

Algorithms and Models

First of all, our input is the standardized DSL extracted from Sketch design draft. The goal is to identify which business module this DSL is, which can be boiled down to a multi-classification problem. Along this line of thought, we extracted eigenvalues from a large number of enhanced DSLS and generated data sets for training. The multi-classification model we use is based on the various components provided by the algorithm platform.

Random forests

Model structures,

At first, we chose the random forest model as the multi-classification model, because the execution speed of random forest is fast, the automatic process is smooth, and almost no additional operations are needed to meet the requirements of our algorithm engineering. In addition, it has low requirements for eigenvalue processing and will process continuous and discrete variables by itself, as shown in the following table.

The rules for automatic resolution of random forest variable types are shown in the following figure:

Therefore, a very simple model can be built quickly, as shown in the figure below.

The random forest model used online is shown in the following figure:

The process parameter

We found that the random forest was occasionally unsure of the data in the sample base, that is, the confidence of positive true was low and stuck by the confidence threshold. Especially for samples with very similar vision, the two similar modules as shown in the figure will bring errors to our classification results.

Similar modules are shown below:

In order to optimize such “unconfidence” problem, random forest parameters were adjusted, including random sample number of a single tree, maximum depth of a single tree, ID3/Cart/C4.5 tree species ratio and other parameters, and feature selection components were also pre-inserted, but the results were not ideal. Finally, after the importance assessment of eigenvalues, the link was manually fed back to feature selection and retraining, and a good result was achieved, as shown in the figure below. But this process couldn’t be integrated into the automated training process and was eventually abandoned.

The random forest model used in the parameter tuning process is shown in the following figure:

Discrete characteristic problem

Although random forest can deal with discrete variables automatically, the algorithm cannot deal with such situations if discrete values outside the training set appear in the test set. To solve this problem, it is necessary to ensure that all values of each discrete feature appear in the training set. Because there are many discrete features, it can not be solved by simple stratified sampling. This is also one of the pain points in the application of random forest model.

To sum up, random forest is the work we have done on the random forest model. Random forest is easy to use and quick to produce results. In most business scenarios, it can meet the identification requirements and become the 1.0 version algorithm of module identification function. However, due to its algorithm defects, we later introduced another model XGBoost.

More than XGBoost classification

Model structures,

XGBoost boosts the “accuracy” of trees through Boosting, which is better than random forest algorithm in our data set. However, the XGBoost model of the algorithm platform has many non-standard processes, so in order to realize the automatic link, we set up the model as shown in the figure.

The XGBoost model is shown below:

pretreatment

The XGBoost model requires more preprocessing methods to implement, including:

1, Label Encoding: preprocessing process. XGBoost only supports label values starting from 0 to (category number -1). However, for the convenience of mapping, the label value we store corresponds to the classification ID of the platform, not 0 ~ N, or even not a continuous integer. Therefore, the Label Encoding component needs to be encoded to a value that meets XGBoost requirements.

2, store Label mapping table: data transfer, because it is predicted that the interface will use this mapping table to escape platform classification, so it needs to save extra.

3. Data restructuring: In the pre-processing process, in order to prevent the random splitting algorithm from splitting the label of the training set into incomplete data sets, the missing data of the label of the training set is retrieved. There is some interference with the model, but it only works in extreme cases with very little data.

XGBoost’s confident performance in the test data reduced the difficulty of threshold partitioning, the predicted results well met our business needs of “identifying the right component”, and also supported automated processes, so it became the traditional training model that we promoted later.

Difficult problem: Out Of Distributio

It is worth mentioning that we cannot comprehensively collect all visual samples outside the current module library, which is just like collecting facial photos of 7 billion people in order to build an internal facial recognition system of Alibaba. The absence of data from outside the sample base means that we are actually missing a hidden category — negative sample classification. It also leads to out-of-distribution problem, that is, inaccurate prediction caused by data outside the sample base. In essence, there are too many false positives in classification results.

In our scenario, this is a difficult problem to solve due to the difficulty of collecting all negative samples. How are we dealing with this now?

Threshold setting

We take the confidence proB output from the classification model as the reference basis to determine the classification results, and if it is higher than a certain threshold, it is considered to be matched to a certain classification. This method has empirical significance and effectively shields most OOD errors in practice.

Logic control

For part of OOD misjudgment of algorithm model, we can identify it by logical relation. If we believe that there cannot be more than one same component on the same path of DSL tree (otherwise, it will form self-nesting), if there are more than one same component on this path, then we select the recognition result by the confidence level. This kind of logic helps us sift through most of the misjudgments.

Negative sample entry

We provide feedback services that allow users to upload misidentified DSLS, which are then enhanced into a certain number of negative samples and stored. Retraining on this basis can solve the OOD problem.

At present, OOD problems still rely on logic and feedback methods to avoid, and the algorithm level has not solved the problem, which is what we plan to do in the next stage.

Deployment model

The algorithm platform supports deployment of models as online interfaces, or predictive services, that can be invoked with one click through the IMGCook platform. In order to automate the training and deployment process, we also did a series of algorithmic engineering work, which will not be detailed here.

Prediction and feedback

Prediction service, the input is DSL (JSON) extracted from the design draft, and the output is the business module information, including ID, location on the design draft, etc.

Before calling the predictive interface of the algorithm platform, we added logical filters, including:

1. Size filtering: If the size deviation of the module is large, it will not enter the prediction logic and directly consider the mismatch

2. Hierarchical filtering: For leaf nodes (i.e. pure text and images), we do not consider this node to have business meaning, so we do not filter it.

The result feedback link includes automatic result detection and user manual feedback. At present, it only provides the sample upload function for predicting error results.

Our business module identification feature was finally used online for the first time in 99 Promotion. The above mentioned model, pre-logic, OOD avoidance and other links ultimately bring the effect that the identification accuracy of business scenarios can reach 100% (the actual accuracy of pure models is not counted).

The future work

Algorithm to optimize

Difficult problem solving

As mentioned above, OOD is a difficult problem, which has not been well solved at present. We have some solutions to this problem and plan to try them in the follow-up work.

Loss Function optimization based on DNN: The DNN network is still built based on manual UI feature values. Through the optimization of Loss Function, the distance between different categories is expanded and the distance within the same category is compressed, and the distance threshold is set on the optimized model to identify OOD data.

Optimization of automatic negative sample generation: Based on XGBoost algorithm, a pre-binary classification model is added to distinguish in-set and out-set data, and the random range of negative sample generation is optimized accordingly. Specific plans need to be investigated.

Deep learning

Although the manual feature extraction method is fast and effective, it cannot compare with the deep learning methods like CNN in generalization ability. Therefore, we will try to use image-based algorithms to extract UI feature vectors using CNN model, and then compare the similarity between input data and each UI component through vector distance calculation or binary classification model.

More can be done in the field of deep learning, not limited to the above algorithm.

Sample platform

At present, our sample generation function has some problems such as low configuration efficiency and few supported algorithm types. Therefore, in the follow-up work, we plan to carry out a richer product design for sample generation. The functions of the sample platform are roughly shown in the figure below.

Product functions of sample platform are shown below:

** Currently our sample generation link is from Sketch to ODPS table data, and in future business scenarios we hope to support sample generation from HTML and front-end code. Regardless of the source, there will be a lot of common ground in the data enhancement layer, where we will abstract out common enhancement algorithm services and open calls.

** algorithm extension: ** finally generated samples, can be eigenvalue table data, used for multi-classification; Images and annotation data in PASCAL, COCO and other formats can also be provided for the target detection model.

** Enhanced intelligence: ** At present, users feel that the configuration is complicated and difficult to use when using the function of sample generation, and even samples are often unavailable due to misoperation. Therefore, we expect to be able to minimize user manipulation and generate effective samples quickly through enhanced “intelligence” of the data.

To sum up, algorithm optimization and sample platform transition are the core work of our next phase.