Hello, everyone. Today, I will talk to you about the daily work of an algorithm engineer — the model overturned.

We all know that algorithm engineers focus on model training, and many of them spend their days doing features, tuning parameters and training models. So for algorithm engineers, the most common problem is that the model turns over, and the result of the model that has been trained is very poor. A lot of entry-level white encounter this situation will be at a loss, do not know what is the problem.

So today, I would like to share with you a simple, personal summary of a point of simple experience, encounter this situation, how we should deal with.

Check the sample

The investigation process of bad training results of the whole model can follow a sequence from large to small and from shallow to deep. That is to say, we first from the overall, macro investigation, and then to check some details of the content.

A lot of white people may be a little bit rash, up to check the details of the features, and ignore the overall inspection. As a result, it took a lot of time to find out that the proportion of samples was wrong or the number of samples was wrong, which was easy to find. Not only is it a waste of time, it makes your boss and others look bad.

So let’s start with the whole thing, check the ratio of positive and negative samples, check the number of training samples. Compared with the usual experiment, there is no change. This kind of examination is often relatively simple, and may give a result in a few minutes. If you find a problem, it’s better. If you don’t find a problem, it’s better. At least you can eliminate some of the cause.

After checking the proportion and number of samples, we can then check the distribution of features to see if there are any problems with new features. There are a lot of things that can go wrong here, like if most of the features are empty, there are two things. One is that there is a problem with the code that does the features, and there may be a bug. Another is that the feature itself is sparse, with only a small number of samples having value. According to my experience, if the features are too sparse, the effect is also very poor, and may even have the opposite effect, it is better not to add.

Another possible problem is the uneven distribution of the range of features. For example, 80% of features are less than 10, and the remaining 20% can reach up to 100W. This extremely unbalanced distribution of features will also pull the crotch model effect, a better way to segment it, do component bucket features. In general, problems with features can be easily investigated by looking at distributions.

View the training curve

Many novices judge models based on final results, such as AUC or accuracy, and ignore how the model changes over the course of training. This is actually not a very good habit, can lose a lot of information, can ignore a lot of situations.

A good recommendation is to get used to using Tensorboard to view the process of model training, which is basically available in the mainstream deep learning framework. Through it, we can see the changes of some key indicators in the training process, and its main function is to help us find the situation of over-fitting or under-fitting.


One of the things we often see is that in the original feature set, the model doesn’t have any problems, but as soon as we add some new features, the effect starts to tug. We checked the logs and found that AUC or other indicators were still rising during the period before the end of the model’s fast training, so we mistakenly thought there was no problem. In fact, it is quite possible that the model fell into an over-fit in the middle of the process, but was ignored due to the long training time.

Many people often suffer this loss, especially beginners. We wasted a lot of time and did not find the problem. In fact, when we opened the Tensorboard, we could see that the model was over-fitted or under-fitted in the middle of the process, so we could take some targeted measures to remedy the problem.

The parameter checking

In addition to the above two, there is another check point is the parameter.

The parameters here are not limited to the training parameters of the model, such as learning rate, iteration times, batCH_size and so on. It also contains some parameters of the model itself, such as the variance of embedding initialization, the size of the embedding, and so on.

For a simple example, many people implement the default initialization of embedding with a variance of 1. This variance is actually a little too large for many scenarios, especially for neural networks with large depth, which are prone to gradient explosion. In many cases, the effect will be improved after we adjust it to 0.001.

Although the structure of the model is the main body and the parameters are only auxiliary, this does not mean that the parameters do not affect the effect of the model. On the contrary, sometimes the impact is too great to be ignored. Of course, to achieve this, we not only need to know the corresponding meaning of each parameter, but also need to understand the structure of the model and the operation principle of the model, so as to predict the effect and significance of the parameter. Otherwise, just rote apparently is not.

Scenario thinking

The three points mentioned above are still relatively obvious, but let’s talk about a bit more hidden, which is the most test of the skills of an algorithm engineer.

Many times a model that works well in one scene doesn’t work well in another, or features that work well suddenly don’t work. This may not be because of a hidden bug, but simply because the model doesn’t fit well in the current scenario.

Take the recommendation scenario as an example. In the recommendation of the home page, we can only recommend users according to their historical behavioral preferences because we do not have any additional input information. At this point, we will pay extra attention to the intersection and overlap of user history behavior and current product information, and focus on making this information into features. But if the same features migrate to the guess you like section at the bottom of the product details page, it may not be very appropriate.

The reason is very simple. The products you like to recall at the bottom of the details page are basically the same category or even the same type of products, which means that most of the information about these products is the same or highly similar. Since the information is highly similar, it is difficult for the model to learn the key information from these features with little differentiation, so it is also difficult to obtain the same effect. It’s not the feature or the model that’s wrong, it’s simply the scene that doesn’t fit.

Understanding and thinking between scenes and features as well as models is the part that tests the ability and experience of an algorithm engineer most. New people often do not pay attention to this aspect, and are more limited to features and models themselves. There are times when our mind can’t follow a straight line, and we need to stop and reflect on what I’m missing.

That’s all for today’s article. I sincerely wish you all a fruitful day. If you still like today’s content, please join us in a three-way support.

Link to the original article, follow to get more articles