Abstract: in the 7th global software conference, huawei YuJiBo software development engineers and developers chatted huawei cloud intelligent practice the club’s official website, mainly concentrated in the operation of content production, content analysis, content quality, content distribution, content consumption and user feedback process, as well as the business pain points encountered in the process.

This article is shared from huawei cloud community “Huawei cloud official website five key measures of intelligent practice [Global Software Conference technology sharing]”, the author of the original: technology torch bearer.

The Internet is generating huge amounts of content all the time. According to a report from Ruiya, the Chinese Internet generated 4.2 million voice messages, 8.3 million shared videos, 4.16 million search queries and 1.65 million weibo visits within 60 seconds.

In the face of so much content, how should we do a good job of website content operation?

In the 7th global software conference, huawei YuJiBo software development engineers and developers chatted huawei cloud intelligent practice the club’s official website, mainly concentrated in the operation of content production, content analysis, content quality, content distribution, content consumption and user feedback process, as well as the business pain points encountered in the process.

It also focuses on how Huawei Cloud uses AI algorithms and models to provide automation capabilities, reduce labor costs, and improve content quality and content distribution efficiency.

How to judge the quality of content and what are the keys to effective content delivery?

In the digital age, traffic is the key to website content operation, and the good experience brought by high-quality content and efficient content distribution is the basis for traffic improvement. A negative example is the misuse of putin’s photo in Indian media coverage of sexual assault, while a positive example is the use of recommendations and search for content distribution by news, e-commerce and video sites.

That huawei cloud official website as a content website is how to do it?

Firstly, the content life cycle and content operation process of Huawei cloud are introduced. The content operation of Huawei cloud official website is divided into six stages: content production, content analysis, content quality inspection, content distribution, content consumption and user feedback. The contents of the pages, documents, audio, video and pictures on the official website are analyzed and understood first. After passing the content review, the operation personnel distribute the content to the live network. End users will report their opinions to the internal and external platforms after consuming the content on the official website of Huawei Cloud.

In the process of content operation, our pain points include the following parts:

1. A large amount of multimedia content (audio and video, pictures, etc.) requires in-depth semantic analysis to determine the quality of the content and distribute it effectively, which is time-consuming and time-consuming;

2. There is a large amount of content release data and frequent updates, and a large amount of content quality inspection consumes large manpower and low efficiency;

3. The traditional way of operation configuration cannot meet the personalized needs of complex customer groups, and it is easy to reduce user interest and lead to user loss;

4. The access experience of end users cannot be effectively collected, analyzed and closed, which is not conducive to the rapid improvement of product experience.

In view of the above problems, we mainly solve the business pain points in each stage through intelligent solutions, including:

1. In content analysis, OCR, ASR, NLP and other technologies are used to automatically extract structured information of content to reduce labor costs;

2. In the content audit, NLP technology and Huawei Cloud Moderation service were used to conduct machine audit;

3. In the content distribution process, use the structured information (TDK, tag, category, etc.) of the content, as well as intelligent recommendation, intelligent search and other related technologies to improve the efficiency and accuracy of content distribution and user experience;

4. In the user feedback link, use NLP related technology to conduct emotion analysis and sound classification, timely process and close the loop, and continuously form product improvement suggestions.

The following describes huawei cloud intelligence operation practices in detail.

The key measures of the official website intelligent operation practice

First of all, I will introduce the overall architecture of huawei cloud official website intelligent operation. The architecture is relatively simple and contains several key layers.

Firstly, the bottom layer is the basic service layer. All our businesses are built based on Huawei cloud services, including AI-related OCR, ASRC, NLP, RES, ModelArts, big data-related DLI, MRS, and basic SQL and NoSQL storage services. Above the basic service layer is the core data layer, including user portrait, behavior data, item information and other data; The middle layer is our feature engineering and algorithm model layer, and the algorithm model mainly focuses on NLP, intelligent recommendation and intelligent search related algorithm. At the top level, we built service components to support different business scenarios, including portrait and label components, policy management and sorting components, AB testing and log collection components, etc. The top application scenario mainly has five thousand faces, recommendation, search, public opinion and intelligent question and answer.

I will highlight some key initiatives for smart practices.

Key move 1: Content analysis

In the content analysis stage, we use OCR and ASR technology of Huawei cloud to extract the text of pictures and audio and video, so as to facilitate the next automatic content audit. At the same time, we use NLP related technology to extract text keyword, abstract, tag, classification, topic and other structured information for search engine optimization and content distribution stage model training.

Key move 2: Content quality inspection

After text extraction and semantic understanding, we use automated means to conduct content inspection, including text error correction, content audit and normative inspection. Among them, text error correction provides the ability of error correction based on pinyin, error connection based on N-Gram sub-string, and error correction based on language model, because the business needs to update the key words and corpus regularly, and update the model regularly.

Content Moderation is connected with The Moderation service of Huawei Cloud, which has the Moderation ability of text, image and video. The business only needs to update the sensitive thesaurus regularly. In addition, there are also normative checks, including 404 dead chain, TDK information, monetary units, etc. The schemes mainly adopted are crawler service and rule engine.

Key move 3: Content distribution – intelligent recommendation

In the content distribution stage, we mainly introduced intelligent recommendation and intelligent search. Intelligent recommendation predicts users’ interests based on user’s article portrait and user behavior through intelligent means, so as to realize content finding, accurate recommendation and improve conversion rate.

The system architecture of Huawei cloud intelligent recommendation is as follows: Based on offline OBS data, DLI offline processing is used to extract user item portraits and user behavior information, and DLI offline processing is used to train feature engineering, recall and sorting models. The training will be released to ModelArts platform, which provides online reasoning ability.

At the same time, we also support real-time recommendation ability. The business uploads user and item information through the DIS channel and updates user and item portrait in real time. Then the DIS channel connects real-time behavior, updates user interest label and recalls real-time recommendation result set. Finally, when the user visits the official website page, he requests ModelArts interface to put back the sorted recommendation content.

Key move 3: Content distribution – recommendation algorithm

Recommendation algorithms in the industry are relatively mature. We adopt common recall and sorting algorithms, including collaborative filtering and interest matching. LR and DeepFM are mainly used in sorting. Among them, LR has the advantages of simple model, high efficiency and small amount of computation, while its disadvantage is that it cannot deal with the relationship between multiple features. DeepFM has the advantages of integrating low-order and high-order features. The more features, the more accurate they are.

In the end, intelligent recommendation brings many improvement effects to the business, such as the content distribution efficiency is improved from hour to minute, and the coverage of content push is increased to 90%+.

In addition, click-through rates for products on the website, recommended events, conversion rates for sign-ups and purchases, and recommended click-through rates for community front page blogs all increased.

In terms of intelligent recommendation for content distribution, we also summarize some experiences:

  • For business scenarios with small amount of data, algorithms with simple model and strong explanation are prefered to go online, and the effect of the algorithm is quickly optimized and verified through AB test.

  • Make full use of users’ near line and search behavior, because near line represents users’ real-time interest, and search can generally represent users’ content requirements, which will be better for the improvement of business indicators.

  • In recommendation scenarios, no algorithm is omnipotent. You need to select an appropriate algorithm based on the scenario, user and service characteristics, and data analysis results.

Key move 4: Content distribution – Intelligent search

Another key measure of intelligent distribution is intelligent search. According to the data statistics and the analysis of the heat map on the right side, users pay more attention to the structured card part and the articles with higher ranking in the search results, and the attention becomes less later. Therefore, our search optimization focuses on the following aspects: 1. Intelligent recall of cards; 2. Search recall optimization; 3. Search sorting optimization.

Smart card recall

In the intelligent recall of cards, we mainly use FastText model to predict the card category (text classification) corresponding to users’ search terms. The input layer is the vector of words that make up Query, and the output layer is softmax layer, which mainly outputs predicted cards and probabilities.

At the same time, we optimize the structure of the hidden layer. The original structure adopts stacking average. Although the calculation speed is fast, the information is lost, so the hidden layer is changed to fully connected embedding mode after stitching.

Recall optimization based on rNN-attention-DSSM

We use RNN-attention-DSSM model to optimize search recall. Traditional ES queries are query recall based on keyword matching, and those with mismatched keywords but consistent semantics cannot be recalled. DSSM used massive exposure of Query and Doc to click log, expressed Query and Doc into low-Dimensional sense vectors with DNN, calculated their semantic vector distances through cosine distance, and finally trained the semantic similarity model. Rnn-attention-dssm is a further optimization of DSSM, considering the context characteristics of statements through RNN and Attention mechanism.

The rNN-attention-DSSM model is as follows: The top layer is typical DSSM layer, which calculates semantic similarity based on query and vector distance of forward and negative documents, and conducts Softmax. The goal of the training is to maximize the probability of a forward document being queried. The bottom left is a typical GRU network, and the right is a typical self-attention model.

Our training data are as follows: positive samples are Doc clicked by Query, negative samples are randomly selected from Doc not clicked by Query, and the proportion of positive and negative samples is 1:4. Query input is the user Query content, Doc input is the file title + book name.

Sorting optimization based on learning sorting algorithm Ranknet

At the same time, Ranknet model was used to sort and optimize search recall results, and doc with high correlation was placed in the first place to improve the accuracy of search results and user experience. Ranknet model belongs to Pairwise method, which does not care about the specific value of the correlation degree between a certain DOC and query, but transforms all doc sorting problems into solving the sequencing problem of any two doc. That is, the use of DOCI is more relevant than DOCj, docj is more relevant than DOCI, and the correlation degree of the two is equal, there are three categories, and {1, -1, 0} are used as the corresponding category label respectively.

As shown in the figure above, the process of Ranknet algorithm is as follows: features are extracted from the articles queried and recalled by users on the left side, then word segmentation of each document is calculated by a DNN network, and then the difference value of document score is calculated in pairs. After that, the value is constrained between (0,1) by sigmoid function.

At present, the number of clicks on each document is used to compare the number of clicks on each document in pairs. The small one is -1, the equal one is 0, and the large one is 1. Then the comparison value is linearized and scaled to the orientation of [0,0.5,1]. The goal of model training is to make the comparison value of model and tag data as close as possible, and the cross entropy loss function is used in model training.

Our intelligent search has also brought good results. Whether it is smart recall of cards or sorting optimization, the search click rate of Top1000 and Top5000 has been increased

Our next plan is to first further improve the off-line index of the ranking model, select rich feature sets based on business understanding and feature selection, and find out more features related to correlation. Secondly, distinguish short and long words query, build a separate training model for short query, improve the accuracy of short query sentence sorting; Finally, based on NLU further mining user search intention, to solve the problem of unclear user search intention.

Key move 5: Experience closed loop – Emotion analysis and sound classification

Analysis and improvement of user experience problems is an important way to continuously improve product experience. We mainly use NLP technology to analyze user emotions, classify and distribute experience problems, and the relevant logical view is as follows:

After the internal and external sound access, the data will be removed and cleaned and stored in the database. Then, NLP and other capabilities will conduct sentiment analysis and sound classification: for negative sound, public opinion alarms will be issued in time; for product experience problems and demands, Bug list and demand list tracking and closed-loop respectively. At the same time, we also have corresponding operation management platform for public opinion configuration, key public opinion tracking, emotional feedback and kanban data presentation. The model adopted in this section is also relatively simple: the bottom layer is a Bert pre-training model, and the downstream is connected with a classification model.

Finally, our effect data is as follows:

1. The accuracy of negative emotion analysis reached 95%+;

2. The workload of sentiment analysis is greatly reduced, and the number of manpower is reduced;

3. The efficiency of negative emotion processing is improved from hour to minute;

4. According to the classification of experience problems, push the cloud service to complete the closed loop of 50+ effective improvement suggestions.

The experience is: 1, category definition as clear as possible, easy to distinguish, reduce ambiguity; 2. The annotated corpus is provided in small batches and with high frequency, and the sampling quality inspection is performed. If the accuracy is lower than 95%, the annotated corpus will be re-labeled.

Summary of engineering practice

Our engineering practice is relatively simple: based on Huawei Cloud ModelArts one-stop development platform, we build the capabilities of data processing, model training, model management and deployment, and based on DGC timing scheduling, we build the capabilities of continuous model training and release.

Other things we’re doing to make our content operations smarter include:

  • Optimize the accuracy of text classification and information extraction based on the pre-training ability of Huawei Cloud NLP Pangu Model;

  • According to the keywords and new features of Huawei cloud products, AI algorithm is used to generate article content intelligently.

  • Based on content deep semantic mining and structured information, the association relationship of Huawei cloud content is established, the unified life cycle management of content is constructed, and the knowledge map is constructed based on the association relationship for intelligent recommendation and search.

  • Multi-task article quality score based on page vision, information content and semantic depth to improve content quality.

welfare

After understanding the key measures of intelligent practice on Huawei cloud official website, do you have any gains or questions you want to communicate? Welcome to leave your questions or feelings in the comments section of the original article. We will extract 3, invite experts to communicate with you 1V1, and send a developer gift package.

This time, there are two Huawei experts to share the website high availability guarantee scheme and front-end low code practice, they also answered the developers concerned questions, such as the best scheme of website high availability guarantee, low code platform selection and so on. Welcome to scan and watch the video.

Finally, attached is the technology sharing PPT of Huawei front-end R&D engineer Guo Xiao at this Global Software Conference. Click “Five Key Measures of Intelligent Practice on Huawei Cloud official website” and you can download and view it at the end of this article.

Click to follow, the first time to learn about Huawei cloud fresh technology ~