01 preface

With the arrival of the inflection point of the growth of Internet users, the competition situation is more changed to game stock users, and the ultimate user experience becomes the key. As the most direct expression of users’ product experience, user feedback is an important input to improve and measure user experience. With massive and diversified features, business teams can dig out effective information from different dimensions to optimize and iterate the experience:

  1. Product operation can extract typical user needs, quickly perceive users’ feedback on new functions/content, and assist in adjusting decisions;

  2. R & D testing can quickly recall and repair online quality problems through feedback, timely stop loss, and feed back offline testing solutions.

During the initial process of user feedback, the business line faced the following problems:

  1. Multiple channels have large amount of data and high cost of problem extraction.

— Feedback channels include telephone complaints, online customer service, public opinion, on-site feedback, internal feedback, etc. For products with MAU of 100 million, the total amount of feedback from all channels is more than 10,000 levels per day, with limited labor input and limited exposure of problems. Only the head problems can be recalled;

— The processing process of each channel is independent from each other, and the same problem is followed up by each channel at the same time, resulting in repeated input of manpower.

  1. The uneven quality of feedback increases the cost of analysis.

— Different channels have different information collection methods and different data richness;

— The questions selected by users are not classified accurately, with different expression habits and missing description.

  1. Feedback analysis link length, closed loop rate and timeliness are not ideal.

— In the process of problem analysis and solution, there are different degrees of multi-role, circulation and collaboration between business lines, resulting in long processing time or no conclusion;

– The feedback log information is limited, so the fault cannot be located and the return call connection rate is low.

  1. Lack of effective effect evaluation means, there are specific problems recall in time.

There are two ways to implement user feedback analysis:

(1) By virtue of the general ability provided by external professional user feedback service team, the company aims to master the trend of feedback volume, hot spots, public opinion and other market information, and has weak analysis and mining at the business level;

(2) Self-build user feedback analysis system, establish a feedback analysis closed-loop link, focus on problem mining, to improve user satisfaction as the goal, mainly based on automatic classification, clustering, through alarm mining head problems, at the same time according to the business characteristics of mining waist and tail problems.

In view of the company’s business characteristics, iQIYI test team follows the second implementation ideas, set up a feedback excavation, analysis of location, repair the closed-loop, problem tracking the whole process of standardization processing mechanism, and provides the corresponding ability to support platform, through feedback auxiliary artificial rapid access to effective information mining algorithm, using automatic analysis ability fast positioning problem, In order to improve the closed-loop rate of problem processing and shorten the processing cycle, this paper will be introduced from the overall framework of user feedback full-link processing and key capabilities of each link.

02 Scheme Design

The user feedback analysis scheme proposed by iQiyi test team is expected to establish universal service capabilities based on the company’s business characteristics, improve the recall efficiency of head problems, focus on efficient problem identification, and assist business to solve problems to improve experience:

(1) Ensure accurate recall of head problems through feedback classification and classification monitoring and alarm, rely on feedback clustering to extract hot alarm feedback, focus on specific problem phenomena, and reduce the cost of feedback analysis;

(2) Establish high-quality feedback mining ability, quickly identify waist and tail problems, especially strengthen the recall of single point problems;

(3) Have the ability of automatic problem analysis and positioning, automatic flow, and can effectively measure the closed-loop rate and timeliness of processing;

(4) With platform capability, manual data statistics and analysis can be automatically calculated by the platform, and the processing process and standards are unified.

The following is the overall architecture. Feedback mining is the basis of algorithm ability. Through platform construction, problem identification, analysis and positioning, repair closed loop and effect tracking are connected in series, and process and result indicators are formulated to measure the effect of each stage.

03 Feedback Access

This link is the initial link of the whole feedback analysis link, which mainly completes data preprocessing, filters multi-channel access data, and reduces the amount of data to be processed. Align data information fed back from multiple channels, unify multi-entry and multi-version data into fixed format field information, carry out standard mapping of classification information, output standardized data, and provide relatively high-quality data for feedback mining, analysis and positioning.

04 Feedback Mining

Feedback as the key part of whole procedure, the core task is to establish a link between effective feedback classification, clustering, based on high quality data identification ability, reduce the cost of extracting problem from huge amounts of data, to achieve full recall at the rear of the head, waist, problem, to carry out important basis for subsequent link below focuses on these three general ability.

1, multi-stage combination automatic classification

Rule-based classification, as the first level, aims at high accuracy and adopts classification rules to achieve fast classification of strong keyword feature feedback.

At first, Word2vec similarity was selected as the second level classification algorithm, and the sample sentences were taken as the comparison object. When the similarity between the feedback text and the sample sentences exceeded the threshold value, the classification was considered to be successful. However, in the effect evaluation, it was found that some of the feedback texts had fewer central words and longer contents, resulting in unsatisfactory overall accuracy and recall rate. Through investigation and experiment, it is found that the N-gram feature of fastText can reduce the influence of Word2vec missing word order on the classification effect. After generating the classification model through sample training, the classification with maximum matching probability and its probability value can be obtained by calling the prediction method. Therefore, the fastText classification is preplaced as the second level classification. Due to the opacity of fastText classification process, sample quality could not be evaluated, so the threshold was set at a high level in this link, and Word2vec similarity classification was retained as the third-level supplementary recall.

Feedback classification algorithm applied in the business line monitoring alarm, covering class hours, days, weeks monitoring time range, according to a recent N monitoring cycle feedback quantity/volume change rate of the mean and standard deviation, dynamic business line classification feedback quantity alarm threshold, normal to avoid business change feedback quantity changes cause false alarms. Multistage monitoring alarm can recall part of waist problems while ensuring the recall of head problems. After several iterations of the algorithm, the classification accuracy is improved by 40% and the alarm accuracy is improved by 30%.

With an accurate alarm, the team hopes to quickly identify hot issues in the alarm feedback, narrow the scope of analysis, and achieve results through clustering algorithm. Clustering algorithm will be introduced below.

2. Incremental clustering based on time window

User feedback is a kind of data stream, and the clustering of data stream has three main points: single pass scan, incremental processing and time locality. Single-pass clustering is an incremental clustering algorithm, which requires only one algorithm for each document. It is especially suitable for streaming data processing and can meet the real-time requirements of text clustering scenarios. It can be well applied to topic monitoring and tracking, online event monitoring and other fields of social media big data.

Tf-idf is a classical similarity calculation method of single-pass. Each feedback is treated as a short text, and the spatial vector representation of all feedback in local time is obtained. Then cosine similarity is used to calculate vector distance, and if the value exceeds the threshold, it is judged to be the same class cluster. This calculation method has a major defect: the spatial vector is composed of the feature weight of each word, and the similarity of words is not taken into account, so the accuracy of clustering recall rate is not ideal. Tf-idf cosine distance is taken as the threshold to calculate the similarity between the feedback and the Word2vec word vector of the known cluster. If the similarity exceeds the threshold, it is judged to belong to the same cluster, which can make up for the deficiency of the classical TF-IDF calculation. Word2vec and TF-IDF are combined to complete vectorization, add weight to the word vector, and supplement the clustering scenes with the same central words but different functional words.

Clustering algorithm falls into three directions:

1) Monitor the alarm service classification to extract hot spot feedback and determine the problem phenomenon;

2) Automatically identify whether there is correlation between incremental feedback and known problems, and do not follow up the same problems repeatedly;

3) Feedback real-time correlation, mining small batch problems.

Due to the different feature aggregation degree of data set, the similarity threshold is set differently in these three scenarios to meet the requirement of each call rate. As shown in the figure below, in order to call the system monitoring alarm associated with feedback clustering, abnormal business classification can be found by monitoring the change rate of feedback quantity of classification, and hotspot feedback can be aggregated to assist problem scene recurrence, and known online problems can be associated to reduce repeated follow-up.

Feedback cluster is implemented to the monitoring alarm hot spot feedback, and the proportion of known problems on the alarm correlation line can reach 15%, and the effect of reducing manpower repeated investment is initially obvious.

Feedback clustering provides us with a batch problem mining idea for extracting monitoring alarms.

The corresponding problem phenomenon is very effective, but it is not effective for small or single point of problem recall. Feedback clustering is data mining based on content features. Similarly, the quality of a single piece of feedback can be judged based on other features. In theory, single point problems can be mined.

3, high-quality feedback identification

Feedback quality is the evaluation object, and the qualitative evaluation of feedback quality can be transformed into quantitative calculation by establishing a multi-dimensional and high-quality data evaluation model and separating the factors affecting feedback quality layer by layer.

There are certain differences in the features that can be divided according to user login status. In practical application, two evaluation schemes are used:

(1) Unlogged users

— Scene rationality: For the feedback of specific classification, the general positioning analysis process is abstracted, and the abnormal conclusion can be reached through automatic positioning; If the positioning result is abnormal, the feedback quality is high. The service location that does not depend on the user ID is unlimited. The service location that depends on the user ID can be applied to the login user.

Content consistency: The content of different channels includes features such as pictures, feedback descriptions, and system logs. The higher the consistency between different features, the higher the quality of feedback. The consistency of picture and content is to judge the correlation between picture text and feedback description by extracting picture text. Consistency between logs and content/pictures means that key information, such as albums or episodes, is extracted from logs. If the feedback description or pictures contain key information in logs, the content is considered consistent.

(2) Login user

In addition to the above scheme, logged-in users can also conduct historical feedback analysis. The evaluation dimensions include content quality, historical adoption rate and feedback frequency; The content quality is evaluated from the two aspects of text quality and image proportion, and the feedback frequency is measured from the three aspects of feedback frequency, concentration degree of problem classification and feedback time. The analytic hierarchy Process (AHP) is used to complete the quantitative calculation of each layer and merge the results, and the single-point high-quality data can be mined.

At present, the characteristics of high-quality feedback mining selection include: historical feedback analysis, consistency of text and text, consistency of log, whether automatic positioning is abnormal, whether clustering of a certain size is formed, etc., which can be flexibly expanded according to the platform capacity, aiming to improve the recall efficiency of tail or single point of problem. Through the identification of high quality feedback and marked key feedback, the amount of feedback follow-up was reduced by 80%.

05 Analysis and Positioning

The analysis and positioning process is based on feedback classification, focusing on the common problem positioning of the same business classification in multiple channels. The negative impact of low feedback quality on problem analysis is weakened by analyzing basic information and server information, and the problem of low closed-loop rate caused by incomplete feedback logs is solved. For the problems found in the feedback mining process, different analysis and positioning methods are adopted according to different discovery methods:

1) For the monitoring alarm recall of middle and waist problems

The abnormal analysis of monitoring alarm is carried out from the following six dimensions: time analysis to locate the fault time, platform and version to determine the scope of influence of the client, and region and operator to identify regional network faults. These five dimensions can basically complete range location. The source cluster collects the source information within the feedback period from the logs. If the number of source information collected reaches the threshold, the specified source is faulty.

(2) Tail problem of high-quality excavation recall

The service side provides the locating interface and critical analysis path. The front end can extract data features or business processes from logs. The platform is abstracted as a general locating process framework.

06 Repair closed loop

Through the establishment of multi-role flow specification, closed-loop monitoring, FAQ and other processes, the problem of unsatisfactory closed-loop rate and closed-loop period caused by link length and link interruption of feedback analysis is solved.

1) Provide one-click reporting function, automatically distribute handlers according to problem type and platform, establish bug closed-loop periodic monitoring, and promote bugfix or demand transformation;

2) Some problems with automatic positioning ability or solutions to common problems are transformed into intelligent customer service content to assist users to solve problems by themselves and reduce consulting feedback;

3) After the closed loop, the problem can reach the user through the channel inside the station to realize the whole closed loop.

07 Customizing problem tracing

Provide multi-dimensional feature combinations for known problems or newly launched functions and activities (e.g. Classification, content, keywords, platform and version, the equipment, area, operators, etc.) to create the follow-up of the tasks, convenient tracking and comparing data change trend, to observe the effect problem solving, evaluating new function or activity effect, at the same time support custom monitoring alarm, as shown in the figure below, by the custom tracking task, after the first failure occurs to solve feedback quantity tends to zero, Long – term monitoring detects small problem spikes and deals with them quickly.

08 Process/result measurement

Establish feedback analysis effect metrics (such as closed-loop rate and closed-loop period) to assist process analysis, evaluate the processing capacity of each link, and assist business lines to make targeted improvements.

09 Overall Framework

Based on the above key capacity building, the platformization of user feedback analysis process is completed: The interaction layer provides visual operation pages, and establishes connections among multiple pages through interactive guidance to form processing links; The service layer provides the general service capability to the interaction layer, and the rapid expansion of the function of the abstract support module based on the general configuration at the business level; The data layer decides the scheduling of the service layer and the display of the interaction layer. By managing the adding, deleting and modifying operations of the scheduling data, the real-time update of the task scheduling and front-end display can be realized.

At present, there are multiple business line access platforms, which integrate a variety of automatic positioning schemes for common business problems. Zero cost is applied to multiple business scenarios, which greatly improves the overall closed-loop rate and shortens the closed-loop period.

10 epilogue

As a kind of data intelligence, the rich potential information of user feedback needs to be further explored. The emotions, frequency and types of user feedback are related to user retention to a certain extent, which is worth further research. By establishing a fast and efficient user feedback analysis system, it provides a general solution from problem discovery to repair, assists the business team to continuously improve experience and responds to users’ voice quickly, which is of great value for maintaining the loyalty of iQiyi’s loyal users. In the future, further effects will be optimized in feedback channel expansion, automatic correlation of repeated questions, and implementation of user touch and landing.