CQ: Byte system authoring quality mid-stage practice

Introduction to the

Creative resources refer to props, templates, small programs and other resources made by designers, which will be applied to multiple byte apps such as Douyin, Huoshan, Scissors Ying, Light Yan, xingtu and so on. With the development of UGC mode, more and more users have joined the ranks of creators, providing a large number of novel ideas and gameplay. The increase of creators also leads to a sharp rise in the number of resources, safety and quality problems are becoming increasingly prominent, and manual testing also has greater efficiency, omission and other problems. In order to improve efficiency and test coverage, and provide creators with better creation environment and users with better use experience, we have created a professional vertical testing service platform — Creation Quality Platform (CQ for short) to guarantee the quality of creators’ production resources.

Platform is introduced

background

The traditional process of creative resources detection can be roughly divided into the steps of user submission, content detection, quality detection, online or call back. Most of the quality testing is manual control, so it is very likely to appear low efficiency, missed detection, wrong detection problems. And these problems may cause online accidents or reduce the enthusiasm of creators to contribute, so that we lose a group of high-quality creators. However, CQ platform adopts the method of automation and algorithm to ensure the quality inspection. Such as automatic template creation, automatic performance detection, policy hierarchical delivery, algorithm classification marking, algorithm content detection, etc. At the same time, recheck and feedback functions are provided to optimize automation strategy and algorithm model and improve platform confidence. The platform system architecture diagram is as follows:

Task flow

There are various detection tasks. Job is abstracted to describe a task, which can represent any type of task. Task refers to a subtask of a Task and is the smallest unit of automatic execution. The creation rule can be determined based on service requirements. Business APP represents different detection types, such as template detection, special performance detection, etc., as mentioned below. The task management module handles common task management logic. The task flow is as follows:

Quality inspection type

Template detection

The object of template detection mainly includes the cutting of the same template, the repair of the same template, and the patting of the same template. It is characterized by large magnitude and complex detection points. Cutting the same template alone is about 3.5W + magnitude every day.

The process is shown in the figure above, which has formed a link closed-loop with the process platform. Upload the template of the same cutting style to the process platform, distinguish the high and low priority queues, and register a task in CQ. CQ distributes tasks based on the priorities of templates and creates automated tasks. Automatic use of materials, video synthesis, video export and other operations. After the same video is obtained, the corresponding algorithm will be used to detect whether template rules and video quality meet expectations, and at the same time, mark the video. Finally, the product and label results will be sent back to the process platform, and the corresponding personnel will confirm the detection results.

task

The local rack cannot fully support the same template because the task magnitude is large. Therefore, the task balancing policy is adopted for detecting the same template. Tasks are prioritized and different detection capabilities are provided based on different priorities. It mainly includes local resource pool, GTF rack service and cloud clipping of the same model.

Template priority	Detection capability	advantages	disadvantages	Order of magnitude
High quality template & low quality grayscale template	Local Resource Pool	The mobile client SDK version is the same as the online version, and the effect confidence is the highest	Local rack, UI automation, low efficiency	15 iOS and 20 Andros
High quality template & low quality grayscale template	GTF rack service	The mobile client SDK version is the same as the online version, and the effect confidence is the highest	Local rack, UI automation, low efficiency	20 android
Low optimal template	Cut the same cloud capability	The SDK based on the Linux kernel is the most efficient	Lag behind the online SDK version, the effect confidence can not reach 100%, can only handle low-quality tasks	Distributed cluster Deployment

Algorithm classification marking

In addition to template detection, template marking classification is also a high consumption of a link. According to operation statistics, the peak accumulated about 560,000 templates are in unmarked state, which has a great negative impact on the overall exposure rate of templates and recommendation quality of the same feeds stream. Therefore, efficient and accurate algorithm classification marking is very important for template marking human efficiency and online business effect. We extracted the features of template title, template video and template audio, and fused them through Gate Multimodal Unit structure to obtain the multi-modal feature model. In the vertical classification of subject classification, the accuracy rate has reached 91%, which is very close to the effect of manual classification.

The ground effect

At present, it has undertaken the template of cutting reflecting, CapCut, Douyin, Light Yan, xingtu and other business lines. At the initial stage, the detection of cutting the same template basically depended on manual operation, which required the operation students to make the video of cutting the same template according to the template on their mobile phones and check it manually. On average, each student could only test about 60 templates every day. After template detection was added, 3.8W + templates were detected on an average day, 2000+ templates for filtering problems were detected on an average day, and nearly 350W templates were detected in total. The stability and accuracy were both above 99%. The algorithm and marking module shorten the detection time from 3 days to 4 hours, greatly liberating the human efficiency.

Small program detection

The object detected by the applets is the byte system applets. Characteristics as follows:

Samples, each small program has a large number of pages;
There are many detection points, and 20+ points such as the name, icon, introduction and content of each small program need to be detected.

The process is shown in the figure above, which has formed a link closed-loop with the process platform. After the creator makes the small program and uploads it to the process platform, the screenshot service will iterate through the small program to obtain the content screenshot, and register the task in CQ once. After classifying the screenshot samples, CQ uses CV&NLP algorithm to detect the corresponding detection points, visualizes the results and sends them back to the process platform, and provides the result feedback function. The operation students can feedback the results on the process platform or CQ, which is convenient for the algorithm side to adjust the algorithm and improve the accuracy and confidence.

Algorithm can assign

At present, based on CV and NLP algorithm, it provides 20+ detection points for small programs, mainly covering the name, icon, introduction, content security, page anomaly, text anomaly, induced information, service category and so on. For different detection points, problem areas will be directly marked on the samples for the convenience of operation personnel to troubleshoot problems.

Sampling mechanism

Due to the large number of samples, it will greatly increase the pressure of operators if they all need feedback. However, if the feedback is only given to the results that are not passed, the error review problem may always be hidden in the passed case, resulting in the long tail problem of the model. The CQ platform then scores all test points in three dimensions:

Accuracy of historical results: The lower accuracy, the higher the score
Time last drawn: The longer you haven’t been drawn, the higher your score
Random factor: random score to ensure the diversity of samples

For the passed samples, a fixed number of detailed samples corresponding to test points with top N scores will be extracted and returned to the process platform for manual marking.

The ground effect

At present, the small program testing process we set up undertakes the daily testing of byte system small programs, with an average of 500+ trials. The detection delay of the small program is also reduced from the original 24h to less than 5h, and the current accuracy rate is more than 95%. According to the results of CQ, the third party small program also opened the whole process of interrogation, testing and delivery, without manual intervention.

Special performance detection

The object of special effects performance detection is special effects resources of each APP. There are many types of special effects, including props, styles, beauty makeup, stickers, filters, transitions, animations, etc. Due to the differences in types, there are also differences in procedures.

The process is as shown in the figure above. At present, a link closed loop has been formed with the special effects platform. Creators will create special effects and upload them to the special effects platform, and register a task in CQ during the testing process. CQ + depending on the type of different business types of resource scheduling corresponding equipment resource pool, creating an automated task, after the automation platform to do data analysis, visualization report shows the results and intelligent alarm, and back to the special platform, QA classmates back effects at the same time also can choose back to reason, CQ is used to determine the accuracy of the results. At present, special effects detection mainly includes performance detection (memory, CPU, FPS, etc.), basic function detection (black and white screen, Crash), customized detection (resource pack logic check), etc. The execution of automated tasks covers the main process of using special effects resources, and also covers the scenes with more online problems for some lines of business.

Trigger item detection

Trigger prop refers to the need to specify the action can trigger the corresponding effects of props, such as the need to blink, heart trigger. At the beginning of special effects detection, only static face effects are covered, but with the gradual diversification of special effects play, the number of trigger effects increases rapidly. As the effects are not triggered, the performance results produced by special effects detection may have problems such as deviation and missing report. In order to produce more accurate performance test data and improve sticker test efficiency, CQ needs to support trigger effect automation capability.

First consider the types of trigger action very much, but in the last iteration, if the testing machine cameras on the portrait image change into a video display corresponding action, this material for QA classmate maintenance costs will increase, and the need to invest a lot of hardware, the subsequent hardware maintenance also need a lot of manpower.

After investigation and discussion, CQ finally adopted the method of simulating algorithm data to complete the detection of trigger props. First parses the effects the resource bundle, access to resources needed for the package to trigger action, mainly involving expressions, gestures, touch screen, and then according to the agreed upon agreement format, structure a document pb, passed to the SDK, the client through the SDK replace real algorithm data after the message is received, can trigger the corresponding algorithm.

Hierarchical Testing process

The frame rate difference of special effects resources on different models is very obvious. For better user experience, some effects will be lost to ensure performance and make users more fluent in using. Effects detection also does hierarchical testing for resource packs. At present, the strategy is to select several representative models, divide them into low, medium and high end, and start the test from the low end in turn. Only when the frame rate result fails, it will be transferred to the middle end, and so on, and finally give the model range of special effects resources.

The ground effect

At present, special effects performance testing has been connected to douyin, Huoshan, Shear ying, Light Yan, FaceU, Xingtu and other businesses. At present, through the special effects platform, some businesses have fully believed the performance results of special effects, without manual intervention in the special effects testing process. At present, the stability is above 99%, and the problem detection rate is about 22%.

conclusion

The creative quality platform is committed to creating a set of efficient, accurate and fully automatic resource detection process, improving users’ experience and creators’ enthusiasm for submission under the premise of guaranteeing quality. At the same time, algorithm enabling is introduced to intelligently solve the detection content that cannot be covered by automation. In the future, we will further improve algorithm detection capability to cover more layers and better align with the usage scenarios of byte system products, so as to escort the company-level products from the aspect of energy efficiency.

Join us

We are byteDance interactive Entertainment RESEARCH and development team, which is responsible for the research and development of douyin, Douyin Volcano, Jiying, FaceU, Light Yan, Live streaming, music and other star products. At present, Douyin has exceeded 600 million daily active users and continues to maintain rapid growth.

The r&d efficiency team focuses on service and intelligence, aiming to empower each business scene of Interactive entertainment through tool platform and algorithm. The team’s functions include but are not limited to improving r&d efficiency, ensuring content quality, optimizing user experience and ensuring business security, etc.

The team has a variety of functions such as algorithm, engineering (front end, back end and client), data (big data, DA), product, testing and so on. It is capable of efficient self-closed-loop design and development of intelligent platform products, and has successful implementation experience of intelligent platform products with multiple corporate-level influence.

The team develops rapidly, is young and vigorous, pays attention to the construction of technical atmosphere, actively participates in the top industry technical conferences at home and abroad, and outputs high-quality technical patents and relevant papers. The work Base is available in Shenzhen, Hangzhou and Beijing.

Join us and make every line of your code available to millions of users around the world.

Resume: [email protected]; Email subject: Name – Technology Stack – R&D effectiveness.

CQ: Byte system authoring quality mid-stage practice

Introduction to the

Platform is introduced

background

Task flow

Quality inspection type

Template detection

task

Algorithm classification marking

The ground effect

Small program detection

Algorithm can assign

Sampling mechanism

The ground effect

Special performance detection

Trigger item detection

Hierarchical Testing process

The ground effect

conclusion

Join us

Related Posts

【 Model reasoning 】 The implementation method of TensorRT8 custom operator Plugin

The father of LSTM, the forgotten god of the Turing Award

Counter for Python’s Collections module