The ability to prepare data determines the boundaries of enterprise AI development

Among the three elements of AI, data, power and algorithm, data is the fuel of AI algorithms. Whether it is the AI business application development of enterprises or the AI scientific research project of teachers and students in universities, sufficient training data must be considered in order to obtain high-precision models.

In simple terms, data tagging is equivalent to preparing “food” for the “feed” AI. Supervised learning and semi-supervised learning in machine learning require manually annotated data for learning, and its training set, verification set and test set are all annotated data.

For example, if you want to teach the AI to recognize an apple, you can use 1,000 images labeled “apple” and many more images that don’t include “apple” as a training set. The machine will learn a model from this and recognize the apple when it encounters the related image later.

But in real business and research scenarios, tagging thousands or even thousands of pieces of data can be a daunting problem. Don’t underestimate the fact that tagging is just a frame and click operation. Relying on one person for tagging is efficient enough to bring the entire AI project to a halt.

Observing this demand, Baidu BML fully functional AI development platform launched the feature function “multi-person annotation”, which can complete data annotation at twice the speed by dividing the whole into parts and parallel blocks. The improvement of AI development efficiency is not a dream!

In multiplayer annotation, there are three roles: administrator, annotator and auditor. The administrator creates multi-person annotation tasks and assigns them to the designated annotation team. After annotators complete annotation, the audit team is introduced to review the work of annotators, which further improves the accuracy of annotation and ensures the accuracy of annotation in subsequent model training. After the completion of the audit by the auditor, the administrator will check and accept the overall annotation effect, and the annotation work will be officially finished after the acceptance.

To start the multiplayer tagging task, just create the data set and use it directly. Image and text tasks on the BML platform are now fully supported. Say yes to all who use it!

Four simple steps, teamwork to solve massive data annotation (benefits at the end of the article)

The administrator creates teams and tasks

First, the administrator creates the annotation team and audit team, and adds relevant members to the team and improves the information.

Once the team has created it, it can initiate a multi-person annotation task for the unlabeled images in the existing dataset. The diagram below:

At present, tasks are evenly allocated to unmarked data in the data set. During task creation, the administrator can flexibly choose whether to review annotations, task deadline, member permissions, and data storage mode.

After completing task creation and submission, the background will automatically allocate tasks evenly and send the link of annotation task to the mailbox of annotation team members.

Annotated by the annotator

The annotator clicks on the task link received in the mailbox to start annotation and finishes the annotation before the end time set by the administrator. After the annotation is completed, the task is submitted and the task review is decided according to the administrator’s Settings.

The figure above shows the annotation page of the annotator. The administrator can set the corresponding color for each label, and the color of the annotation box will also change accordingly, so that the annotation and inspection can be carried out by the annotator. At the same time, EasyData also supports label topping and locking functions, which is convenient for the tagger to quickly select commonly used labels and improve the efficiency and accuracy of labeling.

The auditor performs the audit

The auditor clicks the audit task link in the mailbox, reviews the annotation work of the annotator before the end time set by the administrator, and submits the task after the completion of the audit. After all audit tasks have been submitted, the administrator enters the acceptance stage. In the overall process, the participation of the auditor shares the audit pressure of the administrator, but also improves the requirements of the annotator, through a variety of ways to improve the quality of data annotation, improve the efficiency of the acceptance link.

Administrator Acceptance

The administrator can see the progress of the annotation and audit tasks, as well as all the annotation details. In the case of audit, if the approval rate of the annotation result of a certain annotator is low, it can be called back and re-annotated, and the auditor and administrator also need to re-audit and acceptance. After the completion of acceptance, the annotation data will be saved in the target data set, marking the completion of multi-person annotation.

Based on the “multi-person annotation” function of BML, the onerous data annotation work can be allocated through team collaboration. In addition, the role of administrator and auditor is introduced to further subdivide the annotation work, maximizing the efficiency of team collaboration on the premise of ensuring data quality.

Baidu BML full function AI development platform

Baidu Machine Learning, based on Intel ® Xeon ® integrated AI acceleration, is a one-stop AI development service for enterprise and individual developers with Machine Learning and deep Learning. Provide one-stop, low code, efficient and convenient AI development experience. Include:

Data processing: camera data collection and reflux, online labeling, multi-person labeling, intelligent labeling, data cleaning, data quality inspection; Model training: low code preset model callback mode, Notebook native programming mode, multi-file upload custom job mode; Service deployment: Supports comprehensive public cloud deployment and end-to-end offline deployment, automatic service start and stop, traffic segmentation, customized quotas, and performance evaluation. You can already feel how intimate the features are!

Now, BML is still carrying out the “2021 Gravity Plan”, baidu search “Baidu BML” (ai.baidu.com/bml/) to participate in the activity, experience the platform function is likely to get 10,000 YUAN AI privileges, can be exchanged for super “luxury gift package”!

Such as:

6000+ hours of customized model training; 590+ hours of preset model tuning; Public cloud model deployment service 400+ hour quota; Or exchange for 50 device side SDKS. Plus completely free “multi-person annotation”, no problem to meet the development and debugging of an AI model Demo, not get the wrong hundred million!

The ability to prepare data determines the boundaries of enterprise AI development

Related Posts

Summary of ROS2 features mentioned in the official article

“Deep Application” NLP Machine Translation Deep Learning Practical course

Data Mining – Clustering Analysis (Python implementation of k-means algorithm)