Abstract: Pornographic video content has seriously jeopardized Internet security. Pornographic content often exists in two forms: image and audio. This paper introduces a method of pornographic video content recognition based on image and audio.

The full text contains 1653 words and is expected to take 5 minutes to read.

The background,

With the development of mobile Internet, short videos have become the main way of daily entertainment for people. A large number of users upload and download short videos through the Internet every day. However, some short videos have pornographic content, which not only has a serious impact on the mental health of teenagers, but also causes one of the factors of social insecurity.

Pornographic video content recognition is a multi-mode problem, including: pornographic image recognition and pornographic voice recognition. In the field of pornographic image recognition, although there is research in the field of pornographic image, pornographic image recognition is still a challenging task. For example: pornographic areas in the image is relatively small, difficult to recall and vulgar and pornographic images from the visual point of view is similar, difficult to distinguish and other problems. The picture of some pornographic video is normal, need to rely on pornographic voice to be able to distinguish, at present there is no theoretical research of pornographic voice recognition

Second, the technical problem of pornographic video content recognition

In the field of pornographic image recognition, common algorithms include traditional machine learning method based on manual features and pornographic image recognition based on deep learning. Traditional machine learning methods based on manual features often use color histogram, grain information and other features to detect the skin color area in the image, ** its disadvantage is: can not distinguish between vulgar images and pornographic images. ** With the development of deep learning, image classification and object detection are also used for pornographic image recognition. Its disadvantages are: simple model structure and limited problem solving ability.

There is no theory about the classification of pornographic speech, we refer to the theory of speech classification. Sound model recognition based on original audio and one-dimensional convolution has poor effect, and is gradually abandoned to sound model recognition based on audio spectrum features and two-dimensional convolution.

Three, pornographic video content recognition framework detailed

In order to solve the problem of multi-mode video, two modes, image and audio, are used to determine whether the video is pornographic.

The overall structure consists of three parts:

1. Pornographic image recognition model

2. Pornographic audio recognition model

3. Fusion of image and audio model results

The overall solution is shown in the figure below:

1. Pornographic image recognition model

In order to capture the local and global features of the image, we propose DCNet. The overall structure consists of classification branch and detection branch, which are used to capture the global and local information of the image. The whole structure consists of classification branch and detection branch, which are used to capture the global and local information of the image. Detection branch, compared with the traditional detection network, we have made two optimization:

(1) BiFPN is used for feature fusion, which is characterized by: different feature maps are given different weights, and two-way fusion can be implemented to effectively improve the detection effect;

(2) The task branch adopts the idea of Anchor free. Its characteristics are as follows: the idea of FCN is adopted, and more fine-grained multi-current detection can strengthen the detection of small-area information. At the same time, the central point branch is added to reduce false detection.

The structure of the pornographic image recognition model is as follows:

2. Pornographic audio models

At present, there is no theory related to pornographic speech classification. Based on the theory related to speech classification, the common method is to first turn the audio WAV into a two-dimensional spectrum, and then input the spectrum into the two-dimensional convolution. On this basis, we use log Mel-spectrograms as the audio spectrum characteristics, and proposed RANet, Its features include.

(1) Convert the audio into log Mel-Spectrogram features: one second audio corresponds to a log Mel-spectrogram feature, and its information is a two-dimensional image.

(2) Capture the timing information between the audio based on TSN architecture, segment the audio from the time dimension at equal intervals, and extract a feature map for each segment.

(3) The frequency attention module is adopted to capture the key information of sound. The attention block is composed of two convolution layers and inserted at both ends of the layer in the Resnet network.

The structure diagram of pornographic audio recognition model is as follows:

3. Fusion of image and audio model results

4. Experimental results

In our 3K test set, the model accuracy reached 93.4%.

This paper was published in the journal Applied Sciences in 2021 by the author. Part of the content has been translated for your reference.

Original address:www.mdpi.com/2076-3417/1…

Recruitment Information:

Welcome excellent C++ engineers to join baidu and grow with daishen. Baidu Geek, a public account with the same name, said, ‘You can just enter and push. We are looking forward to your participation!’

Recommended Reading:

Baidu C++ engineer those limit optimization (concurrent article)

Baidu C++ engineers of those extreme optimization (memory)

Baidu large-scale Service Mesh implementation practice

A system and method based on real time digit computation

———- END ———-

Baidu said Geek

Baidu official technology public number online!

Technical dry goods · Industry information · online salon · Industry conference

Recruitment information · Internal promotion information · Technical books · Around Baidu

Welcome your attention