I. Audio and video collection process

Series the whole audio and video recording process, complete audio and video collection, coding, packaging into MP4 output.Access to real-time audio and video data through cameras and microphones;

  • Play process: Get stream – > Decode – > Play.
  • Recording and playing path: recording audio and video -> video processing -> encoding -> Upload server -> Others to play.
  • Live broadcast: Recording audio and video – > encoding – > Streaming media transmission – > Server – > Streaming media transmission to other clients – > decoding – > Playback.

Video sampling data: generally in YUV or RGB format.

Audio sampling data: generally in PCM format.

Two. Audio acquisition

1. How does Android capture audio

The Android SDK provides two sets of apis for audio capture: MediaRecorder and AudioRecorder.

AudioRecord

The AudioRecord output is PCM voice data, which gets the raw FRAME PCM audio data. If saved as an audio file, it can not be played by the player, so you must first write code to achieve data encoding and compression. General live broadcast technology is used to collect audio data AudioRecorder. PCM(Pulse Code Modulation) is also known as Pulse Code Modulation. PCM audio data is the raw stream of uncompressed audio sampling data, which is converted into standard digital audio data by sampling, quantization and encoding of analog signals.

MediaRecorder

MediaRecorder is an API that can directly encode and compress the audio data recorded by the mobile phone microphone into AMR, MP3 and save files.

The difference between MediaRecorder and AudioRecord
  • MediaRecorder and AudioRecord can record audio, the difference is that The Audio file recorded by MediaRecorder is compressed, the encoder needs to be set. And the recorded audio files can be played by the system’s own Music player. And AudioRecord record is PCM format audio files, need to use AudioTrack to play, AudioTrack is closer to the bottom, PCM after encoding compression can be AMR MP3 AAC.
  • For simple data collection as audio files, use MediaRecorder, for further algorithmic processing of audio, use AudioRecorder.
Advantages and disadvantages of MediaRecorder and AudioRecord:

AudioRecord: mainly to achieve while recording while broadcasting and real-time processing of audio, this feature makes him more suitable for voice has advantages; Advantages: real-time voice processing, can use code to achieve a variety of audio packaging. Disadvantages: The output is PCM format file, if saved as an audio file, it can not be played by the player, so you must first write code to achieve data encoding and compression. MediaRecorder: has been integrated recording, coding, compression, support a small amount of recording audio format, probably, AAC, AMR, 3GP and other advantages: integration, direct call related interface, small shortcomings: can not deal with real-time audio; There are not many output audio formats, such as mp3 files

MediaRecorder and AudioRecord applications:

If you just want to do a simple recorder, recording audio files, use MediaRecorder, and if you need to do further algorithm processing of audio, or use a third party’s code library for compression, as well as network transmission, live applications, it is recommended to use AudioRecord.

3. Video capture

And audio, there are also high-level and low-level API, high-level is Camera and MediaRecorder, can quickly achieve coding, low-level is direct use of Camera, and then the data collected for filtering, noise reduction and other pre-processing, after the completion of processing by MediaCodec hardware coding, Finally, MediaMuxer is used to generate the final video file. There are two sets of apis for video capture in Android system, Camera and Camera2. Camera is an older API that has been abandoned since Android 5.0(21). Camera2 is now mainly used for video collection.

In live broadcast development, it is often necessary to obtain the original frame data of the video and then preprocess it, such as beauty, watermark, special effects, etc., and then push it out through RTMP or RTSP and other protocols through coding, so as to complete real-time image transmission. So how do you get the raw video frame data? The Android Camera API has a callback setting method that can be used to retrieve raw video data such as NV21, NV12, YV12, etc.

MediaRecorder video capture advantages and disadvantages

Advantages: easy to use, get is encoded and encapsulated sound and video files, can be used directly.

Disadvantages: You can’t get the raw data, so you can’t add your own processing to the raw data.

Camera realizes the steps of video capture

  1. Create a Surface for input using MediaCodec
  2. Create an OpenGL environment by using the camera preview context EGL, and create an EGLSuface based on the Surface obtained above.
  3. By camera preview binding texture ID, texture rendering.
  4. Exchange data and let the data enter the new Surface. Use AudioReocod for sound collection
  5. Through Mediacodec coding for H264, AAC.
  6. Data is encapsulated as MP4 through Media Mx er.

4. Video processing

After the video or audio is collected, the original data is obtained. In order to enhance some field effects or add some extra effects, we generally process the data before coding and compression, such as video beauty and voice changing operations.

1. Audio processing

The original stream of audio can be processed, such as noise reduction, echo, and various filter effects.

2. Video processing

At present, Douyin and Meitu have provided many video filters for shooting and video processing, as well as various stickers, scenes, face recognition, special effects and watermarking.

Beautifying and adding special effects to videos are mostly handled by OpenGL. Android has a GLSurfaceView, which is similar to a SurfaceView but can be rendered using Renderer. The SurfaceTexture can be generated using OpenGL, and the SurfaceTexture can be generated using the Id of the texture, and the SurfaceTexture can be delivered to the Camera. Finally, the texture can be used to connect the Camera preview image to OpenGL. So you can do a bunch of things with OpenGL.

The whole process is nothing more than generating a new texture from the Camera preview using the FBO technique in OpenGL, and then drawing the new texture with onDrawFrame() in the Renderer. Adding watermarking means first converting an image to a texture and then using OpenGL to draw. Adding dynamic pendant effects is more complex, first according to the current preview image algorithm analysis and recognition of human face corresponding parts, and then draw corresponding images in each corresponding part, the realization of the whole process has a certain difficulty, face recognition technology currently has OpenCV, Dlib, MTCNN and so on.

A variety of simple effects for beauty and video adding can also be implemented based on the GPUImage framework.

3. The texture

A texture is a two-dimensional picture or pictures of an object’s surface. It is also called a texture. Textures can make objects look more realistic when they are mapped to the surface in a specific way. In the current popular graphics system, texture rendering has become an essential rendering method. Texture mapping can be understood as the color of pixels applied to the surface of an object. In the real world, textures represent the colors, patterns, and tactile features of an object. A texture represents only a colored pattern on the surface of an object; it does not change the geometry of the object.

4. OpenGL ES

OpenGL ES is a mobile phone, PDA and game host embedded devices such as 3D (2d also includes) graphics processing API, of course, is used for embedded devices on the graphics processing, OpenGL ES powerful rendering ability makes it a good choice for us in embedded devices for graphics processing. The scenarios we often use are:

Image processing. Such as picture tone conversion, beauty and so on. Camera preview effect processing. For example, beauty camera, spoof camera and so on. Video processing. Camera preview effect processing can, this naturally is not a problem. 3 d games. Such as temple run, urban racing, etc.

5. Audio and video coding and packaging

1. Why encode audio and video?

The raw data of audio and video is huge and difficult to store and transfer. To solve the storage and transmission of audio and video data, or for encryption and so on. These data need to be compressed, audio and video data compression technology is audio and video coding. The purpose of coding is to get the maximum compression in the case of minimum image or audio information loss, decoding is relative coding, its purpose is to restore the original image or audio information to the maximum extent. The meaning of codec is to facilitate data transmission and storage.

2. Types of codec (hardware coding, software coding)

  • Soft coding: the use of CPU encoding, direct, simple, easy to adjust parameters, easy to upgrade, but the CPU load, performance is lower than hard coding, low bit rate quality is usually better than hard coding.

  • Hard encoding: Uses non-CPU encoding, such as graphics card GPU, dedicated DSP, FPGA, and ASIC

With high performance and low bit rate, the quality is usually lower than that of soft encoder. However, some products transplant excellent soft coding algorithm (such as X264) on GPU hardware platform, and the quality is basically the same as that of soft coding.

  • Soft decoding, refers to the use of CPU computing power to decode, usually if the CPU is not very strong when the decoding speed will be relatively slow, second, the mobile phone may appear heat phenomenon. The advantage is that there is good compatibility due to the use of uniform algorithms.

  • Hard decoding refers to the use of a special decoding chip on the phone to speed up decoding. Usually the decoding speed of hard decoding will be much faster, but because hard decoding is implemented by various manufacturers, the quality is uneven, it is very prone to compatibility problems.

Before Android 4.1, there was no hard codec API, so open source libraries such as the famous FFMpeg were used to implement soft codec. Under normal circumstances, the speed of hard coding is faster than that of software coding in the same platform and hardware environment. Soft coding uses CPU for calculation, which will consume some computing efficiency of APP. On Android4.1 and above, MediaCodec can be used to access the underlying media codecs to support hard encoding/hard decoding.

There’s a MediaCodec class in the Android API that’s hard coded, efficient and easy, and there’s a MediaMuxer class in the Android API that combines audio and video. But MediaCodec uses an API with a minimum of 16 and MediaMuxer with a minimum of 18. If soft coding is used, the API restrictions become less severe.

It is recommended that Android 4.1 use hard coding above, soft coding below, and iOS use all hard coding. If play decoding is used, no matter android or iOS, soft decoding scheme is used. Although this inevitably sacrifices power consumption, it has better performance in some details, strong controllability, strong compatibility and fewer errors.

3. Feasible scheme for video coding and encapsulation

  • The first is known as FFMPEG, ffMPEG transplant to anroid platform, compiled into so file, called by JNI, can achieve audio and video separation, cutting, splicing, subtitle, filter and other functions.

  • The second is the MediaCodec framework, which calls the StageFright library at the bottom of the MediaCodec framework. The StageFright library is encapsulated in the Android system by default.

  • Third, if you’re just mixing audio and video, mp4Parser uses the MediaCodec class to encode and compress video to H.264, audio to AAC, and audio to MP4 using MediaMuxer.

4. Advantages and disadvantages of the feasible scheme

How many functions:

Ffmpeg is no doubt the first, he set video codec, video filter, streaming media push streaming, audio various special effects and so on, basically you can think of functions in it. The second is Android’s own son, MediaCodec. MediaCodec covers the entire process of audio and video demultiplexing, audio decoding, video decoding, audio encoding, video encoding, and audio and video merging. Compared to FFMPEG, MediaCodec is closer to the underlying hardware. The plan if you want to achieve the function of video filters, subtitles, stitching, etc, the need to cooperate with OpenGL ES, in addition, audio and video stitching, want to consider different audio sample rate of resampling, audio heavy use problem, need to understand Fourier transform correlation of discrete signal transformation method, if you want to realize audio effects, Such as sound change, equalizer, also need to understand the above signal transformation method. Therefore, few companies will adopt it. Mp4praser, can realize audio and video coding and decoding and editing.

Learning threshold:

When it comes to video transcoding, text, image effects, etc., FFMPEG and MediaCodec are on par, and MP4Parser is the lowest (though mp4Parser doesn’t have much data, which is not necessarily true). When it comes to splicing video, making audio changes, and equalizer effects, MediaCodec is the most difficult, because you need to start from the bottom up.

Operating efficiency:

MediaCodec is the fastest, FFMPEG is the slowest (note that ffMPEG has integrated MediaCodec since May 2017 and is no longer slow), and MP4Parser is the slowest.

Stability:

MediaCodec and FFMPEG are pretty much on par with each other, and MP4Parser can lag on low-spec machines.

Packing takes space:

The best domestic FFMPEG hard solution hard program, so file in 10+M,MediaCodec because it is pure Java code, take up space is easy to do hundreds of K or even dozens of K. Mp4parser is also pure Java, and the development package is also very small.

5. FFmpeg

Ffmpeg is a famous video codec scheme based on C language. It has very powerful functions including video capture function, video format conversion, video capture, video watermarking and so on. There are also a lot of domestic companies will ffMPEG to iOS and Android platforms for video processing, such as Meipai, Miaopai and so on. Most of the current video SDK encapsulates FFMPEG to transcode, compress and cut the video. The advantage is that FFMPEG has been relatively mature and supports more video formats. But the disadvantages are:

  • Slow speed, using CPU to perform video data processing is soft decoding, efficiency is not high;
  • Increase the size of the package. Generally, a good SDK (such as Ali Cloud short video SDK) is about 20m. After such SDK is integrated into the application, the size of the application will be affected to a certain extent.

FFmpeg in essence can be seen as a collection of media processing tools, including a lot of media file processing tools, such as media file format parsing tools, codecs, etc., these tools are actually a library, and FFmpeg command line program is actually a package of these libraries, These libraries are also used to call command line programs. Some of these libraries are compile-time optional, and FFmpeg also supports external libraries such as X264 and MediaCodec. FFmpeg provides a lot of codecs, and its media operation is also very rich, so it can support a lot of media types, while a lot of processing functions have been provided by FFmpeg, users only need to call, so many editing processing functions can be relatively simple to complete the development. Application scenario: Multi-platform (such as mobile phones from different chip manufacturers), short time camera.

6. MediaCodec

MediaCodec offers a relatively simple set of functions, mostly codec related functions. To the whole video transcoding process for example, roughly need several steps: unsealing -> decoding -> filter processing operations -> encoding -> encapsulation, MediaCodec only provides codec function, and other functions require other components, such as MediaExtractor and MediaMuxer to complete. But MediaCodec provides hardware codec functionality at codec time, and the benefits are significant, with high efficiency and a significant reduction in CPU usage. Without the use of hardware codec, many of the transcoding process is unbearably long, and it is simply unusable on the APP. After all, a very short video, transcoding to several minutes, hot also serious, experience is certainly not. The disadvantage of MediaCodec is that it depends on the device to some extent. Since the hard decoding of MediaCodec is actually provided by the manufacturer, and the hardware of Android devices is very different from each other, the implementation of the hard decoding is naturally different. As a result, the same program can run normally on some devices. On other devices, problems may occur and you need to provide compatibility support yourself.

Application scenario: There is a fixed hardware solution, no need to transplant (such as smart furniture products), requiring a long time camera.

7. FFmpeg versus MediaCodec

A simple analogy: FFmpeg is like a toolbox, while MediaCodec is like a powerful but relatively limited and inflexible tool.

  • 1.FFmpeg also has support for MediaCodec. After compiling a suitable library, MediaCodec can be called from FFmpeg’s API, but only with decoding capabilities.
  • 2.MediaCodec does not only stand for hard codec, it can actually be regarded as a service. Manufacturers pre-register their codec schemes with the service, and users call the corresponding codec through the service to complete the task when needed. MediaCodec supports hardware codec and software codec, and you can choose the codec you want to use.
  • 3.FFmpeg uses MediaCodec in a similar way to JAVA calls.FFmpeg uses JNI callXXmethod to call MediaCodec methods. But FFmpeg encapsulates MediaCodec’s operations so that MediaCodec can be called according to FFmpeg’s codec flow.
  • 4.MediaCodec is not Codec per se, it has Codec capability by calling the underlying Codec component.

Open source solutions

There are many ffMPEg-based free software solutions on github.com, such as EpMedia, hard software solutions have not been seen. Commercial charging scheme interesting shot, the United States and so on. But these business plans are a little expensive on an annual basis.

There are free open source solutions based on MediaCodec, such as M4M, VideoTranscoder, etc. However, these open source solutions, seemingly powerful, will encounter many pits in actual use, only suitable for the principle of MediaCodec research.

6. Audio and video decoding

The function of protocol resolution is to parse the data of the streaming media protocol into the corresponding encapsulated data of the standard format. Various streaming protocols, such as HTTP, RTMP, or MMS, are often used when video and audio are transmitted over the network. These protocols transmit some signaling data as well as video and audio data. These signaling data include control of playback (play, pause, stop), or description of network state, etc. During protocol decompression, signaling data is removed and only audio and video data is retained. For example, data transmitted using RTMP is output in FLV format after protocol decompression.

1. The decapsulation

The function of decapsulation is to separate the input encapsulated data into audio stream compressed coding data and video stream compressed coding data. There are many types of packaging formats, such as MP4, MKV, RMVB, TS, FLV, AVI and so on. Its function is to compress and encode the video data and audio data in a certain format together. For example, the data in FLV format, after unencapsulation operation, will output h.264 encoded video stream and AAC encoded audio stream.

2. The decoding

The function of decoding is to compress the video/audio encoded data and decode it into uncompressed video/audio raw data. Audio compression coding standards include AAC, MP3, AC-3 and so on, and video compression coding standards include H.264, MPEG2, VC-1 and so on. Decoding is the most important and complex part of the whole system. Through decoding, the compressed encoded video data output into uncompressed color data, such as YUV420P, RGB and so on; Compressed encoded audio data is output as uncompressed audio sampling data, such as PCM data.

3. Audio synchronization

The function of visual and audio synchronization is to synchronize the decoded video and audio data according to the parameter information obtained in the process of unsealing module, and send the video and audio data to the system’s graphics card and sound card for playback.

4. Hard and soft decoding

Hard solution

Literally decoding with hardware. Decode hd video through video acceleration function of graphics card. It can be understood that there is a special circuit board for video decoding, which depends on GPU. Call GPU special module code to decode, reduce CPU computation. Graphics card core GPU has a unique calculation method, decoding efficiency is very high, which can not only reduce the burden of CPU, but also has the characteristics of low power consumption, less heating.

However, due to the late start of hard decoding, the software and driver support for it is very low. Basically, what kind of module is built into hard decoding, what kind of video will be decoded. In the face of various video coding styles on the Internet, the compatibility is not good. In addition, hard decoding filter, subtitles, picture quality are not ideal.

For Android devices, the most commonly used chips are Qualcomm, Haisi and Mediatek. Most of these chips integrate many functions, including CPU, GUP, DSP and ISP, including video decoding and audio decoding. Using hardware decoders in Android, MediaCodec can be used directly, although MediaPlayer is also a hardware decoder, but it is too wrapped and supports very few protocols. MediaCodec is very scalable. We can customize the hardware decoding based on the streaming protocol and the device hardware itself, which means the player is Google ExoPlayer.

Soft solution

It literally means decoding with software. But it’s really hardware. That hardware is the CPU. In the process of soft decoding, a large amount of video information needs to be calculated, so the CPU performance is very high. Especially for high resolution and large bit rate video, the huge amount of computation will cause low conversion efficiency and high calorific value. Our most common open source library for video soft decoding is FFmpeg. At present, the open source player based on FFmpeg has the IJkPlayer of B station. However, the soft decoding does not need too much hardware support, and the compatibility is very high. Even if a new video encoding format appears, as long as the corresponding decoder file is installed, it can be smoothly played. And soft decoding has a rich filter, subtitles, screen processing optimization and other effects, only your CPU is strong enough, can achieve better picture effect.

5. Summary of hard and soft solutions

In the case that Android devices support hardware decoding, the hardware decoding of Android devices is preferentially used to reduce CPU usage and save power. In the case of Android device hardware solution does not support the use of soft decoding, no matter what, the video can at least play, with better adaptability, but increased CPU consumption, more power consumption, the combination of soft and hard is the best.

7. Android platform audio and video decoder player selection

Playing video in Android is easy, you can use MediaPlayer+SurfaceView or VideoView to set a video file path to play. However, if you want to process audio and video again, such as adding watermarks in the process of video playback, or transcoding the video, you need to encode the video.

1. MediaPlayer

In Android system, there are native implementation of MediaPlayer for video player, as well as the VideoView that encapsulates MediaPlayer and SurfaceView together. Both of them only use hard playback, and basically only support local and HTTP video playback, with poor scalability. Only suitable for the simplest video playback needs.

Android native player, less support formats: support MP4, 3GP, resources support MKV, easy to use, but poor expansion. No need to integrate third-party libraries, no APK volume.

2. ijkplayer

Bilibili open source based on ffMPEG development of a player, the function is more powerful, if only to use it to play, integration is relatively simple, and the use of MediaPlayer is about the same, but to customize the requirements, there is a certain threshold height. Support hard and soft codec, support double speed playback, can be customized integration required functions, integration footprint is also small.

3. ExoPlayer (basically from official documentation translation)

Google-made, recommended player. Same as RecyclerView, very high degree of customization. In addition, DASH and HLS live streaming protocols are supported, but only hard code is supported. If the project only needs to support H264 format video playback, and the streaming protocol is relatively common (such as HTTP, HLS), ExoPlayer customization is also a good choice.

Advantages:

  • Supports dynamic adaptive streaming HTTP(DASH) and smooth streaming, any video format currently supported by MediaPlayer (it also supports HTTP Live (HLS),MP4,MP3,WebM,M4A,MPEG-TS and AAC).
  • Support for advanced HLS features, such as proper handling of ext-X-Discontinuity tags.
  • The ability to seamlessly merge, connect, and loop media.
  • Supports custom usage scenarios. ExoPlayer is designed specifically for this purpose. It allows many components to be replaced with custom implementations. It provides low level media apis such as MediaCodec, AudioTrack, and MediaDrm that can be used to build custom media playback solutions. Integrates in a third-party dependent manner and can be upgraded with applications. Fewer adaptation issues, fewer device-specific issues and fewer behavior changes across different devices and versions of Android can be plugged into FFMPEG.

Disadvantages:

  • More power consumption than MediaPlayer: But Android Q can reduce power consumption by developing audio Affload.
  • The minimum API requirement is 16. Earlier versions do not automatically check the media format to be played. Later versions support this.

Characteristics of 4.

Function Support

function MediaPlayer IjkPlayer ExoPlayer
Adjust display ratio support support support
Slide to adjust playback progress, sound and brightness support support support
Double-click to play and pause support support support
Gravity sensor automatically enters/exits full screen and manually enters/exits full screen support support support
Times the speed of playback Does not support support support
Video screenshots (not supported when using SurfaceView, TextureView is used by default) support support support
List small window global hover play support support support
Play a list of videos in succession support support support
advertising support support support
Broadcast while caching, usedAndroidVideoCache implementation support support support
Barrage, useDanmakuFlameMaster implementation support support support
Multiplex player plays simultaneously support support support
Pure play without any control over the UI support support support
Android 8.0 Picture in picture support support support
Seamless playback support support support
trill support support support

Protocol/format support (only common formats/protocols are listed)

Protocol/Format MediaPlayer IjkPlayer ExoPlayer
https support support support
rtsp Does not support support Does not support
rtmp Does not support support support
ffconcat Does not support support Does not support
File (Local video) support support support
Android. The resource (raw) support support support
Videos in Assets support support support
mp4 support support support
m3u8 support support support
flv support support Can play, cannot seek the progress

5. Advice

  • 1. If you have a simple scene to play, such as a small video scene, all mp4 video (H264 / AAC format), ExoPlayer is recommended. There is nothing better than this.
  • 2. It involves various forms of video interaction, such as live broadcast and long video, etc., or it is recommended to introduce iJkPlayer and other software;
  • 3. If the Android platform does not mind the package size, it is recommended to use VLC, VLC updates frequently, official maintenance is quite powerful; If you are concerned about the package size, iJkPlayer is recommended. The current disadvantage of iJkPlayer is that the maintenance is not so frequent.
  • 4. In the long run, many domestic players will gradually evolve from iJkPlayer, remove the codes not suitable for their own products, introduce modules they need, and gradually become their own players.

6. Open source AUDIO and video player UI scheme

GSYVideoPlayer DKVideoPlayer

Reference article: Android audio capture (original audio) audio and video – Android audio and video involved in the technology of Android audio and video coding base a wechat Android video coding climbed over those pits Android video technology exploration journey: MediaCodec-1- Introduction to Android video processing MediaCodec-2- Use Android Video Processing Mediacodec-3 – Play video Android Video processing Mediacodec-4 – Video frame transfer Image Android Video processing Mediacodec-5 – Generate MP4 video Android audio & Video development basics 4–FFmpeg introduction Ijkplayer, ExoPlayer, VLC player comprehensive comparison of Ijkplayer vod and live video problem solving and optimization, DevYK Cain_Huang Android OpenGL – – SurfaceView, TextureView GlSurfaceView display the camera preview (Demo) development of cat Android | advanced audio-visual direction route and resource collection