AVFoundation

  • Capturing sessions: AVCaptureSession.

  • Capture device: AVCaptureDevice.

  • Capture device input: AVCaptureDevicelnput

  • Capture device output: AVCaptureOutput abstract class.

    • AVCaptureStillImageOutput
    • AVCaputureMovieFileOutput
    • AVCaputureAudioDataOutput
    • AVCaputureVideoDataOutput
  • Capture connections: AVCaptureConnection

  • Capture the preview: AVCaptureVideoPreviewLayer

Set up the Session

  • Initialize the
  • Set resolution
  • Configure the input device (note the conversion to AVCaptureDevicelnput)
  • Configure input devices including audio input and video input
  • Configure output (still image output, video file output)
  • When adding input and output to session. Be sure to check whether you can add it. The reason: the camera does not belong to any APP, it is a public device.

In i0S, session is often used. For example, if we use any hardware device, we need to use the corresponding session. The microphone needs to use AudioSession, the Camera needs to use AVCaptureSession, and the coding needs to use VTCompressionSession. When decoding, should use VTDecompressionSessionRef.

Capture device input

Collect video and audio

  • Use the iOS native framework avfoundation.framework

Video filter processing

  • Use the iOS native framework CoreImage.framework
  • Use the third-party framework gpuImage.framework

Comparison between CoreImage and GPUImage framework:

In actual project development, developers are more inclined to use GPUImage framework.

First of all, it has the same performance as the native framework provided by iOS. Secondly, it is more convenient to use than the native iOS framework. Finally, the most important GPUImage framework is open source. If you want to learn the GPUImage framework, you are advised to learn OpenGL ES. In fact, the packaging and thinking of GPUImage are based on OpenGL ES.

Audio input

/ * * * * * * * * * * * * * * * * * * * * audio related to * * * * * * * * * * / / @ / audio equipment property (nonatomic, strong) AVCaptureDeviceInput * audioInputDevice; Property (nonatomic, strong) AVCaptureAudioDataOutput *audioDataOutput; // capture audioConnection @property (nonatomic, strong) AVCaptureConnection *audioConnection; #pragma mark-init Audio/video - (void)setupAudio{// defaultDeviceWithMediaType:AVMediaTypeAudio]; / / will audioDevice - > AVCaptureDeviceInput object self. AudioInputDevice = [AVCaptureDeviceInput deviceInputWithDevice: audioDevice  error:nil]; Self. audioDataOutput = [[AVCaptureAudioDataOutput alloc] init]; [self.audioDataOutput setSampleBufferDelegate:self queue:self.captureQueue]; [self.captureSession beginConfiguration]; if ([self.captureSession canAddInput:self.audioInputDevice]) { [self.captureSession addInput:self.audioInputDevice]; } if([self.captureSession canAddOutput:self.audioDataOutput]){ [self.captureSession addOutput:self.audioDataOutput]; } [self.captureSession commitConfiguration]; self.audioConnection = [self.audioDataOutput connectionWithMediaType:AVMediaTypeAudio]; }Copy the code

Video input

/ * * * * * * * * * * * * * * * * * * * * video related to * * * * * * * * * * / / / the current use of video equipment @ property (nonatomic, weak) AVCaptureDeviceInput * videoInputDevice; @property (nonatomic, strong) AVCaptureDeviceInput *frontCamera; @property (nonatomic, strong) AVCaptureDeviceInput *backCamera; @property (nonatomic, strong) AVCaptureVideoDataOutput *videoDataOutput; // capture videoConnection @property (nonatomic, strong) AVCaptureConnection *videoConnection; - (void) setupVideo {/ / all video equipment NSArray * videoDevices = [AVCaptureDevice devicesWithMediaType: AVMediaTypeVideo]; / / a front-facing camera self. FrontCamera = [AVCaptureDeviceInput deviceInputWithDevice: videoDevices. LastObject error: nil]; self.backCamera = [AVCaptureDeviceInput deviceInputWithDevice:videoDevices.firstObject error:nil]; // Set the current device to self. VideoInputDevice = self. BackCamera; Self. videoDataOutput = [[AVCaptureVideoDataOutput alloc] init]; [self.videoDataOutput setSampleBufferDelegate:self queue:self.captureQueue]; [self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES]; / / kCVPixelBufferPixelFormatTypeKey it specified pixel output formats, The success of this parameter will directly affect the generated image / / kCVPixelFormatType_420YpCbCr8BiPlanarFullRange YUV420 format. [self. VideoDataOutput setVideoSettings: @ {  (__bridge NSString *)kCVPixelBufferPixelFormatTypeKey:@(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) }]; [self.captureSession beginConfiguration]; if ([self.captureSession canAddInput:self.videoInputDevice]) { [self.captureSession addInput:self.videoInputDevice]; } if([self.captureSession canAddOutput:self.videoDataOutput]){ [self.captureSession addOutput:self.videoDataOutput]; } // Resolution [self setVideoPreset]; [self.captureSession commitConfiguration]; / / the code below will only be effective after the commit the self. The videoConnection = [self. VideoDataOutput connectionWithMediaType: AVMediaTypeVideo]; / / set the video output direction of the self. The videoConnection. VideoOrientation = AVCaptureVideoOrientationPortrait; // FPS /* FPS is defined in the graphics field as the number of frames per second that an image transmits. FPS measures the amount of information used to save and display moving video. The more frames per second, the smoother the action will be. In general, 30 is the minimum to avoid bad movement. Some computer video formats provide only 15 frames per second. */ [self updateFps:25]; // Set preview [self setupPreviewLayer]; }Copy the code

Capture device output

Capture of video content. When setting up a capture session, add an output called AVCaptureMovieFileOutput. This defines methods to capture QuickTime movies to disk. Most of the core functionality of this class is inherited from the superclass AVCaptureFileOutput. This superclass defines a number of utility functions. Such as recording to a maximum time limit or recording to a specific file size. It can also be configured to reserve the minimum available disk space. This is important when recording video on mobile devices with limited storage space.

Usually when a QuickTime movie is ready to be published, the metadata in the movie header is at the beginning of the file. This allows the video player to quickly read the header containing information to determine the content, structure, and location of the multiple samples it contains.

However, when recording a QuickTime movie, the header cannot be created until all the samples have been captured. When recording ends, create the header data and attach it to the end of the file.

There is a problem with putting the process of creating headers after all the movie samples have been captured, especially on mobile devices. If there is a crash or other interruption, such as a phone call, the movie header will not be written properly and an unreadable movie file will be generated on disk. One of the core features of AVCaptureMovieFileOutput is to fragment QuickTime movies.

When recording begins, a minimal header is written at the front of the file, and as the recording progresses, the fragment is written periodically to create the complete header. By default, a fragment is written every 10 seconds, although this interval can be changed by modifying the movieFragentlnterval property that captures the output from the device. Write fragments in a way that creates a complete QuickTime movie header step by step. This ensures that in the event of an application crash or interruption, the movie will still end up with the best written fragment. We’ll use the default interval for this demo, but if you can change this value in your APP.

The codec

What is coding?

Encoding is the recording of sampled and quantified data in a certain format.

What’s the difference between soft coding and hard coding?

  • Hard coding: Encoding using a non-CPU, such as GPU chip processing.

With high performance and low bit rate, the quality is usually lower than that of soft encoder, but some products have transplanted excellent soft coding algorithm (such as X264) on GPU hardware platform, and the quality is basically the same as that of soft coding.

  • Soft coding: Use CPU to perform coding calculations.

Direct, simple, easy to adjust parameters, easy to upgrade, but the CPU load, performance is lower than hard coding, low bit rate quality is usually better than hard coding.

Hard coded

  • Video :VideoToolBox framework
  • Audio :AudioToolBox framework

Soft coding

  • Video: use FFmpeg, X264 algorithm to encode YUV/RGB video data into H264
  • Audio: Use FDK_AAC to convert audio data PCM to AAC

video

VideoToolBox hard-coded

What is video encoding and decoding?

Why can video data be compressed? Because video data is redundant. Generally speaking, for example, in a video, the content of the previous second screen is very similar to the current screen, so can we not save all the data of the two seconds, just keep a complete screen, and record the changes in the next screen? When we take the video to play, we will click on this full image and other changes to restore other images. The process of recording and saving different pictures is data encoding, and the process of recovering pictures according to different places is data decoding.

H264 is a video coding standard. The H264 protocol defines three types of frames:

  • I frame: a fully coded frame, also called a key frame
  • P frame: a frame that refers to the previous I frame and contains only the encoding of the difference part
  • B frame: The reference frame before and after the frame code is called B frame

What exactly is a video?

Content elements:
  • Image (Image)
  • Audio (Audio)
  • Meta information (Metadata)
Encoding format:
  • Video: H264
  • Audio: AAC
Container (Video encapsulation format)
  • Encapsulation format: It is to compress the encoded video data and audio data into a file according to a certain format. This file can be called a container. Of course it’s just a shell.
  • We usually store not only audio data and video data, but also metadata for video synchronization. For example, subtitles. These kinds of data will be handled by different programs, but they are bound together when they are transferred and stored.

Common video container formats:

  • AVI: Was introduced to counter quickTime format (MOV), can only support fixed CBR constant fixed bit rate sound files
  • MOV: Quicktime encapsulation
  • WMV: Launched by Microsoft as a market competition
  • MKV: universal wrapper, good compatibility and cross-platform, error correction, can take external subtitles
  • FLV: This encapsulation method can protect the original address and is not easy to be downloaded. At present, some video sharing websites adopt this encapsulation method
  • MP4: Mainly used for MPEG4 encapsulation, mainly used on mobile phones.

VideoToolbox workflow

VideoToolbox is based on Core Foundation library functions, C language

  1. createsession-> Set encoding parameters -> Start encoding -> Loop input source data (YUVType of data, directly from the camera) -> get encodedH264Data -> End encoding
  2. The H264 file

Coded inputs and outputs

In the AVFoundation callback method, it provides our data which is actually CVPixelBuffer. CVImageBufferRef is another definition of CVPixelBuffer. The CVImageBuffer returned by the Camera is a CVPixelBuffer. The data stored in CMSampleBuffer encoded by VideoToolBox output is a reference to CMBlockBuffer.

decoding

Decoding the train of thought

Since NALU decodes one by one in real time! First, you have to parse the data! Analyze NALU data. The first four bytes are the start bit! Mark the beginning of a NALU! Start at number 5 to get it! The NALU data type comes in fifth place. To get the 5th bit of data, convert it to decimal and determine its data type according to the table (the 5th byte is the data type, converted to decimal, 7 is SPS, 8 is PPS, 5 is IDR (I frame) information)! Determine the data type before sending the NALU into the decoder. SPS/PPS can be obtained, is not the need to decode!

  1. NALU Unit I/P/B…
    • Since NALU decodes one by one in real time! First, you have to parse the data! Analyze NALU data. The first four bytes are the start bit! Mark the beginning of a NALU! Start at number 5 to get it! The NALU data type comes in fifth place.
    • To get the fifth digit, convert it to decimal and determine its data type based on the table!
    • Determine the data type before sending the NALU into the decoder. SPS/PPS can be obtained, is not the need to decode!
    • CVPixelBufferRef saves decoded data or unencoded data
  2. Initialize the decoder
  3. The parsed H264 NALU Unit is entered into the decoder
  4. The decoder completes the callback and outputs the decoded data
  5. Decoded data display (OpenGL ES)

Decodes three core functions:

  1. Create a session, VTDecompressionsessionCreate
  2. Decode a frame, VTDecompressionSessionDecodeFrame
  3. Destruction of decoding session, VTDecompressionSessionInvalidate

Principle analysis:

H264 source stream -> NALU.

  • I frame: a complete video frame is reserved. Decode key!
  • P frame: previous reference frame. Differential data. Decoding needs to depend on I frames
  • B frame: bidirectional reference frame. Both I and P frames are required for decoding!

If the I frame in the H264 stream is wrong/lost, it will result in error transmission, and the P/B frame alone can not complete the decoding work! The phenomenon of flower screen.

audio

PCM(Pulse code modulation)

Analog -> digital signal data PCM data

  1. The sampling
  2. quantitative
  3. coding

Principles of Audio Compression

Digital audio signals contain components that have negligible impact on people’s perception of information and become redundant

  1. Temporal redundancy
  2. Frequency domain redundancy
  3. Auditory redundancy (human can hear the sound model frequency of 20HZ-20khz, outside this frequency can be compressed out, this kind of redundancy is called auditory redundancy)
  • Eliminating redundant data (lossy coding)

    Because of the acquisition process, the acquisition of various frequencies of sound! We can discard that part of the data that the human ear can’t hear. Greatly reduced data storage! Can be killed directly from the source data!

  • Huffman lossless coding:

    Except for the part of the human ear that does not hear the sound compression, the rest of the sound data is preserved as is! The compressed data can be completely restored! (Short code high frequency, long code low frequency)

Audio redundancy information

  • The main method of compression: the removal of redundant audio information! Data outside the range of hearing of the human ear, obscured audio signals.
  • Masking effect: the hearing of one weak sound is affected by another strong sound!
  • Signal: frequency domain masking and time domain masking.

What can AudioToolbox do?

In the case of AAC coding, the source format is the collected PCM data, and the destination format is AAC.

  1. Set up the encoder (CODEC) and start recording
  2. Collect PCM data and pass it to encoder
  3. After encoding is complete, callback is called to write to the file