Learn in 1 hour: The simplest iOS Live Stream push (7) H264 / AAC hard coding

The most simple iOS push stream code, video capture, soft coding (FAAC, X264), hard coding (AAC, H264), beauty, FLV coding, RTMP protocol, updated code parsing, you want to learn the knowledge here, willing to understand live technology students come to see!!

Source: https://github.com/hardman/AWLive

We have described how to get audio and video data from hardware (PCM, NV12).

But the video formats we need are AAC and H264.

Now let’s see how to encode PCM to AAC and NV12 data to H264.

Encoding is divided into soft coding and hard coding.

Hard coding is provided by the system, and the audio and video coding is processed by the hardware equipment specially embedded in the system, and the main computing operations are performed in the corresponding hardware. Hard coding is characterized by high speed and low CPU usage, but it is not flexible enough to use only certain functions.

Soft coding refers to the encoding of data through code calculation, the main calculation operations in the CPU. Soft coding is characterized by flexibility, diversity, rich and extensible functions, but more CPU consumption.

In the code, the encoder is obtained through the AWEncoderManager.

AWENcoderManager is a factory that specifies the encoder type through audioEncoderType and videoEncoderType.

Encoders fall into two categories, AWAudioEncoder and AWVideoEncoder.

Audio and video encoders are divided into hard coding (in the HW directory) and soft coding (in the SW directory).

Therefore, the coding part mainly consists of four files: hard coding H264 (AWHWH264Encoder), hard coding AAC (AWHWAACEncoder), soft coding AAC (AWSWFaacEncoder), soft coding H264 (AWSWX264Encoder).

Hard-coded H264

First step, start the hard encoder

-(void)open{// create video encode session // create video encode session // pass the video width and height, encoding type: kCMVideoCodecType_H264 VtCompressionSessionCallback, this callback function for coding results callback, after the success of the coding, data into this callback. / / (__bridge void * _Nullable) (self) : this parameter will be introduced to vtCompressionSessionCallback intact, this parameter for coding the callback the only parameter of the communication with the outside world. // &_vensession, c can assign values to incoming arguments. Inside the function, memory is allocated and _vEnSession is initialized. OSStatus status = VTCompressionSessionCreate(NULL, (int32_t)(self.videoConfig.pushStreamWidth), (int32_t)self.videoConfig.pushStreamHeight, kCMVideoCodecType_H264, NULL, NULL, NULL, vtCompressionSessionCallback, (__bridge void * _Nullable)(self), &_vEnSession);if(status == noErr) {// Set the parameter // ProfileLevel, the h264 protocol level, uses different ProfileLevel for different sharpnesses. VTSessionSetProperty(_vEnSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel); / / set the bit rate VTSessionSetProperty (_vEnSession kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(self.videoConfig.bitrate)); / / set the real-time encoding VTSessionSetProperty (_vEnSession kVTCompressionPropertyKey_RealTime, kCFBooleanTrue); // Turn off the rearrangement of frames, because with B frames, the encoding order may be different from the display order. This parameter turns off frame B. VTSessionSetProperty(_vEnSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanFalse); // The maximum interval between keyframes. Keyframes are I frames. This indicates that the maximum interval between key frames is 2s. VTSessionSetProperty(_vEnSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(self.videoConfig.fps * 2)); // For B frame, P frame and I frame, please refer to: http://blog.csdn.net/abcjennifer/article/details/6577934 / / parameter is set to complete, ready to start, this initialization is complete, at any time to data, Code at any time the status = VTCompressionSessionPrepareToEncodeFrames (_vEnSession);if(status ! = noErr) { [self onErrorWithCode:AWEncoderErrorCodeVTSessionPrepareFailed des:@"Failed to hardcode vtSession prepare"]; }}else{
        [self onErrorWithCode:AWEncoderErrorCodeVTSessionCreateFailed des:@"Failed to hardcode vtSession creation"]; }}Copy the code

Step 2: Throw data to the encoder:

// yuvData is NV12 data obtained from the camera. -(aw_flv_video_tag *)encodeYUVDataToFlvTag:(NSData *)yuvData{if(! _vEnSession) {returnNULL; } //yuv becomes cvelBufferRef OSStatus status = noErr; / / video width size_t pixelWidth = self. VideoConfig. PushStreamWidth; / / video highly size_t pixelHeight = self. VideoConfig. PushStreamHeight; / / to put NV12 data in CVPixelBufferRef now, because of the hard coded VTCompressionSessionEncodeFrame function called main, this function does not accept the yuv data, but accept CVPixelBufferRef type. CVPixelBufferRef pixelBuf = NULL; / / initialize pixelBuf, data type is kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, this type of data format with NV12 format is the same. CVPixelBufferCreate(NULL, pixelWidth, pixelHeight, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, NULL, &pixelBuf); // Lock address, Lock data, should be multithreaded to prevent reentrant operations.if(CVPixelBufferLockBaseAddress(pixelBuf, 0) ! = kCVReturnSuccess){ [self onErrorWithCode:AWEncoderErrorCodeLockSampleBaseAddressFailed des:@"encode video lock base address failed"];
        returnNULL; Size_t y_size = pixelWidth * pixelHeight; size_t uv_size = y_size / 4; uint8_t *yuv_frame = (uint8_t *)yuvData.bytes; / / processing y frame uint8_t * y_frame = CVPixelBufferGetBaseAddressOfPlane (pixelBuf, 0); memcpy(y_frame, yuv_frame, y_size); uint8_t *uv_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuf, 1); memcpy(uv_frame, yuv_frame + y_size, uv_size * 2); Uint32_t ptsMs = self.manager.timestamp + 1; //self.vFrameCount++ * 1000.f / self.videoConfig.fps; CMTime pts = CMTimeMake(ptsMs, 1000); // Hard coding is mainly about this sentence. PixelBuf with NV12 data is sent to the hard encoder for encoding. status = VTCompressionSessionEncodeFrame(_vEnSession, pixelBuf, pts, kCMTimeInvalid, NULL, pixelBuf, NULL); . . }Copy the code

The third step is to retrieve h264 data through hard-coded callbacks

static void vtCompressionSessionCallback (void * CM_NULLABLE outputCallbackRefCon,
                                          void * CM_NULLABLE sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CM_NULLABLE CMSampleBufferRef sampleBuffer){// Use outputCallbackRefCon to get an object pointer to AWHWH264Encoder and send the encoded H264 data out. AWHWH264Encoder *encoder = (__bridge AWHWH264Encoder *)(outputCallbackRefCon); // Check whether the encoding succeededif(status ! = noErr) { dispatch_semaphore_signal(encoder.vSemaphore); [encoder onErrorWithCode:AWEncoderErrorCodeEncodeVideoFrameFailed des:@"encode video frame error 1"];
        return; } // Whether the data is completeif(! CMSampleBufferDataIsReady(sampleBuffer)) { dispatch_semaphore_signal(encoder.vSemaphore); [encoder onErrorWithCode:AWEncoderErrorCodeEncodeVideoFrameFailed des:@"encode video frame error 2"];
        return; } // Whether it is a keyframe, keyframe and non-keyframe should be clearly distinguished. Push flow should also be noted. BOOL isKeyFrame = ! CFDictionaryContainsKey( (CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer,true), 0)), kCMSampleAttachmentKey_NotSync); // SPS PSS are also part of H264. They can be considered as special H264 video frames, which hold some necessary information about H264 video. // H264 video is difficult to parse without this part of data. // For data processing, SPS PPS data can be placed as a normal H264 frame at the front of the H264 video stream. BOOL needSpsPps = NO;if(! encoder.spsPpsData) {if(isKeyFrame) {// Get avcC, this is the SPS and PPS data we want. // To save the data to a file, add [0 0 0 1] 4 bytes before the data and write it to the top of the H264 file. // If you push the stream, put this data into the FLV data area. CMFormatDescriptionRef sampleBufFormat = CMSampleBufferGetFormatDescription(sampleBuffer); NSDictionary *dict = (__bridge NSDictionary *)CMFormatDescriptionGetExtensions(sampleBufFormat); encoder.spsPpsData = dict[@"SampleDescriptionExtensionAtoms"] [@"avcC"]; } needSpsPps = YES; } / / obtain real video frame data CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer (sampleBuffer); size_t blockDataLen; uint8_t *blockData; status = CMBlockBufferGetDataPointer(blockBuffer, 0, NULL, &blockDataLen, (char **)&blockData);if(status == noErr) { size_t currReadPos = 0; // There are 2 frames at the beginning of the encoding, and the last frame is takenwhile(currReadPos < blockDataLen - 4) { uint32_t naluLen = 0; memcpy(&naluLen, blockData + currReadPos, 4); naluLen = CFSwapInt32BigToHost(naluLen); //naluData is a frame of H264 data. // To save the data to a file, add [0 0 0 1] four bytes before the data and write it to the H264 file in sequence. // If the stream is pushed, the data needs to be preceded by 4 bytes representing the length of the data. The data needs to be in big endian order. // For big-endian and small-endian modes, please refer to this website: http://blog.csdn.net/hackbuteer1/article/details/7722667 encoder.naluData = [NSData dataWithBytes:blockData + currReadPos + 4 length:naluLen]; currReadPos += 4 + naluLen; encoder.isKeyFrame = isKeyFrame; }}else{
        [encoder onErrorWithCode:AWEncoderErrorCodeEncodeGetH264DataFailed des:@"got h264 data failed"]; }... . }Copy the code

The fourth step, in fact, at this time the hard coding has ended, this step has nothing to do with the coding, the h264 data will be obtained, sent to the stream pusher.

-(aw_flv_video_tag *)encodeYUVDataToFlvTag:(NSData *)yuvData{ ... .if (status == noErr) {
        dispatch_semaphore_wait(self.vSemaphore, DISPATCH_TIME_FOREVER);
        if(_naluData) {// The hard coding succeeds. The data in _naluData is the H264 video frame. Uint32_t naluLen = (uint32_t) _naludata.length; // Small end to big end. Computers are generally small ends, and networks and files are generally large ends. Big endian to small endian is the same as the small endian to big endian algorithm, just byte order inversion. uint8_t naluLenArr[4] = {naluLen >> 24 & 0xff, naluLen >> 16 & 0xff, naluLen >> 8 & 0xff, naluLen & 0xff}; NSMutableData *mutableData = [NSMutableData dataWithBytes:naluLenArr length:4]; [mutableData appendData:_naluData]; // Combine h264 data into FLV tags and send them directly to the server. Aw_flv_video_tag *video_tag = aw_encoder_create_video_tag((int8_t *)mutableData.bytes, mutabledata. length, ptsMs, ptsMs, 0, self.isKeyFrame); // At this point, the encoding is complete, and the state is cleared. _naluData = nil; _isKeyFrame = NO; CVPixelBufferUnlockBaseAddress(pixelBuf, 0); CFRelease(pixelBuf);returnvideo_tag; }}else{
        [self onErrorWithCode:AWEncoderErrorCodeEncodeVideoFrameFailed des:@"encode video frame error"];
    }
    CVPixelBufferUnlockBaseAddress(pixelBuf, 0);
    
    CFRelease(pixelBuf);
    
    return NULL;
Copy the code

Step 5, close the encoder

// Never forget to close free resources. -(void)close{ dispatch_semaphore_signal(self.vSemaphore); VTCompressionSessionInvalidate(_vEnSession); _vEnSession = nil; self.naluData = nil; self.isKeyFrame = NO; self.spsPpsData = nil; }Copy the code

Hard-coded AAC

Hardcoded AAC logic is similar to H264.

First step, open the encoder

- (void) open {/ / create audio encode the converter is / / initializes a series of parameters of AAC encoder AudioStreamBasicDescription inputAudioDes = {. MFormatID = kAudioFormatLinearPCM, .mSampleRate = self.audioConfig.sampleRate, .mBitsPerChannel = (uint32_t)self.audioConfig.sampleSize, MBytesPerFrame = 2,// 2 bytes per frame. MBytesPerPacket = 2,// 1 frame per packet is also 2 bytes. MChannelsPerFrame = (uint32_t) self. AudioConfig. ChannelCount, / / a track number, pushing flow generally use mono / / the following flags set with reference to this article: http://www.mamicode.com/info-detail-986202.html .mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsNonInterleaved, .mReserved = 0 }; // Set the output format, Track number AudioStreamBasicDescription outputAudioDes = {. MChannelsPerFrame = (uint32_t) self. AudioConfig. ChannelCount, .mFormatID = kAudioFormatMPEG4AAC, 0 }; // Initialize _aConverter uint32_t outDesSize = sizeof(outputAudioDes); AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &outDesSize, &outputAudioDes); OSStatus status = AudioConverterNew(&inputAudioDes, &outputAudioDes, &_aConverter);if(status ! = noErr) { [self onErrorWithCode:AWEncoderErrorCodeCreateAudioConverterFailed des:@"Hard coded AAC creation failed"]; } // set the bitrate to uint32_t aBitrate = (uint32_t) self.audioconfig.bitrate; uint32_t aBitrateSize = sizeof(aBitrate); status = AudioConverterSetProperty(_aConverter, kAudioConverterEncodeBitRate, aBitrateSize, &aBitrate); Uint32_t aMaxOutput = 0; uint32_t aMaxOutputSize = sizeof(aMaxOutput); AudioConverterGetProperty(_aConverter, kAudioConverterPropertyMaximumOutputPacketSize, &aMaxOutputSize, &aMaxOutput); self.aMaxOutputFrameSize = aMaxOutput;if (aMaxOutput == 0) {
        [self onErrorWithCode:AWEncoderErrorCodeAudioConverterGetMaxFrameSizeFailed des:@AAC failed to get the maximum frame size]; }}Copy the code

The second step is to get the Audio Specific Config, which is a special FLV tag that stores some key data about the AAC used as a basis for parsing the audio frames. In RTMP, this frame must be sent before all audio frames.

- (aw_flv_audio_tag *) createAudioSpecificConfigFlvTag {/ / profile, said the protocol used uint8_t profile = kMPEG4Object_AAC_LC; // Uint8_t sampleRate = 4; // Channel information uint8_t chanCfg= 1; / / the above three information together, and become the 2 bytes uint8_t config1 = (profile < < 3) | ((sampleRate & 0 xe) > > 1); uint8_t config2 = ((sampleRate & 0x1) << 7) | (chanCfg< < 3); // Convert data to aw_data aw_data *config_data = NULL; data_writer.write_uint8(&config_data, config1); data_writer.write_uint8(&config_data, config2); Aw_flv_audio_tag * audio_specific_config_TAG = aw_encoder_create_audio_specific_config_TAG (config_data, &_faacConfig); free_aw_data(&config_data); // return to the caller, ready to sendreturn audio_specific_config_tag;
}
Copy the code

Step 3: When the audio data is obtained from the microphone, the data is delivered to the AAC encoder for encoding.

-(aw_flv_audio_tag *)encodePCMDataToFlvTag:(NSData *)pcmData{ self.curFramePcmData = pcmData; AudioBufferList outAudioBufferList = {0}; outAudioBufferList.mNumberBuffers = 1; outAudioBufferList.mBuffers[0].mNumberChannels = (uint32_t)self.audioConfig.channelCount; outAudioBufferList.mBuffers[0].mDataByteSize = self.aMaxOutputFrameSize; outAudioBufferList.mBuffers[0].mData = malloc(self.aMaxOutputFrameSize); uint32_t outputDataPacketSize = 1; AacEncodeInputDataProc is a callback function that synchronously populates PCM data in the callback. OSStatus status = AudioConverterFillComplexBuffer(_aConverter, aacEncodeInputDataProc, (__bridge void * _Nullable)(self), &outputDataPacketSize, &outAudioBufferList, NULL);if(status == noErr) {NSData *rawAAC = [NSData dataWithBytes: outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize]; // Timestamp (ms) = 1000 * number of samples per second/sample rate; self.manager.timestamp += 1024 * 1000 / self.audioConfig.sampleRate; // Get aAC data, convert it into FLV audio tag, and send it to the server.return aw_encoder_create_audio_tag((int8_t *)rawAAC.bytes, rawAAC.length, (uint32_t)self.manager.timestamp, &_faacConfig);
    }else{/ / coding errors [self onErrorWithCode: AWEncoderErrorCodeAudioEncoderFailed des: @"Aac coding error"];
    }
    
    returnNULL; } // A callback function in the system-specified format static OSStatus aacEncodeInputDataProc(AudioConverterRef)inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData){
    AWHWAACEncoder *hwAacEncoder = (__bridge AWHWAACEncoder *)inUserData; // Pass PCM data to encoderif (hwAacEncoder.curFramePcmData) {
        ioData->mBuffers[0].mData = (void *)hwAacEncoder.curFramePcmData.bytes;
        ioData->mBuffers[0].mDataByteSize = (uint32_t)hwAacEncoder.curFramePcmData.length;
        ioData->mNumberBuffers = 1;
        ioData->mBuffers[0].mNumberChannels = (uint32_t)hwAacEncoder.audioConfig.channelCount;
        
        return noErr;
    }
    
    return- 1; }Copy the code

Step 4: Close the encoder to release resources

-(void)close{
    AudioConverterDispose(_aConverter);
    _aConverter = nil;
    self.curFramePcmData = nil;
    self.aMaxOutputFrameSize = 0;
}
Copy the code

The article lists

1 hour to learn: the simplest iOS live push stream (A) project introduction
1 hour learning: The simplest iOS Live Streams (II) Code Architecture Overview
1 hour to learn: the simplest iOS live push stream (3) Using the system interface capture audio and video
1 hour learning: the simplest iOS live push stream (4) how to use GPUImage, how to beauty
1 hour learning: the simplest iOS live stream push (5) yuV, PCM data introduction and acquisition
1 hour to learn: the simplest iOS live push stream (6) H264, AAC, FLV introduction
Learn in 1 hour: The simplest iOS Live Stream push (7) H264 / AAC hard coding
1 hour to learn: The simplest iOS live Push stream (8) H264 / AAC soft coding
Learn in 1 hour: The simplest iOS live stream push (nine) FLV encoding with audio and video timestamp synchronization
1 hour learning: The simplest iOS live push stream (10) Librtmp usage introduction
Introduction to SPS&PPS and AudioSpecificConfig (End)

Learn in 1 hour: The simplest iOS Live Stream push (7) H264 / AAC hard coding

Hard-coded H264

Hard-coded AAC

First step, open the encoder

The article lists

Related Posts

Flutter: Imitation jingdong project combat (1)- Home page function implementation

Objc_alloc allocates object space

[iOS14]WidgetKit Development Practice 3- Widget user configuration