Vx Search for “gjZKeyFrame” Follow “Keyframe Keyframe” to get the latest audio and video technical articles in time.

This public account will share audio and video technology roadmap: audio and video basics (finished) → audio and video tools (finished) → audio and video engineering examples (in progress) → audio and video industry actual combat (preparation).

IOS /Android client development If students want to start to learn audio and video development, the most smooth way is to have a certain understanding of the basic concept of audio and video knowledge, and then use the audio and video capabilities of the local platform to get started to practice the audio and video collection → coding → packaging → decapsulation → decoding → rendering process. And use audio and video tools to analyze and understand the corresponding audio and video data.

In the section of audio and video engineering example, we will introduce how to get started in iOS/Android platform by disassembling the process of acquisition, coding, packaging, decomsealing, decoding, rendering and implementing Demo.

Here is the first article: iOS audio collection Demo. The Demo contains the following:

  • 1) Implement an audio acquisition module;
  • 2) Realize audio acquisition logic and store the collected audio as PCM data;
  • 3) Detailed code comments to help you understand the code logic and principle.

You can follow the wechat public number, in the public number to send a message “AVDemo” to get all the source code of Demo.

1. Audio acquisition module

First, a KFAudioConfig class is implemented to define the configuration of the audio capture parameters. Here includes: sampling rate, quantization bit depth, number of sound channel these several parameters. The meaning of these parameters was introduced in the previous article, Representation of Sound (3) : Digitization of Sound, which introduced the basis of sound.

KFAudioConfig.h

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface KFAudioConfig : NSObject
+ (instancetype)defaultConfig;

@property (nonatomic.assignNSUInteger channels; // Number of channels, default: 2.
@property (nonatomic.assignNSUInteger sampleRate; // Sampling rate, default: 44100.
@property (nonatomic.assignNSUInteger bitDepth; // Quantization bit depth, default: 16.
@end

NS_ASSUME_NONNULL_END
Copy the code

KFAudioConfig.m

#import "KFAudioConfig.h"

@implementation KFAudioConfig

+ (instancetype)defaultConfig {
    KFAudioConfig *config = [[self alloc] init];
    config.channels = 2;
    config.sampleRate = 44100;
    config.bitDepth = 16;
    
    return config;
}

@end
Copy the code

Next, we implement a KFAudioCapture class for audio capture.

KFAudioCapture.h

#import <Foundation/Foundation.h>
#import <CoreMedia/CoreMedia.h>
#import "KFAudioConfig.h"

NS_ASSUME_NONNULL_BEGIN

@interface KFAudioCapture : NSObject
+ (instancetype)new NS_UNAVAILABLE;
- (instancetype)init NS_UNAVAILABLE;
- (instancetype)initWithConfig:(KFAudioConfig *)config;

@property (nonatomic.strong.readonly) KFAudioConfig *config;
@property (nonatomic.copyvoid (^sampleBufferOutputCallBack)(CMSampleBufferRef sample); // Audio collection data callback.
@property (nonatomic.copyvoid (^errorCallBack)(NSError *error); // Audio acquisition error callback.

- (void)startRunning; // Start collecting audio data.
- (void)stopRunning; // Stop collecting audio data.
@end

NS_ASSUME_NONNULL_END
Copy the code

Above is the interface design of KFAudioCapture. It can be seen that in addition to the initialization method, there are mainly interfaces for obtaining audio configuration and audio collection data callback and error callback, as well as interfaces for starting and stopping collection.

In the audio acquisition data callback interface above, we return the data structure CMSampleBufferRef[1], which we will focus on here. The official documentation describes CMSampleBufferRef as follows:

A reference to a CMSampleBuffer. A CMSampleBuffer is a Core Foundation object containing zero or more compressed (or uncompressed) samples of a particular media type (audio, video, muxed, and so on).

That is, CMSampleBufferRef is a reference to CMSampleBuffer[2]. The core data structure here is the CMSampleBuffer. There are a few things to note about it:

  • CMSampleBufferIs a Core Foundation object, which means its interface is implemented in C, its memory management is non-ARC and needs to be managed manually, and it needs to be bridge converted to and from Foundation objects.
  • CMSampleBufferIt is the core data structure used by the system to use and transfer media sampling data in the pipeline of audio and video processing. You can think of it as the currency in the iOS audio and video processing pipeline, with video data captured by the camera, audio data captured by the microphone, encoding and decoding data, reading and storing video, video rendering, etc., all taking it as a parameter.
  • CMSampleBufferContains zero or more samples of a type (audio, video, muxed, etc.). Such as:
    • Either a CMBlockBuffer of one or more media samples [3]. Which can encapsulate: after audio collection, coding, decoding data (such as: PCM data, AAC data); Video encoded data (e.g. H.264 data).
    • Either a CVImageBuffer[4] (also known as CVPixelBuffer[5]). It contains a format description of CMSampleBuffers in the media stream, width, height and timing information for each sample, buffering level, and subsidiary information for the sample level. Buffer-level dependencies are information about the buffer as a whole, such as playback speed, operations on subsequent buffered data, and so on. Sampling level subsidiary information refers to the information of a single sample, such as the timestamp of the video frame, whether a key frame, etc. Which can encapsulate: after video collection, decoding and other unencoded data (such as: YCbCr data, RGBA data).

So, with that in mind, you can see why the above audio capture data callback interface returns the data structure CMSampleBufferRef. Because it is universal, we can also get the PCM data we want from it.

#import "KFAudioCapture.h"
#import <AVFoundation/AVFoundation.h>
#import <mach/mach_time.h>

@interface KFAudioCapture(a)
@property (nonatomic.assign) AudioComponentInstance audioCaptureInstance; // Audio collection instance.
@property (nonatomic.assign) AudioStreamBasicDescription audioFormat; // Audio capture parameters.
@property (nonatomic.strong.readwrite) KFAudioConfig *config;
@property (nonatomic.strong) dispatch_queue_t captureQueue;
@property (nonatomic.assign) BOOL isError;
@end

@implementation KFAudioCapture

#pragma mark - Lifecycle
- (instancetype)initWithConfig:(KFAudioConfig *)config {
    self = [super init];
    if (self) {
        _config = config;
        _captureQueue = dispatch_queue_create("com.KeyFrameKit.audioCapture", DISPATCH_QUEUE_SERIAL);
    }
    
    return self;
}
- (void)dealloc {
    // Clean up the audio collection instance.
    if (_audioCaptureInstance) {
        AudioOutputUnitStop(_audioCaptureInstance);
        AudioComponentInstanceDispose(_audioCaptureInstance);
        _audioCaptureInstance = nil; }}#pragma mark - Action
- (void)startRunning {
    if (self.isError) {
        return;
    }
    
    __weak typeof(self) weakSelf = self;
    dispatch_async(_captureQueue, ^{
        if(! weakSelf.audioCaptureInstance) {NSError *error = nil;
            // Create an audio collection instance the first time you start Trunning.
            [weakSelf setupAudioCaptureInstance:&error];
            if (error) {
                // Catch and callback an error when creating an audio instance.
                [weakSelf callBackError:error];
                return; }}// Start collecting.
        OSStatus startStatus = AudioOutputUnitStart(weakSelf.audioCaptureInstance);
        if(startStatus ! = noErr) {// Catch and call back an error at the start of the collection.
            [weakSelf callBackError:[NSError errorWithDomain:NSStringFromClass([KFAudioCapture class]) code:startStatus userInfo:nil]]. }}); } - (void)stopRunning {
    if (self.isError) {
        return;
    }
    
    __weak typeof(self) weakSelf = self;
    dispatch_async(_captureQueue, ^{
        if (weakSelf.audioCaptureInstance) {
            // Stop collecting.
            OSStatus stopStatus = AudioOutputUnitStop(weakSelf.audioCaptureInstance);
            if(stopStatus ! = noErr) {// Catch and callback an error when stopping collection.
                [weakSelf callBackError:[NSError errorWithDomain:NSStringFromClass([KFAudioCapture class]) code:stopStatus userInfo:nil]]. }}}); }#pragma mark - Utility
- (void)setupAudioCaptureInstance:(NSError **)error {
    // 1. Set the audio component description.
    AudioComponentDescription acd = {
        .componentType = kAudioUnitType_Output,
        //. ComponentSubType = kAudioUnitSubType_VoiceProcessingIO, // Echo cancelation mode
        .componentSubType = kAudioUnitSubType_RemoteIO,
        .componentManufacturer = kAudioUnitManufacturer_Apple,
        .componentFlags = 0,
        .componentFlagsMask = 0};// Find the audio component that matches the specified description.
    AudioComponent component = AudioComponentFindNext(NULL, &acd);
    
    // create an audio component instance.
    OSStatus status = AudioComponentInstanceNew(component, &_audioCaptureInstance);
    if(status ! = noErr) { *error = [NSError errorWithDomain:NSStringFromClass(self.class) code:status userInfo:nil];
        return;
    }
        
    // 4, set instance properties: read-write. 0 cannot be read or written, and 1 can be read or written.
    UInt32 flagOne = 1;
    AudioUnitSetProperty(_audioCaptureInstance, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, &flagOne, sizeof(flagOne));
    
    // 5, set instance properties: audio parameters, such as: data format, number of tracks, sampling bit depth, sampling rate, etc.
    AudioStreamBasicDescription asbd = {0};
    asbd.mFormatID = kAudioFormatLinearPCM; // The original data is PCM, using channel interleaved format.
    asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
    asbd.mChannelsPerFrame = (UInt32) self.config.channels; // The number of channels per frame
    asbd.mFramesPerPacket = 1; // Number of frames per packet
    asbd.mBitsPerChannel = (UInt32) self.config.bitDepth; // Sampling bit depth
    asbd.mBytesPerFrame = asbd.mChannelsPerFrame * asbd.mBitsPerChannel / 8; // Bytes per frame (byte = bit / 8)
    asbd.mBytesPerPacket = asbd.mFramesPerPacket * asbd.mBytesPerFrame; // The number of bytes per packet
    asbd.mSampleRate = self.config.sampleRate; / / sampling rate
    self.audioFormat = asbd;
    status = AudioUnitSetProperty(_audioCaptureInstance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &asbd, sizeof(asbd));
    if(status ! = noErr) { *error = [NSError errorWithDomain:NSStringFromClass(self.class) code:status userInfo:nil];
        return;
    }
    
    // set the instance properties: data callback function.
    AURenderCallbackStruct cb;
    cb.inputProcRefCon = (__bridge void *) self;
    cb.inputProc = audioBufferCallBack;
    status = AudioUnitSetProperty(_audioCaptureInstance, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &cb, sizeof(cb));
    if(status ! = noErr) { *error = [NSError errorWithDomain:NSStringFromClass(self.class) code:status userInfo:nil];
        return;
    }
    
    // Initialize the instance.
    status = AudioUnitInitialize(_audioCaptureInstance);
    if(status ! = noErr) { *error = [NSError errorWithDomain:NSStringFromClass(self.class) code:status userInfo:nil];
        return; }} - (void)callBackError:(NSError *)error {
    self.isError = YES;
    if (error && self.errorCallBack) {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.errorCallBack(error); }); }} + (CMSampleBufferRef)sampleBufferFromAudioBufferList:(AudioBufferList)buffers inTimeStamp:(const AudioTimeStamp *)inTimeStamp inNumberFrames:(UInt32)inNumberFrames description:(AudioStreamBasicDescription)description {
    CMSampleBufferRef sampleBuffer = NULL; // A reference to the CMSampleBuffer instance to be generated.
    
    // 1. Create a format description for the audio stream.
    CMFormatDescriptionRef format = NULL;
    OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &description, 0.NULL.0.NULL.NULL, &format);
    if(status ! = noErr) {CFRelease(format);
        return nil;
    }
    
    // 2. Process the timestamp information of the audio frame.
    mach_timebase_info_data_t info = {0.0};
    mach_timebase_info(&info);
    uint64_t time = inTimeStamp->mHostTime;
    // Convert to nanoseconds.
    time *= info.numer;
    time /= info.denom;
    / / PTS.
    CMTime presentationTime = CMTimeMake(time, 1000000000.0f);
    // For audio, PTS and DTS are the same.
    CMSampleTimingInfo timing = {CMTimeMake(1, description.mSampleRate), presentationTime, presentationTime};
    
    // create CMSampleBuffer instance.
    status = CMSampleBufferCreate(kCFAllocatorDefault, NULL.false.NULL.NULL, format, (CMItemCount) inNumberFrames, 1, &timing, 0.NULL, &sampleBuffer);
    if(status ! = noErr) {CFRelease(format);
        return nil;
    }
    
    // create CMBlockBuffer instance. Where the data is copied from the AudioBufferList and the CMBlockBuffer instance is associated with the CMSampleBuffer instance.
    status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer, kCFAllocatorDefault, kCFAllocatorDefault, 0, &buffers);
    if(status ! = noErr) {CFRelease(format);
        return nil;
    }
    
    CFRelease(format);
    return sampleBuffer;
}

#pragma mark - Capture CallBack
static OSStatus audioBufferCallBack(void *inRefCon,
                                    AudioUnitRenderActionFlags *ioActionFlags,
                                    const AudioTimeStamp *inTimeStamp,
                                    UInt32 inBusNumber,
                                    UInt32 inNumberFrames,
                                    AudioBufferList *ioData) {
    @autoreleasepool {
        KFAudioCapture *capture = (__bridge KFAudioCapture *) inRefCon;
        if(! capture) {return - 1;
        }
        
        // create AudioBufferList space to receive collected data.
        AudioBuffer buffer;
        buffer.mData = NULL;
        buffer.mDataByteSize = 0;
        // The data format is kAudioFormatLinearPCM, so mNumberChannels is set to 1 even if the channels are dual.
        // For two-channel data, two channel data will be assembled one by one in each group according to the sampling bit depth of 16 bits.
        buffer.mNumberChannels = 1;
        AudioBufferList buffers;
        buffers.mNumberBuffers = 1;
        buffers.mBuffers[0] = buffer;
        
        // get the audio PCM data and store it in the AudioBufferList.
        // Here are a few things to be clear about:
        // 1) How much data will come in each callback?
        // According to the above Settings of audio collection parameters: PCM is interlaced with sound channels, the number of sound channels in each frame is 2, and the sampling bit depth is 16 bit. So the number of bytes per frame is 4 bytes (2 bytes for each left and right channel).
        // The number of frames returned is inNumberFrames. The number of bytes of data returned by such a callback is mBytesPerFrame(4) * inNumberFrames.
        // 2) Is the frequency of the data callback related to the audio sampling rate?
        // The frequency of this data callback is independent of the audio sampling rate (mSampleRate 44100, set above). The number of channels, sampling bit depth and sampling rate together determine the size of sampling data per unit time of the equipment. These data will be buffered and then returned to us piece by piece through this data callback. The frequency of this callback is the speed at which the underlying layer gives us data piece by piece, independent of the sampling rate.
        // 3) What is the frequency of this data callback?
        / / the data interval callback is [AVAudioSession sharedInstance]. PreferredIOBufferDuration, frequency is the reciprocal of the value. We can use [[AVAudioSession sharedInstance] setPreferredIOBufferDuration: 1 error: nil] setting this value to control the frequency of the callback.
        OSStatus status = AudioUnitRender(capture.audioCaptureInstance,
                                          ioActionFlags,
                                          inTimeStamp,
                                          inBusNumber,
                                          inNumberFrames,
                                          &buffers);
        
        // 3. Data encapsulation and callback
        if (status == noErr) {
            // Encapsulate the data as CMSampleBuffer using utility methods.
            CMSampleBufferRef sampleBuffer = [KFAudioCapture sampleBufferFromAudioBufferList:buffers inTimeStamp:inTimeStamp inNumberFrames:inNumberFrames description:capture.audioFormat];
            // Callback data.
            if (capture.sampleBufferOutputCallBack) {
                capture.sampleBufferOutputCallBack(sampleBuffer);
            }
            if (sampleBuffer) {
                CFRelease(sampleBuffer); }}returnstatus; }}@end

Copy the code

Above is the implementation of KFAudioCapture, from the code can see the main parts:

  • 1) Create an audio collection instance. First call-startRunningBefore creating an audio capture instance.
    • in-setupAudioCaptureInstance:Method.
  • 2) Handle the data callback of the audio collection instance and encapsulate the data in the callback toCMSampleBufferRefStructure, the external data callback interface thrown to KFAudioCapture.
    • inaudioBufferCallBack(...)Method to implement the callback processing logic.
    • The encapsulationCMSampleBufferRefWith the help of+sampleBufferFromAudioBufferList:inTimeStamp:inNumberFrames:description:Methods.
  • 3) Realize the start and stop collection logic.
    • Respectively in-startRunning 和 -stopRunningMethod. Notice that the start and stop operations are passed through the serial queuedispatch_asyncAsynchronously processed, mainly to prevent the main thread from stalling.
  • 4) Catch errors in the start and stop operations of audio collection and throw them to KFAudioCapture’s external error callback interface.
    • in-startRunning 和 -stopRunningMethod to catch errors in-callBackError:Method calls back outward.
  • 5) Clear audio collection instances.
    • in-deallocMethod.

See the code and its comments for more details.

2. Collect audio and store it as PCM file

We implement the audio capture logic in a ViewController and store the captured audio as PCM data.

KFAudioCaptureViewController.m

#import "KFAudioCaptureViewController.h"
#import <AVFoundation/AVFoundation.h>
#import "KFAudioCapture.h"

@interface KFAudioCaptureViewController(a)
@property (nonatomic.strong) KFAudioConfig *audioConfig;
@property (nonatomic.strong) KFAudioCapture *audioCapture;
@property (nonatomic.strong) NSFileHandle *fileHandle;
@end

@implementation KFAudioCaptureViewController
#pragma mark - Property
- (KFAudioConfig *)audioConfig {
    if(! _audioConfig) { _audioConfig = [KFAudioConfig defaultConfig]; }return _audioConfig;
}

- (KFAudioCapture *)audioCapture {
    if(! _audioCapture) { __weak typeof(self) weakSelf = self;
        _audioCapture = [[KFAudioCapture alloc] initWithConfig:self.audioConfig];
        _audioCapture.errorCallBack = ^(NSError* error) {
            NSLog(@"KFAudioCapture error: %zi %@", error.code, error.localizedDescription);
        };
        // Audio collection data callback. Here the PCM data is written to a file.
        _audioCapture.sampleBufferOutputCallBack = ^(CMSampleBufferRef sampleBuffer) {
            if (sampleBuffer) {
                // get the CMBlockBuffer, which encapsulates PCM data.
                CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
                size_t lengthAtOffsetOutput, totalLengthOutput;
                char *dataPointer;
                
                // get PCM data from CMBlockBuffer and store it in file.
                CMBlockBufferGetDataPointer(blockBuffer, 0, &lengthAtOffsetOutput, &totalLengthOutput, &dataPointer);
                [weakSelf.fileHandle writeData:[NSDatadataWithBytes:dataPointer length:totalLengthOutput]]; }}; }return _audioCapture;
}

- (NSFileHandle *)fileHandle {
    if(! _fileHandle) {NSString *audioPath = [[NSSearchPathForDirectoriesInDomains(NSDocumentDirectory.NSUserDomainMask.YES) lastObject] stringByAppendingPathComponent:@"test.pcm"];
        NSLog(@"PCM file path: %@", audioPath);
        [[NSFileManager defaultManager] removeItemAtPath:audioPath error:nil];
        [[NSFileManager defaultManager] createFileAtPath:audioPath contents:nil attributes:nil];
        _fileHandle = [NSFileHandle fileHandleForWritingAtPath:audioPath];
    }

    return _fileHandle;
}

#pragma mark - Lifecycle
- (void)viewDidLoad {
    [super viewDidLoad];
    
    [self setupAudioSession];
    [self setupUI];
    
    // After the audio is collected, you can copy the test. PCM file under the App Document folder to the computer and use ffplay to play it:
    // ffplay -ar 44100 -channels 2 -f s16le -i test.pcm
}

- (void)dealloc {
    if(_fileHandle) { [_fileHandle closeFile]; }}#pragma mark - Setup
- (void)setupUI {
    self.edgesForExtendedLayout = UIRectEdgeAll;
    self.extendedLayoutIncludesOpaqueBars = YES;
    self.title = @"Audio Capture";
    self.view.backgroundColor = [UIColor whiteColor];
    
    
    // Navigation item.
    UIBarButtonItem *startBarButton = [[UIBarButtonItem alloc] initWithTitle:@"Start" style:UIBarButtonItemStylePlain target:self action:@selector(start)];
    UIBarButtonItem *stopBarButton = [[UIBarButtonItem alloc] initWithTitle:@"Stop" style:UIBarButtonItemStylePlain target:self action:@selector(stop)];
    self.navigationItem.rightBarButtonItems = @[startBarButton, stopBarButton];

}

- (void)setupAudioSession {
    NSError *error = nil;
    
    // get the audio session instance.
    AVAudioSession *session = [AVAudioSession sharedInstance];

    // 2. Set categories and options.
    [session setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:AVAudioSessionCategoryOptionMixWithOthers | AVAudioSessionCategoryOptionDefaultToSpeaker error:&error];
    if (error) {
        NSLog(@"AVAudioSession setCategory error.");
        error = nil;
        return;
    }
    
    // 3. Set mode.
    [session setMode:AVAudioSessionModeVideoRecording error:&error];
    if (error) {
        NSLog(@"AVAudioSession setMode error.");
        error = nil;
        return;
    }

    // activate session.
    [session setActive:YES error:&error];
    if (error) {
        NSLog(@"AVAudioSession setActive error.");
        error = nil;
        return; }}#pragma mark - Action
- (void)start {
    [self.audioCapture startRunning];
}

- (void)stop {
    [self.audioCapture stopRunning];
}

@end

Copy the code

Above is KFAudioCaptureViewController implementation, it is important to note here before collecting audio need to set up AVAudioSession [6] to correct acquisition mode.

3. Play the PCM file with the tool

After completing the audio collection, you can copy the test. PCM file under the App Document folder to the computer and use FFPlay to verify whether the audio collection effect meets expectations:

$ ffplay -ar 44100 -channels 2 -f s16le -i test.pcm
Copy the code

Note that the parameters are aligned with the sampling rate, number of channels, and sampling bit depth set in the project code.

For tools for playing PCM files, see FFmpeg Tools section 2 ffplay command line tools and Visual Audio and Video Analysis Tools Section 1.1 Adobe Audition.

The resources

[1] CMSampleBufferRef: developer.apple.com/documentati…

[2] CMSampleBuffer: developer.apple.com/documentati…

[3] CMBlockBuffer: developer.apple.com/documentati…

[4] CVImageBuffer: developer.apple.com/documentati…

[5] CVPixelBuffer: developer.apple.com/documentati…

[6] AVAudioSession: developer.apple.com/documentati…

[6]

AVAudioSession: developer.apple.com/documentati…

Recommended reading

“FFmpeg tools: Audio and Video development with it, quick @ your brother to see”

Visual Audio and Video Analysis Tools: A great collection of tools to use and forward to Your brother

Data Capture Tools: See what protocols have been optimized for Competing products

IOS Reverse Tools: If you reverse well, code farmers leave work early