If you want to see the final result before you decide to read the article ->bilibili



Sample code download

IOS Audio Spectrum animation (2)

Due to space considerations, this tutorial is divided into two parts. This part focuses on audio playback and spectrum data acquisition, and the next part covers data processing and animation.

preface

A long time ago, when I was listening to music on my computer, I used to feel professional when I brought up a small tool on my computer player, and the bar chart in it would dance with the rhythm of the music, although I later learned that this was the performance of the audio signal in the frequency domain.

Warm up knowledge

Before we start writing code, let’s look at some basic concepts

Audio digitalization

  • Sampling: As we all know, sound is a kind of pressure wave, is continuous, but in the computer can not represent continuous data, so can only be discretized through interval sampling, which is called the sampling rate of the frequency collected. According to Nyquist sampling theorem, the signal frequency will not be distorted when the sampling rate is greater than 2 times the maximum frequency of the signal. The frequency range of sound that human can hear is 20Hz to 20khz, so CD and others adopt a sampling rate of 44.1khz to meet most needs.

  • Quantization: The signal strength of each sample will also have a loss of precision, if expressed by 16-bit Int(bit depth), its range is [-32768,32767], so the higher the bit depth, the greater the range can be expressed, the better the audio quality.

  • Channel number: in order to better effect, the sound is generally collected right and left double channel signal, how to arrange it? One is Interleaved: LRLRRLR, and the other is non-interleaved: LLL RRR.

The above method of digitizing analog signals is called pulse coded modulation (PCM), and in this paper we need to process such data.

Fourier transform

Now our audio data is time-domain, meaning that the horizontal axis is time, the vertical axis is signal strength, and the horizontal axis required by the animation is frequency. Converting a signal from the time domain to the frequency domain can be achieved by using the Fourier transform. After the Fourier transform, the signal is decomposed into sinusoidal waves of different frequencies. The frequency and amplitude of these signals are the data we need to animate.

Figure 1 (the fromnti-audioThe Fourier transform transforms a signal from the time domain to the frequency domain

In fact, all computers deal with is the discrete Fourier Transform (DFT), and the fast Fourier Transform (FFT) is a fast way to compute the discrete Fourier transform (DFT) or its inverse transform, which reduces the complexity of the DFT from O(n ^ 2) to O(N logn). If you’ve just clicked on the link and seen the FFT algorithm, you might be wondering how to write the FFT code. Don’t worry, Apple provides the Accelerate framework for this, where the vDSP part provides a functional implementation of digital signal processing, including FFT. With vDSP, we can implement FFT in just a few steps, which is easy and efficient.

Audio framework in iOS

Now let’s take a look at the audio framework in iOS system. AudioToolbox is powerful, but the apis provided are all based on C language. Most of its functions can already be implemented through AVFoundation, which uses objective-c /Swift to encapsulates the underlying interface. Our demand this time is relatively simple, only need to play audio files and real-time processing, so the AVAudioEngine in AVFoundation can meet the needs of this audio playback and processing.

Figure 2 (the fromWWDC16) iOS/MAC OS X Audio Technology stack

AVAudioEngine

AVAudioEngine has been added to the AVFoundation framework from iOS8 and provides features that previously required digging into the underlying AudioToolbox, such as real-time audio processing. It abstracts every link of audio processing into AVAudioNode and manages it through AVAudioEngine. Finally, they are connected to form a complete node graph. Here’s how this tutorial’s AVAudioEngine connects to its nodes.

Figure 3 Connection diagram of AVAudioEngine and AVAudioNode

MainMixerNode and outputNode are both singletons created and connected by default by AVAudioEngine objects when they are accessed, which means we just need to manually create engine and Player nodes and connect them! Finally, a tap is installed on the output bus of mainMixerNode to convert and FFT the quantitative output AVAudioPCMBuffer data.

Code implementation

With that in mind, we’re ready to start writing code. When you start the project AudioSpectrum01-Starter, the first thing to implement is the audio playback function.

If you just want to browse through the implementation code, open up the project AudioSpectrum01-Final and you have completed all the code for this article

Audio playback

Create AVAudioEngine and AVAudioPlayerNode instance variables in AudioSpectrumPlayer class:

private let engine = AVAudioEngine(a)private let player = AVAudioPlayerNode(a)Copy the code

Next add the following code to the init() method:

/ / 1
engine.attach(player)
engine.connect(player, to: engine.mainMixerNode, format: nil)
/ / 2
engine.prepare()
try! engine.start()
Copy the code

//1: Mount player to engine and connect player to Engine’s mainMixerNode to complete the AVAudioEngine node diagram (see Figure 3). //2: Before starting engine by calling strat(), resources need to be allocated using the prepare() method

Continue refining the play(withFileName fileName: String) and stop() methods:

/ / 1
func play(withFileName fileName: String) {
    guard let audioFileURL = Bundle.main.url(forResource: fileName, withExtension: nil),
        let audioFile = try? AVAudioFile(forReading: audioFileURL) else { return }
    player.stop()
    player.scheduleFile(audioFile, at: nil, completionHandler: nil)
    player.play()
}
/ / 2
func stop(a) {
    player.stop()
}
Copy the code

//1: You need to make sure that the audio file named fileName is loaded properly, use the stop() method to stop the previous playback, call the scheduleFile(_:at:completionHandler:) method to orchestrate a new file, and use the play() method to start the playback. //2: Stop playing by calling player stop().

The audio Play function has been completed, to run the project, try clicking the Play button to the right of the music to Play the audio.

Audio data acquisition

Now open the AudioSpectrumPlayer file and define fftSize, which is the number of frames in each buffer obtained.

private var fftSize: Int = 2048
Copy the code

Place the cursor below the engine.connect() statement in the init() method and call mainMixerNode’s installTap method to start installing taps:

engine.mainMixerNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(fftSize), format: nil, block: { [weak self](buffer, when) in
    guard let strongSelf = self else { return }
    if! strongSelf.player.isPlaying {return }
    buffer.frameLength = AVAudioFrameCount(strongSelf.fftSize) / / the buffer in the role is to ensure that every time the callback is frameLength fftSize size, see: https://stackoverflow.com/a/27343266/6192288
    let amplitudes = strongSelf.fft(buffer)
    ifstrongSelf.delegate ! =nil {
        strongSelf.delegate?.player(strongSelf, didGenerateSpectrum: amplitudes)
    }
})
Copy the code

In the tap callback block, the buffer of 2048 frames is handed to the FFT function for calculation, and finally the calculation result is transmitted through the delegate.

According to the sampling rate of 44100Hz and 1 Frame = 1 Packet (refer to the concepts and relations of channel, sample, Frame and Packet here), then the block will call 44100/2048≈21.5 times in one second. Another thing to note is that blocks may not be called on the main thread.

FFT to realize

Finally, it is time to implement FFT. According to the vDSP documentation, first we need to define an FFT weight array (fftSetup), which can be used repeatedly in multiple FFTS and improve FFT performance:

private lazy var fftSetup = vDSP_create_fftsetup(vDSP_Length(Int(round(log2(Double(fftSize))))), FFTRadix(kFFTRadix2))
Copy the code

Destroy in destructor (uninitializer) when not needed:

deinit {
    vDSP_destroy_fftsetup(fftSetup)
}
Copy the code

Finally, new FFT function, the implementation code is as follows:

private func fft(_ buffer: AVAudioPCMBuffer)- > [[Float]] {
    var amplitudes = [[Float]] ()guard let floatChannelData = buffer.floatChannelData else { return amplitudes }

    //1: extract sample data from buffer
    var channels: UnsafePointer<UnsafeMutablePointer<Float>> = floatChannelData
    let channelCount = Int(buffer.format.channelCount)
    let isInterleaved = buffer.format.isInterleaved
    
    if isInterleaved {
        // deinterleave
        let interleavedData = UnsafeBufferPointer(start: floatChannelData[0].count: self.fftSize * channelCount)
        var channelsTemp : [UnsafeMutablePointer<Float= >] []for i in 0..<channelCount {
            var channelData = stride(from: i, to: interleavedData.count, by: channelCount).map{ interleavedData[$0]}
            channelsTemp.append(UnsafeMutablePointer(&channelData))
        }
        channels = UnsafePointer(channelsTemp)
    }
    
    for i in 0..<channelCount {
        let channel = channels[i]
        //2: Add hanning window
        var window = [Float](repeating: 0.count: Int(fftSize))
        vDSP_hann_window(&window, vDSP_Length(fftSize), Int32(vDSP_HANN_NORM))
        vDSP_vmul(channel, 1, window, 1, channel, 1, vDSP_Length(fftSize))
        
        //3: wrap the real numbers into the complex fftInOut required by FFT, which is both input and output
        var realp = [Float](repeating: 0.0.count: Int(fftSize / 2))
        var imagp = [Float](repeating: 0.0.count: Int(fftSize / 2))
        var fftInOut = DSPSplitComplex(realp: &realp, imagp: &imagp)
        channel.withMemoryRebound(to: DSPComplex.self, capacity: fftSize) { (typeConvertedTransferBuffer) -> Void in
            vDSP_ctoz(typeConvertedTransferBuffer, 2, &fftInOut, 1, vDSP_Length(fftSize / 2))}//4: executes FFTvDSP_fft_zrip(fftSetup! , &fftInOut,1, vDSP_Length(round(log2(Double(fftSize)))), FFTDirection(FFT_FORWARD));
        
        //5: Adjust FFT result to calculate amplitude
        fftInOut.imagp[0] = 0
        let fftNormFactor = Float(1.0 / (Float(fftSize)))
        vDSP_vsmul(fftInOut.realp, 1, [fftNormFactor], fftInOut.realp, 1, vDSP_Length(fftSize / 2));
        vDSP_vsmul(fftInOut.imagp, 1, [fftNormFactor], fftInOut.imagp, 1, vDSP_Length(fftSize / 2));
        var channelAmplitudes = [Float](repeating: 0.0.count: Int(fftSize / 2))
        vDSP_zvabs(&fftInOut, 1, &channelAmplitudes, 1, vDSP_Length(fftSize / 2));
        channelAmplitudes[0] = channelAmplitudes[0] / 2 // The amplitude of the dc component needs to be divided by 2
        amplitudes.append(channelAmplitudes)
    }
    return amplitudes
}
Copy the code

From the comments in the code, you should be able to see how to get the audio sample data from the buffer and perform the FFT calculation, but the following two points were a bit difficult for me to understand while I was working through this part:

  1. throughbufferObject methodsfloatChannelDataGet sample data if it is multichannel and isinterleavedWe need to do itdeinterleaveIt can be seen clearly from the picture belowdeinterleaveBefore and after, but after I experimented with a lot of audio files, it was all the samenon-interleavedIn other words, there is no need to convert. ┑( ̄  ̄  ̄)┍

  1. vDSPWe’re dealing with real numbersFFTComputations use a unique data format to save memoryFFTBefore and after the calculation will be transformed twiceFFTThe input and output data structures are unified intoDSPSplitComplex. The first conversion is throughvDSP_ctozThe function converts a real array of sample data toDSPSplitComplex. The second time will beFFTThe result is converted toDSPSplitComplexThe process of transformation is inFFTCalculation functionvDSP_fft_zripAutomatically completed in.

    The second conversion process is as follows: N bits of sample data (n/2 complex digits) FFT calculation results in n/2+1 complex digits: {[DC,0],C[2]… ,C[n/2],[NY,0]} (where DC is the DC component,NY is the Nyquist frequency value,C is the complex array), where the imaginary part of [DC,0] and [NY,0] are all 0, so NY can be put into the imaginary part of DC, the result becomes {[DC,NY],C[2],C[3]… ,C[n/2]}, consistent with the input number.

Running the project again, you can now see printouts of spectrum data in the console in addition to listening to music.

Well, that’s the end of this article, and the next one will process and animate the calculated spectral data.

Reference [1] Wikipedia, Pulse code Modulation, zh.wikipedia.org/wiki/%E8%84… [2] Mike Ash, the audio data acquisition and analysis, www.mikeash.com/pyblog/frid… [3] Korea’s son, Fourier analysis of strangled tutorial, blog.jobbole.com/70549/ [4] raywenderlich, AVAudioEngine introduction to programming, www.raywenderlich.com/5154-avaudi… [5] Apple, vDSP programming guide, developer.apple.com/library/arc… [6] Apple, aurioTouch case code, developer.apple.com/library/arc…