preface

Due to the company’s demand, I was assigned to study the real-time voice dialogue of walkie-talkies. The intercom clicks the button to initiate a conversation. The Android terminal answers the call and then performs a voice conversation. After studying the third party walkie-talkie Demo for a few days, I found that this Demo is just a simple audio playback, and there is no code related to Android client,Java version also needs to see the underlying implementation, there is no way but to build it by myself, I just want to say ***!! .

The preparatory work

Originally intended to use the Web end to do the client at the beginning, but due to limited technology, midway into Android(Kotlin) end, the background is SpringBoot. The front and back end interaction is real-time data interaction through WebSocket. After determining the technical solution, then understand the relevant audio format

  • PCM PCM (Pulse Code Modulation) is also known as Pulse Code coded Modulation. A PCM file is A binary sequence of analog audio signals directly formed by A/D conversion (A/D conversion) without additional file headers and end-of-file flags. The sound data in PCM is not compressed. If it is a mono file, the sampled data is stored in chronological order. But only these digitized audio binary sequences cannot be played, because no player knows what channel number, sampling frequency or sampling bit to play at, and the binary sequence is not self-descriptive.

  • WAV WAVE (Waveform Audio File Format), or WAV as it is popularly known by its extension, is also lossless Audio coding. A WAV file can be used as a wrapper for a PCM file. In fact, if you look at the HEX file of the PCM and the corresponding WAV file, you can see that the WAV file only has an extra 44bytes at the beginning of the PCM file to represent the number of channels, sampling frequency, and sampling bits. Because of their self-description, WAV files can be played by almost any audio player. Naturally speaking, if we need to play pure PCM stream on the Web end, do we just need to add 44bytes in its head to turn it into the corresponding WAV file, and then we can play it?

  • GSM GSM 06.10 Lossy voice compression. Lossy format for compressed speech for the Global Mobile Telecommunications Standard (GSM). It is intended to be beneficial in reducing the size of the audio data, but it can introduce a lot of noise when a given audio signal is encoded and decoded multiple times. This format is used by some voice mail applications. This is CPU intensive

The basic flow

  • Intercom initiate a call (or Android initiate a conversation directly)
  • The Android terminal answers and sends the network request.
  • The back end accepts the request and calls the intercom answer method.
  • The background functions as a relay station, forwarding Android audio data and calling back audio data.
  • Android, back end through WebScoket real-time data transmission (byte)

Pay attention to

  • The walkie-talkie is calling backGSMAudio data
  • Android recording yes data yesPCMAudio data

Much talk is of no avail

The Android end

Add dependencies under App Gradle

// WebSocket
api 'org. Java - websocket: Java - websocket: 1.3.6'

api 'com. Making. Tbruyelle: rxpermissions: 0.10.2'

// retrofit
String retrofit_version = '2.4.0'
api "com.squareup.retrofit2:retrofit:$retrofit_version"
api "com.squareup.retrofit2:converter-gson:${retrofit_version}"
api "com.squareup.retrofit2:adapter-rxjava2:${retrofit_version}"

// okhttp
String okhttp_version = '3.4.1 track'
api "com.squareup.okhttp3:okhttp:${okhttp_version}"
api "com.squareup.okhttp3:logging-interceptor:${okhttp_version}"

// RxKotlin and RxAndroid 2.x
api 'the IO. Reactivex. Rxjava2: rxkotlin: 2.3.0'
api 'the IO. Reactivex. Rxjava2: rxandroid: 2.1.0'
Copy the code

Create JWebSocketClient to inherit WebSocketClient

class JWebSocketClient(serverUri: URI,private val callback: ((data: ByteBuffer?) -> Unit)) : WebSocketClient(serverUri) {

    override fun onOpen(handshakedata: ServerHandshake?). {
        Log.d("LLLLLLLLLLLL"."onOpen")}override fun onClose(code: Int, reason: String? , remote:Boolean) {
        Log.d("LLLLLLLLLLLL"."code = $code, onClose = $reason")}override fun onMessage(message: String?). {
        //Log.d("LLLLLLLLLLLL", "onMessage = $message")
    }


    override fun onMessage(bytes: ByteBuffer?). {
        super.onMessage(bytes)

        //Log.d("LLLLLLLLLLLL", "onMessage2 = $bytes")

        callback.invoke(bytes)
    }

    override fun onError(ex: Exception?). {
        Log.d("LLLLLLLLLLLL"."onError = $ex")}}Copy the code

The onMessage method receives data from the background and calls the callback back to the Activity for processing. MainActivitiy related code

class MainActivity : AppCompatActivity() {

    private lateinit var client: WebSocketClient

    private var isGranted = false
    private var isRecording = true

    private var disposable: Disposable? = null

    private val service by lazy {
        RetrofitFactory.newInstance.create(ApiService::class.java)
    }

    private val sampleRate = 8000
    private val channelIn = AudioFormat.CHANNEL_IN_MONO
    private val channelOut = AudioFormat.CHANNEL_OUT_MONO
    private val audioFormat = AudioFormat.ENCODING_PCM_16BIT

    private val trackBufferSize by lazy { AudioTrack.getMinBufferSize(sampleRate, channelOut, audioFormat) }

    private val recordBufferSize by lazy { AudioTrack.getMinBufferSize(sampleRate, channelOut, audioFormat) }

    private val audioTrack by lazy {
        AudioTrack(AudioManager.STREAM_MUSIC,
                sampleRate,
                channelOut,
                audioFormat,
                trackBufferSize,
                AudioTrack.MODE_STREAM)
    }

    / * * * the MediaRecorder. AudioSource. * / MIC refers to the microphone
    private val audioRecord by lazy {
        AudioRecord(MediaRecorder.AudioSource.MIC,
                sampleRate,
                channelIn,
                audioFormat,
                recordBufferSize)
    }

    private val pcm2WavUtil by lazy {
        FileUtils(sampleRate, channelIn, audioFormat)
    }

    override fun onCreate(savedInstanceState: Bundle?). {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        // Permission request
        requestPermission()

        initWebSocket()

        btnReceive.setOnClickListener {
            if (client.readyState == WebSocket.READYSTATE.NOT_YET_CONNECTED) {
                client.connect()
            }

            audioTrack.play()
            // Incoming device
            service.talkIntercom(IdModel(10))
                    .observeOn(AndroidSchedulers.mainThread())
                    .subscribeOn(Schedulers.io())
                    .subscribe({
                        if(! isGranted) { toast("Permission request denied. Recording function unavailable.")
                            return@subscribe
                        }

                        // Check whether the AudioRecord initialization is successful
                        if(audioRecord.state ! = AudioRecord.STATE_INITIALIZED) { toast("Recording initialization failed")
                            return@subscribe
                        }

                        audioRecord.startRecording()
                        isRecording = true

                        thread {
                            val data = ByteArray(recordBufferSize)
                            while (isRecording) {
                                val readSize = audioRecord.read(data.0, recordBufferSize)
                            
                                if (readSize >= AudioRecord.SUCCESS) {
                                    // Convert PCM to WAV
                                    // Add a header
                                    client.send(pcm2WavUtil.pcm2wav(data))}else {
                                    "Read failed".showLog()
                                }
                            }
                        }
                    }, {
                        "error = $it".showLog()
                    })
        }

        btnHangup.setOnClickListener {
            isRecording = false
            // Turn off the recording
            audioRecord.stop()
            // Turn off the playback
            audioTrack.stop()

            service.hangupIntercom(IdModel(10))
                    .observeOn(AndroidSchedulers.mainThread())
                    .subscribeOn(Schedulers.io())
                    .subscribe {
                        toast("Hang up successful")}}}private fun initWebSocket(a) {
        val uri = URI.create(Ws: / / "192.168.1.140:3014 / websocket / 16502")
        client = JWebSocketClient(uri) {
            valbuffer = ByteArray(trackBufferSize) it? .let { byteBuffer ->//byteBuffer.array().size.toString().showLog()

                val inputStream = ByteArrayInputStream(byteBuffer.array())
                while (inputStream.available() > 0) {
                    val readCount = inputStream.read(buffer)
                    if (readCount == - 1) {
                        "No more data to read.".showLog()
                        break
                    }
                    audioTrack.write(buffer, 0, readCount)
                }
            }
        }
    }

    private fun requestPermission(a) {
        disposable = RxPermissions(this)
                .request(android.Manifest.permission.RECORD_AUDIO,
                        android.Manifest.permission.WRITE_EXTERNAL_STORAGE)
                .subscribe { granted ->
                    if(! granted) { toast("Permission request denied. Recording function unavailable.")
                        return@subscribe
                    }

                    isGranted = true}}override fun onDestroy(a) {
        super.onDestroy() client.close() disposable? .dispose() audioRecord.stop() audioRecord.release() } }Copy the code
  • AudioRecord knowledge, can refer to this article
  • AudioTrack knowledge, can refer to this article

Because of the need to use the recording, so to apply for recording permission. When the WebSocket is initialized. When the answer button is clicked, the request is initiated, the logic in SUBSCRIBE is successfully executed, the recording data (PCM) is read and converted into WAV format and transmitted to the back end, the relevant transcoding codes are as follows

fun pcm2wav(data: ByteArray): ByteArray{

        val sampleRate = 8000
        val channels = 1
        val byteRate = (16 * sampleRate * channels / 8).toLong()

        val totalAudioLen = data.size
        val totalDataLen = totalAudioLen + 36


        val header = ByteArray(44 + data.size)
        // RIFF/WAVE header
        header[0] = 'R'.toByte()
        header[1] = 'I'.toByte()
        header[2] = 'F'.toByte()
        header[3] = 'F'.toByte()
        header[4] = (totalDataLen and 0xff).toByte()
        header[5] = (totalDataLen shr 8 and 0xff).toByte()
        header[6] = (totalDataLen shr 16 and 0xff).toByte()
        header[7] = (totalDataLen shr 24 and 0xff).toByte()
        //WAVE
        header[8] = 'W'.toByte()
        header[9] = 'A'.toByte()
        header[10] = 'V'.toByte()
        header[11] = 'E'.toByte()
        // 'fmt ' chunk
        header[12] = 'f'.toByte()
        header[13] = 'm'.toByte()
        header[14] = 't'.toByte()
        header[15] = ' '.toByte()
        // 4 bytes: size of 'fmt ' chunk
        header[16] = 16
        header[17] = 0
        header[18] = 0
        header[19] = 0
        // format = 1
        header[20] = 1
        header[21] = 0
        header[22] = channels.toByte()
        header[23] = 0
        header[24] = (sampleRate and 0xff).toByte()
        header[25] = (sampleRate shr 8 and 0xff).toByte()
        header[26] = (sampleRate shr 16 and 0xff).toByte()
        header[27] = (sampleRate shr 24 and 0xff).toByte()
        header[28] = (byteRate and 0xff).toByte()
        header[29] = (byteRate shr 8 and 0xff).toByte()
        header[30] = (byteRate shr 16 and 0xff).toByte()
        header[31] = (byteRate shr 24 and 0xff).toByte()
        // block align
        header[32] = (2 * 16 / 8).toByte()
        header[33] = 0
        // bits per sample
        header[34] = 16
        header[35] = 0
        //data
        header[36] = 'd'.toByte()
        header[37] = 'a'.toByte()
        header[38] = 't'.toByte()
        header[39] = 'a'.toByte()
        header[40] = (totalAudioLen and 0xff).toByte()
        header[41] = (totalAudioLen shr 8 and 0xff).toByte()
        header[42] = (totalAudioLen shr 16 and 0xff).toByte()
        header[43] = (totalAudioLen shr 24 and 0xff).toByte()

        // Add raw data
        data.forEachIndexed { index, byte ->
            header[44 + index] = byte
        }

        return header
    }
Copy the code

The background

Introduce WebSocket dependencies in the SpringBoot project

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
Copy the code

Then create a new WebScoket class with the following code

import com.kapark.cloud.context.AudioSender;
import com.xiaoleilu.hutool.log.Log;
import com.xiaoleilu.hutool.log.LogFactory;
import org.springframework.stereotype.Component;

import javax.websocket.*;
import javax.websocket.server.PathParam;
import javax.websocket.server.ServerEndpoint;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.concurrent.ConcurrentHashMap;

/ * * *@author: hyzhan
 * @date: 2019/6/14
 * @desc: TODO
 */
@Component
@ServerEndpoint("/websocket/{devId}")
public class AudioSocket {

    private static Log log = LogFactory.get(AudioSocket.class);
    // Static variable, used to record the current number of online connections. It should be designed to be thread-safe.
    private static int onlineCount = 0;
    // A thread-safe Set for a concurrent package, used to hold the corresponding AudioSocket object for each client.
    private static ConcurrentHashMap<Integer, AudioSocket> webSocketMap = new ConcurrentHashMap<>();

    // A connection session with a client through which to send data to the client
    private Session session;

    / / receive the sid
    private int devId;

    /** * Connection established successfully called method */
    @OnOpen
    public void onOpen(Session session, @PathParam("devId") int devId) {
        this.session = session;
        this.devId = devId;

        webSocketMap.put(devId, this);
        addOnlineCount();           // The number of lines increases by 1
        log.info("New window starts listening :" + devId + ", the number of current online users is" + getOnlineCount());
    }

    /** * the connection closes the called method */
    @OnClose
    public void onClose(a) {
        webSocketMap.remove(devId, this);  // Delete from set
        subOnlineCount();           // The number of lines is reduced by 1
        log.info("There's a connection down! The number of current online users is" + getOnlineCount());
    }

    @OnMessage
    public void onMessage(String message, Session session) {
        log.info("Accept String:" + message);
    }

    /** * The method called after receiving the client message **@paramMessage Indicates the message sent by the client
    @OnMessage
    public void onMessage(byte[] message, Session session) {
        log.info("Accept byte length:" + message.length);
        AudioSender.send2Intercom(devId, message);
    }


    @OnError
    public void onError(Session session, Throwable error) {
        log.error("Error occurred");
        error.printStackTrace();
    }

    /** * Send a defined message */
    public static void send2Client(int devId, byte[] data, int len) {
        AudioSocket audioSocket = webSocketMap.get(devId);
        if(audioSocket ! =null) {
            try {
                synchronized (audioSocket.session) {
                    audioSocket.session.getBasicRemote().sendBinary(ByteBuffer.wrap(data, 0, len)); }}catch(IOException e) { e.printStackTrace(); }}}private static synchronized int getOnlineCount(a) {
        return onlineCount;
    }

    private static synchronized void addOnlineCount(a) {
        AudioSocket.onlineCount++;
    }

    private static synchronized void subOnlineCount(a) { AudioSocket.onlineCount--; }}Copy the code

We pay attention to the onMessage() and send2Client() methods. OnMessage () receives the message from the client and calls send2Intercom to send the audio to the corresponding intercom according to the device ID. The send2Intercom method is as follows:

public static void send2Intercom(int devId, byte[] data) {
        try {

            // Convert the incoming audio data into an inputStream
            InputStream inputStream = new ByteArrayInputStream(data);

            // Convert to audioInputStream according to inputStream
            AudioInputStream pcmInputStream = AudioSystem.getAudioInputStream(inputStream);
            // Convert PCM data type to GSM type
            AudioInputStream gsmInputStream = AudioSystem.getAudioInputStream(gsmFormat, pcmInputStream);

            // This byte size is self-adjusting
            byte[] tempBytes = new byte[50];
            int len;
            while((len = gsmInputStream.read(tempBytes)) ! = -1) {
                // Call the intercom related method (requestSendAudioData) to send to the intercomDongSDKProxy.requestSendAudioData(devId, tempBytes, len); }}catch(UnsupportedAudioFileException | IOException e) { e.printStackTrace(); }}Copy the code

The GMS format code is as follows

AudioFormat gsmFormat = new AudioFormat(org.tritonus.share.sampled.Encodings.getEncoding("GSM0610"),
                    8000.0 F.// sampleRate
                    -1.// sampleSizeInBits
                    1.// channels
                    33.// frameSize
                    50.0 F.// frameRate
                    false);
Copy the code

Pay attention to

Due to real-time voice transmission data, the default size is limited, so the background WebSocket needs to set BufferSize to transmit data larger, the reference code is as follows:

@Configuration
public class WebSocketConfig {

    @Bean
    public ServerEndpointExporter serverEndpointExporter(a) {
        return new ServerEndpointExporter();
    }

    @Bean
    public ServletServerContainerFactoryBean createWebSocketContainer(a) {
        ServletServerContainerFactoryBean container = new ServletServerContainerFactoryBean();
        container.setMaxTextMessageBufferSize(500000);
        container.setMaxBinaryMessageBufferSize(500000);
        returncontainer; }}Copy the code

The background side receives the onAudioData callback of the intercom, calls the AudioSender. send2Client method, decodes it, and sends it to the Android end

/** ** Audio data */
    @Override
    public int onAudioData(int dwDeviceID, InfoMediaData audioData) {
        String clazzName = Thread.currentThread().getStackTrace()[1].getMethodName();
        
        audioSender.send2Client(dwDeviceID, audioData.pRawData, audioData.nRawLen);
        return 0;
    }
Copy the code
public void send2Client(int devId, byte[] data, long total) {

        /** * 1, the data returned by the intercom is in GSM format * 2, the transmitted audio data (GSM) into inputStream * 3, inputStream into the corresponding audioInputStream audioInputStream * 5. After reading inputStream, send audio data to Android terminal */
        try (InputStream inputStream = new ByteArrayInputStream(data);
             AudioInputStream gsmInputStream = AudioSystem.getAudioInputStream(inputStream);
             AudioInputStream pcmInputStream = AudioSystem.getAudioInputStream(pcmFormat, gsmInputStream)) {
            
            // This byte size can be adjusted to suit your needs
            byte[] tempBytes = new byte[50];
            int len;
            while((len = pcmInputStream.read(tempBytes)) ! = -1) {
                // Call WebSocket and send it to the clientAudioSocket.send2Client(devId, tempBytes, len); }}catch(UnsupportedAudioFileException | IOException e) { e.printStackTrace(); }}Copy the code

The pcmFormat format code is as follows

// PCM_SIGNED 8000.0Hz, 16 bit, mono, 2 bytes/frame, little-endian
pcmFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                8000f.16.1.2.8000f.false);
Copy the code

Let’s go back to receiving data on Android:

private fun initWebSocket(a) {
        val uri = URI.create(Ws: / / "192.168.1.140:3014 / websocket / 16502")
        client = JWebSocketClient(uri) {
            valbuffer = ByteArray(trackBufferSize) it? .let { byteBuffer ->val inputStream = ByteArrayInputStream(byteBuffer.array())
                while (inputStream.available() > 0) {
                    val readCount = inputStream.read(buffer)
                    if (readCount == - 1) {
                        "No more data to read.".showLog()
                        break
                    }
                    audioTrack.write(buffer, 0, readCount)
                }
            }
        }
    }
Copy the code

JWebSocketClient{} here corresponds to the onMessage callback of WebSocket, which directly throws the read data into audioTrack. AudioTrack only plays PCM data, and we have transcoded it into PCM in the background. So you can just play it

The last

So far, the above is a relatively complete real-time voice flow. It is the first time to study Andorid audio related development. Maybe some knowledge points are not deeply understood. If there is something wrong, please give me some advice.

Attach the source code

Android source code portal

Back-end source portalBecause company code is involved, only the core Java classes are uploaded

Google transcoding onlineTo facilitate the test

Thanks for reading. See you next time.