1 introduction

1.1 the overview

The essay will record the whole process and technical points of the development of a basic camera live broadcast APP. The project uses X264 for video data processing, FAAC for audio data processing, and RTMP protocol for data streaming. The whole process is generally as follows.

The whole APP realizes the following functions:

  • Video preview before live broadcast
  • Start livestreaming, coding, and streaming
  • Switching cameras
  • Cease to live
  • Quit the application

The essay will document each technical point in terms of functionality.

1.2 CameraX

1.2.1 profile

From the developer. The android

CameraX is a Jetpack support library designed to help you simplify camera application development. It provides a consistent and easy-to-use API interface that works on most Android devices and is backward compatible up to Android 5.0 (API level 21).

While it leverages the capabilities of camera2, it uses a simpler, use-case-based approach with lifecycle awareness. It also addresses device compatibility issues, so you don’t need to include device-specific code in your code base. These features reduce the amount of code you need to write to add camera functionality to your application.

Finally, with CameraX, developers can take advantage of the same camera experience and functionality as a pre-installed camera app with just two lines of code. The CameraX Extensions is an optional plug-in that allows you to add portrait, HDR, Night mode, and beauty effects to your application on supported devices.

1.2.2 Basic use of CameraX

Please refer to Google’s official CameraX Demo, which is written using Kotlin.

Git clone github.com/android/cam…

1.2.3 Obtaining the original picture frame data

How to get raw picture frame data?

CameraX provides an image analysis interface: The Analyze () method is used to obtain an ImageProxy. Obviously, the ImageProxy is a proxy for the Image. ImageProxy can call all methods of the Image.

//Java
package androidx.camera.core;
//import ...
public final class ImageAnalysis extends UseCase {
    
    / /...
    
	public interface Analyzer {
        /** * Analyzes an image to produce a result. * * <p>This method is called once for each image from the camera, and called at the * frame rate of the camera. Each analyze call is executed sequentially. * * <p>The caller is responsible for ensuring this analysis method can be executed quickly * enough to prevent stalls in the image acquisition pipeline. Otherwise, newly available * images will not be acquired and analyzed. * * <p>The image passed to this method becomes invalid after  this method returns. The caller * should not store external references to this image, as these references will become * invalid. * * <p>Processing should complete within a single frame time of latency, or the image data * should be copied out for longer processing. Applications can be skip analyzing a frame * by having the analyzer return immediately. * *@param image           The image to analyze
         * @param rotationDegrees The rotation which if applied to the image would make it match
         *                        the current target rotation of {@link ImageAnalysis}, expressed in
         *                        degrees in the range {@code[0.. 360)}. * /
        void analyze(ImageProxy image, int rotationDegrees);
    }
    
    / /...
    
}
Copy the code

So what are the methods of ImageProxy?

//Java
// Get a clipping rectangle
Rect cropRect = image.getCropRect();
// Get the image format
int format = image.getFormat();
// Get the image height
int height = image.getHeight();
// Get the image width
int width = image.getWidth();
// Get the image
Image image1 = image.getImage();
// Get the image information
ImageInfo imageInfo = image.getImageInfo();
// Get the plane proxy of the image
ImageProxy.PlaneProxy[] planes = image.getPlanes();
// Get the timestamp of the image
long timestamp = image.getTimestamp();
Copy the code

★ Emphasize!

CameraX produces images in YUV_420_888 format. Therefore, when using ImageProxy objects, an error should be reported if the resulting image format does not match.

//Java
if(format ! = ImageFormat.YUV_420_888) {// Throw an exception
}
Copy the code

Secondly, imageproxy. PlaneProxy[] contains Y data, U data and V data of YUV, that is, the value of planes. Length is 3.

Here, we have found the frame data of the original picture. Next, we need to extract the Y data, U data and V data and arrange them according to I420 format.

Why I420?

Because the acquired image data then needs to be encoded, and in the process of encoding, the general encoder receives the data format to be encoded as I420.

1.3 YUV_420_888

The image data produced by CameraX is in YUV_420_888 format. This section introduces NV21 and I420 formats in YUV_420_888.

1.3.1 YUV_420_888 format

YUV represents the color space by Y, U and V components, where Y represents brightness and U and V represent chroma. (If the UV data is all zero, then we get a black and white image.)

Each pixel in RGB has independent R, G and B color components. YUV can be divided into YUV444, YUV422 and YUV420 according to the different number of U and V samples. YUV420 means that each pixel has an independent brightness representation, that is, Y component. Chromaticity, the U and V components, are shared by every four pixels. For example, for a 4×4 image, at YUV420, there are 16 Y values, 4 U values, and 4 V values.

  • In YUV420, each pixel has a unique Y, every four pixels share a U, every four pixels share a V;
  • In YUV420, the number of valid bytes of Y data =Height×Width;
  • In YUV420, the number of valid bytes of U data =(Height/2)×(Width/2);
  • In YUV420, the number of valid bytes of V data =(Height/2)×(Width/2);
  • In YUV420, the YUV data of each pixel point is shown in the following diagram.

★YUV format that the picture is formed by a brightness plane and two color plane superimposed (Y plane+U plane+V plane);

★ Official YUV three planes are called color plane (color plane), we are generally used to Y plane called brightness plane;

Planes [0] are Y planes, PLANES [1] are U planes, planes[2] are V planes, planes[0] are U planes, planes[2] are V planes.

★ImageProxy.PlaneProxy objects have three methods:

GetBuffer () gets the ByteBuffer containing the YUV data bytes;

GetPixelStride () to obtain the storage method of UV data (see 1.2.2 for details);

GetRowStride () gets the line span (see 1.2.3).

1.3.2 UV sequence under YUV420

YUV420 is divided into many different formats according to the storage order of color data, and the actual storage information of these formats is completely the same.

For example, for a 4×4 image, under YUV420, there are 16 Y values, 4 U values and 4 V values in any format, just the order of Y, U, and V varies by format.

  • I420 storage diagram is as follows, U and V are separated:

  • The storage diagram of NV21 is as follows: UVU in Planes [1]; In Planes [2], VUV:

★YUV420 is a class of formats, including I420. The 888 in YUV_420_888 indicates that all three components of YUV are expressed in 8 bits/1 byte. YUV420 cannot completely determine the storage order of color data (i.e., UV data), so the data of PLANES [1] and Planes [2] need to be processed in two different cases. As for the DATA of Y represented by Planes [0], it is the same in different storage order, so it does not need to be processed separately.

★ The UV data in NV21 storage format is redundant, we can get all the U data by using the even-indexed bytes of planes[1], we can get all the V data by using the even-indexed bytes of planes[2], the redundancy can be ignored.

★ How to determine whether the current image format is I420 or NV21? Imageproxy.planeproxy [] Each plane element in the imageProxy.planeProxy [] array has a getPixelStride() method that returns a value of 1 in the format I420; If it is 2, the format is NV21.

★ For planes[0], the getPixelStride() for planes[0] is 1.

1.2.3 RowStride

Google notes the getRowStride() method as follows:

        /**
         * <p>The row stride for this color plane, in bytes.</p>
         *
         * <p>This is the distance between the start of two consecutive rows of
         * pixels in the image. Note that row stried is undefined for some formats
         * such as
         * {@link android.graphics.ImageFormat#RAW_PRIVATE RAW_PRIVATE},
         * and calling getRowStride on images of these formats will
         * cause an UnsupportedOperationException being thrown.
         * For formats where row stride is well defined, the row stride
         * is always greater than 0.</p>
         */
        public abstract int getRowStride(a);
Copy the code

The line span of this color plane, in bytes, is the distance between the beginning of two consecutive lines of pixels in the image. Note that for some format, row span is undefined, try to obtain row span from these formats will trigger UnsupportedOperationException anomalies. The value of the row span must be greater than 0 for a format with a row span definition.

★ In other words, every row in planes[0/1/2].getBuffer() may have invalid data as well as valid data, depending on the relationship between rowStride and image width. One thing that can be determined is that the rowStride must be greater than or equal to the effective data length.

Continuing with the example of a 4×4 image:

I420/NV21 Y plane

  • rowStride=width

  • rowStride>width

I420 U/V plane

  • rowStride=width/2

  • rowStride>width/2

NV21 U/V plane

  • rowStride=width-1

  • rowStride>width-1

1.3 Project Structure

1.4 Third-party Libraries

This project uses 4 third-party libraries in native layer:

  1. X264 for video data processing;
  2. FAAC for audio data processing;
  3. Libyuv for image data processing (rotation, scaling);
  4. RTMPDump is used to send packets using the RTMP protocol.

Before the use of X264 and FAAC, the static connection library has been compiled in Linux system. The compilation process of the static connection library has been compiled in Linux with NDK.

The libyuv and RTMPDump libraries use source code directly.

Configure project CMakeLists

The project’s cmakelists.txt is edited as follows:

/cpp/CMakeLists.txt

#CMake
#/cpp/CMakeLists.txt

cmake_minimum_required(VERSION 3.4.1)

#native-lib.cpp
add_library(native-lib SHARED native-lib.cpp JavaCallHelper.cpp VideoChannel.cpp)

#rtmp
include_directories(${CMAKE_SOURCE_DIR}/librtmp)
add_subdirectory(librtmp)

#x264
include_directories(${CMAKE_SOURCE_DIR}/x264/armeabi-v7a/include)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -L${CMAKE_SOURCE_DIR}/x264/armeabi-v7a/lib")

#faac
include_directories(${CMAKE_SOURCE_DIR}/faac/armeabi-v7a/include)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -L${CMAKE_SOURCE_DIR}/faac/armeabi-v7a/lib")

#log-lib
find_library(log-lib log)

# Link native lib and the libraries it uses
target_link_libraries(native-lib ${log-lib} x264 faac rtmp)

#libyuv
include_directories(${CMAKE_SOURCE_DIR}/libyuv/include)
add_subdirectory(libyuv)
add_library(ImageUtils SHARED ImageUtils.cpp)

# link ImageUtils and the library it uses
target_link_libraries(ImageUtils yuv)
Copy the code

/cpp/librtmp/CMakeLists.txt

#CMake
#/cpp/librtmp/CMakeLists.txt

RTMPS is not supported
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DNO_CRYPTO")

Put all source files into the rtmp_source variable
file(GLOB rtmp_source *.c)

# Build static libraries
add_library(rtmp STATIC ${rtmp_source})
Copy the code

/cpp/libyuv/CMakeLists.txt

#CMake
#/cpp/libyuv/CMakeLists.txt

aux_source_directory(source LIBYUV)
add_library(yuv STATIC ${LIBYUV})
Copy the code

2 Video preview before live broadcast

2.1 a sequence diagram

When the user opens the APP and enters the home page, the camera is opened and the preview screen is rendered to TextureView. When MainActivity enters the onCreate life cycle, the sequence diagram looks like this:

2.2 Process Description

When the user opens the APP, MainActivity enters the onCreate life cycle and performs three actions:

  • Step 1, createRtmpClientObject, byMainActivityHolding the object; Called in its constructornativeInit()Method that initializes the necessary components of the Native layer to createJavaCallHelperObject, bynative-lib.cppHolds the object.
  • The second step is to invoke the objectrtmpClienttheinitVideo()Method, in which the Java layer’sVideoChannelObject, byrtmpClientHolding the object; inVideoChannelConstructor, creates and starts an “analysis-thread” Thread, which is called periodicallyCameraXTo provide theanalyze()Callbacks, in which we first monitorrtmpClient.isConnectedIdentifier to judge whether it is in the live broadcast state, if so, it will enter the encoding push stream operation; Otherwise, the current preview-only operation is maintained. completeVideoChannelAfter the construction operation of,initVideo()Continue calling native methodsinitVideoEnc(), initializes the video encoder of native layer and prepares to enter the live broadcast state at all times.
  • Third, call the objectrtmpClienttheinitAudio()Method, in which the Java layer’sAudioChannelObject, byrtmpClientHolding the object; inAudioChannelIn the constructor, create and start an “audio-recode” thread, in which the Audio will be encoded in the live state; completeAudioChannelAfter the construction of,initAudio()Method continues to call native methodsinitAudioEnc(), initializes the audio encoder of native layer and prepares to enter the live broadcast state at all times.

2.3 Key Codes

2.3.1 Preparing a Video encoder

The following code corresponds to the videoChannel.openCodec (int,int,int,int)->void method in the sequence diagram. In this method, some x264 video parameters are set. Pay attention to the code comments.

//C++
//VideoChannel.cpp
void VideoChannel::openCodec(int width, int height, int fps, int bitrate) {
    // Encoder parameters
    x264_param_t param;
    // Ultrafast: Ultrafast encoding is fast and quality control
    // Zerolatency: no delay coding, real-time communication
    x264_param_default_preset(&param, "ultrafast"."zerolatency");
    //main base_line high
    // Base_line 3.2 Encoding Specifications No B-frame (minimum amount of data, but slowest decoding speed)
    param.i_level_idc = 32;
    // Input data format
    param.i_csp = X264_CSP_I420;
    param.i_width = width;
    param.i_height = height;
    / / no b frames
    param.i_bframe = 0;
    // i_rc_method indicates bit rate control, CQP(constant mass), CRF(constant bit rate), ABR(average bit rate)
    param.rc.i_rc_method = X264_RC_ABR;
    // Bit rate (bit rate in Kbps)
    param.rc.i_bitrate = bitrate / 1000;
    // Instantaneous maximum bit rate
    param.rc.i_vbv_max_bitrate = bitrate / 1000 * 1.2;
    / / frame rate
    param.i_fps_num = fps;
    param.i_fps_den = 1;
    param.pf_log = x264_log_default2;
    // Frame distance (keyframe) 2s one keyframe
    param.i_keyint_max = fps * 2;
    // Whether to copy SPS and PPS before each keyframe. This parameter is set so that each keyframe (I frame) has SPS/PPS attached
    param.b_repeat_headers = 1;
    // Do not use parallel encoding. Set param.rc. i_lookAhead to 0 in zerolatency scenarios
    // Then the encoder encodes a frame, no parallel, no delay
    param.i_threads = 1;
    param.rc.i_lookahead = 0;
    x264_param_apply_profile(&param, "baseline");
    codec = x264_encoder_open(&param);
    ySize = width * height;
    uSize = (width >> 1) * (height >> 1);
    this->width = width;
    this->height = height;
}
Copy the code

2.3.2 Preparing an audio encoder

The following code corresponds to the audiochannel. openCodec:(int,int)->void method in the sequence diagram, mainly setting the FAAC audio parameters. Pay attention to the code comments.

//C++
//AudioChannel.cpp
void AudioChannel::openCodec(int sampleRate, int channels) {
    // Input sample: the number of samples to be sent to the encoder
    unsigned long inputSamples;
    codec = faacEncOpen(sampleRate, channels, &inputSamples, &maxOutputBytes);
    // The sample is 16 bits, so a sample is 2 bytes
    inputByteNum = inputSamples * 2;
    outputBuffer = static_cast<unsigned char* > (malloc(maxOutputBytes));
    // Get the parameters of the current encoder
    faacEncConfigurationPtr configurationPtr = faacEncGetCurrentConfiguration(codec);
    configurationPtr->mpegVersion = MPEG4;
    configurationPtr->aacObjectType = LOW;
    //1. The result data of each frame of audio encoding will carry ADTS (a data header containing sampling, sound channel and other information)
    //0. Encode aAC raw data
    configurationPtr->outputFormat = 0;
    configurationPtr->inputFormat = FAAC_INPUT_16BIT;
    faacEncSetConfiguration(codec, configurationPtr);
}
Copy the code

3 Start Livestreaming

3.1 Connecting to the Streaming Media Server

3.1.1 sequence diagram

When the user clicks the start live button on the page, the first thing the client needs to do is connect to the streaming server. The sequence diagram is as follows:

3.1.2 Process Description

The connection to the streaming media server needs to be made in the native layer with the help of the RTMPDump library. At the same time, this process involves network requests, so the connection process needs to be made in a new thread.

  • When the user clicksBegin to liveButton, the APP will eventually create a new one in the Native layerpthreadAsynchronously perform the connection to the server;
  • Performed asynchronouslyvoid *connect(void *args)In the method, with the help ofRTMPDumpLibrary attempts to connect to streaming media server;
  • After a successful connection to the streaming media server, native layer will be utilizedJavaCallHelperObject –helpertheonPrepare()Method that calls the Java layer’srtmpClientThe object’sonPrepare()Methods; Within this method, first, setrtmpClienttheisConnectedIdentified astrue, start coding video; callaudioChannel.start()Method to encode audio taskspostGo to the “audio-recode” thread and start encoding the Audio.

3.1.3 Key code

The thread connecting to the streaming media server is enabled

The following code corresponds to the JNI_connect (JNIEnv *,jobject, jString)->void function in the sequence diagram, where the program starts a new thread.

//C++
//native-lib.cpp
extern "C"
JNIEXPORT void JNICALL
Java_com_tongbo_mycameralive_RtmpClient_connect( JNIEnv *env, jobject thiz, jstring url_ ) {
    const char *url = env->GetStringUTFChars(url_, 0);
    path = new char[strlen(url) + 1];
    strcpy(path, url);
    pthread_create(&ptid, NULL, connect, 0);
    env->ReleaseStringUTFChars(url_, url);
}
Copy the code

Attempt to connect to the streaming media server

Attention! Where, the member method of RtmpClient connect(String URL) is a native method, which starts a new thread in its JNI implementation and asynchronously executes void *connect(void *args) function. In this function, We use the API provided by RTMPDump library to connect to the streaming media server. Void *connect(void *args); void *connect(void *args); void *connect(void *args);

//C++
//native-lib.cpp

/ /...

VideoChannel *videoChannel = 0;
AudioChannel *audioChannel = 0;
JavaVM *javaVM = 0;
JavaCallHelper *helper = 0;
pthread_t pid;
char *path = 0;
RTMP *rtmp = 0;
uint64_t startTime;

/ /...

void *connect(void *args) {
    int ret;
    rtmp = RTMP_Alloc();
    RTMP_Init(rtmp);
    do {
        // Parse the url (possible failure, invalid address)
        ret = RTMP_SetupURL(rtmp, path);
        if(! ret) {//TODO:Notify Java of a problem with address passing (not implemented)
            break;
        }
        // Enable the output mode, only pull stream playback, do not need to enable
        RTMP_EnableWrite(rtmp);
        ret = RTMP_Connect(rtmp, 0);
        if(! ret) {//TODO:Notify the Java server of connection failure (not implemented)
            break;
        }
        ret = RTMP_ConnectStream(rtmp, 0);
        if(! ret) {//TODO:Notifying Java that it is not connected to the stream (not implemented)
            break;
        }
        // Send audio specific config(tell the player how to decode the audio stream I'm pushing)
        RTMPPacket *packet = audioChannel->getAudioConfig();
        callback(packet);
    } while (false);
	//TODO:Clean up variable space to prevent memory leaks
    if(! ret) { RTMP_Close(rtmp); RTMP_Free(rtmp); rtmp =0;
    }
    delete (path);
    path = 0;
    //TODO:Notify the Java layer that it is ready to start pushing streams
    helper->onParpare(ret);
    startTime = RTMP_GetTime();
    return 0;
}
Copy the code

Use JavaCallHelper to inform the Java layer that the connection was successful

After the native layer connection is successful, call the Java layer rtmpclient.onprepare () method with the following code corresponding to helper.onprepare :(jboolean,int)->void.

//C++
//JavaCallHelper.cpp
void JavaCallHelper::onParpare(jboolean isConnect, int thread) {
    if (thread == THREAD_CHILD) {
        JNIEnv *jniEnv;
        if (javaVM->AttachCurrentThread(&jniEnv, 0) != JNI_OK) {
            return;
        }
        jniEnv->CallVoidMethod(jobj, jmid_prepare, isConnect);
        javaVM->DetachCurrentThread();
    } else{ env->CallVoidMethod(jobj, jmid_prepare); }}Copy the code

3.2 Video Coding

3.2.1 sequence diagram

When rtmpClient.isConnected is set to true, CameraX performs the analyze() call, and the video encoding operation is performed as follows:

3.2.2 Process Description

The analyze() callback is executed when the user enters the APP to initialize the CameraX and start the preview. The rtmpClient.isConnected flag is false. When this flag is set to true, the video encoding process is divided into two steps:

  • The first step is to get image bytes.
  • The second step is to send image data.

3.2.3 Key codes

Get image bytes

For the six cases in 1.2.3, we process each line of each plane in units of one byte. The following code corresponds to the [rtmpClient.isConnected==true] imageutils.getBytes :(ImageProxy,int,int,int)->byte[] part of the sequence diagram:

//Java
//ImageUtils.java
package com.tongbo.mycameralive;

import android.graphics.ImageFormat;
import androidx.camera.core.ImageProxy;
import java.nio.ByteBuffer;

public class ImageUtils {

    static {
        System.loadLibrary("ImageUtils");
    }
    
    static ByteBuffer i420;
    static byte[] scaleBytes;

    public static byte[] getBytes(ImageProxy image, int rotationDegrees, int width, int height) {
        int format = image.getFormat();
        if(format ! = ImageFormat.YUV_420_888) {// Throw an exception
        }
        // Create a ByteBuffer i420 object with height*width*3/2 to store the last i420 image data
        int size = height * width * 3 / 2;
        //TODO:Preventing Memory jitter
        if (i420 == null || i420.capacity() < size) {
            i420 = ByteBuffer.allocate(size);
        }
        i420.position(0);
        / / YUV planes arrays
        ImageProxy.PlaneProxy[] planes = image.getPlanes();
        //TODO:Take the Y data and put it in i420
        int pixelStride = planes[0].getPixelStride();
        ByteBuffer yBuffer = planes[0].getBuffer();
        int rowStride = planes[0].getRowStride();
        If rowStride equals Width, skipRow is an empty array
        //2. If the rowStride is larger than Width, skipRow can store the extra bytes in each row
        byte[] skipRow = new byte[rowStride - width];
        byte[] row = new byte[width];
        for (int i = 0; i < height; i++) {
            yBuffer.get(row);
            i420.put(row);
            //1. If not the last line, put invalid placeholder data into skipRow array
            //2. If there is no invalid placeholder data in the last line, no processing is required. Otherwise, an error is reported
            if (i < height - 1) { yBuffer.get(skipRow); }}//TODO:Take out the U/V data and put it in i420
        for (int i = 1; i < 3; i++) {
            ImageProxy.PlaneProxy plane = planes[i];
            pixelStride = plane.getPixelStride();
            rowStride = plane.getRowStride();
            ByteBuffer buffer = plane.getBuffer();

            int uvWidth = width / 2;
            int uvHeight = height / 2;
            
            // Process one line at a time
            for (int j = 0; j < uvHeight; j++) {
                // Process one byte at a time
                for (int k = 0; k < rowStride; k++) {
                    //1
                    if (j == uvHeight - 1) {
                        I420: UV not mixed together, rowStride is greater than or equal to Width/2, if it is the last row, ignore the space data
                        if (pixelStride == 1 && k >= uvWidth) {
                            break;
                        }
                        NV21: UV mixed together, rowStride is greater than or equal to width-1, if it is the last row, ignore the proportion
                        if (pixelStride == 2 && k >= width - 1) {
                            break; }}//2. Not the last line
                    byte b = buffer.get();
                    //1.I420: UV does not mix together, only saves valid data with even index, ignores placeholder data
                    if (pixelStride == 1 && k < uvWidth) {
                        i420.put(b);
                        continue;
                    }
                    //2.NV21: UV is mixed together to store only valid data with even indexes, ignoring placeholder data
                    if (pixelStride == 2 && k < width - 1 && k % 2= =0) {
                        i420.put(b);
                        continue; }}}}//TODO:Convert the I420 data into a byte array, perform the rotation, and return
        int srcWidth = image.getWidth();
        int srcHeight = image.getHeight();
        byte[] result = i420.array();
        if (rotationDegrees == 90 || rotationDegrees == 270) {
            result = rotate(result, width, height, rotationDegrees);
            srcWidth = image.getHeight();
            srcHeight = image.getWidth();
        }
        if(srcWidth ! = width || srcHeight ! = height) {//todo JNI modiates the value of scaleBytes to avoid memory jitter
            int scaleSize = width * height * 3 / 2;
            if (scaleBytes == null || scaleBytes.length < scaleSize) {
                scaleBytes = new byte[scaleSize];
            }
            scale(result, scaleBytes, srcWidth, srcHeight, width, height);
            return scaleBytes;
        }
        return result;
    }
    
    private static native byte[] rotate(byte[] data, int width, int height, int degress);
    
    private native static void scale(byte[] src, byte[] dst, int srcWidth, int srcHeight, int dstWidth, int dstHeight);
    
}
Copy the code

★ The native function rotate() is implemented as follows. The core code of image rotation is libyuv::I420Rotate(). The rotation is required because the data is offset by 90 or 270 degrees by default and needs to be manually restored.

//C++
//ImageUtils.cpp
#include <jni.h>
#include <libyuv.h>

extern "C"
JNIEXPORT jbyteArray JNICALL
Java_com_tongbo_mycameralive_ImageUtils_rotate(JNIEnv *env, jclass thiz, jbyteArray data_, jint width, jint height, jint degress) {
	//TODO:Prepare the incoming arguments for the libyuv::I420Rotate() function
    jbyte *data = env->GetByteArrayElements(data_, 0);
    uint8_t *src = reinterpret_cast<uint8_t *>(data);
    int ySize = width * height;
    int uSize = (width >> 1) * (height >> 1);
    int size = (ySize * 3) > >1;
    uint8_t dst[size];

    uint8_t *src_y = src;
    uint8_t *src_u = src + ySize;
    uint8_t *src_v = src + ySize + uSize;

    uint8_t *dst_y = dst;
    uint8_t *dst_u = dst + ySize;
    uint8_t *dst_v = dst + ySize + uSize;
	
    //TODO:Call the libyuv::I420Rotate() function
    libyuv::I420Rotate(src_y, width, src_u, width >> 1, src_v, width >> 1,
                       dst_y, height, dst_u, height >> 1, dst_v, height >> 1,
                       width, height, static_cast<libyuv::RotationMode>(degress));

    //TODO:Prepare return value
    jbyteArray result = env->NewByteArray(size);
    env->SetByteArrayRegion(result, 0, size, reinterpret_cast<const jbyte *>(dst));

    env->ReleaseByteArrayElements(data_, data, 0);
    return result;
}
Copy the code

★ The implementation of native function scale() is:

//C++
//ImageUtils.cpp
extern "C"
JNIEXPORT void JNICALL
Java_com_tongbo_mycameralive_ImageUtils_scale(JNIEnv *env, jclass clazz, jbyteArray src_, jbyteArray dst_, jint srcWidth, jint srcHeight, jint dstWidth, jint dstHeight) {
    jbyte *data = env->GetByteArrayElements(src_, 0);
    uint8_t *src = reinterpret_cast<uint8_t *>(data);

    int64_t size = (dstWidth * dstHeight * 3) > >1;
    uint8_t dst[size];
    uint8_t *src_y;
    uint8_t *src_u;
    uint8_t *src_v;
    int src_stride_y;
    int src_stride_u;
    int src_stride_v;
    uint8_t *dst_y;
    uint8_t *dst_u;
    uint8_t *dst_v;
    int dst_stride_y;
    int dst_stride_u;
    int dst_stride_v;

    src_stride_y = srcWidth;
    src_stride_u = srcWidth >> 1;
    src_stride_v = src_stride_u;

    dst_stride_y = dstWidth;
    dst_stride_u = dstWidth >> 1;
    dst_stride_v = dst_stride_u;

    int src_y_size = srcWidth * srcHeight;
    int src_u_size = src_stride_u * (srcHeight >> 1);
    src_y = src;
    src_u = src + src_y_size;
    src_v = src + src_y_size + src_u_size;

    int dst_y_size = dstWidth * dstHeight;
    int dst_u_size = dst_stride_u * (dstHeight >> 1);
    dst_y = dst;
    dst_u = dst + dst_y_size;
    dst_v = dst + dst_y_size + dst_u_size;

    libyuv::I420Scale(src_y, src_stride_y,
                      src_u, src_stride_u,
                      src_v, src_stride_v,
                      srcWidth, srcHeight,
                      dst_y, dst_stride_y,
                      dst_u, dst_stride_u,
                      dst_v, dst_stride_v,
                      dstWidth, dstHeight,
                      libyuv::FilterMode::kFilterNone);
    env->ReleaseByteArrayElements(src_, data, 0);

    env->SetByteArrayRegion(dst_, 0, size, reinterpret_cast<const jbyte *>(dst));
}
Copy the code

X264 video encoding

Encode with the API of X264 library, the following code corresponds to the videoChannel.encode:(Uint8_t *)->void part of the sequence diagram:

//C++
//VideoChannel.cpp
void VideoChannel::encode(uint8_t *data) {
    // Output data to be encoded
    x264_picture_t pic_in;
    x264_picture_alloc(&pic_in, X264_CSP_I420, width, height);

    pic_in.img.plane[0] = data;
    pic_in.img.plane[1] = data + ySize;
    pic_in.img.plane[2] = data + ySize + uSize;
    //TODO:Encoded i_pts, each time needed to grow
    pic_in.i_pts = i_pts++;

    x264_picture_t pic_out;
    x264_nal_t *pp_nal;
    int pi_nal;
    //pi_nal: indicates the number of nal outputs
    int error = x264_encoder_encode(codec, &pp_nal, &pi_nal, &pic_in, &pic_out);
    if (error <= 0) {
        return;
    }
    int spslen, ppslen;
    uint8_t *sps;
    uint8_t *pps;
    for (int i = 0; i < pi_nal; ++i) {
        int type = pp_nal[i].i_type;
        / / data
        uint8_t *p_payload = pp_nal[i].p_payload;
        // Data length
        int i_payload = pp_nal[i].i_payload;
        if (type == NAL_SPS) {
            // we can use PPS to create a new environment
            spslen = i_payload - 4; // remove the interval 00 00 00 01
            sps = (uint8_t *) alloca(spslen); // Stack application, no need to release
            memcpy(sps, p_payload + 4, spslen);
        } else if (type == NAL_PPS) {
            ppslen = i_payload - 4; // remove the interval 00 00 00 01
            pps = (uint8_t *) alloca(ppslen);
            memcpy(pps, p_payload + 4, ppslen);

            // We need to send an SPS and PPS before sending an I frame
            sendVideoConfig(sps, pps, spslen, ppslen);
        } else{ sendFrame(type, p_payload, i_payload); }}}Copy the code

X264 package video configuration information

According to X264 and video format standards, the header of video configuration information frame and video image data frame are different, so two functions are divided to package data respectively. The following code corresponds to the [data[I] is config]sendVideoConfig:(uint8_t *,uint8_t *,int,int)->void part of the sequence diagram:

//C++
//VideoChannel.cpp
void VideoChannel::sendVideoConfig(uint8_t *sps, uint8_t *pps, int spslen, int ppslen) {
    int bodySize = 13 + spslen + 3 + ppslen;
    RTMPPacket *packet = new RTMPPacket;
    RTMPPacket_Alloc(packet, bodySize);

    int i = 0;
    / / fixed head
    packet->m_body[i++] = 0x17;
    / / type
    packet->m_body[i++] = 0x00;
    //composition time 0x000000
    packet->m_body[i++] = 0x00;
    packet->m_body[i++] = 0x00;
    packet->m_body[i++] = 0x00;

    / / version
    packet->m_body[i++] = 0x01;
    // Code specification
    packet->m_body[i++] = sps[1];
    packet->m_body[i++] = sps[2];
    packet->m_body[i++] = sps[3];
    packet->m_body[i++] = 0xFF;

    / / the whole SPS
    packet->m_body[i++] = 0xE1;
    / / SPS length
    packet->m_body[i++] = (spslen >> 8) & 0xff;
    packet->m_body[i++] = spslen & 0xff;
    memcpy(&packet->m_body[i], sps, spslen);
    i += spslen;

    //pps
    packet->m_body[i++] = 0x01;
    packet->m_body[i++] = (ppslen >> 8) & 0xff;
    packet->m_body[i++] = (ppslen) & 0xff;
    memcpy(&packet->m_body[i], pps, ppslen);

    packet->m_packetType = RTMP_PACKET_TYPE_VIDEO;
    packet->m_nBodySize = bodySize;
    packet->m_headerType = RTMP_PACKET_SIZE_MEDIUM;
    // Timestamp SPS and PPS (not images) have no timestamp
    packet->m_nTimeStamp = 0;
    // Use relative time
    packet->m_hasAbsTimestamp = 0;
    // Give a random channel, just avoid rtmp.c
    packet->m_nChannel = 0x10;
    callback(packet);
}
Copy the code

X264 wraps video image data

The following code corresponds to the [data[I] is config]sendFrame:(int,uint8_t *,int)->void part of the sequence diagram:

//C++
//VideoChannel.cpp
void VideoChannel::sendFrame(int type, uint8_t *p_payload, int i_payload) {
    // Remove 00 00 00 01/00 00 01
    if (p_payload[2] = =0x00) {
        i_payload -= 4;
        p_payload += 4;
    } else if (p_payload[2] = =0x01) {
        i_payload -= 3;
        p_payload += 3;
    }
    RTMPPacket *packet = new RTMPPacket;
    int bodysize = 9 + i_payload;
    RTMPPacket_Alloc(packet, bodysize);
    RTMPPacket_Reset(packet);
    //int type = payload[0] & 0x1f;
    packet->m_body[0] = 0x27;
    / / key frames
    if (type == NAL_SLICE_IDR) {
        packet->m_body[0] = 0x17;
    }
    / / type
    packet->m_body[1] = 0x01;
    / / timestamp
    packet->m_body[2] = 0x00;
    packet->m_body[3] = 0x00;
    packet->m_body[4] = 0x00;
    // The length of an int is 4 bytes, which is equivalent to converting an int into a 4 byte array
    packet->m_body[5] = (i_payload >> 24) & 0xff;
    packet->m_body[6] = (i_payload >> 16) & 0xff;
    packet->m_body[7] = (i_payload >> 8) & 0xff;
    packet->m_body[8] = (i_payload) & 0xff;

    // Image data
    memcpy(&packet->m_body[9], p_payload, i_payload);

    packet->m_hasAbsTimestamp = 0;
    packet->m_nBodySize = bodysize;
    packet->m_packetType = RTMP_PACKET_TYPE_VIDEO;
    packet->m_nChannel = 0x10;
    packet->m_headerType = RTMP_PACKET_SIZE_LARGE;
    callback(packet);
}
Copy the code

3.3 Audio Coding

3.3.1 sequence diagram

3.3.2 Process Description

  • The above sequence diagram is fromJavaCallHelpercallrtmpClient.onPrepare()Method start, inaudioChannel.start()Method to pass the audio encoding taskpost(new Runnable(){... })Submitted to thehandlerThe binding oflooper.
  • In the audio coding task, first create oneAudioRecordObject –audioRecord, the object is specified byaudioChannelHold it and then use itaudioRecordStart recording, and loop reading until exit recording state.

3.3.3 Key codes

Submit the audio encoding task

The rtmpClient.onprepare () method is called, in which the isConnected flag is set to true; The audiochannel.start () method is then called to submit the Audio encoding task to the “audio-recode” thread, The following code corresponds to the Audiochannel.start ()->void and handler.post (Runnable)-> Boolean sections in the sequence diagram:

//Java
//AudioChannel.java
    public void start(a) {
        handler.post(new Runnable() {
            @Override
            public void run(a) {
                audioRecord = new AudioRecord(
                        MediaRecorder.AudioSource.MIC,
                        sampleRate,
                        channelConfig,
                        AudioFormat.ENCODING_PCM_16BIT,
                        minBufferSize
                );
                audioRecord.startRecording();
                while (audioRecord.getRecordingState() == AudioRecord.RECORDSTATE_RECORDING) {
                    int len = audioRecord.read(buffer, 0, buffer.length);
                    if (len > 0) {
                        // Sample size = number of bytes /2 bytes (16 bits)
                        rtmpClient.sendAudio(buffer, len >> 1); }}}}); }Copy the code

Audio coding

The following code corresponds to audiochannel.encode :(int32_t *)->void part of the sequence diagram:

//C++
//AudioChannel.cpp
void AudioChannel::encode(int32_t *data, int len) {
    //len: number of input samples
    //outputBuffer: output, encoded results
    //maxOutputBytes: The number of bytes that the encoding result cache can receive
    int bytelen = faacEncEncode(codec, data, len, outputBuffer, maxOutputBytes);
    if (bytelen > 0) {

        RTMPPacket *packet = new RTMPPacket;
        RTMPPacket_Alloc(packet, bytelen + 2);
        packet->m_body[0] = 0xAF;
        packet->m_body[1] = 0x01;

        memcpy(&packet->m_body[2], outputBuffer, bytelen);

        packet->m_hasAbsTimestamp = 0;
        packet->m_nBodySize = bytelen + 2;
        packet->m_packetType = RTMP_PACKET_TYPE_AUDIO;
        packet->m_nChannel = 0x11; packet->m_headerType = RTMP_PACKET_SIZE_LARGE; callback(packet); }}Copy the code

3.4 Audio and video streaming

It should be noted that each time an audio or video frame is wrapped according to the protocol, a callback(RTMPPacket *packet) callback is called to send the data out using the RTMPDump library. Audio and video use the same callback(RTMPPacket *packet), but the data passed in is audio and video respectively.

  • thecallback()The definition of a callback is
//C++
//Callback.h

#ifndef PUSHER_CALLBACK_H
#define PUSHER_CALLBACK_H


#include <rtmp.h>

typedef void (*Callback)(RTMPPacket *);

#endif //PUSHER_CALLBACK_H
Copy the code
  • Its concrete implementation is
//C++
//native-lib.cpp
void callback(RTMPPacket *packet) {
    if (rtmp) {
        packet->m_nInfoField2 = rtmp->m_stream_id;
        // Use relative time
        packet->m_nTimeStamp = RTMP_GetTime() - startTime;
        // Put it in the queue
        RTMP_SendPacket(rtmp, packet, 1);
    }
    RTMPPacket_Free(packet);
    delete (packet);
}
Copy the code

4 Camera Switching

4.1 a sequence diagram

4.2 Process Description

When the user clicks the button to switch the camera, the method bound to MainActivity is first called. MainActivity interacts with RtmpClient, the object of RtmpClient. Through the rtmpClient operating audio and video respectively, finally in rtmpClient. ToggleCamera () method calls to the implementation of specific switching camera videoChannel. ToggleCamera () method.

4.3 Key Codes

The implementation of specific switch camera videoChannel. ToggleCamera () method:

//Java
//VideoChannel.java
    public void toggleCamera(a) {
        CameraX.unbindAll();
        if (currentFacing == CameraX.LensFacing.BACK) {
            currentFacing = CameraX.LensFacing.FRONT;
        } else {
            currentFacing = CameraX.LensFacing.BACK;
        }
        CameraX.bindToLifecycle(lifecycleOwner, getPreView(), getAnalysis());
    }
Copy the code

5 Stop Live streaming

5.1 a sequence diagram

5.2 Process Description

The above process starts when the user clicks the stop live broadcast button. First, the stopLive(View View) method of the binding button in MainActivity is called, and the rtmpClient.stop() method is called to stop the live streaming of audio and video respectively.

  • Video: Attention! Stopping live streaming does not mean quitting the APP, so it is still necessaryReserve video preview, so the video processing in the sequence diagram mainly lies in the reset of the video encoderi_pts=0And, inCameraXtheanalyze()Flag bit to be checked in the callbackrtmpClient.isConnected, position the flag asfalse.
  • Audio: Key needs to be calledaudioRecord.stop()Method, stop recording.
  • RTMP:JNI_disConnect:(JNIEnv *,jobject)->voidIn the function, releasertmpWhat the pointer points to.

5.3 Key Codes

The JNI layer is disconnected

JNI_disConnect:(JNIEnv *,jobject)->void

//C++
//native-lib.cpp
extern "C"
JNIEXPORT void JNICALL
Java_com_tongbo_mycameralive_RtmpClient_disConnect(JNIEnv *env, jobject thiz) {
    pthread_mutex_lock(&mutex);
    if (rtmp) {
        RTMP_Close(rtmp);
        RTMP_Free(rtmp);
        rtmp = 0;
    }
    if (videoChannel) {
        videoChannel->resetPts();
    }
    pthread_mutex_unlock(&mutex);
}
Copy the code

6 Exiting an Application

6.1 a sequence diagram

6.2 Process Description

When Android’s execution lifecycle callback calls the onDestroy() method, we override the onDestroy() method and call the rtmpClient-release () method, where we stop and free memory resources, respectively. The core release process has been marked in red on the sequence diagram. So far, all functions of the basic live broadcast APP have been realized.

Welcome to find fault.