AVPacket stores encoded frames, which are typically output by Demuxer and passed as input to a decoder, or received from encoder as output and passed to Muxer.

For video, it should usually contain a compressed frame. For audio, it may contain several compressed frames. Encoder allows you to output empty packets that contain no compressed data and only side data (for example, updating some stream parameters at the end of encoding).

The semantics of data ownership depend on the BUF field. If this value is set, Packet data is dynamically allocated and valid indefinitely until the call to AV_packet_unref () reduces the reference count to 0.

If the BUF field is not set, av_packet_ref() makes a copy instead of increasing the reference count.

Side data is always allocated by av_malloc(), copied by av_packet_ref(), and released by av_packet_unref().

Sizeof (AVPacket) is deprecated as part of the public ABI. Once the av_init_packet() function is removed, new packets will only be allocated by av_packet_alloc(), and new fields may be added to the end of the structure.

AVPacket is first studied, then AVBufferRef and AVPacketSideData are derived from AVPacket structure, and finally AVBuffer is derived from AVBufferRef. AVPacketSideData leads to the AVPacketSideDataType enumeration.

First, the AVPacket

libavcodec/packet.h

typedef struct AVPacket {
    /** * A reference to the reference-counted buffer where the packet data is * stored. * May be NULL, then the packet data is not reference-counted. */
    AVBufferRef *buf;
    /** * Presentation timestamp in AVStream->time_base units; the time at which * the decompressed packet will be presented to the user. * Can be AV_NOPTS_VALUE if it is not stored in the file. * pts MUST be larger or equal to dts as presentation cannot happen before * decompression, unless one wants to view hex dumps. Some formats misuse * the terms dts and pts/cts to mean something different. Such timestamps * must be converted to true pts/dts before they are stored in AVPacket. */
    int64_t pts;
    /** * Decompression timestamp in AVStream->time_base units; the time at which * the packet is decompressed. * Can be AV_NOPTS_VALUE if it is not stored in the file. */
    int64_t dts;
    uint8_t *data;
    int   size;
    int   stream_index;
    /** * A combination of AV_PKT_FLAG values */
    int   flags;
    /** * Additional packet data that can be provided by the container. * Packet can contain several types of side information. */
    AVPacketSideData *side_data;
    int side_data_elems;

    /** * Duration of this packet in AVStream->time_base units, 0 if unknown. * Equals next_pts - this_pts in presentation order. */
    int64_t duration;

    int64_t pos;                            ///< byte position in stream, -1 if unknown

#if FF_API_CONVERGENCE_DURATION
    /** * @deprecated Same as the duration field, but as int64_t. This was required * for Matroska subtitles, whose duration values could overflow when the * duration field was still an int. */
    attribute_deprecated
    int64_t convergence_duration;
#endif
} AVPacket;
Copy the code

Here’s what each field means.

field meaning
AVBufferRef * buf A reference to the reference count buffer that stores packet data.
int64_t pts Use AVStream->time_base to display the timestamp of the unpacked packet to the user.
int64_t dts Use AVStream->time_base to unpack the timestamp, the time when the packet was unpacked.
uint8_t * data Packet’s actual data buffer.
int size Packet Indicates the actual data size.
int stream_index Flow index.
int flags AV_PKT_FLAG value combination.
AVPacketSideData * side_data Additional data that the container can provide.
int side_data_elems Side_data Specifies the number of elements.
int64_t duration The duration of this packet is expressed in AVStream->time_base, or 0 if unknown.
int64_t pos The byte position in the stream, or -1 if unknown.

The following are the combinations of values available for AV_PKT_FLAG.

libavcodec/packet.h

#define AV_PKT_FLAG_KEY 0x0001 / / key frames
#define AV_PKT_FLAG_CORRUPT 0x0002 // Corrupted data
#define AV_PKT_FLAG_DISCARD 0x0004 // Used to discard packets that need to remain in a valid decoder state but do not need output, and should be discarded after decoding.
#define AV_PKT_FLAG_TRUSTED   0x0008 // Packet is from a trusted source.
#define AV_PKT_FLAG_DISPOSABLE 0x0010 // Indicates packets containing frames that can be discarded by the decoder, i.e., non-reference frames.
Copy the code

Second, the AVBufferRef

A reference to a data buffer. The size of this structure is not part of the common ABI and is not intended to be assigned directly.

libavutil/buffer.h

typedef struct AVBufferRef {
     AVBuffer *buffer;
 
     /** * The data buffer. It is considered writable if and only if * this is the only reference to the buffer, in which case * av_buffer_is_writable() returns 1. */
     uint8_t *data;
     /** * Size of data in bytes. */
 #if FF_API_BUFFER_SIZE_T
     int      size;
 #else
     size_t   size;
 #endif
} AVBufferRef;
Copy the code
field meaning
AVBuffer *buffer A reference – counting buffer type. It is opaque, meaning it is used by reference (AVBufferRef).
uint8_t *data Data buffer. This is considered writable if and only if it is the only reference to the buffer, in which case av_buffer_IS_writable () returns 1.
size_t / int size Data size in bytes.

Third, AVBuffer

A reference – counting buffer type. Defined in libavutil/buffer_internal.h. It is opaque, meaning it is used by reference (AVBufferRef).

libavutil/buffer_internal.h

struct AVBuffer {
    uint8_t *data; /**< data described by this buffer */
    buffer_size_t size; /**< size of data in bytes */

    /** * number of existing AVBufferRef instances referring to this buffer */
    atomic_uint refcount;

    /** * a callback for freeing the data */
    void (*free) (void *opaque, uint8_t *data);

    /** * an opaque pointer, to be used by the freeing callback */
    void *opaque;

    /** * A combination of AV_BUFFER_FLAG_* */
    int flags;

    /** * A combination of BUFFER_FLAG_* */
    int flags_internal;
};
Copy the code
field meaning
uint8_t *data The data described by this buffer.
buffer_size_t size The size of data in bytes.
atomic_uint refcount The number of existing AVBufferRef instances that reference this buffer.
void (*free)(void *opaque, uint8_t *data) A callback used to release data.
void *opaque An opaque pointer used by the release callback function.
int flags Combination of AV_BUFFER_FLAG_*.
int flags_internal BUFFER_FLAG_* combination.

AVBuffer is an API for reference counting data buffers.

There are two core objects in this API, AVBuffer and AVBufferRef. AVBuffer represents the data buffer itself; It is opaque and cannot be accessed directly by the caller, only through AVBufferRef. However, the caller may compare two AVBuffer Pointers to check whether two different references describe the same data buffer. AVBufferRef represents a single reference to AVBuffer, which is an object that can be manipulated directly by the caller.

There are two functions that create a new AVBuffer with a reference — av_buffer_alloc() is used to allocate a new buffer, and av_buffer_create() is used to wrap an existing array in AVBuffer. From existing references, you can create additional references using av_buffer_ref(). Use av_buffer_unref() to free a reference (once all references are freed, the data is automatically freed).

The convention of this API and the rest of FFmpeg is that the buffer is considered writable if only one reference to it exists (and it is not marked as read-only). The av_buffer_IS_writable () function is provided to check if this is true, and av_buffer_make_writable() will automatically create a new writable buffer if necessary.

Of course, there is nothing to prevent the calling code from violating this convention, but this is only safe if all existing references are under its control.

Reference and dereference buffers are thread-safe and therefore can be used by multiple threads at the same time without requiring any additional locks.

Two different references to the same buffer can point to different parts of the buffer (for example, their avBufferRef.data data will not be equal).

Four, AVPacketSideData

Additional Packet data that the container can provide. Packet can contain several types of side information.

libavcodec/packet.h

typedef struct AVPacketSideData {
    uint8_t *data;
#if FF_API_BUFFER_SIZE_T
    int      size;
#else
    size_t   size;
#endif
    enum AVPacketSideDataType type;
} AVPacketSideData;

Copy the code
field meaning
uint8_t *data Data cache.
int / size_t size The size of the data cache in bytes.
enum AVPacketSideDataType type Packet Side data type.

The AVPacketSideDataType enumeration defines various Side data types.

libavcodec/packet.h

/** * @defgroup lavc_packet AVPacket * * Types and functions for working with AVPacket. * @{ */
enum AVPacketSideDataType {
    /** * An AV_PKT_DATA_PALETTE side data packet contains exactly AVPALETTE_SIZE * bytes worth of palette. This side data signals that a new palette is * present. */
    AV_PKT_DATA_PALETTE,

    /** * The AV_PKT_DATA_NEW_EXTRADATA is used to notify the codec or the format * that the extradata buffer was changed and the receiving side should * act upon it appropriately. The new extradata is embedded in the side * data buffer and should be immediately used for processing the current * frame or packet. */
    AV_PKT_DATA_NEW_EXTRADATA,

    /** * An AV_PKT_DATA_PARAM_CHANGE side data packet is laid out as follows: * @code * u32le param_flags * if (param_flags & AV_SIDE_DATA_PARAM_CHANGE_CHANNEL_COUNT) * s32le channel_count * if (param_flags & AV_SIDE_DATA_PARAM_CHANGE_CHANNEL_LAYOUT) * u64le channel_layout * if (param_flags & AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE) * s32le sample_rate * if (param_flags & AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS) * s32le width * s32le height * @endcode */
    AV_PKT_DATA_PARAM_CHANGE,

    /** * An AV_PKT_DATA_H263_MB_INFO side data packet contains a number of * structures with info about macroblocks relevant to splitting the * packet into smaller packets on macroblock edges (e.g. as for RFC 2190). * That is, it does not necessarily contain info about all macroblocks, * as long as the distance between macroblocks in the info is smaller * than the target payload size. * Each MB info structure is 12 bytes, and is laid out as follows: * @code * u32le bit offset from the start of the packet * u8 current quantizer at the start of the macroblock * u8 GOB number * u16le macroblock address within the GOB * u8 horizontal MV predictor * u8 vertical MV predictor * u8 horizontal  MV predictor for block number 3 * u8 vertical MV predictor for block number 3 * @endcode */
    AV_PKT_DATA_H263_MB_INFO,

    /** * This side data should be associated with an audio stream and contains * ReplayGain information in form of the AVReplayGain struct. */
    AV_PKT_DATA_REPLAYGAIN,

    /** * This side data contains a 3x3 transformation matrix describing an affine * transformation that needs to be applied  to the decoded video frames for * correct presentation. * * See libavutil/display.h for a detailed description of the data. */
    AV_PKT_DATA_DISPLAYMATRIX,

    /** * This side data should be associated with a video stream and contains * Stereoscopic 3D information in form of the AVStereo3D struct. */
    AV_PKT_DATA_STEREO3D,

    /** * This side data should be associated with an audio stream and corresponds * to enum AVAudioServiceType. */
    AV_PKT_DATA_AUDIO_SERVICE_TYPE,

    /** * This side data contains quality related information from the encoder. * @code * u32le quality factor of the compressed frame. Allowed range is between 1 (good) and FF_LAMBDA_MAX (bad). * u8 picture type * u8 error count * u16 reserved * u64le[error count] sum of squared differences between encoder in and output * @endcode */
    AV_PKT_DATA_QUALITY_STATS,

    /** * This side data contains an integer value representing the stream index * of a "fallback" track. A fallback track indicates an alternate * track to use when the current track can not be decoded for some reason. * e.g. no decoder available for codec. */
    AV_PKT_DATA_FALLBACK_TRACK,

    /** * This side data corresponds to the AVCPBProperties struct. */
    AV_PKT_DATA_CPB_PROPERTIES,

    /** * Recommmends skipping the specified number of samples * @code * u32le number of samples to skip from start of this packet * u32le number of samples to skip from end of this packet * u8 reason for start skip * u8 reason for end skip (0=padding silence, 1=convergence) * @endcode */
    AV_PKT_DATA_SKIP_SAMPLES,

    /** * An AV_PKT_DATA_JP_DUALMONO side data packet indicates that * the packet may contain "dual mono" audio specific to Japanese DTV * and if it is true, recommends only the selected channel to be used. * @code * u8 selected channels (0=mail/left, 1=sub/right, 2=both) * @endcode */
    AV_PKT_DATA_JP_DUALMONO,

    /** * A list of zero terminated key/value strings. There is no end marker for * the list, so it is required to rely on the side data size to stop. */
    AV_PKT_DATA_STRINGS_METADATA,

    /** * Subtitle event position * @code * u32le x1 * u32le y1 * u32le x2 * u32le y2 * @endcode */
    AV_PKT_DATA_SUBTITLE_POSITION,

    /** * Data found in BlockAdditional element of matroska container. There is * no end marker for the data, so it is required to rely on the side data * size to recognize the end. 8 byte id (as found in BlockAddId) followed * by  data. */
    AV_PKT_DATA_MATROSKA_BLOCKADDITIONAL,

    /** * The optional first identifier line of a WebVTT cue. */
    AV_PKT_DATA_WEBVTT_IDENTIFIER,

    /** * The optional settings (rendering instructions) that immediately * follow the timestamp specifier of a WebVTT cue. * /
    AV_PKT_DATA_WEBVTT_SETTINGS,

    /** * A list of zero terminated key/value strings. There is no end marker for * the list, so it is required to rely on the side data size to stop. This * side data includes updated metadata which appeared in the stream. */
    AV_PKT_DATA_METADATA_UPDATE,

    /** * MPEGTS stream ID as uint8_t, this is required to pass the stream ID * information from the demuxer to the corresponding muxer. */
    AV_PKT_DATA_MPEGTS_STREAM_ID,

    /** * Mastering display metadata (based on SMPTE-2086:2014). This metadata * should be associated with a video stream and contains data in the form * of the AVMasteringDisplayMetadata struct. */
    AV_PKT_DATA_MASTERING_DISPLAY_METADATA,

    /** * This side data should be associated with a video stream and corresponds * to the AVSphericalMapping structure. */
    AV_PKT_DATA_SPHERICAL,

    /** * Content light level (based on CTA-861.3). This metadata should be * associated with a video stream and contains data in the form of the * AVContentLightMetadata struct. */
    AV_PKT_DATA_CONTENT_LIGHT_LEVEL,

    /** * ATSC A53 Part 4 Closed Captions. This metadata should be associated with * a video stream. A53 CC bitstream is stored as uint8_t in AVPacketSideData.data. * The number of bytes of CC data is AVPacketSideData.size. */
    AV_PKT_DATA_A53_CC,

    /** * This side data is encryption initialization data. * The format is not part of ABI, use av_encryption_init_info_* methods to * access. */
    AV_PKT_DATA_ENCRYPTION_INIT_INFO,

    /** * This side data contains encryption info for how to decrypt the packet. * The format is not part of ABI, use av_encryption_info_* methods to access. */
    AV_PKT_DATA_ENCRYPTION_INFO,

    /** * Active Format Description data consisting of a single byte as specified * in ETSI TS 101 154 using AVActiveFormatDescription enum. */
    AV_PKT_DATA_AFD,

    /** * Producer Reference Time data corresponding to the AVProducerReferenceTime struct, * usually exported by some encoders (on demand through the prft flag set in the * AVCodecContext export_side_data field). */
    AV_PKT_DATA_PRFT,

    /** * ICC profile data consisting of an opaque octet buffer following the * format described by ISO 15076-1. */
    AV_PKT_DATA_ICC_PROFILE,

    /** * DOVI configuration * ref: Dolby * * - vision - bitstreams - within - the - iso - base - media - file - format - v2.1.2, Section 2.2 * dolby * - vision - bitstreams - in - the mpeg - 2 - transport stream - multiplex - v1.2, Section 3.3 * Tags are stored in struct AVDOVIDecoderConfigurationRecord. * /
    AV_PKT_DATA_DOVI_CONF,

    /** * Timecode which conforms to SMPTE ST 12-1:2014. The data is an array of 4 uint32_t * where the first uint32_t describes how many (1-3) of the other timecodes are used. * The timecode format is described in the documentation of av_timecode_get_smpte_from_framenum() * function in libavutil/timecode.h. */
    AV_PKT_DATA_S12M_TIMECODE,

    /** * The number of side data types. * This is not part of the public API/ABI in the sense that it may * change when new  side data types are added. * This must stay the last enum value. * If its value becomes huge, some code using it * needs to be updated as it assumes it to be smaller than other limits. */
    AV_PKT_DATA_NB
};
Copy the code
type meaning
AV_PKT_DATA_PALETTE Palette, data size is determined by AVPALETTE_SIZE.
AV_PKT_DATA_NEW_EXTRADATA Used to notify the codec or format extradata that the buffer has changed and that the receiver should take appropriate action. The new Extradata is embedded in the Side Data buffer and should be used immediately to process the current frame or packet.
AV_PKT_DATA_PARAM_CHANGE Parameters change, the layout is influenced by AVSideDataParamChangeFlags type and different.
AV_PKT_DATA_H263_MB_INFO Contains a lot of macro block information about the fragmentation of the packet into smaller packets at the edge of the macro block.
AV_PKT_DATA_REPLAYGAIN Associated with an audio stream and contains ReplayGain information in the form of an AVReplayGain structure.
AV_PKT_DATA_DISPLAYMATRIX Contains a 3×3 transformation matrix, which describes an affine transformation that needs to be applied to decoded video frames to display correctly.
AV_PKT_DATA_STEREO3D Associated with a video stream and contains stereoscopic 3D information in the form of an AVStereo3D structure.
AV_PKT_DATA_AUDIO_SERVICE_TYPE Associated with an audio stream and corresponding to enum type enum AVAudioServiceType.
AV_PKT_DATA_QUALITY_STATS Contains quality-related information from the encoder.
AV_PKT_DATA_FALLBACK_TRACK Contains an integer value representing the stream index of the “fallback” track.
AV_PKT_DATA_CPB_PROPERTIES Corresponds to the AVCPBProperties structure.
AV_PKT_DATA_SKIP_SAMPLES It is recommended to skip the specified number of samples.
AV_PKT_DATA_JP_DUALMONO Indicates that the packet may contain “Dual Mono” audio specific to Japanese DTV, and if it is true, it is recommended to use only the selected channel.
AV_PKT_DATA_STRINGS_METADATA List of string key-value pairs.
AV_PKT_DATA_SUBTITLE_POSITION The location of the subtitle event.
AV_PKT_DATA_MATROSKA_BLOCKADDITIONAL The data found in the BlockAdditional element of the Matroska container.
AV_PKT_DATA_WEBVTT_IDENTIFIER Optional first identifier line for WebVTT cue.
AV_PKT_DATA_WEBVTT_SETTINGS Optional setting after the timestamp specifier of the WebVTT cue (rendering specification).
AV_PKT_DATA_METADATA_UPDATE List of string key-value pairs. Includes update metadata that appears in the flow.
AV_PKT_DATA_MPEGTS_STREAM_ID Uint8_t MPEGTS flow ID, which needs to pass the flow ID information from demuxer to the corresponding Muxer.
AV_PKT_DATA_MASTERING_DISPLAY_METADATA Mastering display metadata (based on SMPTE – 2086:2014), the metadata should be associated with streaming video, and stored in the form of AVMasteringDisplayMetadata structure data.
AV_PKT_DATA_SPHERICAL Associated with the video stream and corresponding to the AVSphericalMapping structure.
AV_PKT_DATA_CONTENT_LIGHT_LEVEL Content Light rating (based on CTA-861.3). This metadata should be associated with the video stream and store the data as an AVContentLightMetadata structure.
AV_PKT_DATA_A53_CC ATSC A53 Part 4 Closed Captions .
AV_PKT_DATA_ENCRYPTION_INIT_INFO Encrypts initialization data.
AV_PKT_DATA_ENCRYPTION_INFO Contains encrypted information about how to decrypt packet.
AV_PKT_DATA_AFD Active Format Description Data. Describe using AVActiveFormatDescription enumeration in ETSI TS 101 154 specified is composed of a single byte of data.
AV_PKT_DATA_PRFT The producer reference time data corresponds to the AVProducerReferenceTime structure, which is typically exported by some encoder (by setting the PRFT flag in the AVCodecContext export_SIDE_DATA field).
AV_PKT_DATA_ICC_PROFILE ICC profile data consisting of an eight-bit byte buffer in an opaque format described in ISO 15076-1.
AV_PKT_DATA_DOVI_CONF DOVI configuration.
AV_PKT_DATA_S12M_TIMECODE Timecode for SMPTE ST 12-1:2014.
AV_PKT_DATA_NB Number of side data types.

References:

  1. Ffmpeg.org/doxygen/tru…
  2. Ffmpeg.org/doxygen/tru…
  3. Ffmpeg.org/doxygen/tru…
  4. Ffmpeg.org/doxygen/tru…
  5. Ffmpeg.org/doxygen/tru…