Authors: Wang Yekui, Zhang Li

1. Introduction

Video has been everywhere in People’s Daily work and life, not only used for entertainment, leisure, shopping and so on, and is gradually replacing text as the most important way for people to obtain knowledge and information. Video coding/compression technology is one of the most core and basic technologies supporting video applications. At present, the widely used video coding and compression technologies are mainly some video coding standards. H.266/VVC [1][2] (VVC for short) is the latest generation of video coding standards. It was finalized in July 2020. The first version of ITU was officially released in November that year, and the first version of ISO/IEC was officially released in February 2021. The first version of the VSEI standard [3][4] accompanying VVC was finalized and released at the same time as VVC.

VVC is the abbreviation of Versatile Video Coding, which means VVC can support a very wide range of applications. In contrast to previous video coding standards, such as H.265/HEVC and H.264/AVC, VVC has better support for new video types such as 8K ultra HD, screen, high dynamic and 360-degree panoramic video, as well as applications such as streaming media and real-time communication with adaptive bandwidth and resolution. VSEI stands for Versatile Supplemental Enhancement Information, It mainly specifies the format of Video Useability Information (VUI) and some messages carrying auxiliary Enhancement Information (SEI) for the VVC Video stream. In the HEVC and AVC standards, these VUI and SEI formats are placed in the coding standard main text, and in the specification VVC is placed in two different standard text.

So, how is the VVC developed? From the technical point of view, VVC relative to the previous video coding standards, especially HEVC and AVC, what are the new technologies and technical enhancements? Will there be a new version of VVC? Each of these will be covered in the following sections of this article.

2.VVC formulation process

VVC, HEVC and AVC are both composed of MPEG (Moving Picture Experts Group), a video coding expert Group (VCEG), under THE umbrella of ISO/IEC and ITU-T, the telecommunication standardization department of ITU. Video Coding Experts Group. The name of the Joint MPEG and VCEG working group in the development of VVC was called Joint Video Exploration Team in the early stage. When the VVC program was officially launched, it was called the Joint Video Experts Team, JVET [5]. There were two joint working groups when HEVC was developed, The Joint Collaboration Team on Video Coding, Jctv-vc) [6] and the Joint Collaboration Team on 3D Video Coding Extension Development (JCTV-3V) [7]. The simplest name of the Joint working group that formulated AVC is Joint Video Team (JVT) [8].

Figure 1. VVC standard birth process

Figure 1 depicts the critical time nodes for H.266/VVC. From January 2015 to October 2015, it belongs to the STAGE of KTA (Key Technology Area), where people can explore some technologies more widely [9]. In October 2015, JVET (Joint Video Exploration Team), also known as the Joint Exploration Committee, was formally established with the submission of a technical proposal exceeding the 10+% coding performance of HEVC [10]. The proposed software platform is defined as the Joint Exploration Model (JEM). With JEM, new technologies are validated against JEM, and a new VERSION of JEM is released after each standard meeting. As of July 2017, after seven JVET standard meetings, JEM has completed seven iterations of the release, which has achieved a 30% improvement over THE compression performance of HEVC. This gives the industry a strong signal that the next generation standard is still promising to achieve its stated goal (50% bit rate reduction for the same subjective quality). As a result, the standardization work entered the second stage: presentation of evidence and the presentation of evidence in the technical proposal solicitation and response phase, which went through three standard meetings. In April 2018, the test results of the 23 CfP responses were released. The highest performance versions of the 23 CfP responses are 40% less than that of HEVC, which means that the next generation of video compression technology is mature. Thus the formal launch of VVC standard work. The first version of the VVC standard (VVC V1) was officially completed in July 2020 after ten standards meetings since April 2018, thousands of technical proposals were reviewed and hundreds of experts around the world met and worked together day and night. Figure 2 lists some of the companies involved in VVC standardization efforts; Fortunately, The participation of Chinese companies is very high, and Chinese companies are playing an increasingly important role on the stage of international video standardization.

Figure 2. Main VVC standard participating companies

VVC coding tool

Compared with the previous generation of video coding standard HEVC, VVC has achieved an average coding performance improvement of 49% according to recent JVET official subjective test results [11]. The compression performance of VVC depends on a series of existing coding tools, new coding tools and technical improvements to existing coding tools. VVC’s new coding tools and technical improvements to existing coding tools mainly include the following categories: 1) block partitioning, 2) intra-frame prediction, 3) inter-frame prediction, 4) transformation and quantization, 5) entropy coding, 6) loop filtering, 7) special modes, 8) screen content coding, and 9) 360-degree video coding. The following is a brief introduction to the specific technology of these kinds of coding tools in VVC. Reference [12] contains a more detailed description of these topics.

Block division

In VVC, a new partitioning method besides quadtree partitioning is added, called multiple-type Tree (MTT). In other words, VVC adopts the partitioning method of quadtree plus multi-type Tree (QT+MTT). In QT+MTT partition, a square can be evenly divided into left and right or upper and lower rectangular blocks, also known as BT split (Binary Tree split); Alternatively, three rectangular blocks can be divided in a 1:2:1 ratio from left to right or top to bottom, also known as a TT split (ternary-tree split), as shown in Figure 3. At the same time, BT or TT partition of the resulting subblock can continue to use BT or TT partition, but no longer use QT partition.

Figure 3. Quadtree plus multitype tree (QT+MTT) block partitioning example

VVC also allows the Chroma component to use different block Tree structure (Chroma Separate Tree, CST). CST can be implemented in two ways: 1) Coding band level within the frame, which is also called dual-tree Coding. For each supported maximum Coding Tree Unit (CTU), different block Tree structure is adopted for brightness and chroma. 2) For strips using single-tree (CTU level brightness and chromaticity adopt the same block Tree), when the brightness block size meets certain conditions, brightness and chromaticity will be divided into different block trees. This coding method is also called Local dual-tree coding. Mainly used to prevent many small chroma blocks in the chunking result. In addition, the maximum CTU supported by HEVC is 64×64, while the maximum CTU supported by VVC is increased to 128×128 and the minimum CTU only supports 32×32.

Frame prediction

In terms of in-frame prediction, VVC supports 67 in-frame prediction modes (the number is 35 in HEVC) and adjusts the Angle prediction direction of non-square blocks. The prediction pixel interpolation adopts two types of four-step interpolation filters (low-precision linear difference in HEVC). Position Dependent Intra Prediction Combination (PDPC) combines Prediction signals before and after filtering to further improve intra-frame Prediction accuracy. The multi-reference line intra-frame prediction technique can not only use the nearest adjacent reconstructed pixel values, but also use the distant reconstructed pixel values for intra-frame prediction. Matrix – based intra – frame prediction technology uses matrix vector multiplication to make intra – frame prediction. Cross-component linear model intra-frame prediction technique uses the pixel values of luminance image components to predict the pixel values of chromaticity components in the same image. In subblock mode, different subblocks of a brightness coding unit adopt the same encoding mode information.

Interframe prediction

In terms of inter-frame prediction, VVC inherits HEVC’s Motion Vector Difference (MMVD) coding and Motion information inheritance mode based on the whole coding unit, namely: AMVP Adaptive Motion Vector Prediction (AMVP) and Skip/Merge modes are extended respectively. For AMVP mode, VVC introduces block-level adaptive Motion Vector precision and Symmetric Motion Vector Differences Signalling for bidirectional prediction but encoding only one of the reference images MVD. For Skip/Merge mode, VVC introduces HMVP (History-based Motion Vector Prediction) and pin-wise Average Merge candidate. In addition to the above Motion Vector encoding/inheritance based on the whole coding unit, VVC also introduces subblock-based Temporal Motion Vector Prediction (SbTMVP). That is, the current coding unit is divided into sub-blocks of the same size (8×8 brightness sub-blocks), and the motion vector of each sub-block is derived separately. VVC also introduces an affine motion model to more accurately represent higher-order motion such as scaling and rotation so as to improve the coding efficiency of motion information. The accuracy of the motion vector was increased from 1/4 brightness pixel in HEVC to 1/16 brightness pixel. In addition, VVC also introduced a number of new inter-frame prediction coding tools, such as: MERGE mode with MVD (MMVD), combining AMVP and MERGE mode, which is further improved by adding additional motion vector difference to merge mode; The segmentation result of geometric segmentation mode can better fit the motion track of solid object boundary in video content. The prediction model combining inter-frame prediction and intra-frame prediction can reduce time-domain redundancy and spatial-domain redundancy to achieve higher compression performance. Another important improvement of VVC is the introduction of the decoding side motion refinement and bidirectional optical flow, which further improve the efficiency of motion compensation without increasing the bit rate overhead.

Transformation and quantization

In terms of transformation, VVC introduces non-square transformation, multi-transformation (main transformation) selection, low-frequency non-separable transformation and sub-block transformation. VCC In addition, the maximum transformation dimension in VVC has been increased to 64×64 (32×32 in HEVC). Non-square transformation is used to transform non-square blocks. This transformation uses transform cores of different lengths in horizontal and vertical directions. With multi-transform selection, the encoder can take sine, cosine, skip transforms from a predefined set of integers and indicate the used transform in the code stream. In order to make better use of the directivity of the content of the coding block, the low-frequency non-separable transform is used to perform the secondary transform of the low-frequency component in the main transform result of the residual prediction in the frame. Subblock transformations are used when part of an interframe prediction residual block is encoded and the rest of the block is set to zero.

VVC introduces three new coding tools in quantization: adaptive chromaticity quantization parameter deviation, dependent quantization and quantization residual joint coding. When using the tool of adaptive chromaticity quantization parameter deviation, chromaticity quantization parameters are not encoded directly for a particular quantization group, but are derived from luminance quantization parameters and predefined and transmitted lookup tables. In dependency quantization, the range of reconstructed values of a transform coefficient depends on the reconstructed values of the several transform coefficients preceding it in the scan order, thereby reducing the average distortion between the input vector and the closest reconstructed vector. Quantized residual joint coding refers to coding the residuals of the two chromaticity components together rather than separately, so that the coding efficiency will be higher when the residuals of the two chromaticity components are similar.

Entropy coding

Like HEVC, entropy Coding adopted by VVC is also context-adaptive Binary Arithmetic Coding (CABAC), but improved in CABAC engine and transform coefficient Coding. An improvement in the CABAC engine is the adaptive binding of the multiple hypothesis probability update model and the context model (i.e. the probability update speed is dependent on the context model), where two probability estimates P0 and P1 coupled to each context model are used, and P0 and P1 are updated independently of each other according to their own adaptive rates. The probability estimate P for interval subdivision in a binary arithmetic encoder is set to be the mean of P0 and P1. In terms of transformation coefficient coding, in addition to the coefficient group of 4×4, VVC also allows six coefficient groups of 1×16, 16×1, 2×8, 8×2, 2×4 and 4×2. In addition, a flag bit is added for state transition of dependent quantization, and an improved probabilistic model selection mechanism is added for encoding syntax elements related to absolute values of transformation coefficients.

The loop filter

In addition to supporting Deblocking Filter (DBF) and Sample Adaptive Offset (SAO), which are also used in HEVC, VVC also supports Luma Mapping with Chroma Scaling (LMCS) and adaptive Loop filter (ALF). The DBF convenience adds longer filters and an adaptive brightness filtering mode designed specifically for high-motion video. SAO is the same as HEVC. The encoder can use LMCS to change the dynamic range of the amplitude distribution of the input video signal piecewise and linearly before encoding to improve the encoding efficiency and reverse the restoration at the decoding end. ALF in VVC includes two modes: 1) Block-based ALF for brightness and chroma samples; 2) Cross-component Adaptive Loop Filter (CC-ALF) for chroma sample. In ALF, 7×7 and 5×5 diamond filters are used for brightness and chromaticity respectively. For each 4×4 block, according to its directionality and gradient activity, it is divided into 25 classes and one of 4 transpose states, and one of the passed multiple sets of filters is selected. Cc-alf uses a rhombus linear high-pass filter to further refine the chroma sample by using the brightness sample filtered by ALF.

Screen content encoding

VVC retains the block-based differential pulse coding modulation in HEVC, but is limited to the coding unit of intra-frame prediction. 1) The position of the first non-zero value is no longer encoded, and the scanning direction is changed to the opposite direction; 2) Using context model to improve the coding efficiency of positive and negative sign indication; 3) Absolute value coding improvement. Intra Block Copy (IBC) and palette mode, both existing tools in HEVC, have been retained and improved. In HEVC, IBC is defined as an inter-frame prediction mode in which the reference frame is the current frame itself and the motion vector must point to the region where the current frame has been decoded without loop filtering. In VVC, IBC is decoupled from inter-frame prediction, and the management of reference buffer is simplified compared with HEVC. Reference samples are stored in a local small buffer. How the palette is encoded in the VVC depends on whether a single coding tree is used for brightness chroma. If a single coding tree is used, the palettes of the three chromaticity components are coded together. Otherwise the brightness and chroma palettes are coded separately. For palette-based coding units, individual pixels can also be directly encoded with their quantized values without using the contents of the palette. Finally, the adaptive color transform in VVC is the same screen content coding tool as in HEVC.

360-degree video encoding

360-degree video gradually became popular around 2014 and 2015, and the first version of HEVC was finalized at the beginning of 2013, so VVC smoothly became the first international video coding standard containing 360-degree video coding tools. Since the traditional video coding technology is basically used for 360-degree video coding, there are only two 360-degree video “compression” tools included in VVC, and more support for 360-degree video is in the design of system and transmission interface (see the next section of this article). A 360-degree video “compression” tool in VVC is called motion vector surround, which means that when the motion vector points outside the right (or left) boundary of the image, the actual reference pixels used for motion compensation are pixels within the left (or right) boundary of the image (or sub-pixels obtained by interpolation filtering). This is because the left and right boundaries of an image commonly used in 360-degree videos called Equirectangular Projection (ERP) are actually the continuous positions of spherical surfaces of the physical world, similar to the way the left and right boundaries of a world map are actually the same longitude lines connecting the north and south poles on Earth. Therefore, such motion vector surround can improve the coding efficiency of 360 degree video using ERP. Another 360-degree video “compression” tool is called loop filtering virtual boundaries; If used, loop filtering will not be applied over some horizontal or vertical lines in the image (these lines are called virtual boundaries). This tool is suitable for another type of Projection commonly used in 360-degree videos called Cube Map Projection (CMP). Reference [13] contains detailed 360-degree videos and an introduction to ERP and CMP.

4.VVC system and transmission interface

The system and transmission interface of the video coding standard is often called high-level Syntax (HLS), which is the link between the compression tool and the video application and the transmission system in the codec. HLS covers many topics in video coding standards, Including the basic structure of the code flow, the basic structure of encoding data, the parameters of the sequence and image layer coding, adaptive random access, video streaming, decoded image management (including reference images management), the scale (profile) and level (level) and encoding, data stream buffer model, the definition of high-level image segmentation (e.g., strip divide and SLATE), and when Domain scalability, extensibility, backward compatibility, fault tolerance, enhanced information encoding, and more.

VVC inherits many aspects of AVC and VVC’s HLS design, These include a syntax structure based on the Network Abstraction Layer (NAL) unit, a hierarchical syntax and data unit structure, VUI and SEI mechanisms, and a Virtual Reference Decoder. HRD) video buffering model.

Compared with AVC and HEVC, the new or significantly improved design in VVC HLS mainly includes the following aspects: Slice and subpicture, adaptive image resolution update, adaptive Parameter Set (APS), image header, Gradual Decoding Refresh, GDR), direct coding of Reference Picture List (RPL), and greatly simplified multi-level scalable coding design. The following is a brief introduction. References [12] and [14] contain more detailed descriptions of these topics.

Strip and subimage

One major change in VVC’s striping support relative to AVC and HEVC is that striping mechanisms based on partitioned units (macroblocks in AVC or CTU in HEVC) are replaced by striping mechanisms based on tiles or CTU rows in tiles. The reason for this change is that the development of network technology and video encoding and transmission technology has made the commonly used video error hiding technology in the past largely relegated to the old file of history – the video that people see basically no longer contains the video frame obtained by error hiding technology.

The VVC strip comes with two modes: rectangular strip and raster scan strip. As the name implies, the shape of a rectangular strip is always a rectangle. Each rectangular strip can contain one or more complete tiles (as shown in Figure 4), or one or more CTU rows within a tile (as shown in Figure 5 in the upper right corner). Each raster scan strip also contains one or more complete tiles, but the order of these tiles must be raster scan order, so the shape is usually not rectangular, as shown in Figure 6.

Figure 4. The image containing 18×12 CTUs is divided into 24 tiles and 9 rectangular strips

Figure 5. An image is divided into 4 tiles and 4 rectangular strips (note: the two tiles on the left are combined into one strip, while the tiles on the upper right are divided into 2 rectangular strips)

Figure 6. The image containing 18×12 CTUs is divided into 12 tiles and 3 raster scan bands

VVC was the first video coding standard to introduce subimage design. Conceptually, the subimage is the same as the motion-constrained tile set (MCTS) in HEVC, but the design is improved to improve coding compression efficiency and application system friendliness. The shape of each subimage must also be rectangular, containing one or more rectangular strips, as shown in Figure 7.

Figure 7. An image is divided into 18 tiles, 24 strips, and 24 subimages (in this example each subimage contains exactly one rectangular strip)

The sub-image can be encoded independently so that it can be extracted and decoded separately, so it can be used for region of Interest (ROI) coding and 360-degree video transmission optimization, as shown in Figure 8. One of the key differences between 360-degree video and traditional video applications is that the user can only see a small portion of the entire 360-degree sphere at any one moment. This transmission solution is optimized to take advantage of this key point. The goal is to make the part that the user sees high quality and the part that the user can’t see low quality. What you can’t see can’t be completely lost, either, because if the user suddenly turns his head and sees a black screen, it’s too far from an intrusive experience.

Figure 8. 360-degree video transmission scheme based on sub-image

Compared with MCTS, the improvement of VVC sub-image design mainly includes the following five points: 1) The motion vector in the extracted sub-image can point outside the boundary of the sub-image. If it happens, pixel filling technology will be used for motion compensation, just as the motion vector points outside the boundary of the image, so as to improve the coding efficiency. 2) The motion vector selection and derivation in merging mode and decoding end motion vector refinement are improved for sub-image. 3) There is no need to change the bar head when extracting sub-images. 4) Subimages containing different types of strips (such as those that support random access and those that do not support random access) can be simply merged into one image without changing the bar head. 5) HRD and categories of subimage sequences are defined so that the encoder can ensure the consistency of each extractable substream.

Adaptive image resolution update

In AVC and HEVC, changing the image resolution is only possible when the start frame of the Coded Video Sequence (CVS) is Coded and a new set of Sequence parameters is started. In VVC, the image resolution can be changed at any frame in a CVS, and the change can continue with inter-frame prediction. This requires that inter-frame prediction be allowed between two images of different resolutions, and hence the ability to do Reference Picture Resampling (RPR). The resampling here may be either upsampling or downsampling, depending on whether the resolution of the reference frame is greater or the resolution of the current frame is greater.

Adaptive Parameter Set (APS)

A new parameter set — APS is added in VVC, which is used to transmit the underlying information of image layer or strip that meets the following three conditions: 1) can be shared by multiple strips of an image and/or multiple strips of different images; 2) May change frequently between images; 3) The number of possible values is relatively large. If it is put into Picture Parameter Set (PPS), PPS will need to be updated in a bit stream, so PPS out-of-band transmission cannot be carried out. APS is used to transmit three parameters in VVC: ALF parameter, LMCS parameter and Scaling List parameter.

The image first

Image headers are not a new concept. They exist in coding standards prior to AVC, such as MPEG-2, but not in AVC and HEVC, which use a code stream structure based on NAL units. The main purpose of reintroducing image headers in VVC is to reduce the repetition of image layer information in different bands of an image, so the information contained is basically the information that must or is likely to be shared by each band in the same image.

Progressive Decode Refresh (GDR)

GDR refers to random access from an interframe encoded image. Although the correct decoded image is not immediately available, as more frames are decoded, the correct decoded area in the video content gradually increases until all the regions can be decoded correctly in a given frame. Because the blocks encoded within the frame can be distributed relatively evenly across many consecutive images, the encoder can smooth the bit rate and reduce the point-to-point delay. GDR is also not a new concept and is supported in AVC and HEVC, with recovery Point SEI messages indicating support for GDR and giving the location of GDR recovery points. In VVC, GDR is represented by a new NAL unit type. The location information of the GDR recovery point is placed in the head of the image. The first frame of a code stream or CVS can be a GDR frame with inter-frame encoding. Even the entire legal code stream may have no Instantaneous Decoding Refresh (IDR) or Clean Random Access (CRA) frame, and may have no internal coding frame in the entire legal code stream.

Direct encoding of reference image list (RPL)

Reference image management is responsible for decoding images into the Decoded Picture Buffer (DPB), deleting from the DPB, and placing reference frames into the RPL in a reasonable order, which is one of the core functions in the video coding standard. In HEVC, Reference frames are managed through a mechanism called Reference Picture Set (RPS), including the establishment process of RPL. VVC encodes RPL information directly, rather than indirectly through RPS.

Multilayer scalable coding

Thanks to the RPR mentioned above, it is easy to support multiple layers of scalable coding in VVC. Because there is no need for any other “low-level” coding tools compared to single-layer coding, just add HLS support. This is the main reason for the support for multilayer scalable coding in the first VVC release (AVC and HEVC both added support for multilayer scalable coding after the first release). The multilayer scalable coding design in VVC was focused from the outset on friendliness to single-layer decoder design, as opposed to the multilayer scalable coding in later versions of AVC and HEVC. First, the ability to decode a multilayer codestream is specified in the same way as a single-layer codestream, so that a single-layer decoder can decode a multilayer codestream with only a few changes, such as the minimum DPB capability specified in the level independent of the number of layers in the codestream. In addition, the design of multi-layer scalable coding HLS is greatly simplified at the expense of some flexibility, such as requiring that the image of each layer must exist at each random access point.

The multi-layer scalable coding design in VVC, while relatively simple, still supports not only traditional spatial scalability, quality scalability, and multi-perspective scalability, but also some combination of scalability and sub-images. For example, the 360-degree video transmission scheme based on sub-images shown in Figure 8 can be further improved by allowing interlayer prediction, as shown in Figure 9.

Figure 9. 360-degree video transmission scheme based on sub-images and allowing interlayer prediction

5. To summarize

As the title suggests, this article gives a veritable “overview” of the h.266 /VVC video standard’s development process, coding tools, and system/transport interfaces. Since the first version of the VVC was finalized in July 2020, JVET’s work has included the maintenance of text and reference software for the first version, as well as VVC subjective quality testing and code stream conformance. I also recently started work on VVC version 2 and VSEI Version 2. New additions to the VVC draft 2 currently include a new level and two SEI messages introduced from HEVC, and planned additions also include support for high bit depths such as 12 bits, which are currently in core experimentation. The current additions to the VSEI draft 2 include six SEI messages (three of which were imported from HEVC).

6. References

  1. Rec. ITU – T h. 266: www.itu.int/rec/T-REC-H…
  2. ISO/IEC 23090-3 (VVC) : www.iso.org/standard/73…
  3. Rec. ITU-T H.274 (VSEI): www.itu.int/rec/T-REC-H…
  4. ISO/IEC 23002-7 (VSEI) : www.iso.org/standard/79…
  5. JVET documentation: www.jvet-experts.org/
  6. JCT-VC documentation: phenix.int-evry. Fr/JCT /index.p…
  7. Phenix.int-evry. Fr/jCT3V /index…
  8. JVT documentation: www.itu.int/wftp3/av-ar…
  9. J. Chen, Y. Chen, M. Karczewicz, X. Li, H. Liu, and L. Zhang, “Coding Tools Investigation for Next Generation Video Coding”, ITU-T SG16 doc.COM16 — C806, Feb. 2015.
  10. M. Karczewicz, J. Chen, W.-J. Chien, X. Li, A. Said, L. Zhang, and X. Zhao, “Study of coding efficiency improvements beyond HEVC”, ISO/IEC MPEG doc. m37102, Oct. 2015.
  11. V. Baroncini and M. Wien, “VVC verification test report for UHD SDR video content”, JVET-T2020, Oct. 2020.
  12. B. Bross, J. Chen, J.-R. Ohm, G.J. Sullivan, and Y.-K. Wang, “Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC), “Proc. Of the IEEE, to Appear, Pre-published with Open Access Herein: Ieeexplore.ieee.org/stamp/stamp… .
  13. M. M. Hannuksela and Y.-K. Wang, “An Overview of Omnidirectional MediA Format (OMAF),” Proc. The IEEE, to Appear.
  14. Y.-K. Wang, R. Skupin, M. M. Hannuksela, S. Deshpande, Hendry, V. Drugeon, R. Sjoberg, B. Choi, V. Seregin, Y. Sanchez, J. M. Boyce, W. Wan, and G. J. Sullivan, “High-level Syntax of The Versatile Video Coding (VVC) Standard,” IEEE Trans. Circuits Syst. Video Technol. (TCSVT), to appear.