AV1 Bitstream & Decoding Process Specification

Last modified: 2018-11-28 11:23 PT

Authors
Peter de Rivaz, Argon Design Ltd
Jack Haughton, Argon Design Ltd

Codec Working Group Chair
Adrian Grange, Google Inc

Design
Lou Quillio, Google Inc

Notice

This is an internal AOMedia working document and not an approved version of the AV1 specification. The approved AV1 bitstream specification can be found here:

https://github.com/AOMediaCodec/av1-spec/releases/download/v1.0.0/av1-spec-v1.0.0.pdf.

Abstract

This document defines the bitstream formats and decoding process for the Alliance for Open Media AV1 video codec.

Contents

Scope

This document specifies the Alliance for Open Media AV1 bitstream formats and decoding process.

Terms and definitions

For the purposes of this document, the following terms and definitions apply:

AC coefficient

Any transform coefficient whose frequency indices are non-zero in at least one dimension.

Altref

(Alternative reference frame) A frame that can be used in inter coding.

Base layer

The layer with spatial_id and temporal_id values equal to 0.

Bitstream

The sequence of bits generated by encoding a sequence of frames.

Bit string

An ordered string with limited number of bits. The left most bit is the most significant bit (MSB), the right most bit is the least significant bit (LSB).

Block

A square or rectangular region of samples.

Block scan

A specified serial ordering of quantized coefficients.

Byte

An 8-bit bit string.

Byte alignment

One bit is byte aligned if the position of the bit is an integer multiple of eight from the position of the first bit in the bitstream.

CDEF

Constrained Directional Enhancement Filter designed to adaptively filter blocks based on identifying the direction.

CDF

Cumulative distribution function representing the probability times 32768 that a symbol has value less than or equal to a given level.

Chroma

A sample value matrix or a single sample value of one of the two color difference signals.

Note: Symbols of chroma are U and V.

Coded frame

The representation of one frame before the decoding process.

Component

One of the three sample value matrices (one luma matrix and two chroma matrices) or its single sample value.

Compound prediction

A type of inter prediction where sample values are computed by blending together predictions from two reference frames (the frames blended can be the same or different).

DC coefficient

A transform coefficient whose frequency indices are zero in both dimensions.

Decoded frame

The frame reconstructed out of the bitstream by the decoder.

Decoder

One embodiment of the decoding process.

Decoding process

The process that derives decoded frames from syntax elements, including any processing steps used prior to and for the film grain synthesis process.

Dequantization

The process in which transform coefficients are obtained by scaling the quantized coefficients.

Encoder

One embodiment of the encoding process.

Encoding process

A process not specified in this Specification that generates the bitstream that conforms to the description provided in this document.

Enhancement layer

A layer with either spatial_id greater than 0 or temporal_id greater than 0.

Flag

A binary variable - some variables and syntax elements (e.g. obu_extension_flag) are described using the word flag to highlight that the syntax element can only be equal to 0 or equal to 1.

Frame

The representation of video signals in the spatial domain, composed of one luma sample matrix (Y) and two chroma sample matrices (U and V).

Frame context

A set of probabilities used in the decoding process.

Golden frame

A frame that can be used in inter coding. Typically the golden frame is encoded with higher quality and is used as a reference for multiple inter frames.

Inter coding

Coding one block or frame using inter prediction.

Inter frame

A frame compressed by referencing previously decoded frames and which may use intra prediction or inter prediction.

Inter prediction

The process of deriving the prediction value for the current frame using previously decoded frames.

Intra coding

Coding one block or frame using intra prediction.

Intra frame

A frame compressed using only intra prediction which can be independently decoded.

Intra prediction

The process of deriving the prediction value for the current sample using previously decoded sample values in the same decoded frame.

Inverse transform

The process in which a transform coefficient matrix is transformed into a spatial sample value matrix.

Key frame

An Intra frame which resets the decoding process when it is shown.

Layer

A set of tile group OBUs with identical spatial_id and identical temporal_id values.

Level

A defined set of constraints on the values for the syntax elements and variables.

Loop filter

A filtering process applied to the reconstruction intended to reduce the visibility of block edges.

Luma

A sample value matrix or a single sample value representing the monochrome signal related to the primary colors.

Note: The symbol representing luma is Y.

Mode info

Syntax elements sent for a block containing an indication of how a block is to be predicted during the decoding process.

Mode info block

A luma sample value block of size 4x4 or larger and its two corresponding chroma sample value blocks (if present).

Motion vector

A two-dimensional vector used for inter prediction which refers the current frame to the reference frame, the value of which provides the coordinate offsets from a location in the current frame to a location in the reference frame.

OBU

All structures are packetized in “Open Bitstream Units” or OBUs. Each OBU has a header, which provides identifying information for the contained data (payload).

Parse

The procedure of getting the syntax element from the bitstream.

Prediction

The implementation of the prediction process consisting of either inter or intra prediction.

Prediction process

The process of estimating the decoded sample value or data element using a predictor.

Prediction value

The value, which is the combination of the previously decoded sample values or data elements, used in the decoding process of the next sample value or data element.

Profile

A subset of syntax, semantics and algorithms defined in a part.

Quantization parameter

A variable used for scaling the quantized coefficients in the decoding process.

Quantized coefficient

A transform coefficient before dequantization.

Raster scan

Maps a two dimensional rectangular raster into a one dimensional raster, in which the entry of the one dimensional raster starts from the first row of the two dimensional raster, and the scanning then goes through the second row and the third row, and so on. Each raster row is scanned in left to right order.

Reconstruction

Obtaining the addition of the decoded residual and the corresponding prediction values.

Reference

One of a set of tags, each of which is mapped to a reference frame.

Reference frame

A storage area for a previously decoded frame and associated information.

Reserved

A special syntax element value which may be used to extend this part in the future.

Residual

The differences between the reconstructed samples and the corresponding prediction values.

Sample

The basic elements that compose the frame.

Sample value

The value of a sample. This is an integer from 0 to 255 (inclusive) for 8-bit frames, from 0 to 1023 (inclusive) for 10-bit frames, and from 0 to 4095 (inclusive) for 12-bit frames.

Segmentation map

A 3-bit number containing the segment affiliation for each 4x4 block in the image. A segmentation map is stored for each reference frame to allow new frames to use a previously coded map.

Sequence

The highest level syntax structure of coding bitstream, including one or several consecutive coded frames.

Superblock

The top level of the block quadtree within a tile. All superblocks within a frame are the same size and are square. The superblocks may be 128x128 luma samples or 64x64 luma samples. A superblock may contain 1 or 2 or 4 mode info blocks, or may be bisected in each direction to create 4 sub-blocks, which may themselves be further subpartitioned, forming the block quadtree.

Switch Frame

An inter frame that can be used as a point to switch between sequences. Switch frames overwrite all the reference frames without forcing the use of intra coding. The intention is to allow a streaming use case where videos can be encoded in small chunks (say of 1 second duration), each starting with a switch frame. If the available bandwidth drops, the server can start sending chunks from a lower bitrate encoding instead. When this happens the inter prediction uses the existing higher quality reference frames to decode the switch frame. This approach allows a bitrate switch without the cost of a full key frame.

Syntax element

An element of data represented in the bitstream.

Temporal delimiter OBU

An indication that the following OBUs will have a different presentation/decoding time stamp from the one of the last frame prior to the temporal delimiter.

Temporal unit

A Temporal unit consists of all the OBUs that are associated with a specific, distinct time instant. It consists of a temporal delimiter OBU, and all the OBUs that follow, up to but not including the next temporal delimiter.

Temporal group

A set of frames whose temporal prediction structure is used periodically in a video sequence.

Tile

A rectangular region of the frame that can be decoded and encoded independently, although loop-filtering across tile edges is still applied.

Transform block

A rectangular transform coefficient matrix, used as input to the inverse transform process.

Transform coefficient

A scalar value, considered to be in a frequency domain, contained in a transform block.

Uncompressed header

High level description of the frame to be decoded that is encoded without the use of arithmetic encoding.

Symbols and abbreviated terms

DCT

Discrete Cosine Transform

ADST

Asymmetric Discrete Sine Transform

LSB

Least Significant Bit

MSB

Most Significant Bit

WHT

Walsh Hadamard Transform

The specification makes use of a number of constant integers. Constants that relate to the semantics of a particular syntax element are defined in section 6.

Additional constants are defined below:

Symbol name Value Description
REFS_PER_FRAME 7 Number of reference frames that can be used for inter prediction
TOTAL_REFS_PER_FRAME 8 Number of reference frame types (including intra type)
BLOCK_SIZE_GROUPS 4 Number of contexts when decoding y_mode
BLOCK_SIZES 22 Number of different block sizes used
BLOCK_INVALID 22 Sentinel value to mark partition choices that are not allowed
MAX_SB_SIZE 128 Maximum size of a superblock in luma samples
MI_SIZE 4 Smallest size of a mode info block in luma samples
MI_SIZE_LOG2 2 Base 2 logarithm of smallest size of a mode info block
MAX_TILE_WIDTH 4096 Maximum width of a tile in units of luma samples
MAX_TILE_AREA 4096 * 2304 Maximum area of a tile in units of luma samples
MAX_TILE_ROWS 64 Maximum number of tile rows
MAX_TILE_COLS 64 Maximum number of tile columns
INTRABC_DELAY_PIXELS 256 Number of horizontal luma samples before intra block copy can be used
INTRABC_DELAY_SB64 4 Number of 64 by 64 blocks before intra block copy can be used
NUM_REF_FRAMES 8 Number of frames that can be stored for future reference
IS_INTER_CONTEXTS 4 Number of contexts for is_inter
REF_CONTEXTS 3 Number of contexts for single_ref, comp_ref, comp_bwdref, uni_comp_ref, uni_comp_ref_p1 and uni_comp_ref_p2
MAX_SEGMENTS 8 Number of segments allowed in segmentation map
SEGMENT_ID_CONTEXTS 3 Number of contexts for segment_id
SEG_LVL_ALT_Q 0 Index for quantizer segment feature
SEG_LVL_ALT_LF_Y_V 1 Index for vertical luma loop filter segment feature
SEG_LVL_REF_FRAME 5 Index for reference frame segment feature
SEG_LVL_SKIP 6 Index for skip segment feature
SEG_LVL_GLOBALMV 7 Index for global mv feature
SEG_LVL_MAX 8 Number of segment features
PLANE_TYPES 2 Number of different plane types (luma or chroma)
TX_SIZE_CONTEXTS 3 Number of contexts for transform size
INTERP_FILTERS 3 Number of values for interp_filter
INTERP_FILTER_CONTEXTS 16 Number of contexts for interp_filter
SKIP_MODE_CONTEXTS 3 Number of contexts for decoding skip_mode
SKIP_CONTEXTS 3 Number of contexts for decoding skip
PARTITION_CONTEXTS 4 Number of contexts when decoding partition
TX_SIZES 5 Number of square transform sizes
TX_SIZES_ALL 19 Number of transform sizes (including non-square sizes)
TX_MODES 3 Number of values for tx_mode
DCT_DCT 0 Inverse transform rows with DCT and columns with DCT
ADST_DCT 1 Inverse transform rows with DCT and columns with ADST
DCT_ADST 2 Inverse transform rows with ADST and columns with DCT
ADST_ADST 3 Inverse transform rows with ADST and columns with ADST
FLIPADST_DCT 4 Inverse transform rows with DCT and columns with FLIPADST
DCT_FLIPADST 5 Inverse transform rows with FLIPADST and columns with DCT
FLIPADST_FLIPADST 6 Inverse transform rows with FLIPADST and columns with FLIPADST
ADST_FLIPADST 7 Inverse transform rows with FLIPADST and columns with ADST
FLIPADST_ADST 8 Inverse transform rows with ADST and columns with FLIPADST
IDTX 9 Inverse transform rows with identity and columns with identity
V_DCT 10 Inverse transform rows with identity and columns with DCT
H_DCT 11 Inverse transform rows with DCT and columns with identity
V_ADST 12 Inverse transform rows with identity and columns with ADST
H_ADST 13 Inverse transform rows with ADST and columns with identity
V_FLIPADST 14 Inverse transform rows with identity and columns with FLIPADST
H_FLIPADST 15 Inverse transform rows with FLIPADST and columns with identity
TX_TYPES 16 Number of inverse transform types
MB_MODE_COUNT 17 Number of values for YMode
INTRA_MODES 13 Number of values for y_mode
UV_INTRA_MODES_CFL_NOT_ALLOWED 13 Number of values for uv_mode when chroma from luma is not allowed
UV_INTRA_MODES_CFL_ALLOWED 14 Number of values for uv_mode when chroma from luma is allowed
COMPOUND_MODES 8 Number of values for compound_mode
COMPOUND_MODE_CONTEXTS 8 Number of contexts for compound_mode
COMP_NEWMV_CTXS 5 Number of new mv values used when constructing context for compound_mode
NEW_MV_CONTEXTS 6 Number of contexts for new_mv
ZERO_MV_CONTEXTS 2 Number of contexts for zero_mv
REF_MV_CONTEXTS 6 Number of contexts for ref_mv
DRL_MODE_CONTEXTS 3 Number of contexts for drl_mode
MV_CONTEXTS 2 Number of contexts for decoding motion vectors including one for intra block copy
MV_INTRABC_CONTEXT 1 Motion vector context used for intra block copy
MV_JOINTS 4 Number of values for mv_joint
MV_CLASSES 11 Number of values for mv_class
CLASS0_SIZE 2 Number of values for mv_class0_bit
MV_OFFSET_BITS 10 Maximum number of bits for decoding motion vectors
MAX_LOOP_FILTER 63 Maximum value used for loop filtering
REF_SCALE_SHIFT 14 Number of bits of precision when scaling reference frames
SUBPEL_BITS 4 Number of bits of precision when choosing an inter prediction filter kernel
SUBPEL_MASK 15 ( 1 << SUBPEL_BITS ) - 1
SCALE_SUBPEL_BITS 10 Number of bits of precision when computing inter prediction locations
MV_BORDER 128 Value used when clipping motion vectors
PALETTE_COLOR_CONTEXTS 5 Number of values for color contexts
PALETTE_MAX_COLOR_CONTEXT_HASH 8 Number of mappings between color context hash and color context
PALETTE_BLOCK_SIZE_CONTEXTS 7 Number of values for palette block size
PALETTE_Y_MODE_CONTEXTS 3 Number of values for palette Y plane mode contexts
PALETTE_UV_MODE_CONTEXTS 2 Number of values for palette U and V plane mode contexts
PALETTE_SIZES 7 Number of values for palette_size
PALETTE_COLORS 8 Number of values for palette_color
PALETTE_NUM_NEIGHBORS 3 Number of neighbors considered within palette computation
DELTA_Q_SMALL 3 Value indicating alternative encoding of quantizer index delta values
DELTA_LF_SMALL 3 Value indicating alternative encoding of loop filter delta values
QM_TOTAL_SIZE 3344 Number of values in the quantizer matrix
MAX_ANGLE_DELTA 3 Maximum magnitude of AngleDeltaY and AngleDeltaUV
DIRECTIONAL_MODES 8 Number of directional intra modes
ANGLE_STEP 3 Number of degrees of step per unit increase in AngleDeltaY or AngleDeltaUV.
TX_SET_TYPES_INTRA 3 Number of intra transform set types
TX_SET_TYPES_INTER 4 Number of inter transform set types
WARPEDMODEL_PREC_BITS 16 Internal precision of warped motion models
IDENTITY 0 Warp model is just an identity transform
TRANSLATION 1 Warp model is a pure translation
ROTZOOM 2 Warp model is a rotation + symmetric zoom + translation
AFFINE 3 Warp model is a general affine transform
GM_ABS_TRANS_BITS 12 Number of bits encoded for translational components of global motion models, if part of a ROTZOOM or AFFINE model
GM_ABS_TRANS_ONLY_BITS 9 Number of bits encoded for translational components of global motion models, if part of a TRANSLATION model
GM_ABS_ALPHA_BITS 12 Number of bits encoded for non-translational components of global motion models
DIV_LUT_PREC_BITS 14 Number of fractional bits of entries in divisor lookup table
DIV_LUT_BITS 8 Number of fractional bits for lookup in divisor lookup table
DIV_LUT_NUM 257 Number of entries in divisor lookup table
MOTION_MODES 3 Number of values for motion modes
SIMPLE 0 Use translation or global motion compensation
OBMC 1 Use overlapped block motion compensation
LOCALWARP 2 Use local warp motion compensation
LEAST_SQUARES_SAMPLES_MAX 8 Largest number of samples used when computing a local warp
LS_MV_MAX 256 Largest motion vector difference to include in local warp computation
WARPEDMODEL_TRANS_CLAMP 1<<23 Clamping value used for translation components of warp
WARPEDMODEL_NONDIAGAFFINE_CLAMP 1<<13 Clamping value used for matrix components of warp
WARPEDPIXEL_PREC_SHIFTS 1<<6 Number of phases used in warped filtering
WARPEDDIFF_PREC_BITS 10 Number of extra bits of precision in warped filtering
GM_ALPHA_PREC_BITS 15 Number of fractional bits for sending non-translational warp model coefficients
GM_TRANS_PREC_BITS 6 Number of fractional bits for sending translational warp model coefficients
GM_TRANS_ONLY_PREC_BITS 3 Number of fractional bits used for pure translational warps
INTERINTRA_MODES 4 Number of inter intra modes
MASK_MASTER_SIZE 64 Size of MasterMask array
SEGMENT_ID_PREDICTED_CONTEXTS 3 Number of contexts for segment_id_predicted
IS_INTER_CONTEXTS 4 Number of contexts for is_inter
SKIP_CONTEXTS 3 Number of contexts for skip
FWD_REFS 4 Number of syntax elements for forward reference frames
BWD_REFS 3 Number of syntax elements for backward reference frames
SINGLE_REFS 7 Number of syntax elements for single reference frames
UNIDIR_COMP_REFS 4 Number of syntax elements for unidirectional compound reference frames
COMPOUND_TYPES 2 Number of values for compound_type
CFL_JOINT_SIGNS 8 Number of values for cfl_alpha_signs
CFL_ALPHABET_SIZE 16 Number of values for cfl_alpha_u and cfl_alpha_v
COMP_INTER_CONTEXTS 5 Number of contexts for comp_mode
COMP_REF_TYPE_CONTEXTS 5 Number of contexts for comp_ref_type
CFL_ALPHA_CONTEXTS 6 Number of contexts for cfl_alpha_u and cfl_alpha_v
INTRA_MODE_CONTEXTS 5 Number of each of left and above contexts for intra_frame_y_mode
COMP_GROUP_IDX_CONTEXTS 6 Number of contexts for comp_group_idx
COMPOUND_IDX_CONTEXTS 6 Number of contexts for compound_idx
INTRA_EDGE_KERNELS 3 Number of filter kernels for the intra edge filter
INTRA_EDGE_TAPS 5 Number of kernel taps for the intra edge filter
FRAME_LF_COUNT 4 Number of loop filter strength values
MAX_VARTX_DEPTH 2 Maximum depth for variable transform trees
TXFM_PARTITION_CONTEXTS 21 Number of contexts for txfm_split
REF_CAT_LEVEL 640 Bonus weight for close motion vectors
MAX_REF_MV_STACK_SIZE 8 Maximum number of motion vectors in the stack
MFMV_STACK_SIZE 3 Stack size for motion field motion vectors
MAX_TX_DEPTH 2 Maximum times the transform can be split
WEDGE_TYPES 16 Number of directions for the wedge mask process
FILTER_BITS 7 Number of bits used in Wiener filter coefficients
WIENER_COEFFS 3 Number of Wiener filter coefficients to read
SGRPROJ_PARAMS_BITS 4 Number of bits needed to specify self guided filter set
SGRPROJ_PRJ_SUBEXP_K 4 Controls how self guided deltas are read
SGRPROJ_PRJ_BITS 7 Precision bits during self guided restoration
SGRPROJ_RST_BITS 4 Restoration precision bits generated higher than source before projection
SGRPROJ_MTABLE_BITS 20 Precision of mtable division table
SGRPROJ_RECIP_BITS 12 Precision of division by n table
SGRPROJ_SGR_BITS 8 Internal precision bits for core selfguided_restoration
EC_PROB_SHIFT 6 Number of bits to reduce CDF precision during arithmetic coding
EC_MIN_PROB 4 Minimum probability assigned to each symbol during arithmetic coding
SELECT_SCREEN_CONTENT_TOOLS 2 Value that indicates the allow_screen_content_tools syntax element is coded
SELECT_INTEGER_MV 2 Value that indicates the force_integer_mv syntax element is coded
RESTORATION_TILESIZE_MAX 256 Maximum size of a loop restoration tile
MAX_FRAME_DISTANCE 31 Maximum distance when computing weighted prediction
MAX_OFFSET_WIDTH 8 Maximum horizontal offset of a projected motion vector
MAX_OFFSET_HEIGHT 0 Maximum vertical offset of a projected motion vector
WARP_PARAM_REDUCE_BITS 6 Rounding bitwidth for the parameters to the shear process
NUM_BASE_LEVELS 2 Number of quantizer base levels
COEFF_BASE_RANGE 12 The quantizer range above NUM_BASE_LEVELS above which the Exp-Golomb coding process is activated
BR_CDF_SIZE 4 Number of values for coeff_br
SIG_COEF_CONTEXTS_EOB 4 Number of contexts for coeff_base_eob
SIG_COEF_CONTEXTS_2D 26 Context offset for coeff_base for horizontal-only or vertical-only transforms.
SIG_COEF_CONTEXTS 42 Number of contexts for coeff_base
SIG_REF_DIFF_OFFSET_NUM 5 Maximum number of context samples to be used in determining the context index for coeff_base and coeff_base_eob.
SUPERRES_NUM 8 Numerator for upscaling ratio
SUPERRES_DENOM_MIN 9 Smallest denominator for upscaling ratio
SUPERRES_DENOM_BITS 3 Number of bits sent to specify denominator of upscaling ratio
SUPERRES_FILTER_BITS 6 Number of bits of fractional precision for upscaling filter selection
SUPERRES_FILTER_SHIFTS 1 << SUPERRES_FILTER_BITS Number of phases of upscaling filters
SUPERRES_FILTER_TAPS 8 Number of taps of upscaling filters
SUPERRES_FILTER_OFFSET 3 Sample offset for upscaling filters
SUPERRES_SCALE_BITS 14 Number of fractional bits for computing position in upscaling
SUPERRES_SCALE_MASK (1 << 14) - 1 Mask for computing position in upscaling
SUPERRES_EXTRA_BITS 8 Difference in precision between SUPERRES_SCALE_BITS and SUPERRES_FILTER_BITS
TXB_SKIP_CONTEXTS 13 Number of contexts for all_zero
EOB_COEF_CONTEXTS 9 Number of contexts for eob_extra
DC_SIGN_CONTEXTS 3 Number of contexts for dc_sign
LEVEL_CONTEXTS 21 Number of contexts for coeff_br
TX_CLASS_2D 0 Transform class for transform types performing non-identity transforms in both directions
TX_CLASS_HORIZ 1 Transform class for transforms performing only a horizontal non-identity transform
TX_CLASS_VERT 2 Transform class for transforms performing only a vertical non-identity transform
REFMVS_LIMIT ( 1 << 12 ) - 1 Largest reference MV component that can be saved
INTRA_FILTER_SCALE_BITS 4 Scaling shift for intra filtering process
INTRA_FILTER_MODES 5 Number of types of intra filtering
COEFF_CDF_Q_CTXS 4 Number of selectable context types for the coeff( ) syntax structure
PRIMARY_REF_NONE 7 Value of primary_ref_frame indicating that there is no primary reference frame
BUFFER_POOL_MAX_SIZE 10 Number of frames in buffer pool

Conventions

General

The mathematical operators and their precedence rules used to describe this Specification are similar to those used in the C programming language. However, the operation of integer division with truncation is specifically defined.

In addition, a length 2 array used to hold a motion vector (indicated by the variable name ending with the letters Mv or Mvs) can be accessed using either array notation (e.g. Mv[ 0 ] and Mv[ 1 ]), or by just the name (e.g., Mv). The only operations defined when using the name are assignment and equality/inequality testing. Assignment of an array is represented using the notation A = B and is specified to mean the same as doing both the individual assignments A[ 0 ] = B[ 0 ] and A[ 1 ] = B[ 1 ]. Equality testing of 2 motion vectors is represented using the notation A == B and is specified to mean the same as (A[ 0 ] == B[ 0 ] && A[ 1 ] == B[ 1 ]). Inequality testing is defined as A != B and is specified to mean the same as (A[ 0 ] != B[ 0 ] || A[ 1 ] != B[ 1 ]).

When a variable is said to be representable by a signed integer with x bits, it means that the variable is greater than or equal to -(1 << (x-1)), and that the variable is less than or equal to (1 << (x-1))-1.

The key words “must”, “must not”, “required”, “shall”, “shall not”, “should”, “should not”, “recommended”, “may”, and “optional” in this document are to be interpreted as described in RFC 2119.

Arithmetic operators

+ Addition
Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Integer division with truncation of the result toward zero. For example, 7/4 and -7/-4 are truncated to 1 and -7/4 and 7/-4 are truncated to -1.
a % b Remainder from division of a by b. Both a and b are positive integers.
÷ Floating point (arithmetical) division.
ceil(x) The smallest integer that is greater or equal than x.
floor(x) The largest integer that is smaller or equal than x.

Logical operators

a && b Logical AND operation between a and b
a || b Logical OR operation between a and b
! Logical NOT operation.

Relational operators

> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
!= Not equal to

Bitwise operators

& AND operation
| OR operation
^ XOR operation
~ Negation operation
a >> b Shift a in 2’s complement binary integer representation format to the right by b bit positions. This operator is only used with b being a non-negative integer. Bits shifted into the MSBs as a result of the right shift have a value equal to the MSB of a prior to the shift operation.
a << b Shift a in 2’s complement binary integer representation format to the left by b bit positions. This operator is only used with b being a non-negative integer. Bits shifted into the LSBs as a result of the left shift have a value equal to 0.

Assignment

= Assignment operator
++ Increment, x++ is equivalent to x = x + 1. When this operator is used for an array index, the variable value is obtained before the auto increment operation
- - Decrement, i.e. x-- is equivalent to x = x - 1. When this operator is used for an array index, the variable value is obtained before the auto decrement operation
+= Addition assignment operator, for example x += 3 corresponds to x = x + 3
-= Subtraction assignment operator, for example x -= 3 corresponds to x = x - 3

Mathematical functions

The following mathematical functions (Abs, Clip3, Clip1, Min, Max, Round2 and Round2Signed) are defined as follows:

The definition of Round2 uses standard mathematical power and division operations, not integer operations. An equivalent definition using integer operations is:

Round2( x, n ) {
  if ( n == 0 )
    return x
  return ( x + ( 1 << (n - 1) ) ) >> n
}

The FloorLog2(x) function is defined to be the floor of the base 2 logarithm of the input x.

The input x will always be an integer, and will always be greater than or equal to 1.

This function extracts the location of the most significant bit in x.

An equivalent definition (using the pseudo-code notation introduced in the following section) is:

FloorLog2( x ) {
  s = 0
  while ( x != 0 ) {
    x = x >> 1
    s++
  }
  return s - 1
}

The CeilLog2(x) function is defined to be the ceiling of the base 2 logarithm of the input x (when x is 0, it is defined to return 0).

The input x will always be an integer, and will always be greater than or equal to 0.

This function extracts the number of bits needed to code a value in the range 0 to x-1.

An equivalent definition (using the pseudo-code notation introduced in the following section) is:

CeilLog2( x ) {
  if ( x < 2 )
    return 0
  i = 1
  p = 2
  while ( p < x ) {
    i++
    p = p << 1
  }
  return i
}

Method of describing bitstream syntax

The description style of the syntax is similar to the C programming language. Syntax elements in the bitstream are represented in bold type. Each syntax element is described by its name (using only lower case letters with underscore characters) and a descriptor for its method of coded representation. The decoding process behaves according to the value of the syntax element and to the values of previously decoded syntax elements. When a value of a syntax element is used in the syntax tables or the text, it appears in regular (i.e. not bold) type. If the value of a syntax element is being computed (e.g. being written with a default value instead of being coded in the bitstream), it also appears in regular type (e.g. tile_size_minus_1).

In some cases the syntax tables may use the values of other variables derived from syntax elements values. Such variables appear in the syntax tables, or text, named by a mixture of lower case and upper case letter and without any underscore characters. Variables starting with an upper case letter are derived for the decoding of the current syntax structure and all depending syntax structures. These variables may be used in the decoding process for later syntax structures. Variables starting with a lower case letter are only used within the process from which they are derived. (Single character variables are allowed.)

Constant values appear in all upper case letters with underscore characters (e.g. MI_SIZE).

Constant lookup tables appear as words (with the first letter of each word in upper case, and remaining letters in lower case) separated with underscore characters (e.g. Block_Width[…]).

Hexadecimal notation, indicated by prefixing the hexadecimal number by 0x, may be used when the number of bits is an integer multiple of 4. For example, 0x1a represents a bit string 0001 1010.

Binary notation is indicated by prefixing the binary number by 0b. For example, 0b00011010 represents a bit string 0001 1010. Binary numbers may include underscore characters to enhance readability. If present, the underscore characters appear every 4 binary digits starting from the LSB. For example, 0b11010 may also be written as 0b1_1010.

A value equal to 0 represents a FALSE condition in a test statement. The value TRUE is represented by any value not equal to 0.

The following table lists examples of the syntax specification format. When syntax_element appears (with bold face font), it specifies that this syntax element is parsed from the bitstream.

                                                           Type
/* A statement can be a syntax element with associated  
descriptor or can be an expression used to specify its  
existence, type, and value, as in the following  
examples */  
                                                            
syntax_element f(1)
   
/* A group of statements enclosed in brackets is a  
compound statement and is treated functionally as a single  
statement. */  
   
{  
    statement  
    …  
}  
   
/* A “while” structure specifies that the statement is  
to be evaluated repeatedly while the condition remains  
true. */  
   
while ( condition )  
    statement  
   
/* A “do .. while” structure executes the statement once,  
and then tests the condition. It repeatedly evaluates the  
statement while the condition remains true. */  
   
do  
    statement  
while ( condition )  
   
/* An “if .. else” structure tests the condition first. If  
it is true, the primary statement is evaluated. Otherwise,  
the alternative statement is evaluated. If the alternative  
statement is unnecessary to be evaluated, the “else” and  
corresponding alternative statement can be omitted. */  
   
if ( condition )  
    primary statement  
else  
    alternative statement  
   
/* A “for” structure evaluates the initial statement at the  
beginning then tests the condition. If it is true, the primary  
and subsequent statements are evaluated until the condition  
becomes false. */  
   
for ( initial statement; condition; subsequent statement )  
    primary statement  
   
/* The return statement in a syntax structure specifies  
that the parsing of the syntax structure will be terminated  
without processing any additional information after this stage.  
When a value immediately follows a return statement, this value  
shall also be returned as the output of this syntax structure. */  
   
return x  

Functions

Bitstream functions used for syntax description are specified in this section.

Other functions are included in the syntax tables. The convention is that a section is called syntax if it causes syntax elements to be read from the bitstream, either directly or indirectly through subprocesses. The remaining sections are called functions.

The specification of these functions makes use of a bitstream position indicator. This bitstream position indicator locates the position of the bit that is going to be read next.

get_position( ): Return the value of the bitstream position indicator.

init_symbol( sz ): Initialize the arithmetic decode process for the Symbol decoder with a size of sz bytes as specified in section 8.2.2.

exit_symbol( ): Exit the arithmetic decode process as described in section 8.2.4 (this includes reading trailing bits).

Descriptors

General

The following descriptors specify the parsing of syntax elements. Lower case descriptors specify syntax elements that are represented by an integer number of bits in the bitstream; upper case descriptors specify syntax elements that are represented by arithmetic coding.

f(n)

Unsigned n-bit number appearing directly in the bitstream. The bits are read from high to low order. The parsing process specified in section 8.1 is invoked and the syntax element is set equal to the return value.

uvlc()

Variable length unsigned n-bit number appearing directly in the bitstream. The parsing process for this descriptor is specified below:

uvlc() { Type
    leadingZeros = 0  
    while ( 1 ) {  
        done f(1)
        if ( done )  
            break  
        leadingZeros++  
    }  
    if ( leadingZeros >= 32 ) {  
        return ( 1 << 32 ) - 1  
    }  
    value f(leadingZeros)
    return value + ( 1 << leadingZeros ) - 1  
}  

le(n)

Unsigned little-endian n-byte number appearing directly in the bitstream. The parsing process for this descriptor is specified below:

le(n) { Type
    t = 0  
    for ( i = 0; i < n; i++) {  
        byte f(8)
        t += ( byte << ( i * 8 ) )  
    }  
    return t  
}  

Note: This syntax element will only be present when the bitstream position is byte aligned.

leb128()

Unsigned integer represented by a variable number of little-endian bytes.

Note: This syntax element will only be present when the bitstream position is byte aligned.

In this encoding, the most significant bit of each byte is equal to 1 to signal that more bytes should be read, or equal to 0 to signal the end of the encoding.

A variable Leb128Bytes is set equal to the number of bytes read during this process.

The parsing process for this descriptor is specified below:

leb128() { Type
    value = 0  
    Leb128Bytes = 0  
    for ( i = 0; i < 8; i++ ) {  
        leb128_byte f(8)
        value |= ( (leb128_byte & 0x7f) << (i*7) )  
        Leb128Bytes += 1  
        if ( !(leb128_byte & 0x80) ) {  
            break  
        }  
    }  
    return value  
}  

It is a requirement of bitstream conformance that the value returned from the leb128 parsing process is less than or equal to (1 << 32) - 1.

leb128_byte contains 8 bits read from the bitstream. The bottom 7 bits are used to compute the variable value. The most significant bit is used to indicate that there are more bytes to be read.

It is a requirement of bitstream conformance that the most significant bit of leb128_byte is equal to 0 if i is equal to 7. (This ensures that this syntax descriptor never uses more than 8 bytes.)

Note: There are multiple ways of encoding the same value depending on how many leading zero bits are encoded. There is no requirement that this syntax descriptor uses the most compressed representation. This can be useful for encoder implementations by allowing a fixed amount of space to be filled in later when the value becomes known.

su(n)

Signed integer converted from an n bits unsigned integer in the bitstream. (The unsigned integer corresponds to the bottom n bits of the signed integer.) The parsing process for this descriptor is specified below:

su(n) { Type
    value f(n)
    signMask = 1 << (n - 1)  
    if ( value & signMask )  
        value = value - 2 * signMask  
    return value  
}  

ns(n)

Unsigned encoded integer with maximum number of values n (i.e. output in range 0..n-1).

This descriptor is similar to f(CeilLog2(n)), but reduces wastage incurred when encoding non-power of two value ranges by encoding 1 fewer bits for the lower part of the value range. For example, when n is equal to 5, the encodings are as follows (full binary encodings are also presented for comparison):

Value Full binary encoding ns(n) encoding
    0 000 00
    1 001 01
    2 010 10
    3 011 110
    4 100 111

The parsing process for this descriptor is specified as:

ns( n ) { Type
    w = FloorLog2(n) + 1  
    m = (1 << w) - n  
    v f(w - 1)
    if ( v < m )  
        return v  
    extra_bit f(1)
    return (v << 1) - m + extra_bit  
}  

The abbreviation ns stands for non-symmetric. This encoding is non-symmetric because the values are not all coded with the same number of bits.

L(n)

Unsigned arithmetic encoded n-bit number encoded as n flags (a “literal”). The flags are read from high to low order. The syntax element is set equal to the return value of read_literal( n ) (see section 8.2.5 for a specification of this process).

S()

An arithmetic encoded symbol coded from a small alphabet of at most 16 entries.

The symbol is decoded based on a context sensitive CDF (see section 8.3 for the specification of this process).

NS(n)

Unsigned arithmetic encoded integer with maximum number of values n (i.e. output in range 0..n-1).

This descriptor is the same as ns(n) except the underlying bits are coded arithmetically.

The parsing process for this descriptor is specified as:

NS( n ) { Type
    w = FloorLog2(n) + 1  
    m = (1 << w) - n  
    v L(w - 1)
    if ( v < m )  
        return v  
    extra_bit L(1)
    return (v << 1) - m + extra_bit  
}  

Syntax structures

General

This section presents the syntax structures in a tabular form. The meaning of each of the syntax elements is presented in Section 6.

Low overhead bitstream format

This specification defines a low-overhead bitstream format as a sequence of the OBU syntactical elements defined in this section. When using this format, obu_has_size_field must be equal to 1. For applications requiring a format where it is easier to skip through frames or temporal units, a length-delimited bitstream format is defined in Annex B.

Derived specifications, such as container formats enabling storage of AV1 videos together with audio or subtitles, should indicate which of these formats they rely on. Other methods of packing OBUs into a bitstream format are also allowed.

OBU syntax

General OBU syntax

open_bitstream_unit( sz ) { Type
    obu_header()  
    if ( obu_has_size_field ) {  
        obu_size leb128()
    } else {  
        obu_size = sz - 1 - obu_extension_flag  
    }  
    startPosition = get_position( )  
    if ( obu_type != OBU_SEQUENCE_HEADER &&  
         obu_type != OBU_TEMPORAL_DELIMITER &&  
         OperatingPointIdc != 0 &&  
         obu_extension_flag == 1 )  
    {  
        inTemporalLayer = (OperatingPointIdc >> temporal_id ) & 1  
        inSpatialLayer = (OperatingPointIdc >> ( spatial_id + 8 ) ) & 1  
        if ( !inTemporalLayer || ! inSpatialLayer ) {  
            drop_obu( )  
            return  
        }  
    }  
    if ( obu_type == OBU_SEQUENCE_HEADER )  
        sequence_header_obu( )  
    else if ( obu_type == OBU_TEMPORAL_DELIMITER )  
        temporal_delimiter_obu( )  
    else if ( obu_type == OBU_FRAME_HEADER )  
        frame_header_obu( )  
    else if ( obu_type == OBU_REDUNDANT_FRAME_HEADER )  
        frame_header_obu( )  
    else if ( obu_type == OBU_TILE_GROUP )  
        tile_group_obu( obu_size )  
    else if ( obu_type == OBU_METADATA )  
        metadata_obu( )  
    else if ( obu_type == OBU_FRAME )  
        frame_obu( obu_size )  
    else if ( obu_type == OBU_TILE_LIST )  
        tile_list_obu( )  
    else if ( obu_type == OBU_PADDING )  
        padding_obu( )  
    else  
        reserved_obu( )  
    currentPosition = get_position( )  
    payloadBits = currentPosition - startPosition  
    if ( obu_size > 0 && obu_type != OBU_TILE_GROUP &&  
         obu_type != OBU_TILE_LIST &&  
         obu_type != OBU_FRAME ) {  
        trailing_bits( obu_size * 8 - payloadBits )  
    }  
}  

OBU header syntax

obu_header() { Type
    obu_forbidden_bit f(1)
    obu_type f(4)
    obu_extension_flag f(1)
    obu_has_size_field f(1)
    obu_reserved_1bit f(1)
    if ( obu_extension_flag == 1 )  
        obu_extension_header()  
}  

OBU extension header syntax

obu_extension_header() { Type
    temporal_id f(3)
    spatial_id f(2)
    extension_header_reserved_3bits f(3)
}  

Trailing bits syntax

trailing_bits( nbBits ) { Type
    trailing_one_bit f(1)
    nbBits--  
    while ( nbBits > 0 ) {  
        trailing_zero_bit f(1)
        nbBits--  
    }  
}  

Byte alignment syntax

byte_alignment( ) { Type
    while ( get_position( ) & 7 )  
        zero_bit f(1)
}  

Reserved OBU syntax

reserved_obu( ) { Type
}  

Note: Reserved OBUs do not have a defined syntax. The obu_type reserved values are reserved for future use. Decoders should ignore the entire OBU if they do not understand the obu_type. Ignoring the OBU can be done based on obu_size. The last byte of the valid content of the payload data for this OBU type is considered to be the last byte that is not equal to zero. This rule is to prevent the dropping of valid bytes by systems that interpret trailing zero bytes as a continuation of the trailing bits in an OBU. This implies that when any payload data is present for this OBU type, at least one byte of the payload data (including the trailing bit) shall not be equal to 0.

Sequence header OBU syntax

General sequence header OBU syntax

sequence_header_obu( ) { Type
    seq_profile f(3)
    still_picture f(1)
    reduced_still_picture_header f(1)
    if ( reduced_still_picture_header ) {  
        timing_info_present_flag = 0  
        decoder_model_info_present_flag = 0  
        initial_display_delay_present_flag = 0  
        operating_points_cnt_minus_1 = 0  
        operating_point_idc[ 0 ] = 0  
        seq_level_idx[ 0 ] f(5)
        seq_tier[ 0 ] = 0  
        decoder_model_present_for_this_op[ 0 ] = 0  
        initial_display_delay_present_for_this_op[ 0 ] = 0  
    } else {  
        timing_info_present_flag f(1)
        if ( timing_info_present_flag ) {  
            timing_info( )  
            decoder_model_info_present_flag f(1)
            if ( decoder_model_info_present_flag ) {  
                decoder_model_info( )  
            }  
        } else {  
            decoder_model_info_present_flag = 0  
        }  
        initial_display_delay_present_flag f(1)
        operating_points_cnt_minus_1 f(5)
        for ( i = 0; i <= operating_points_cnt_minus_1; i++ ) {  
            operating_point_idc[ i ] f(12)
            seq_level_idx[ i ] f(5)
            if ( seq_level_idx[ i ] > 7 ) {  
                seq_tier[ i ] f(1)
            } else {  
                seq_tier[ i ] = 0  
            }  
            if ( decoder_model_info_present_flag ) {  
                decoder_model_present_for_this_op[ i ] f(1)
                if ( decoder_model_present_for_this_op[ i ] ) {  
                    operating_parameters_info( i )  
                }  
            } else {  
                decoder_model_present_for_this_op[ i ] = 0  
            }  
            if ( initial_display_delay_present_flag ) {  
                initial_display_delay_present_for_this_op[ i ] f(1)
                if ( initial_display_delay_present_for_this_op[ i ] ) {  
                    initial_display_delay_minus_1[ i ] f(4)
                }  
            }  
        }  
    }  
    operatingPoint = choose_operating_point( )  
    OperatingPointIdc = operating_point_idc[ operatingPoint ]  
    frame_width_bits_minus_1 f(4)
    frame_height_bits_minus_1 f(4)
    n = frame_width_bits_minus_1 + 1  
    max_frame_width_minus_1 f(n)
    n = frame_height_bits_minus_1 + 1  
    max_frame_height_minus_1 f(n)
    if ( reduced_still_picture_header )  
        frame_id_numbers_present_flag = 0  
    else  
        frame_id_numbers_present_flag f(1)
    if ( frame_id_numbers_present_flag ) {  
        delta_frame_id_length_minus_2 f(4)
        additional_frame_id_length_minus_1 f(3)
    }  
    use_128x128_superblock f(1)
    enable_filter_intra f(1)
    enable_intra_edge_filter f(1)
    if ( reduced_still_picture_header ) {  
        enable_interintra_compound = 0  
        enable_masked_compound = 0  
        enable_warped_motion = 0  
        enable_dual_filter = 0  
        enable_order_hint = 0  
        enable_jnt_comp = 0  
        enable_ref_frame_mvs = 0  
        seq_force_screen_content_tools = SELECT_SCREEN_CONTENT_TOOLS  
        seq_force_integer_mv = SELECT_INTEGER_MV  
        OrderHintBits = 0  
    } else {  
        enable_interintra_compound f(1)
        enable_masked_compound f(1)
        enable_warped_motion f(1)
        enable_dual_filter f(1)
        enable_order_hint f(1)
        if ( enable_order_hint ) {  
            enable_jnt_comp f(1)
            enable_ref_frame_mvs f(1)
        } else {  
            enable_jnt_comp = 0  
            enable_ref_frame_mvs = 0  
        }  
        seq_choose_screen_content_tools f(1)
        if ( seq_choose_screen_content_tools ) {  
            seq_force_screen_content_tools = SELECT_SCREEN_CONTENT_TOOLS  
        } else {  
            seq_force_screen_content_tools f(1)
        }  
                                                                
        if ( seq_force_screen_content_tools > 0 ) {  
            seq_choose_integer_mv f(1)
            if ( seq_choose_integer_mv ) {  
                seq_force_integer_mv = SELECT_INTEGER_MV  
            } else {  
                seq_force_integer_mv f(1)
            }  
        } else {  
            seq_force_integer_mv = SELECT_INTEGER_MV  
        }  
        if ( enable_order_hint ) {  
            order_hint_bits_minus_1 f(3)
            OrderHintBits = order_hint_bits_minus_1 + 1  
        } else {  
            OrderHintBits = 0  
        }  
    }  
    enable_superres f(1)
    enable_cdef f(1)
    enable_restoration f(1)
    color_config( )  
    film_grain_params_present f(1)
}  

Color config syntax

color_config( ) { Type
    high_bitdepth f(1)
    if ( seq_profile == 2 && high_bitdepth ) {  
        twelve_bit f(1)
        BitDepth = twelve_bit ? 12 : 10  
    } else if ( seq_profile <= 2 ) {  
        BitDepth = high_bitdepth ? 10 : 8  
    }  
    if ( seq_profile == 1 ) {  
        mono_chrome = 0  
    } else {  
        mono_chrome f(1)
    }  
    NumPlanes = mono_chrome ? 1 : 3  
    color_description_present_flag f(1)
    if ( color_description_present_flag ) {  
        color_primaries f(8)
        transfer_characteristics f(8)
        matrix_coefficients f(8)
    } else {  
        color_primaries = CP_UNSPECIFIED  
        transfer_characteristics = TC_UNSPECIFIED  
        matrix_coefficients = MC_UNSPECIFIED  
    }  
    if ( mono_chrome ) {  
        color_range f(1)
        subsampling_x = 1  
        subsampling_y = 1  
        chroma_sample_position = CSP_UNKNOWN  
        separate_uv_delta_q = 0  
        return  
    } else if ( color_primaries == CP_BT_709 &&  
                transfer_characteristics == TC_SRGB &&  
                matrix_coefficients == MC_IDENTITY ) {  
        color_range = 1  
        subsampling_x = 0  
        subsampling_y = 0  
    } else {  
        color_range f(1)
        if ( seq_profile == 0 ) {  
            subsampling_x = 1  
            subsampling_y = 1  
        } else if ( seq_profile == 1 ) {  
            subsampling_x = 0  
            subsampling_y = 0  
        } else {  
            if ( BitDepth == 12 ) {  
                subsampling_x f(1)
                if ( subsampling_x )  
                    subsampling_y f(1)
                else  
                    subsampling_y = 0  
            } else {  
                subsampling_x = 1  
                subsampling_y = 0  
            }  
        }  
        if ( subsampling_x && subsampling_y ) {  
            chroma_sample_position f(2)
        }  
    }  
    separate_uv_delta_q f(1)
}  

Timing info syntax

timing_info( ) { Type
    num_units_in_display_tick f(32)
    time_scale f(32)
    equal_picture_interval f(1)
    if ( equal_picture_interval )  
       num_ticks_per_picture_minus_1 uvlc()
}  

Decoder model info syntax

decoder_model_info( ) { Type
    buffer_delay_length_minus_1 f(5)
    num_units_in_decoding_tick f(32)
    buffer_removal_time_length_minus_1 f(5)
    frame_presentation_time_length_minus_1 f(5)
}  

Operating parameters info syntax

operating_parameters_info( op ) { Type
    n = buffer_delay_length_minus_1 + 1  
    decoder_buffer_delay[ op ] f(n)
    encoder_buffer_delay[ op ] f(n)
    low_delay_mode_flag[ op ] f(1)
}  

Temporal delimiter obu syntax

temporal_delimiter_obu( ) { Type
    SeenFrameHeader = 0  
}  

Note: The temporal delimiter has an empty payload.

Padding OBU syntax

padding_obu( ) { Type
    for ( i = 0; i < obu_padding_length; i++ )  
        obu_padding_byte f(8)
}  

Note: obu_padding_length is not coded in the bitstream but can be computed based on obu_size minus the number of trailing bytes. In practice, though, since this is padding data meant to be skipped, decoders do not need to determine either that length nor the number of trailing bytes. They can ignore the entire OBU. Ignoring the OBU can be done based on obu_size. The last byte of the valid content of the payload data for this OBU type is considered to be the last byte that is not equal to zero. This rule is to prevent the dropping of valid bytes by systems that interpret trailing zero bytes as a continuation of the trailing bits in an OBU. This implies that when any payload data is present for this OBU type, at least one byte of the payload data (including the trailing bit) shall not be equal to 0.

Metadata OBU syntax

General metadata OBU syntax

metadata_obu( ) { Type
    metadata_type leb128()
    if ( metadata_type == METADATA_TYPE_ITUT_T35 )  
        metadata_itut_t35( )  
    else if ( metadata_type == METADATA_TYPE_HDR_CLL )  
        metadata_hdr_cll( )  
    else if ( metadata_type == METADATA_TYPE_HDR_MDCV )  
        metadata_hdr_mdcv( )  
    else if ( metadata_type == METADATA_TYPE_SCALABILITY )  
        metadata_scalability( )  
    else if ( metadata_type == METADATA_TYPE_TIMECODE )  
        metadata_timecode( )  
}  

Note: The exact syntax of metadata_obu is not defined in this specification when metadata_type is equal to a value reserved for future use or a user private value. Decoders should ignore the entire OBU if they do not understand the metadata_type. The last byte of the valid content of the data is considered to be the last byte that is not equal to zero. This rule is to prevent the dropping of valid bytes by systems that interpret trailing zero bytes as a padding continuation of the trailing bits in an OBU. This implies that when any payload data is present for this OBU type, at least one byte of the payload data (including the trailing bit) shall not be equal to 0.

Metadata ITUT T35 syntax

metadata_itut_t35( ) { Type
    itu_t_t35_country_code f(8)
    if ( itu_t_t35_country_code == 0xFF ) {  
        itu_t_t35_country_code_extension_byte f(8)
    }  
    itu_t_t35_payload_bytes  
}  

Note: The exact syntax of itu_t_t35_payload_bytes is not defined in this specification. External specifications can define the syntax. Decoders should ignore the entire OBU if they do not understand it. The last byte of the valid content of the data is considered to be the last byte that is not equal to zero. This rule is to prevent the dropping of valid bytes by systems that interpret trailing zero bytes as a padding continuation of the trailing bits in an OBU. This implies that when any payload data is present for this OBU type, at least one byte of the payload data (including the trailing bit) shall not be equal to 0.

Metadata high dynamic range content light level syntax

metadata_hdr_cll( ) { Type
    max_cll f(16)
    max_fall f(16)
}  

Metadata high dynamic range mastering display color volume syntax

metadata_hdr_mdcv( ) { Type
    for ( i = 0; i < 3; i++ ) {  
        primary_chromaticity_x[ i ] f(16)
        primary_chromaticity_y[ i ] f(16)
    }  
    white_point_chromaticity_x f(16)
    white_point_chromaticity_y f(16)
    luminance_max f(32)
    luminance_min f(32)
}  

Metadata scalability syntax

metadata_scalability( ) { Type
    scalability_mode_idc f(8)
    if ( scalability_mode_idc == SCALABILITY_SS )  
        scalability_structure( )  
}  

Scalability structure syntax

scalability_structure( ) { Type
    spatial_layers_cnt_minus_1 f(2)
    spatial_layer_dimensions_present_flag f(1)
    spatial_layer_description_present_flag f(1)
    temporal_group_description_present_flag f(1)
    scalability_structure_reserved_3bits f(3)
    if ( spatial_layer_dimensions_present_flag ) {  
        for ( i = 0; i <= spatial_layers_cnt_minus_1 ; i++ ) {  
            spatial_layer_max_width[ i ] f(16)
            spatial_layer_max_height[ i ] f(16)
        }  
    }  
    if ( spatial_layer_description_present_flag ) {  
        for ( i = 0; i <= spatial_layers_cnt_minus_1; i++ )  
            spatial_layer_ref_id[ i ] f(8)
    }  
    if ( temporal_group_description_present_flag ) {  
        temporal_group_size f(8)
        for ( i = 0; i < temporal_group_size; i++ ) {  
            temporal_group_temporal_id[ i ] f(3)
            temporal_group_temporal_switching_up_point_flag[ i ] f(1)
            temporal_group_spatial_switching_up_point_flag[ i ] f(1)
            temporal_group_ref_cnt[ i ] f(3)
            for ( j = 0; j < temporal_group_ref_cnt[ i ]; j++ ) {  
                temporal_group_ref_pic_diff[ i ][ j ] f(8)
            }  
        }  
    }  
}  

Metadata timecode syntax

metadata_timecode( ) { Type
    counting_type f(5)
    full_timestamp_flag f(1)
    discontinuity_flag f(1)
    cnt_dropped_flag f(1)
    n_frames f(9)
    if ( full_timestamp_flag ) {  
        seconds_value f(6)
        minutes_value f(6)
        hours_value f(5)
    } else {  
        seconds_flag f(1)
        if ( seconds_flag ) {  
            seconds_value f(6)
            minutes_flag f(1)
            if ( minutes_flag ) {  
                minutes_value f(6)
                hours_flag f(1)
                if ( hours_flag ) {  
                    hours_value f(5)
                }  
            }  
        }  
    }  
    time_offset_length f(5)
    if ( time_offset_length > 0 ) {  
        time_offset_value f(time_offset_length)
    }  
}  

Frame header OBU syntax

General frame header OBU syntax

frame_header_obu( ) { Type
    if ( SeenFrameHeader == 1 ) {  
        frame_header_copy()  
    } else {  
        SeenFrameHeader = 1  
        uncompressed_header( )  
        if ( show_existing_frame ) {  
            decode_frame_wrapup( )  
            SeenFrameHeader = 0  
        } else {  
            TileNum = 0  
            SeenFrameHeader = 1  
        }  
    }  
}  

Uncompressed header syntax

uncompressed_header( ) { Type
    if ( frame_id_numbers_present_flag ) {  
        idLen = ( additional_frame_id_length_minus_1 +  
                  delta_frame_id_length_minus_2 + 3 )  
    }  
    allFrames = (1 << NUM_REF_FRAMES) - 1  
    if ( reduced_still_picture_header ) {  
        show_existing_frame = 0  
        frame_type = KEY_FRAME  
        FrameIsIntra = 1  
        show_frame = 1  
        showable_frame = 0  
    } else {  
        show_existing_frame f(1)
        if ( show_existing_frame == 1 ) {  
            frame_to_show_map_idx f(3)
            if ( decoder_model_info_present_flag && !equal_picture_interval ) {  
                temporal_point_info( )  
            }  
            refresh_frame_flags = 0  
            if ( frame_id_numbers_present_flag ) {  
                display_frame_id f(idLen)
            }  
            frame_type = RefFrameType[ frame_to_show_map_idx ]  
            if ( frame_type == KEY_FRAME ) {  
                refresh_frame_flags = allFrames  
            }  
            if ( film_grain_params_present ) {  
                load_grain_params( frame_to_show_map_idx )  
            }  
            return  
        }  
        frame_type f(2)
        FrameIsIntra = (frame_type == INTRA_ONLY_FRAME ||  
                        frame_type == KEY_FRAME)  
        show_frame f(1)
        if ( show_frame && decoder_model_info_present_flag && !equal_picture_interval ) {  
            temporal_point_info( )  
        }  
        if ( show_frame ) {  
            showable_frame = frame_type != KEY_FRAME  
        } else {  
            showable_frame f(1)
        }  
        if ( frame_type == SWITCH_FRAME ||  
             ( frame_type == KEY_FRAME && show_frame ) )  
            error_resilient_mode = 1  
        else  
            error_resilient_mode f(1)
    }  
    if ( frame_type == KEY_FRAME && show_frame ) {  
        for ( i = 0; i < NUM_REF_FRAMES; i++ ) {  
            RefValid[ i ] = 0  
            RefOrderHint[ i ] = 0  
        }  
        for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
            OrderHints[ LAST_FRAME + i ] = 0  
        }  
    }  
    disable_cdf_update f(1)
    if ( seq_force_screen_content_tools == SELECT_SCREEN_CONTENT_TOOLS ) {  
        allow_screen_content_tools f(1)
    } else {  
        allow_screen_content_tools = seq_force_screen_content_tools  
    }  
    if ( allow_screen_content_tools ) {  
        if ( seq_force_integer_mv == SELECT_INTEGER_MV ) {  
            force_integer_mv f(1)
        } else {  
            force_integer_mv = seq_force_integer_mv  
        }  
    } else {  
        force_integer_mv = 0  
    }  
    if ( FrameIsIntra ) {  
        force_integer_mv = 1  
    }  
    if ( frame_id_numbers_present_flag ) {  
        PrevFrameID = current_frame_id  
        current_frame_id f(idLen)
        mark_ref_frames( idLen )  
    } else {  
        current_frame_id = 0  
    }  
    if ( frame_type == SWITCH_FRAME )  
        frame_size_override_flag = 1  
    else if ( reduced_still_picture_header )  
        frame_size_override_flag = 0  
    else  
        frame_size_override_flag f(1)
    order_hint f(OrderHintBits)
    OrderHint = order_hint  
    if ( FrameIsIntra || error_resilient_mode ) {  
        primary_ref_frame = PRIMARY_REF_NONE  
    } else {  
        primary_ref_frame f(3)
    }  
    if ( decoder_model_info_present_flag ) {  
        buffer_removal_time_present_flag f(1)
        if ( buffer_removal_time_present_flag ) {  
            for ( opNum = 0; opNum <= operating_points_cnt_minus_1; opNum++ ) {  
                if ( decoder_model_present_for_this_op[ opNum ] ) {  
                    opPtIdc = operating_point_idc[ opNum ]  
                    inTemporalLayer = ( opPtIdc >> temporal_id ) & 1  
                    inSpatialLayer = ( opPtIdc >> ( spatial_id + 8 ) ) & 1  
                    if ( opPtIdc == 0 || ( inTemporalLayer && inSpatialLayer ) ) {  
                        n = buffer_removal_time_length_minus_1 + 1  
                        buffer_removal_time[ opNum ] f(n)
                    }  
                }  
            }  
        }  
    }  
    allow_high_precision_mv = 0  
    use_ref_frame_mvs = 0  
    allow_intrabc = 0  
    if ( frame_type == SWITCH_FRAME ||  
         ( frame_type == KEY_FRAME && show_frame ) ) {  
        refresh_frame_flags = allFrames  
    } else {  
        refresh_frame_flags f(8)
    }  
    if ( !FrameIsIntra || refresh_frame_flags != allFrames ) {  
        if ( error_resilient_mode && enable_order_hint ) {  
            for ( i = 0; i < NUM_REF_FRAMES; i++) {  
                ref_order_hint[ i ] f(OrderHintBits)
                if ( ref_order_hint[ i ] != RefOrderHint[ i ] ) {  
                    RefValid[ i ] = 0  
                }  
            }  
        }  
    }  
    if ( frame_type == KEY_FRAME ) {  
        frame_size( )  
        render_size( )  
        if ( allow_screen_content_tools && UpscaledWidth == FrameWidth ) {  
            allow_intrabc f(1)
        }  
    } else {  
        if ( frame_type == INTRA_ONLY_FRAME ) {  
            frame_size( )  
            render_size( )  
            if ( allow_screen_content_tools && UpscaledWidth == FrameWidth ) {  
                allow_intrabc f(1)
            }  
        } else {  
            if ( !enable_order_hint ) {  
              frame_refs_short_signaling = 0  
            } else {  
              frame_refs_short_signaling f(1)
              if ( frame_refs_short_signaling ) {  
                last_frame_idx f(3)
                gold_frame_idx f(3)
                set_frame_refs()  
              }  
            }  
            for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
                if ( !frame_refs_short_signaling )  
                  ref_frame_idx[ i ] f(3)
                if ( frame_id_numbers_present_flag ) {  
                    n = delta_frame_id_length_minus_2 + 2  
                    delta_frame_id_minus_1 f(n)
                    DeltaFrameId = delta_frame_id_minus_1 + 1  
                    expectedFrameId[ i ] = ((current_frame_id + (1 << idLen) -  
                                            DeltaFrameId ) % (1 << idLen))  
                }  
            }  
            if ( frame_size_override_flag && !error_resilient_mode ) {  
                frame_size_with_refs( )  
            } else {  
                frame_size( )  
                render_size( )  
            }  
            if ( force_integer_mv ) {  
                allow_high_precision_mv = 0  
            } else {  
                allow_high_precision_mv f(1)
            }  
            read_interpolation_filter( )  
            is_motion_mode_switchable f(1)
            if ( error_resilient_mode ||  
                 !enable_ref_frame_mvs ) {  
                use_ref_frame_mvs = 0  
            } else {  
                use_ref_frame_mvs f(1)
            }  
        }  
    }  
    if ( !FrameIsIntra ) {  
        for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
            refFrame = LAST_FRAME + i  
            hint = RefOrderHint[ ref_frame_idx[ i ] ]  
            OrderHints[ refFrame ] = hint  
            if ( !enable_order_hint ) {  
                RefFrameSignBias[ refFrame ] = 0  
            } else {  
                RefFrameSignBias[ refFrame ] = get_relative_dist( hint, OrderHint) > 0  
            }  
        }  
    }  
    if ( reduced_still_picture_header || disable_cdf_update )  
        disable_frame_end_update_cdf = 1  
    else  
        disable_frame_end_update_cdf f(1)
    if ( primary_ref_frame == PRIMARY_REF_NONE ) {  
        init_non_coeff_cdfs( )  
        setup_past_independence( )  
    } else {  
        load_cdfs( ref_frame_idx[ primary_ref_frame ] )  
        load_previous( )  
    }  
    if ( use_ref_frame_mvs == 1 )  
        motion_field_estimation( )  
    tile_info( )  
    quantization_params( )  
    segmentation_params( )  
    delta_q_params( )  
    delta_lf_params( )  
    if ( primary_ref_frame == PRIMARY_REF_NONE ) {  
        init_coeff_cdfs( )  
    } else {  
        load_previous_segment_ids( )  
    }  
    CodedLossless = 1  
    for ( segmentId = 0; segmentId < MAX_SEGMENTS; segmentId++ ) {  
        qindex = get_qindex( 1, segmentId )  
        LosslessArray[ segmentId ] = qindex == 0 && DeltaQYDc == 0 &&  
                                     DeltaQUAc == 0 && DeltaQUDc == 0 &&  
                                     DeltaQVAc == 0 && DeltaQVDc == 0  
        if ( !LosslessArray[ segmentId ] )  
            CodedLossless = 0  
        if ( using_qmatrix ) {  
            if ( LosslessArray[ segmentId ] ) {  
                SegQMLevel[ 0 ][ segmentId ] = 15  
                SegQMLevel[ 1 ][ segmentId ] = 15  
                SegQMLevel[ 2 ][ segmentId ] = 15  
            } else {  
                SegQMLevel[ 0 ][ segmentId ] = qm_y  
                SegQMLevel[ 1 ][ segmentId ] = qm_u  
                SegQMLevel[ 2 ][ segmentId ] = qm_v  
            }  
        }  
    }  
    AllLossless = CodedLossless && ( FrameWidth == UpscaledWidth )  
    loop_filter_params( )  
    cdef_params( )  
    lr_params( )  
    read_tx_mode( )  
    frame_reference_mode( )  
    skip_mode_params( )  
    if ( FrameIsIntra ||  
         error_resilient_mode ||  
         !enable_warped_motion )  
        allow_warped_motion = 0  
    else  
        allow_warped_motion f(1)
    reduced_tx_set f(1)
    global_motion_params( )  
    film_grain_params( )  
}  

Get relative distance function

This function computes the distance between two order hints by sign extending the result of subtracting the values.

get_relative_dist( a, b ) { Type
    if ( !enable_order_hint )  
        return 0  
    diff = a - b  
    m = 1 << (OrderHintBits - 1)  
    diff = (diff & (m - 1)) - (diff & m)  
    return diff  
}  

Reference frame marking function

mark_ref_frames( idLen ) { Type
    diffLen = delta_frame_id_length_minus_2 + 2  
    for ( i = 0; i < NUM_REF_FRAMES; i++ ) {  
        if ( current_frame_id > ( 1 << diffLen ) ) {  
            if ( RefFrameId[ i ] > current_frame_id ||  
                 RefFrameId[ i ] < ( current_frame_id - ( 1 << diffLen ) ) )  
                RefValid[ i ] = 0  
        } else {  
            if ( RefFrameId[ i ] > current_frame_id &&  
                 RefFrameId[ i ] < ( ( 1 << idLen ) +  
                                     current_frame_id -  
                                     ( 1 << diffLen ) ) )  
                RefValid[ i ] = 0  
        }  
    }  
}  

Frame size syntax

frame_size( ) { Type
    if ( frame_size_override_flag ) {  
        n = frame_width_bits_minus_1 + 1  
        frame_width_minus_1 f(n)
        n = frame_height_bits_minus_1 + 1  
        frame_height_minus_1 f(n)
        FrameWidth = frame_width_minus_1 + 1  
        FrameHeight = frame_height_minus_1 + 1  
    } else {  
        FrameWidth = max_frame_width_minus_1 + 1  
        FrameHeight = max_frame_height_minus_1 + 1  
    }  
    superres_params( )  
    compute_image_size( )  
}  

Render size syntax

render_size( ) { Type
    render_and_frame_size_different f(1)
    if ( render_and_frame_size_different == 1 ) {  
        render_width_minus_1 f(16)
        render_height_minus_1 f(16)
        RenderWidth = render_width_minus_1 + 1  
        RenderHeight = render_height_minus_1 + 1  
    } else {  
        RenderWidth = UpscaledWidth  
        RenderHeight = FrameHeight  
    }  
}  

Frame size with refs syntax

frame_size_with_refs( ) { Type
    for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
        found_ref f(1)
        if ( found_ref == 1 ) {  
            UpscaledWidth = RefUpscaledWidth[ ref_frame_idx[ i ] ]  
            FrameWidth = UpscaledWidth  
            FrameHeight = RefFrameHeight[ ref_frame_idx[ i ] ]  
            RenderWidth = RefRenderWidth[ ref_frame_idx[ i ] ]  
            RenderHeight = RefRenderHeight[ ref_frame_idx[ i ] ]  
            break  
        }  
    }  
    if ( found_ref == 0 ) {  
        frame_size( )  
        render_size( )  
    } else {  
        superres_params( )  
        compute_image_size( )  
    }  
}  

Superres params syntax

superres_params() { Type
  if ( enable_superres )  
    use_superres f(1)
  else  
    use_superres = 0  
  if ( use_superres ) {  
    coded_denom f(SUPERRES_DENOM_BITS)
    SuperresDenom = coded_denom + SUPERRES_DENOM_MIN  
  } else {  
    SuperresDenom = SUPERRES_NUM  
  }  
  UpscaledWidth = FrameWidth  
  FrameWidth = (UpscaledWidth * SUPERRES_NUM +  
                (SuperresDenom / 2)) / SuperresDenom  
}  

Compute image size function

compute_image_size( ) { Type
    MiCols = 2 * ( ( FrameWidth + 7 ) >> 3 )  
    MiRows = 2 * ( ( FrameHeight + 7 ) >> 3 )  
}  

Interpolation filter syntax

read_interpolation_filter( ) { Type
    is_filter_switchable f(1)
    if ( is_filter_switchable == 1 ) {  
        interpolation_filter = SWITCHABLE  
    } else {  
        interpolation_filter f(2)
    }  
}  

Loop filter params syntax

loop_filter_params( ) { Type
    if ( CodedLossless || allow_intrabc ) {  
        loop_filter_level[ 0 ] = 0  
        loop_filter_level[ 1 ] = 0  
        loop_filter_ref_deltas[ INTRA_FRAME ] = 1  
        loop_filter_ref_deltas[ LAST_FRAME ] = 0  
        loop_filter_ref_deltas[ LAST2_FRAME ] = 0  
        loop_filter_ref_deltas[ LAST3_FRAME ] = 0  
        loop_filter_ref_deltas[ BWDREF_FRAME ] = 0  
        loop_filter_ref_deltas[ GOLDEN_FRAME ] = -1  
        loop_filter_ref_deltas[ ALTREF_FRAME ] = -1  
        loop_filter_ref_deltas[ ALTREF2_FRAME ] = -1  
        for ( i = 0; i < 2; i++ ) {  
            loop_filter_mode_deltas[ i ] = 0  
        }  
        return  
    }  
    loop_filter_level[ 0 ] f(6)
    loop_filter_level[ 1 ] f(6)
    if ( NumPlanes > 1 ) {  
        if ( loop_filter_level[ 0 ] || loop_filter_level[ 1 ] ) {  
            loop_filter_level[ 2 ] f(6)
            loop_filter_level[ 3 ] f(6)
        }  
    }  
    loop_filter_sharpness f(3)
    loop_filter_delta_enabled f(1)
    if ( loop_filter_delta_enabled == 1 ) {  
        loop_filter_delta_update f(1)
        if ( loop_filter_delta_update == 1 ) {  
            for ( i = 0; i < TOTAL_REFS_PER_FRAME; i++ ) {  
                update_ref_delta f(1)
                if ( update_ref_delta == 1 )  
                    loop_filter_ref_deltas[ i ] su(1+6)
            }  
            for ( i = 0; i < 2; i++ ) {  
                update_mode_delta f(1)
                if ( update_mode_delta == 1 )  
                    loop_filter_mode_deltas[ i ] su(1+6)
            }  
        }  
    }  
}  

Quantization params syntax

quantization_params( ) { Type
    base_q_idx f(8)
    DeltaQYDc = read_delta_q( )  
    if ( NumPlanes > 1 ) {  
        if ( separate_uv_delta_q )  
          diff_uv_delta f(1)
        else  
          diff_uv_delta = 0  
        DeltaQUDc = read_delta_q( )  
        DeltaQUAc = read_delta_q( )  
        if ( diff_uv_delta ) {  
          DeltaQVDc = read_delta_q( )  
          DeltaQVAc = read_delta_q( )  
        } else {  
          DeltaQVDc = DeltaQUDc  
          DeltaQVAc = DeltaQUAc  
        }  
    } else {  
        DeltaQUDc = 0  
        DeltaQUAc = 0  
        DeltaQVDc = 0  
        DeltaQVAc = 0  
    }  
    using_qmatrix f(1)
    if ( using_qmatrix ) {  
        qm_y f(4)
        qm_u f(4)
        if ( !separate_uv_delta_q )  
            qm_v = qm_u  
        else  
            qm_v f(4)
    }  
}  

Delta quantizer syntax

read_delta_q( ) { Type
    delta_coded f(1)
    if ( delta_coded ) {  
        delta_q su(1+6)
    } else {  
        delta_q = 0  
    }  
    return delta_q  
}  

Segmentation params syntax

segmentation_params( ) { Type
    segmentation_enabled f(1)
    if ( segmentation_enabled == 1 ) {  
        if ( primary_ref_frame == PRIMARY_REF_NONE ) {  
            segmentation_update_map = 1  
            segmentation_temporal_update = 0  
            segmentation_update_data = 1  
        } else {  
            segmentation_update_map f(1)
            if ( segmentation_update_map == 1 )  
                segmentation_temporal_update f(1)
            segmentation_update_data f(1)
        }  
        if ( segmentation_update_data == 1 ) {  
            for ( i = 0; i < MAX_SEGMENTS; i++ ) {  
                for ( j = 0; j < SEG_LVL_MAX; j++ ) {  
                    feature_value = 0  
                    feature_enabled f(1)
                    FeatureEnabled[ i ][ j ] = feature_enabled  
                    clippedValue = 0  
                    if ( feature_enabled == 1 ) {  
                        bitsToRead = Segmentation_Feature_Bits[ j ]  
                        limit = Segmentation_Feature_Max[ j ]  
                        if ( Segmentation_Feature_Signed[ j ] == 1 ) {  
                            feature_value su(1+bitsToRead)
                            clippedValue = Clip3( -limit, limit, feature_value)  
                        } else {  
                            feature_value f(bitsToRead)
                            clippedValue = Clip3( 0, limit, feature_value)  
                        }  
                    }  
                    FeatureData[ i ][ j ] = clippedValue  
                }  
            }  
        }  
    } else {  
        for ( i = 0; i < MAX_SEGMENTS; i++ ) {  
            for ( j = 0; j < SEG_LVL_MAX; j++ ) {  
                FeatureEnabled[ i ][ j ] = 0  
                FeatureData[ i ][ j ] = 0  
            }  
        }  
    }  
    SegIdPreSkip = 0  
    LastActiveSegId = 0  
    for ( i = 0; i < MAX_SEGMENTS; i++ ) {  
        for ( j = 0; j < SEG_LVL_MAX; j++ ) {  
            if ( FeatureEnabled[ i ][ j ] ) {  
                LastActiveSegId = i  
                if ( j >= SEG_LVL_REF_FRAME ) {  
                    SegIdPreSkip = 1  
                }  
            }  
        }  
    }  
}  

The constant lookup tables used in this syntax are defined as:

Segmentation_Feature_Bits[ SEG_LVL_MAX ]   = { 8, 6, 6, 6, 6, 3, 0, 0 }
Segmentation_Feature_Signed[ SEG_LVL_MAX ] = { 1, 1, 1, 1, 1, 0, 0, 0 }
Segmentation_Feature_Max[ SEG_LVL_MAX ] = {
  255, MAX_LOOP_FILTER, MAX_LOOP_FILTER,
  MAX_LOOP_FILTER, MAX_LOOP_FILTER, 7,
  0, 0 }

Tile info syntax

tile_info ( ) { Type
    sbCols = use_128x128_superblock ? ( ( MiCols + 31 ) >> 5 ) : ( ( MiCols + 15 ) >> 4 )  
    sbRows = use_128x128_superblock ? ( ( MiRows + 31 ) >> 5 ) : ( ( MiRows + 15 ) >> 4 )  
    sbShift = use_128x128_superblock ? 5 : 4  
    sbSize = sbShift + 2  
    maxTileWidthSb = MAX_TILE_WIDTH >> sbSize  
    maxTileAreaSb = MAX_TILE_AREA >> ( 2 * sbSize )  
    minLog2TileCols = tile_log2(maxTileWidthSb, sbCols)  
    maxLog2TileCols = tile_log2(1, Min(sbCols, MAX_TILE_COLS))  
    maxLog2TileRows = tile_log2(1, Min(sbRows, MAX_TILE_ROWS))  
    minLog2Tiles = Max(minLog2TileCols,  
                       tile_log2(maxTileAreaSb, sbRows * sbCols))  
   
    uniform_tile_spacing_flag f(1)
    if ( uniform_tile_spacing_flag ) {  
        TileColsLog2 = minLog2TileCols  
        while ( TileColsLog2 < maxLog2TileCols ) {  
            increment_tile_cols_log2 f(1)
            if ( increment_tile_cols_log2 == 1 )  
                TileColsLog2++  
            else  
                break  
        }  
        tileWidthSb = (sbCols + (1 << TileColsLog2) - 1) >> TileColsLog2  
        i = 0  
        for ( startSb = 0; startSb < sbCols; startSb += tileWidthSb ) {  
          MiColStarts[ i ] = startSb << sbShift  
          i += 1  
        }  
        MiColStarts[i] = MiCols  
        TileCols = i  
   
        minLog2TileRows = Max( minLog2Tiles - TileColsLog2, 0)  
        TileRowsLog2 = minLog2TileRows  
        while ( TileRowsLog2 < maxLog2TileRows ) {  
            increment_tile_rows_log2 f(1)
            if ( increment_tile_rows_log2 == 1 )  
                TileRowsLog2++  
            else  
                break  
        }  
        tileHeightSb = (sbRows + (1 << TileRowsLog2) - 1) >> TileRowsLog2  
        i = 0  
        for ( startSb = 0; startSb < sbRows; startSb += tileHeightSb ) {  
          MiRowStarts[ i ] = startSb << sbShift  
          i += 1  
        }  
        MiRowStarts[i] = MiRows  
        TileRows = i  
    } else {  
        widestTileSb = 0  
        startSb = 0  
        for ( i = 0; startSb < sbCols; i++ ) {  
            MiColStarts[ i ] = startSb << sbShift  
            maxWidth = Min(sbCols - startSb, maxTileWidthSb)  
            width_in_sbs_minus_1 ns(maxWidth)
            sizeSb = width_in_sbs_minus_1 + 1  
            widestTileSb = Max( sizeSb, widestTileSb )  
            startSb += sizeSb  
        }  
        MiColStarts[i] = MiCols  
        TileCols = i  
        TileColsLog2 = tile_log2(1, TileCols)  
   
        if ( minLog2Tiles > 0 )  
            maxTileAreaSb = (sbRows * sbCols) >> (minLog2Tiles + 1)  
        else  
            maxTileAreaSb = sbRows * sbCols  
        maxTileHeightSb = Max( maxTileAreaSb / widestTileSb, 1 )  
   
        startSb = 0  
        for ( i = 0; startSb < sbRows; i++ ) {  
            MiRowStarts[ i ] = startSb << sbShift  
            maxHeight = Min(sbRows - startSb, maxTileHeightSb)  
            height_in_sbs_minus_1 ns(maxHeight)
            sizeSb = height_in_sbs_minus_1 + 1  
            startSb += sizeSb  
        }  
        MiRowStarts[ i ] = MiRows  
        TileRows = i  
        TileRowsLog2 = tile_log2(1, TileRows)  
    }  
    if ( TileColsLog2 > 0 || TileRowsLog2 > 0 ) {  
        context_update_tile_id f(TileRowsLog2 + TileColsLog2)
        tile_size_bytes_minus_1 f(2)
        TileSizeBytes = tile_size_bytes_minus_1 + 1  
    } else {  
        context_update_tile_id = 0  
    }  
}  

Tile size calculation function

tile_log2 returns the smallest value for k such that blkSize << k is greater than or equal to target.

tile_log2( blkSize, target ) { Type
  for ( k = 0; (blkSize << k) < target; k++ ) {  
  }  
  return k  
}  

Quantizer index delta parameters syntax

delta_q_params( ) { Type
    delta_q_res = 0  
    delta_q_present = 0  
    if ( base_q_idx > 0 ) {  
        delta_q_present f(1)
    }  
    if ( delta_q_present ) {  
        delta_q_res f(2)
    }  
}  

Loop filter delta parameters syntax

delta_lf_params( ) { Type
    delta_lf_present = 0  
    delta_lf_res = 0  
    delta_lf_multi = 0  
    if ( delta_q_present ) {  
        if ( !allow_intrabc )  
            delta_lf_present f(1)
        if ( delta_lf_present ) {  
            delta_lf_res f(2)
            delta_lf_multi f(1)
        }  
    }  
}  

CDEF params syntax

cdef_params( ) { Type
    if ( CodedLossless || allow_intrabc ||  
         !enable_cdef) {  
        cdef_bits = 0  
        cdef_y_pri_strength[0] = 0  
        cdef_y_sec_strength[0] = 0  
        cdef_uv_pri_strength[0] = 0  
        cdef_uv_sec_strength[0] = 0  
        CdefDamping = 3  
        return  
    }  
    cdef_damping_minus_3 f(2)
    CdefDamping = cdef_damping_minus_3 + 3  
    cdef_bits f(2)
    for ( i = 0; i < (1 << cdef_bits); i++ ) {  
        cdef_y_pri_strength[i] f(4)
        cdef_y_sec_strength[i] f(2)
        if ( cdef_y_sec_strength[i] == 3 )  
            cdef_y_sec_strength[i] += 1  
        if ( NumPlanes > 1 ) {  
            cdef_uv_pri_strength[i] f(4)
            cdef_uv_sec_strength[i] f(2)
            if ( cdef_uv_sec_strength[i] == 3 )  
                cdef_uv_sec_strength[i] += 1  
        }  
    }  
}  

Loop restoration params syntax

lr_params( ) { Type
    if ( AllLossless || allow_intrabc ||  
         !enable_restoration ) {  
        FrameRestorationType[0] = RESTORE_NONE  
        FrameRestorationType[1] = RESTORE_NONE  
        FrameRestorationType[2] = RESTORE_NONE  
        UsesLr = 0  
        return  
    }  
    UsesLr = 0  
    usesChromaLr = 0  
    for ( i = 0; i < NumPlanes; i++ ) {  
        lr_type f(2)
        FrameRestorationType[i] = Remap_Lr_Type[lr_type]  
        if ( FrameRestorationType[i] != RESTORE_NONE ) {  
            UsesLr = 1  
            if ( i > 0 ) {  
                usesChromaLr = 1  
            }  
        }  
    }  
    if ( UsesLr ) {  
        if ( use_128x128_superblock ) {  
            lr_unit_shift f(1)
            lr_unit_shift++  
        } else {  
            lr_unit_shift f(1)
            if ( lr_unit_shift ) {  
                lr_unit_extra_shift f(1)
                lr_unit_shift += lr_unit_extra_shift  
            }  
        }  
        LoopRestorationSize[ 0 ] = RESTORATION_TILESIZE_MAX >> (2 - lr_unit_shift)  
        if ( subsampling_x && subsampling_y && usesChromaLr ) {  
            lr_uv_shift f(1)
        } else {  
            lr_uv_shift = 0  
        }  
        LoopRestorationSize[ 1 ] = LoopRestorationSize[ 0 ] >> lr_uv_shift  
        LoopRestorationSize[ 2 ] = LoopRestorationSize[ 0 ] >> lr_uv_shift  
    }  
}  

where Remap_Lr_Type is a constant lookup table specified as:

Remap_Lr_Type[4] = {
  RESTORE_NONE, RESTORE_SWITCHABLE, RESTORE_WIENER, RESTORE_SGRPROJ
}

TX mode syntax

read_tx_mode( ) { Type
    if ( CodedLossless == 1 ) {  
        TxMode = ONLY_4X4  
    } else {  
        tx_mode_select f(1)
        if ( tx_mode_select ) {  
            TxMode = TX_MODE_SELECT  
        } else {  
            TxMode = TX_MODE_LARGEST  
        }  
    }  
}  

Skip mode params syntax

skip_mode_params( ) { Type
    if ( FrameIsIntra || !reference_select || !enable_order_hint ) {  
        skipModeAllowed = 0  
    } else {  
        forwardIdx = -1  
        backwardIdx = -1  
        for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
            refHint = RefOrderHint[ ref_frame_idx[ i ] ]  
            if ( get_relative_dist( refHint, OrderHint ) < 0 ) {  
                if ( forwardIdx < 0 ||  
                     get_relative_dist( refHint, forwardHint) > 0 ) {  
                    forwardIdx = i  
                    forwardHint = refHint  
                }  
            } else if ( get_relative_dist( refHint, OrderHint) > 0 ) {  
                if ( backwardIdx < 0 ||  
                     get_relative_dist( refHint, backwardHint) < 0 ) {  
                    backwardIdx = i  
                    backwardHint = refHint  
                }  
            }  
        }  
        if ( forwardIdx < 0 ) {  
            skipModeAllowed = 0  
        } else if ( backwardIdx >= 0 ) {  
            skipModeAllowed = 1  
            SkipModeFrame[ 0 ] = LAST_FRAME + Min(forwardIdx, backwardIdx)  
            SkipModeFrame[ 1 ] = LAST_FRAME + Max(forwardIdx, backwardIdx)  
        } else {  
            secondForwardIdx = -1  
            for ( i = 0; i < REFS_PER_FRAME; i++ ) {  
                refHint = RefOrderHint[ ref_frame_idx[ i ] ]  
                if ( get_relative_dist( refHint, forwardHint ) < 0 ) {  
                    if ( secondForwardIdx < 0 ||  
                         get_relative_dist( refHint, secondForwardHint ) > 0 ) {  
                        secondForwardIdx = i  
                        secondForwardHint = refHint  
                    }  
                }  
            }  
            if ( secondForwardIdx < 0 ) {  
                skipModeAllowed = 0  
            } else {  
                skipModeAllowed = 1  
                SkipModeFrame[ 0 ] = LAST_FRAME + Min(forwardIdx, secondForwardIdx)  
                SkipModeFrame[ 1 ] = LAST_FRAME + Max(forwardIdx, secondForwardIdx)  
            }  
        }  
    }  
    if ( skipModeAllowed ) {  
        skip_mode_present f(1)
    } else {  
        skip_mode_present = 0  
    }  
}  

Frame reference mode syntax

frame_reference_mode( ) { Type
    if ( FrameIsIntra ) {  
        reference_select = 0  
    } else {  
        reference_select f(1)
    }  
}  

Global motion params syntax

global_motion_params( ) { Type
    for ( ref = LAST_FRAME; ref <= ALTREF_FRAME; ref++ ) {  
        GmType[ ref ] = IDENTITY  
        for ( i = 0; i < 6; i++ ) {  
            gm_params[ ref ][ i ] = ( ( i % 3 == 2 ) ?  
                      1 << WARPEDMODEL_PREC_BITS : 0 )  
        }  
    }  
    if ( FrameIsIntra )  
        return  
    for ( ref = LAST_FRAME; ref <= ALTREF_FRAME; ref++ ) {  
        is_global f(1)
        if ( is_global ) {  
            is_rot_zoom f(1)
            if ( is_rot_zoom ) {  
                type = ROTZOOM  
            } else {  
                is_translation f(1)
                type = is_translation ? TRANSLATION : AFFINE  
            }  
        } else {  
            type = IDENTITY  
        }  
        GmType[ref] = type  
   
        if ( type >= ROTZOOM ) {  
            read_global_param(type,ref,2)  
            read_global_param(type,ref,3)  
            if ( type == AFFINE ) {  
                read_global_param(type,ref,4)  
                read_global_param(type,ref,5)  
            } else {  
                gm_params[ref][4] = -gm_params[ref][3]  
                gm_params[ref][5] = gm_params[ref][2]  
            }  
        }  
        if ( type >= TRANSLATION ) {  
            read_global_param(type,ref,0)  
            read_global_param(type,ref,1)  
        }  
    }  
}  

Global param syntax

read_global_param( type, ref, idx ) { Type
    absBits = GM_ABS_ALPHA_BITS  
    precBits = GM_ALPHA_PREC_BITS  
    if ( idx < 2 ) {  
        if ( type == TRANSLATION ) {  
            absBits = GM_ABS_TRANS_ONLY_BITS - !allow_high_precision_mv  
            precBits = GM_TRANS_ONLY_PREC_BITS - !allow_high_precision_mv  
        } else {  
            absBits = GM_ABS_TRANS_BITS  
            precBits = GM_TRANS_PREC_BITS  
        }  
    }  
    precDiff = WARPEDMODEL_PREC_BITS - precBits  
    round = (idx % 3) == 2 ? (1 << WARPEDMODEL_PREC_BITS) : 0  
    sub = (idx % 3) == 2 ? (1 << precBits) : 0  
    mx = (1 << absBits)  
    r = (PrevGmParams[ref][idx] >> precDiff) - sub  
    gm_params[ref][idx] =  
        (decode_signed_subexp_with_ref( -mx, mx + 1, r )<< precDiff) + round  
}  

Note: When force_integer_mv is equal to 1, some fractional bits are still read for the translation components. However, these fractional bits will be discarded during the Setup Global MV process.

Decode signed subexp with ref syntax

decode_signed_subexp_with_ref( low, high, r ) { Type
    x = decode_unsigned_subexp_with_ref(high - low, r - low)  
    return x + low  
}  

Note: decode_signed_subexp_with_ref will return a value in the range low to high - 1 (inclusive).

Decode unsigned subexp with ref syntax

decode_unsigned_subexp_with_ref( mx, r ) { Type
    v = decode_subexp( mx )  
    if ( (r << 1) <= mx ) {  
        return inverse_recenter(r, v)  
    } else {  
        return mx - 1 - inverse_recenter(mx - 1 - r, v)  
    }  
}  

Note: decode_unsigned_subexp_with_ref will return a value in the range 0 to mx - 1 (inclusive).

Decode subexp syntax

decode_subexp( numSyms ) { Type
    i = 0  
    mk = 0  
    k = 3  
    while ( 1 ) {  
        b2 = i ? k + i - 1 : k  
        a = 1 << b2  
        if ( numSyms <= mk + 3 * a ) {  
            subexp_final_bits ns(numSyms - mk)
            return subexp_final_bits + mk  
        } else {  
            subexp_more_bits f(1)
            if ( subexp_more_bits ) {  
               i++  
               mk += a  
            } else {  
               subexp_bits f(b2)
               return subexp_bits + mk  
            }  
        }  
    }  
}  

Inverse recenter function

inverse_recenter( r, v ) { Type
  if ( v > 2 * r )  
    return v  
  else if ( v & 1 )  
    return r - ((v + 1) >> 1)  
  else  
    return r + (v >> 1)  
}  

Film grain params syntax

film_grain_params( ) { Type
    if ( !film_grain_params_present ||  
         (!show_frame && !showable_frame) ) {  
        reset_grain_params()  
        return  
    }  
    apply_grain f(1)
    if ( !apply_grain ) {  
        reset_grain_params()  
        return  
    }  
    grain_seed f(16)
    if ( frame_type == INTER_FRAME )  
        update_grain f(1)
    else  
        update_grain = 1  
    if ( !update_grain ) {  
        film_grain_params_ref_idx f(3)
        tempGrainSeed = grain_seed  
        load_grain_params( film_grain_params_ref_idx )  
        grain_seed = tempGrainSeed  
        return  
    }  
    num_y_points f(4)
    for ( i = 0; i < num_y_points; i++ ) {  
        point_y_value[ i ] f(8)
        point_y_scaling[ i ] f(8)
    }  
    if ( mono_chrome ) {  
        chroma_scaling_from_luma = 0  
    } else {  
        chroma_scaling_from_luma f(1)
    }  
    if ( mono_chrome || chroma_scaling_from_luma ||  
         ( subsampling_x == 1 && subsampling_y == 1 &&  
           num_y_points == 0 )  
       ) {  
        num_cb_points = 0  
        num_cr_points = 0  
    } else {  
        num_cb_points f(4)
        for ( i = 0; i < num_cb_points; i++ ) {  
            point_cb_value[ i ] f(8)
            point_cb_scaling[ i ] f(8)
        }  
        num_cr_points f(4)
        for ( i = 0; i < num_cr_points; i++ ) {  
            point_cr_value[ i ] f(8)
            point_cr_scaling[ i ] f(8)
        }  
    }  
    grain_scaling_minus_8 f(2)
    ar_coeff_lag f(2)
    numPosLuma = 2 * ar_coeff_lag * ( ar_coeff_lag + 1 )  
    if ( num_y_points ) {  
        numPosChroma = numPosLuma + 1  
        for ( i = 0; i < numPosLuma; i++ )  
            ar_coeffs_y_plus_128[ i ] f(8)
    } else {  
        numPosChroma = numPosLuma  
    }  
    if ( chroma_scaling_from_luma || num_cb_points ) {  
        for ( i = 0; i < numPosChroma; i++ )  
            ar_coeffs_cb_plus_128[ i ] f(8)
    }  
    if ( chroma_scaling_from_luma || num_cr_points ) {  
        for ( i = 0; i < numPosChroma; i++ )  
            ar_coeffs_cr_plus_128[ i ] f(8)
    }  
    ar_coeff_shift_minus_6 f(2)
    grain_scale_shift f(2)
    if ( num_cb_points ) {  
        cb_mult f(8)
        cb_luma_mult f(8)
        cb_offset f(9)
    }  
    if ( num_cr_points ) {  
        cr_mult f(8)
        cr_luma_mult f(8)
        cr_offset f(9)
    }  
    overlap_flag f(1)
    clip_to_restricted_range f(1)
}  

Temporal point info syntax

temporal_point_info( ) { Type
    n = frame_presentation_time_length_minus_1 + 1  
    frame_presentation_time f(n)
}  

Frame OBU syntax

frame_obu( sz ) { Type
    startBitPos = get_position( )  
    frame_header_obu( )  
    byte_alignment( )  
    endBitPos = get_position( )  
    headerBytes = (endBitPos - startBitPos) / 8  
    sz -= headerBytes  
    tile_group_obu( sz )  
}  

Tile group OBU syntax

General tile group OBU syntax

tile_group_obu( sz ) { Type
    NumTiles = TileCols * TileRows  
    startBitPos = get_position( )  
    tile_start_and_end_present_flag = 0  
    if ( NumTiles > 1 )  
        tile_start_and_end_present_flag f(1)
    if ( NumTiles == 1 || !tile_start_and_end_present_flag ) {  
        tg_start = 0  
        tg_end = NumTiles - 1  
    } else {  
        tileBits = TileColsLog2 + TileRowsLog2  
        tg_start f(tileBits)
        tg_end f(tileBits)
    }  
    byte_alignment( )  
    endBitPos = get_position( )  
    headerBytes = (endBitPos - startBitPos) / 8  
    sz -= headerBytes  
   
    for ( TileNum = tg_start; TileNum <= tg_end; TileNum++ ) {  
        tileRow = TileNum / TileCols  
        tileCol = TileNum % TileCols  
        lastTile = TileNum == tg_end  
        if ( lastTile ) {  
            tileSize = sz  
        } else {  
            tile_size_minus_1 le(TileSizeBytes)
            tileSize = tile_size_minus_1 + 1  
            sz -= tileSize + TileSizeBytes  
        }  
        MiRowStart = MiRowStarts[ tileRow ]  
        MiRowEnd = MiRowStarts[ tileRow + 1 ]  
        MiColStart = MiColStarts[ tileCol ]  
        MiColEnd = MiColStarts[ tileCol + 1 ]  
        CurrentQIndex = base_q_idx  
        init_symbol( tileSize )  
        decode_tile( )  
        exit_symbol( )  
    }  
    if ( tg_end == NumTiles - 1 ) {  
        if ( !disable_frame_end_update_cdf ) {  
            frame_end_update_cdf( )  
        }  
        decode_frame_wrapup( )  
        SeenFrameHeader = 0  
    }  
}  

Decode tile syntax

decode_tile( ) { Type
    clear_above_context( )  
    for ( i = 0; i < FRAME_LF_COUNT; i++ )  
        DeltaLF[ i ] = 0  
    for ( plane = 0; plane < NumPlanes; plane++ ) {  
        for ( pass = 0; pass < 2; pass++ ) {  
            RefSgrXqd[ plane ][ pass ] = Sgrproj_Xqd_Mid[ pass ]  
            for ( i = 0; i < WIENER_COEFFS; i++ ) {  
                RefLrWiener[ plane ][ pass ][ i ] = Wiener_Taps_Mid[ i ]  
            }  
        }  
    }  
    sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64  
    sbSize4 = Num_4x4_Blocks_Wide[ sbSize ]  
    for ( r = MiRowStart; r < MiRowEnd; r += sbSize4 ) {  
        clear_left_context( )  
        for ( c = MiColStart; c < MiColEnd; c += sbSize4 ) {  
            ReadDeltas = delta_q_present  
            clear_cdef( r, c )  
            clear_block_decoded_flags( r, c, sbSize4 )  
            read_lr( r, c, sbSize )  
            decode_partition( r, c, sbSize )  
        }  
    }  
}  

where Sgrproj_Xqd_Mid and Wiener_Taps_Mid are constant lookup tables specified as:

Wiener_Taps_Mid[3] = {  3,  -7,  15 }

Sgrproj_Xqd_Mid[2] = { -32,  31 }

Clear block decoded flags function

clear_block_decoded_flags( r, c, sbSize4 ) { Type
    for ( plane = 0; plane < NumPlanes; plane++ ) {  
        subX = (plane > 0) ? subsampling_x : 0  
        subY = (plane > 0) ? subsampling_y : 0  
        sbWidth4 = ( MiColEnd - c ) >> subX  
        sbHeight4 = ( MiRowEnd - r ) >> subY  
        for ( y = -1; y <= ( sbSize4 >> subY ); y++ )  
            for ( x = -1; x <= ( sbSize4 >> subX ); x++ ) {  
                if ( y < 0 && x < sbWidth4 )  
                    BlockDecoded[ plane ][ y ][ x ] = 1  
                else if ( x < 0 && y < sbHeight4 )  
                    BlockDecoded[ plane ][ y ][ x ] = 1  
                else  
                    BlockDecoded[ plane ][ y ][ x ] = 0  
            }  
        BlockDecoded[ plane ][ sbSize4 >> subY ][ -1 ] = 0  
    }  
}  

Decode partition syntax

decode_partition( r, c, bSize ) { Type
    if ( r >= MiRows || c >= MiCols )  
        return 0  
    AvailU = is_inside( r - 1, c )  
    AvailL = is_inside( r, c - 1 )  
    num4x4 = Num_4x4_Blocks_Wide[ bSize ]  
    halfBlock4x4 = num4x4 >> 1  
    quarterBlock4x4 = halfBlock4x4 >> 1  
    hasRows = ( r + halfBlock4x4 ) < MiRows  
    hasCols = ( c + halfBlock4x4 ) < MiCols  
    if ( bSize < BLOCK_8X8 ) {  
        partition = PARTITION_NONE  
    } else if ( hasRows && hasCols ) {  
        partition S()
    } else if ( hasCols ) {  
        split_or_horz S()
        partition = split_or_horz ? PARTITION_SPLIT : PARTITION_HORZ  
    } else if ( hasRows ) {  
        split_or_vert S()
        partition = split_or_vert ? PARTITION_SPLIT : PARTITION_VERT  
    } else {  
        partition = PARTITION_SPLIT  
    }  
    subSize = Partition_Subsize[ partition ][ bSize ]  
    splitSize = Partition_Subsize[ PARTITION_SPLIT ][ bSize ]  
    if ( partition == PARTITION_NONE ) {  
        decode_block( r, c, subSize )  
    } else if ( partition == PARTITION_HORZ ) {  
        decode_block( r, c, subSize )  
        if ( hasRows )  
            decode_block( r + halfBlock4x4, c, subSize )  
    } else if ( partition == PARTITION_VERT ) {  
        decode_block( r, c, subSize )  
        if ( hasCols )  
            decode_block( r, c + halfBlock4x4, subSize )  
    } else if ( partition == PARTITION_SPLIT ) {  
        decode_partition( r, c, subSize )  
        decode_partition( r, c + halfBlock4x4, subSize )  
        decode_partition( r + halfBlock4x4, c, subSize )  
        decode_partition( r + halfBlock4x4, c + halfBlock4x4, subSize )  
    } else if ( partition == PARTITION_HORZ_A ) {  
        decode_block( r, c, splitSize )  
        decode_block( r, c + halfBlock4x4, splitSize )  
        decode_block( r + halfBlock4x4, c, subSize )  
    } else if ( partition == PARTITION_HORZ_B ) {  
        decode_block( r, c, subSize )  
        decode_block( r + halfBlock4x4, c, splitSize )  
        decode_block( r + halfBlock4x4, c + halfBlock4x4, splitSize )  
    } else if ( partition == PARTITION_VERT_A ) {  
        decode_block( r, c, splitSize )  
        decode_block( r + halfBlock4x4, c, splitSize )  
        decode_block( r, c + halfBlock4x4, subSize )  
    } else if ( partition == PARTITION_VERT_B ) {  
        decode_block( r, c, subSize )  
        decode_block( r, c + halfBlock4x4, splitSize )  
        decode_block( r + halfBlock4x4, c + halfBlock4x4, splitSize )  
    } else if ( partition == PARTITION_HORZ_4 ) {  
        decode_block( r + quarterBlock4x4 * 0, c, subSize )  
        decode_block( r + quarterBlock4x4 * 1, c, subSize )  
        decode_block( r + quarterBlock4x4 * 2, c, subSize )  
        if ( r + quarterBlock4x4 * 3 < MiRows )  
            decode_block( r + quarterBlock4x4 * 3, c, subSize )  
    } else {  
        decode_block( r, c + quarterBlock4x4 * 0, subSize )  
        decode_block( r, c + quarterBlock4x4 * 1, subSize )  
        decode_block( r, c + quarterBlock4x4 * 2, subSize )  
        if ( c + quarterBlock4x4 * 3 < MiCols )  
            decode_block( r, c + quarterBlock4x4 * 3, subSize )  
    }  
}  

Decode block syntax

decode_block( r, c, subSize ) { Type
    MiRow = r  
    MiCol = c  
    MiSize = subSize  
    bw4 = Num_4x4_Blocks_Wide[ subSize ]  
    bh4 = Num_4x4_Blocks_High[ subSize ]  
    if ( bh4 == 1 && subsampling_y && (MiRow & 1) == 0 )  
        HasChroma = 0  
    else if ( bw4 == 1 && subsampling_x && (MiCol & 1) == 0 )  
        HasChroma = 0  
    else  
        HasChroma = NumPlanes > 1  
    AvailU = is_inside( r - 1, c )  
    AvailL = is_inside( r, c - 1 )  
    AvailUChroma = AvailU  
    AvailLChroma = AvailL  
    if ( HasChroma ) {  
        if ( subsampling_y && bh4 == 1 )  
            AvailUChroma = is_inside( r - 2, c )  
        if ( subsampling_x && bw4 == 1 )  
            AvailLChroma = is_inside( r, c - 2 )  
    } else {  
        AvailUChroma = 0  
        AvailLChroma = 0  
    }  
    mode_info( )  
    palette_tokens( )  
    read_block_tx_size( )  
   
    if ( skip )  
        reset_block_context( bw4, bh4 )  
    isCompound = RefFrame[ 1 ] > INTRA_FRAME  
    for ( y = 0; y < bh4; y++ ) {  
        for ( x = 0; x < bw4; x++ ) {  
            YModes [ r + y ][ c + x ] = YMode  
            if ( RefFrame[ 0 ] == INTRA_FRAME && HasChroma )  
                UVModes [ r + y ][ c + x ] = UVMode  
            for ( refList = 0; refList < 2; refList++ )  
                RefFrames[ r + y ][ c + x ][ refList ] = RefFrame[ refList ]  
            if ( is_inter ) {  
                if ( !use_intrabc ) {  
                  CompGroupIdxs[ r + y ][ c + x ] = comp_group_idx  
                  CompoundIdxs[ r + y ][ c + x ] = compound_idx  
                }  
                for ( dir = 0; dir < 2; dir++ ) {  
                    InterpFilters[ r + y ][ c + x ][ dir ] = interp_filter[ dir ]  
                }  
                for ( refList = 0; refList < 1 + isCompound; refList++ ) {  
                    Mvs[ r + y ][ c + x ][ refList ] = Mv[ refList ]  
                }  
            }  
        }  
    }  
    compute_prediction( )  
    residual( )  
    for ( y = 0; y < bh4; y++ ) {  
        for ( x = 0; x < bw4; x++ ) {  
            IsInters[ r + y ][ c + x ] = is_inter  
            SkipModes[ r + y ][ c + x ] = skip_mode  
            Skips[ r + y ][ c + x ] = skip  
            TxSizes[ r + y ][ c + x ] = TxSize  
            MiSizes[ r + y ][ c + x ] = MiSize  
            SegmentIds[ r + y ][ c + x ] = segment_id  
            PaletteSizes[ 0 ][ r + y ][ c + x ] = PaletteSizeY  
            PaletteSizes[ 1 ][ r + y ][ c + x ] = PaletteSizeUV  
            for ( i = 0; i < PaletteSizeY; i++ )  
                PaletteColors[ 0 ][ r + y ][ c + x ][ i ] = palette_colors_y[ i ]  
            for ( i = 0; i < PaletteSizeUV; i++ )  
                PaletteColors[ 1 ][ r + y ][ c + x ][ i ] = palette_colors_u[ i ]  
            for ( i = 0; i < FRAME_LF_COUNT; i++ )  
                DeltaLFs[ r + y ][ c + x ][ i ] = DeltaLF[ i ]  
        }  
    }  
}  

where reset_block_context( ) is specified as:

reset_block_context( bw4, bh4 ) {
    for ( plane = 0; plane < 1 + 2 * HasChroma; plane++ ) {
        subX = (plane > 0) ? subsampling_x : 0
        subY = (plane > 0) ? subsampling_y : 0
        for ( i = MiCol >> subX; i < ( ( MiCol + bw4 ) >> subX ); i++) {
            AboveLevelContext[ plane ][ i ] = 0
            AboveDcContext[ plane ][ i ] = 0
        }
        for ( i = MiRow >> subY; i < ( ( MiRow + bh4 ) >> subY ); i++) {
            LeftLevelContext[ plane ][ i ] = 0
            LeftDcContext[ plane ][ i ] = 0
        }
    }
}

Mode info syntax

mode_info( ) { Type
    if ( FrameIsIntra )  
        intra_frame_mode_info( )  
    else  
        inter_frame_mode_info( )  
}  

Intra frame mode info syntax

intra_frame_mode_info( ) { Type
    skip = 0  
    if ( SegIdPreSkip )  
        intra_segment_id( )  
    skip_mode = 0  
    read_skip( )  
    if ( !SegIdPreSkip )  
        intra_segment_id( )  
    read_cdef( )  
    read_delta_qindex( )  
    read_delta_lf( )  
    ReadDeltas = 0  
    RefFrame[ 0 ] = INTRA_FRAME  
    RefFrame[ 1 ] = NONE  
    if ( allow_intrabc ) {  
        use_intrabc S()
    } else {  
        use_intrabc = 0  
    }  
    if ( use_intrabc ) {  
        is_inter = 1  
        YMode = DC_PRED  
        UVMode = DC_PRED  
        motion_mode = SIMPLE  
        compound_type = COMPOUND_AVERAGE  
        PaletteSizeY = 0  
        PaletteSizeUV = 0  
        interp_filter[ 0 ] = BILINEAR  
        interp_filter[ 1 ] = BILINEAR  
        find_mv_stack( 0 )  
        assign_mv( 0 )  
    } else {  
        is_inter = 0  
        intra_frame_y_mode S()
        YMode = intra_frame_y_mode  
        intra_angle_info_y( )  
        if ( HasChroma ) {  
            uv_mode S()
            UVMode = uv_mode  
            if ( UVMode == UV_CFL_PRED ) {  
                read_cfl_alphas( )  
            }  
            intra_angle_info_uv( )  
        }  
        PaletteSizeY = 0  
        PaletteSizeUV = 0  
        if ( MiSize >= BLOCK_8X8 &&  
             Block_Width[ MiSize ] <= 64 &&  
             Block_Height[ MiSize ] <= 64 &&  
             allow_screen_content_tools ) {  
            palette_mode_info( )  
        }  
        filter_intra_mode_info( )  
    }  
}  

Intra segment ID syntax

intra_segment_id( ) { Type
    if ( segmentation_enabled )  
        read_segment_id( )  
    else  
        segment_id = 0  
    Lossless = LosslessArray[ segment_id ]  
}  

Read segment ID syntax

read_segment_id( ) { Type
    if ( AvailU && AvailL )  
        prevUL = SegmentIds[ MiRow - 1 ][ MiCol - 1 ]  
    else  
        prevUL = -1  
    if ( AvailU )  
        prevU = SegmentIds[ MiRow - 1 ][ MiCol ]  
    else  
        prevU = -1  
    if ( AvailL )  
        prevL = SegmentIds[ MiRow ][ MiCol - 1 ]  
    else  
        prevL = -1  
    if ( prevU == -1 )  
        pred = (prevL == -1) ? 0 : prevL  
    else if ( prevL == -1 )  
        pred = prevU  
    else  
        pred = (prevUL == prevU) ? prevU : prevL  
    if ( skip ) {  
        segment_id = pred  
    } else {  
        segment_id S()
        segment_id = neg_deinterleave( segment_id, pred,  
                                       LastActiveSegId + 1 )  
    }  
}  

where neg_deinterleave is a function defined as:

neg_deinterleave(diff, ref, max) {
  if ( !ref )
    return diff
  if ( ref >= (max - 1) )
    return max - diff - 1
  if ( 2 * ref < max ) {
    if ( diff <= 2 * ref ) {
      if ( diff & 1 )
        return ref + ((diff + 1) >> 1)
      else
        return ref - (diff >> 1)
    }
    return diff
  } else {
    if ( diff <= 2 * (max - ref - 1) ) {
      if ( diff & 1 )
        return ref + ((diff + 1) >> 1)
      else
        return ref - (diff >> 1)
    }
    return max - (diff + 1)
  }
}

Skip mode syntax

read_skip_mode() { Type
    if ( seg_feature_active( SEG_LVL_SKIP ) ||  
         seg_feature_active( SEG_LVL_REF_FRAME ) ||  
         seg_feature_active( SEG_LVL_GLOBALMV ) ||  
         !skip_mode_present ||  
         Block_Width[ MiSize ] < 8 ||  
         Block_Height[ MiSize ] < 8 ) {  
        skip_mode = 0  
    } else {  
        skip_mode S()
    }  
}  

Skip syntax

read_skip() { Type
    if ( SegIdPreSkip && seg_feature_active( SEG_LVL_SKIP ) ) {  
        skip = 1  
    } else {  
        skip S()
    }  
}  

Quantizer index delta syntax

read_delta_qindex( ) { Type
    sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64  
    if ( MiSize == sbSize && skip )  
        return  
    if ( ReadDeltas ) {  
        delta_q_abs S()
        if ( delta_q_abs == DELTA_Q_SMALL ) {  
            delta_q_rem_bits L(3)
            delta_q_rem_bits++  
            delta_q_abs_bits L(delta_q_rem_bits)
            delta_q_abs = delta_q_abs_bits + (1 << delta_q_rem_bits) + 1  
        }  
        if ( delta_q_abs ) {  
            delta_q_sign_bit L(1)
            reducedDeltaQIndex = delta_q_sign_bit ? -delta_q_abs : delta_q_abs  
            CurrentQIndex = Clip3(1, 255,  
                CurrentQIndex + (reducedDeltaQIndex << delta_q_res))  
        }  
    }  
}  

Loop filter delta syntax

read_delta_lf( ) { Type
    sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64  
    if ( MiSize == sbSize && skip )  
        return  
    if ( ReadDeltas && delta_lf_present ) {  
        frameLfCount = 1  
        if ( delta_lf_multi ) {  
            frameLfCount = ( NumPlanes > 1 ) ? FRAME_LF_COUNT : ( FRAME_LF_COUNT - 2 )  
        }  
        for ( i = 0; i < frameLfCount; i++ ) {  
            delta_lf_abs S()
            if ( delta_lf_abs == DELTA_LF_SMALL ) {  
                delta_lf_rem_bits L(3)
                n = delta_lf_rem_bits + 1  
                delta_lf_abs_bits L(n)
                deltaLfAbs = delta_lf_abs_bits +  
                               ( 1 << n ) + 1  
            } else {  
                deltaLfAbs = delta_lf_abs  
            }  
            if ( deltaLfAbs ) {  
                delta_lf_sign_bit L(1)
                reducedDeltaLfLevel = delta_lf_sign_bit ?  
                                      -deltaLfAbs :  
                                       deltaLfAbs  
                DeltaLF[ i ] = Clip3( -MAX_LOOP_FILTER, MAX_LOOP_FILTER, DeltaLF[ i ] +  
                                  (reducedDeltaLfLevel << delta_lf_res) )  
            }  
        }  
    }  
}  

Segmentation feature active function

seg_feature_active_idx( idx, feature ) { Type
    return segmentation_enabled && FeatureEnabled[ idx ][ feature ]  
}  
   
seg_feature_active( feature ) {  
    return seg_feature_active_idx( segment_id, feature )  
}  

TX size syntax

read_tx_size( allowSelect ) { Type
    if ( Lossless ) {  
        TxSize = TX_4X4  
        return  
    }  
    maxRectTxSize = Max_Tx_Size_Rect[ MiSize ]  
    maxTxDepth = Max_Tx_Depth[ MiSize ]  
    TxSize = maxRectTxSize  
    if ( MiSize > BLOCK_4X4 && allowSelect && TxMode == TX_MODE_SELECT ) {  
        tx_depth S()
        for ( i = 0; i < tx_depth; i++ )  
            TxSize = Split_Tx_Size[ TxSize ]  
        }  
    }  
}  

The Max_Tx_Depth table specifies the maximum transform depth for each block size:

Max_Tx_Depth[ BLOCK_SIZES ] = {
    0, 1, 1, 1,
    2, 2, 2, 3,
    3, 3, 4, 4,
    4, 4, 4, 4,
    2, 2, 3, 3,
    4, 4
}

Note: Max_Tx_Depth contains the number of times the transform must be split to reach a 4x4 transform size. This number can be greater than MAX_TX_DEPTH. However, it is impossible to encode a transform depth greater than MAX_TX_DEPTH because tx_depth can only encode values in the range 0 to 2

Block TX size syntax

read_block_tx_size( ) { Type
    bw4 = Num_4x4_Blocks_Wide[ MiSize ]  
    bh4 = Num_4x4_Blocks_High[ MiSize ]  
    if ( TxMode == TX_MODE_SELECT &&  
          MiSize > BLOCK_4X4 && is_inter &&  
          !skip && !Lossless ) {  
        maxTxSz = Max_Tx_Size_Rect[ MiSize ]  
        txW4 = Tx_Width[ maxTxSz ] / MI_SIZE  
        txH4 = Tx_Height[ maxTxSz ] / MI_SIZE  
        for ( row = MiRow; row < MiRow + bh4; row += txH4 )  
            for ( col = MiCol; col < MiCol + bw4; col += txW4 )  
                read_var_tx_size( row, col, maxTxSz, 0 )  
    } else {  
        read_tx_size(!skip || !is_inter)  
        for ( row = MiRow; row < MiRow + bh4; row++ )  
            for ( col = MiCol; col < MiCol + bw4; col++ )  
                InterTxSizes[ row ][ col ] = TxSize  
    }  
}  

Var TX size syntax

read_var_tx_size is used to read a transform size tree.

read_var_tx_size( row, col, txSz, depth) { Type
    if ( row >= MiRows || col >= MiCols )  
        return  
    if ( txSz == TX_4X4 || depth == MAX_VARTX_DEPTH ) {  
        txfm_split = 0  
    } else {  
        txfm_split S()
    }  
    w4 = Tx_Width[ txSz ] / MI_SIZE  
    h4 = Tx_Height[ txSz ] / MI_SIZE  
    if ( txfm_split ) {  
        subTxSz = Split_Tx_Size[ txSz ]  
        stepW = Tx_Width[ subTxSz ] / MI_SIZE  
        stepH = Tx_Height[ subTxSz ] / MI_SIZE  
        for ( i = 0; i < h4; i += stepH )  
            for ( j = 0; j < w4; j += stepW )  
                read_var_tx_size( row + i, col + j, subTxSz, depth+1)  
    } else {  
        for ( i = 0; i < h4; i++ )  
            for ( j = 0; j < w4; j++ )  
                InterTxSizes[ row + i ][ col + j ] = txSz  
        TxSize = txSz  
    }  
}  

Inter frame mode info syntax

inter_frame_mode_info( ) { Type
    use_intrabc = 0  
    LeftRefFrame[ 0 ] = AvailL ? RefFrames[ MiRow ][ MiCol-1 ][ 0 ] : INTRA_FRAME  
    AboveRefFrame[ 0 ] = AvailU ? RefFrames[ MiRow-1 ][ MiCol ][ 0 ] : INTRA_FRAME  
    LeftRefFrame[ 1 ] = AvailL ? RefFrames[ MiRow ][ MiCol-1 ][ 1 ] : NONE  
    AboveRefFrame[ 1 ] = AvailU ? RefFrames[ MiRow-1 ][ MiCol ][ 1 ] : NONE  
    LeftIntra = LeftRefFrame[ 0 ] <= INTRA_FRAME  
    AboveIntra = AboveRefFrame[ 0 ] <= INTRA_FRAME  
    LeftSingle = LeftRefFrame[ 1 ] <= INTRA_FRAME  
    AboveSingle = AboveRefFrame[ 1 ] <= INTRA_FRAME  
    skip = 0  
    inter_segment_id( 1 )  
    read_skip_mode( )  
    if ( skip_mode )  
        skip = 1  
    else  
        read_skip( )  
    if ( !SegIdPreSkip )  
        inter_segment_id( 0 )  
    Lossless = LosslessArray[ segment_id ]  
    read_cdef( )  
    read_delta_qindex( )  
    read_delta_lf( )  
    ReadDeltas = 0  
    read_is_inter( )  
    if ( is_inter )  
        inter_block_mode_info( )  
    else  
        intra_block_mode_info( )  
}  

Inter segment ID syntax

This is called before (preSkip equal to 1) and after (preSkip equal to 0) the skip syntax element has been read.

inter_segment_id( preSkip ) { Type
    if ( segmentation_enabled ) {  
        predictedSegmentId = get_segment_id( )  
        if ( segmentation_update_map ) {  
            if ( preSkip && !SegIdPreSkip ) {  
                segment_id = 0  
                return  
            }  
            if ( !preSkip ) {  
                if ( skip ) {  
                    seg_id_predicted = 0  
                    for ( i = 0; i < Num_4x4_Blocks_Wide[ MiSize ]; i++ )  
                        AboveSegPredContext[ MiCol + i ] = seg_id_predicted  
                    for ( i = 0; i < Num_4x4_Blocks_High[ MiSize ]; i++ )  
                        LeftSegPredContext[ MiRow + i ] = seg_id_predicted  
                    read_segment_id( )  
                    return  
                }  
            }  
            if ( segmentation_temporal_update == 1 ) {  
                seg_id_predicted S()
                if ( seg_id_predicted )  
                    segment_id = predictedSegmentId  
                else  
                    read_segment_id( )  
                for ( i = 0; i < Num_4x4_Blocks_Wide[ MiSize ]; i++ )  
                    AboveSegPredContext[ MiCol + i ] = seg_id_predicted  
                for ( i = 0; i < Num_4x4_Blocks_High[ MiSize ]; i++ )  
                    LeftSegPredContext[ MiRow + i ] = seg_id_predicted  
            } else {  
                read_segment_id( )  
            }  
        } else {  
            segment_id = predictedSegmentId  
        }  
    } else {  
        segment_id = 0  
    }  
}  

Is inter syntax

read_is_inter( ) { Type
    if ( skip_mode ) {  
        is_inter = 1  
    } else if ( seg_feature_active ( SEG_LVL_REF_FRAME ) ) {  
        is_inter = FeatureData[ segment_id ][ SEG_LVL_REF_FRAME ] != INTRA_FRAME  
    } else if ( seg_feature_active ( SEG_LVL_GLOBALMV ) ) {  
        is_inter = 1  
    } else {  
        is_inter S()
    }  
}  

Get segment ID function

The predicted segment id is the smallest value found in the on-screen region of the segmentation map covered by the current block.

get_segment_id( ) { Type
    bw4 = Num_4x4_Blocks_Wide[ MiSize ]  
    bh4 = Num_4x4_Blocks_High[ MiSize ]  
    xMis = Min( MiCols - MiCol, bw4 )  
    yMis = Min( MiRows - MiRow, bh4 )  
    seg = 7  
    for ( y = 0; y < yMis; y++ )  
        for ( x = 0; x < xMis; x++ )  
            seg = Min( seg, PrevSegmentIds[ MiRow + y ][ MiCol + x ] )  
    return seg  
}  

Intra block mode info syntax

intra_block_mode_info( ) { Type
    RefFrame[ 0 ] = INTRA_FRAME  
    RefFrame[ 1 ] = NONE  
    y_mode S()
    YMode = y_mode  
    intra_angle_info_y( )  
    if ( HasChroma ) {  
        uv_mode S()
        UVMode = uv_mode  
        if ( UVMode == UV_CFL_PRED ) {  
            read_cfl_alphas( )  
        }  
        intra_angle_info_uv( )  
    }  
    PaletteSizeY = 0  
    PaletteSizeUV = 0  
    if ( MiSize >= BLOCK_8X8 &&  
         Block_Width[ MiSize ] <= 64 &&  
         Block_Height[ MiSize ] <= 64 &&  
         allow_screen_content_tools )  
        palette_mode_info( )  
    filter_intra_mode_info( )  
}  

Inter block mode info syntax

inter_block_mode_info( ) { Type
    PaletteSizeY = 0  
    PaletteSizeUV = 0  
    read_ref_frames( )  
    isCompound = RefFrame[ 1 ] > INTRA_FRAME  
    find_mv_stack( isCompound )  
    if ( skip_mode ) {  
        YMode = NEAREST_NEARESTMV  
    } else if ( seg_feature_active( SEG_LVL_SKIP ) ||  
         seg_feature_active( SEG_LVL_GLOBALMV ) ) {  
        YMode = GLOBALMV  
    } else if ( isCompound ) {  
        compound_mode S()
        YMode = NEAREST_NEARESTMV + compound_mode  
    } else {  
        new_mv S()
        if ( new_mv == 0 ) {  
            YMode = NEWMV  
        } else {  
            zero_mv S()
            if ( zero_mv == 0 ) {  
                YMode = GLOBALMV  
            } else {  
                ref_mv S()
                YMode = (ref_mv == 0) ? NEARESTMV : NEARMV  
            }  
        }  
    }  
    RefMvIdx = 0  
    if ( YMode == NEWMV || YMode == NEW_NEWMV ) {  
        for ( idx = 0; idx < 2; idx++ ) {  
            if ( NumMvFound > idx + 1 ) {  
                drl_mode S()
                if ( drl_mode == 0 ) {  
                  RefMvIdx = idx  
                  break  
                }  
                RefMvIdx = idx + 1  
            }  
        }  
    } else if ( has_nearmv( ) ) {  
        RefMvIdx = 1  
        for ( idx = 1; idx < 3; idx++ ) {  
            if ( NumMvFound > idx + 1 ) {  
                drl_mode S()
                if ( drl_mode == 0 ) {  
                    RefMvIdx = idx  
                    break  
                }  
                RefMvIdx = idx + 1  
            }  
        }  
    }  
    assign_mv( isCompound )  
    read_interintra_mode( isCompound )  
    read_motion_mode( isCompound )  
    read_compound_type( isCompound )  
    if ( interpolation_filter == SWITCHABLE ) {  
        for ( dir = 0; dir < ( enable_dual_filter ? 2 : 1 ); dir++ ) {  
            if ( needs_interp_filter( ) ) {  
                interp_filter[ dir ] S()
            } else {  
                interp_filter[ dir ] = EIGHTTAP  
            }  
        }  
        if ( !enable_dual_filter )  
            interp_filter[ 1 ] = interp_filter[ 0 ]  
    } else {  
        for ( dir = 0; dir < 2; dir++ )  
            interp_filter[ dir ] = interpolation_filter  
    }  
}  

The function has_nearmv is defined as:

has_nearmv( ) {
    return (YMode == NEARMV || YMode == NEAR_NEARMV
            || YMode == NEAR_NEWMV || YMode == NEW_NEARMV)
}

The function needs_interp_filter is defined as:

needs_interp_filter( ) {
    large = (Min(Block_Width[MiSize], Block_Height[MiSize]) >= 8)
    if ( skip_mode || motion_mode == LOCALWARP ) {
        return 0
    } else if ( large && YMode == GLOBALMV ) {
        return GmType[ RefFrame[ 0 ] ] == TRANSLATION
    } else if ( large && YMode == GLOBAL_GLOBALMV ) {
        return GmType[ RefFrame[ 0 ] ] == TRANSLATION || GmType[ RefFrame[ 1 ] ] == TRANSLATION
    } else {
        return 1
    }
}

Filter intra mode info syntax

filter_intra_mode_info( ) { Type
    use_filter_intra = 0  
    if ( enable_filter_intra &&  
         YMode == DC_PRED && PaletteSizeY == 0 &&  
         Max( Block_Width[ MiSize ], Block_Height[ MiSize ] ) <= 32 ) {  
        use_filter_intra S()
        if ( use_filter_intra ) {  
            filter_intra_mode S()
        }  
    }  
}  

Ref frames syntax

read_ref_frames( ) { Type
    if ( skip_mode ) {  
        RefFrame[ 0 ] = SkipModeFrame[ 0 ]  
        RefFrame[ 1 ] = SkipModeFrame[ 1 ]  
    } else if ( seg_feature_active( SEG_LVL_REF_FRAME ) ) {  
        RefFrame[ 0 ] = FeatureData[ segment_id ][ SEG_LVL_REF_FRAME ]  
        RefFrame[ 1 ] = NONE  
    } else if ( seg_feature_active( SEG_LVL_SKIP ) ||  
                seg_feature_active( SEG_LVL_GLOBALMV ) ) {  
        RefFrame[ 0 ] = LAST_FRAME  
        RefFrame[ 1 ] = NONE  
    } else {  
        bw4 = Num_4x4_Blocks_Wide[ MiSize ]  
        bh4 = Num_4x4_Blocks_High[ MiSize ]  
        if ( reference_select && ( Min( bw4, bh4 ) >= 2 ) )  
            comp_mode S()
        else  
            comp_mode = SINGLE_REFERENCE  
        if ( comp_mode == COMPOUND_REFERENCE ) {  
            comp_ref_type S()
            if ( comp_ref_type == UNIDIR_COMP_REFERENCE ) {  
                uni_comp_ref S()
                if ( uni_comp_ref ) {  
                    RefFrame[0] = BWDREF_FRAME  
                    RefFrame[1] = ALTREF_FRAME  
                } else {  
                    uni_comp_ref_p1 S()
                    if ( uni_comp_ref_p1 ) {  
                        uni_comp_ref_p2 S()
                        if ( uni_comp_ref_p2 ) {  
                          RefFrame[0] = LAST_FRAME  
                          RefFrame[1] = GOLDEN_FRAME  
                        } else {  
                          RefFrame[0] = LAST_FRAME  
                          RefFrame[1] = LAST3_FRAME  
                        }  
                    } else {  
                        RefFrame[0] = LAST_FRAME  
                        RefFrame[1] = LAST2_FRAME  
                    }  
                }  
            } else {  
                comp_ref S()
                if ( comp_ref == 0 ) {  
                    comp_ref_p1 S()
                    RefFrame[ 0 ] = comp_ref_p1 ?  
                                    LAST2_FRAME : LAST_FRAME  
                } else {  
                    comp_ref_p2 S()
                    RefFrame[ 0 ] = comp_ref_p2 ?  
                                    GOLDEN_FRAME : LAST3_FRAME  
                }  
                comp_bwdref S()
                if ( comp_bwdref == 0 ) {  
                    comp_bwdref_p1 S()
                    RefFrame[ 1 ] = comp_bwdref_p1 ?  
                                     ALTREF2_FRAME : BWDREF_FRAME  
                } else {  
                    RefFrame[ 1 ] = ALTREF_FRAME  
                }  
            }  
        } else {  
            single_ref_p1 S()
            if ( single_ref_p1 ) {  
                single_ref_p2 S()
                if ( single_ref_p2 == 0 ) {  
                    single_ref_p6 S()
                    RefFrame[ 0 ] = single_ref_p6 ?  
                                     ALTREF2_FRAME : BWDREF_FRAME  
                } else {  
                    RefFrame[ 0 ] = ALTREF_FRAME  
                }  
            } else {  
                single_ref_p3 S()
                if ( single_ref_p3 ) {  
                    single_ref_p5 S()
                    RefFrame[ 0 ] = single_ref_p5 ?  
                                     GOLDEN_FRAME : LAST3_FRAME  
                } else {  
                    single_ref_p4 S()
                    RefFrame[ 0 ] = single_ref_p4 ?  
                                     LAST2_FRAME : LAST_FRAME  
                }  
            }  
            RefFrame[ 1 ] = NONE  
        }  
    }  
}  

Assign MV syntax

assign_mv( isCompound ) { Type
    for ( i = 0; i < 1 + isCompound; i++ ) {  
        if ( use_intrabc ) {  
            compMode = NEWMV  
        } else {  
            compMode = get_mode( i )  
        }  
        if ( use_intrabc ) {  
            PredMv[ 0 ] = RefStackMv[ 0 ][ 0 ]  
            if ( PredMv[ 0 ][ 0 ] == 0 && PredMv[ 0 ][ 1 ] == 0 ) {  
                PredMv[ 0 ] = RefStackMv[ 1 ][ 0 ]  
            }  
            if ( PredMv[ 0 ][ 0 ] == 0 && PredMv[ 0 ][ 1 ] == 0 ) {  
                sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64  
                sbSize4 = Num_4x4_Blocks_High[ sbSize ]  
                if ( MiRow - sbSize4 < MiRowStart ) {  
                    PredMv[ 0 ][ 0 ] = 0  
                    PredMv[ 0 ][ 1 ] = -(sbSize4 * MI_SIZE + INTRABC_DELAY_PIXELS) * 8  
                } else {  
                    PredMv[ 0 ][ 0 ] = -(sbSize4 * MI_SIZE * 8)  
                    PredMv[ 0 ][ 1 ] = 0  
                }  
            }  
        } else if ( compMode == GLOBALMV ) {  
            PredMv[ i ] = GlobalMvs[ i ]  
        } else {  
            pos = ( compMode == NEARESTMV ) ? 0 : RefMvIdx  
            if ( compMode == NEWMV && NumMvFound <= 1 )  
                pos = 0  
            PredMv[ i ] = RefStackMv[ pos ][ i ]  
        }  
        if ( compMode == NEWMV ) {  
            read_mv( i )  
        } else {  
            Mv[ i ] = PredMv[ i ]  
        }  
    }  
}  

Read motion mode syntax

read_motion_mode( isCompound ) { Type
    if ( skip_mode ) {  
        motion_mode = SIMPLE  
        return  
    }  
    if ( !is_motion_mode_switchable ) {  
        motion_mode = SIMPLE  
        return  
    }  
    if ( Min( Block_Width[ MiSize ],  
              Block_Height[ MiSize ] ) < 8 ) {  
        motion_mode = SIMPLE  
        return  
    }  
    if ( !force_integer_mv &&  
         ( YMode == GLOBALMV || YMode == GLOBAL_GLOBALMV ) ) {  
        if ( GmType[ RefFrame[ 0 ] ] > TRANSLATION ) {  
            motion_mode = SIMPLE  
            return  
        }  
    }  
    if ( isCompound || RefFrame[ 1 ] == INTRA_FRAME || !has_overlappable_candidates( ) ) {  
        motion_mode = SIMPLE  
        return  
    }  
    find_warp_samples()  
    if ( force_integer_mv || NumSamples == 0 ||  
         !allow_warped_motion || is_scaled( RefFrame[0] ) ) {  
        use_obmc S()
        motion_mode = use_obmc ? OBMC : SIMPLE  
    } else {  
        motion_mode S()
    }  
}  

where is_scaled is a function that determines whether a reference frame uses scaling and is specified as:

is_scaled( refFrame ) {
  refIdx = ref_frame_idx[ refFrame - LAST_FRAME ]
  xScale = ( ( RefUpscaledWidth[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameWidth / 2 ) ) / FrameWidth
  yScale = ( ( RefFrameHeight[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameHeight / 2 ) ) / FrameHeight
  noScale = 1 << REF_SCALE_SHIFT
  return xScale != noScale || yScale != noScale
}

Read inter intra syntax

read_interintra_mode( isCompound ) { Type
    if ( !skip_mode && enable_interintra_compound && !isCompound &&  
         MiSize >= BLOCK_8X8 && MiSize <= BLOCK_32X32) {  
        interintra S()
        if ( interintra ) {  
            interintra_mode S()
            RefFrame[1] = INTRA_FRAME  
            AngleDeltaY = 0  
            AngleDeltaUV = 0  
            use_filter_intra = 0  
            wedge_interintra S()
            if ( wedge_interintra ) {  
                wedge_index S()
                wedge_sign = 0  
            }  
        }  
    } else {  
        interintra = 0  
    }  
}  

Read compound type syntax

read_compound_type( isCompound ) { Type
    comp_group_idx = 0  
    compound_idx = 1  
    if ( skip_mode ) {  
        compound_type = COMPOUND_AVERAGE  
        return  
    }  
    if ( isCompound ) {  
        n = Wedge_Bits[ MiSize ]  
        if ( enable_masked_compound ) {  
              comp_group_idx S()
        }  
        if ( comp_group_idx == 0 ) {  
            if ( enable_jnt_comp ) {  
                compound_idx S()
                compound_type = compound_idx ? COMPOUND_AVERAGE :  
                                               COMPOUND_DISTANCE  
            } else {  
                compound_type = COMPOUND_AVERAGE  
            }  
        } else {  
            if ( n == 0 ) {  
                compound_type = COMPOUND_DIFFWTD  
            } else {  
                compound_type S()
            }  
        }  
        if ( compound_type == COMPOUND_WEDGE ) {  
            wedge_index S()
            wedge_sign L(1)
        } else if ( compound_type == COMPOUND_DIFFWTD ) {  
            mask_type L(1)
        }  
    } else {  
        if ( interintra ) {  
            compound_type = wedge_interintra ? COMPOUND_WEDGE : COMPOUND_INTRA  
        } else {  
            compound_type = COMPOUND_AVERAGE  
        }  
    }  
}  

Get mode function

get_mode( refList ) { Type
    if ( refList == 0 ) {  
        if ( YMode < NEAREST_NEARESTMV )  
            compMode = YMode  
        else if ( YMode == NEW_NEWMV || YMode == NEW_NEARESTMV || YMode == NEW_NEARMV )  
            compMode = NEWMV  
        else if ( YMode == NEAREST_NEARESTMV || YMode == NEAREST_NEWMV )  
            compMode = NEARESTMV  
        else if ( YMode == NEAR_NEARMV || YMode == NEAR_NEWMV )  
            compMode = NEARMV  
        else  
            compMode = GLOBALMV  
    } else {  
        if ( YMode == NEW_NEWMV || YMode == NEAREST_NEWMV || YMode == NEAR_NEWMV )  
            compMode = NEWMV  
        else if ( YMode == NEAREST_NEARESTMV || YMode == NEW_NEARESTMV )  
            compMode = NEARESTMV  
        else if ( YMode == NEAR_NEARMV || YMode == NEW_NEARMV )  
            compMode = NEARMV  
        else  
            compMode = GLOBALMV  
    }  
    return compMode  
}  

MV syntax

read_mv( ref ) { Type
    diffMv[ 0 ] = 0  
    diffMv[ 1 ] = 0  
    if ( use_intrabc ) {  
        MvCtx = MV_INTRABC_CONTEXT  
    } else {  
        MvCtx = 0  
    }  
    mv_joint S()
    if ( mv_joint == MV_JOINT_HZVNZ || mv_joint == MV_JOINT_HNZVNZ )  
        diffMv[ 0 ] = read_mv_component( 0 )  
    if ( mv_joint == MV_JOINT_HNZVZ || mv_joint == MV_JOINT_HNZVNZ )  
        diffMv[ 1 ] = read_mv_component( 1 )  
    Mv[ ref ][ 0 ] = PredMv[ ref ][ 0 ] + diffMv[ 0 ]  
    Mv[ ref ][ 1 ] = PredMv[ ref ][ 1 ] + diffMv[ 1 ]  
}  

MV component syntax

read_mv_component( comp ) { Type
    mv_sign S()
    mv_class S()
    if ( mv_class == MV_CLASS_0 ) {  
        mv_class0_bit S()
        if ( force_integer_mv )  
            mv_class0_fr = 3  
        else  
            mv_class0_fr S()
        if ( allow_high_precision_mv )  
            mv_class0_hp S()
        else  
            mv_class0_hp = 1  
        mag = ( ( mv_class0_bit << 3 ) |  
                ( mv_class0_fr << 1 ) |  
                  mv_class0_hp ) + 1  
    } else {  
        d = 0  
        for ( i = 0; i < mv_class; i++ ) {  
            mv_bit S()
            d |= mv_bit << i  
        }  
        mag = CLASS0_SIZE << ( mv_class + 2 )  
        if ( force_integer_mv )  
            mv_fr = 3  
        else  
            mv_fr S()
        if ( allow_high_precision_mv )  
            mv_hp S()
        else  
            mv_hp = 1  
        mag += ( ( d << 3 ) | ( mv_fr << 1 ) | mv_hp ) + 1  
    }  
    return mv_sign ? -mag : mag  
}  

Compute prediction syntax

compute_prediction() { Type
    sbMask = use_128x128_superblock ? 31 : 15  
    subBlockMiRow = MiRow & sbMask  
    subBlockMiCol = MiCol & sbMask  
    for ( plane = 0; plane < 1 + HasChroma * 2; plane++ ) {  
        planeSz = get_plane_residual_size( MiSize, plane )  
        num4x4W = Num_4x4_Blocks_Wide[ planeSz ]  
        num4x4H = Num_4x4_Blocks_High[ planeSz ]  
        log2W = MI_SIZE_LOG2 + Mi_Width_Log2[ planeSz ]  
        log2H = MI_SIZE_LOG2 + Mi_Height_Log2[ planeSz ]  
        subX = (plane > 0) ? subsampling_x : 0  
        subY = (plane > 0) ? subsampling_y : 0  
        baseX = (MiCol >> subX) * MI_SIZE  
        baseY = (MiRow >> subY) * MI_SIZE  
        candRow = (MiRow >> subY) << subY  
        candCol = (MiCol >> subX) << subX  
   
        IsInterIntra = ( is_inter && RefFrame[ 1 ] == INTRA_FRAME )  
        if ( IsInterIntra ) {  
            if ( interintra_mode == II_DC_PRED ) mode = DC_PRED  
            else if ( interintra_mode == II_V_PRED ) mode = V_PRED  
            else if ( interintra_mode == II_H_PRED ) mode = H_PRED  
            else mode = SMOOTH_PRED  
            predict_intra( plane, baseX, baseY,  
                           plane == 0 ? AvailL : AvailLChroma,  
                           plane == 0 ? AvailU : AvailUChroma,  
                           BlockDecoded[ plane ]  
                                       [ ( subBlockMiRow >> subY ) - 1 ]  
                                       [ ( subBlockMiCol >> subX ) + num4x4W ],  
                           BlockDecoded[ plane ]  
                                       [ ( subBlockMiRow >> subY ) + num4x4H ]  
                                       [ ( subBlockMiCol >> subX ) - 1 ],  
                           mode,  
                           log2W, log2H )  
        }  
        if ( is_inter ) {  
            predW = Block_Width[ MiSize ] >> subX  
            predH = Block_Height[ MiSize ] >> subY  
            someUseIntra = 0  
            for ( r = 0; r < (num4x4H << subY); r++ )  
                for ( c = 0; c < (num4x4W << subX); c++ )  
                    if ( RefFrames[ candRow + r ][ candCol + c ][ 0 ] == INTRA_FRAME )  
                        someUseIntra = 1  
            if ( someUseIntra ) {  
                predW = num4x4W * 4  
                predH = num4x4H * 4  
                candRow = MiRow  
                candCol = MiCol  
            }  
            r = 0  
            for ( y = 0; y < num4x4H * 4; y += predH ) {  
                c = 0  
                for ( x = 0; x < num4x4W * 4; x += predW ) {  
                    predict_inter( plane, baseX + x, baseY + y,  
                                   predW, predH,  
                                   candRow + r, candCol + c)  
                    c++  
                }  
                r++  
            }  
        }  
    }  
}  

Residual syntax

residual( ) { Type
    sbMask = use_128x128_superblock ? 31 : 15  
   
    widthChunks = Max( 1, Block_Width[ MiSize ] >> 6 )  
    heightChunks = Max( 1, Block_Height[ MiSize ] >> 6 )  
   
    miSizeChunk = ( widthChunks > 1 || heightChunks > 1 ) ? BLOCK_64X64 : MiSize  
   
    for ( chunkY = 0; chunkY < heightChunks; chunkY++ ) {  
        for ( chunkX = 0; chunkX < widthChunks; chunkX++ ) {  
            miRowChunk = MiRow + ( chunkY << 4 )  
            miColChunk = MiCol + ( chunkX << 4 )  
            subBlockMiRow = miRowChunk & sbMask  
            subBlockMiCol = miColChunk & sbMask  
   
            for ( plane = 0; plane < 1 + HasChroma * 2; plane++ ) {  
                txSz = Lossless ? TX_4X4 : get_tx_size( plane, TxSize )  
                stepX = Tx_Width[ txSz ] >> 2  
                stepY = Tx_Height[ txSz ] >> 2  
                planeSz = get_plane_residual_size( miSizeChunk, plane )  
                num4x4W = Num_4x4_Blocks_Wide[ planeSz ]  
                num4x4H = Num_4x4_Blocks_High[ planeSz ]  
                subX = (plane > 0) ? subsampling_x : 0  
                subY = (plane > 0) ? subsampling_y : 0  
                baseX = (miColChunk >> subX) * MI_SIZE  
                baseY = (miRowChunk >> subY) * MI_SIZE  
                if ( is_inter && !Lossless && !plane ) {  
                    transform_tree( baseX, baseY, num4x4W * 4, num4x4H * 4 )  
                } else {  
                    baseXBlock = (MiCol >> subX) * MI_SIZE  
                    baseYBlock = (MiRow >> subY) * MI_SIZE  
                    for ( y = 0; y < num4x4H; y += stepY )  
                        for ( x = 0; x < num4x4W; x += stepX )  
                            transform_block( plane, baseXBlock, baseYBlock, txSz,  
                                             x + ( ( chunkX << 4 ) >> subX ),  
                                             y + ( ( chunkY << 4 ) >> subY ) )  
                }  
            }  
        }  
    }  
}  

Transform block syntax

transform_block(plane, baseX, baseY, txSz, x, y) { Type
    startX = baseX + 4 * x  
    startY = baseY + 4 * y  
    subX = (plane > 0) ? subsampling_x : 0  
    subY = (plane > 0) ? subsampling_y : 0  
    row = ( startY << subY ) >> MI_SIZE_LOG2  
    col = ( startX << subX ) >> MI_SIZE_LOG2  
    sbMask = use_128x128_superblock ? 31 : 15  
    subBlockMiRow = row & sbMask  
    subBlockMiCol = col & sbMask  
    stepX = Tx_Width[ txSz ] >> MI_SIZE_LOG2  
    stepY = Tx_Height[ txSz ] >> MI_SIZE_LOG2  
    maxX = (MiCols * MI_SIZE) >> subX  
    maxY = (MiRows * MI_SIZE) >> subY  
    if ( startX >= maxX || startY >= maxY ) {  
        return  
    }  
    if ( !is_inter ) {  
        if ( ( ( plane == 0 ) && PaletteSizeY ) ||  
             ( ( plane != 0 ) && PaletteSizeUV ) ) {  
            predict_palette( plane, startX, startY, x, y, txSz )  
        } else {  
            isCfl = (plane > 0 && UVMode == UV_CFL_PRED)  
            if ( plane == 0 ) {  
                mode = YMode  
            } else {  
                mode = ( isCfl ) ? DC_PRED : UVMode  
            }  
            log2W = Tx_Width_Log2[ txSz ]  
            log2H = Tx_Height_Log2[ txSz ]  
            predict_intra( plane, startX, startY,  
                           ( plane == 0 ? AvailL : AvailLChroma ) || x > 0,  
                           ( plane == 0 ? AvailU : AvailUChroma ) || y > 0,  
                           BlockDecoded[ plane ]  
                                       [ ( subBlockMiRow >> subY ) - 1 ]  
                                       [ ( subBlockMiCol >> subX ) + stepX ],  
                           BlockDecoded[ plane ]  
                                       [ ( subBlockMiRow >> subY ) + stepY ]  
                                       [ ( subBlockMiCol >> subX ) - 1 ],  
                           mode,  
                           log2W, log2H )  
            if ( isCfl ) {  
                predict_chroma_from_luma( plane, startX, startY, txSz )  
            }  
        }  
          
        if ( plane == 0 ) {  
            MaxLumaW = startX + stepX * 4  
            MaxLumaH = startY + stepY * 4  
        }  
    }  
    if ( !skip ) {  
        eob = coeffs( plane, startX, startY, txSz )  
        if ( eob > 0 )  
            reconstruct( plane, startX, startY, txSz )  
    }  
    for ( i = 0; i < stepY; i++ ) {  
        for ( j = 0; j < stepX; j++ ) {  
            LoopfilterTxSizes[ plane ]  
                             [ (row >> subY) + i ]  
                             [ (col >> subX) + j ] = txSz  
            BlockDecoded[ plane ]  
                        [ ( subBlockMiRow >> subY ) + i ]  
                        [ ( subBlockMiCol >> subX ) + j ] = 1  
        }  
    }  
}  

Transform tree syntax

transform_tree is used to read a number of transform blocks arranged in a transform tree.

transform_tree( startX, startY, w, h ) { Type
    maxX = MiCols * MI_SIZE  
    maxY = MiRows * MI_SIZE  
    if ( startX >= maxX || startY >= maxY ) {  
        return  
    }  
    row = startY >> MI_SIZE_LOG2  
    col = startX >> MI_SIZE_LOG2  
    lumaTxSz = InterTxSizes[ row ][ col ]  
    lumaW = Tx_Width[ lumaTxSz ]  
    lumaH = Tx_Height[ lumaTxSz ]  
    if ( w <= lumaW && h <= lumaH ) {  
        txSz = find_tx_size( w, h )  
        transform_block( 0, startX, startY, txSz, 0, 0 )  
    } else {  
        if ( w > h ) {  
            transform_tree( startX, startY, w/2, h )  
            transform_tree( startX + w / 2, startY, w/2, h )  
        } else if ( w < h ) {  
            transform_tree( startX, startY, w, h/2 )  
            transform_tree( startX, startY + h/2, w, h/2 )  
        } else {  
            transform_tree( startX, startY, w/2, h/2 )  
            transform_tree( startX + w/2, startY, w/2, h/2 )  
            transform_tree( startX, startY + h/2, w/2, h/2 )  
            transform_tree( startX + w/2, startY + h/2, w/2, h/2 )  
        }  
    }  
}  

where find_tx_size finds the transform size matching the given dimensions and is defined as:

find_tx_size( w, h ) {
    for ( txSz = 0; txSz < TX_SIZES_ALL; txSz++ )
        if ( Tx_Width[ txSz ] == w && Tx_Height[ txSz ] == h )
            break
    return txSz
}

Get TX size function

get_tx_size( plane, txSz ) { Type
    if ( plane == 0 )  
        return txSz  
    uvTx = Max_Tx_Size_Rect[ get_plane_residual_size( MiSize, plane ) ]  
    if ( Tx_Width[ uvTx ] == 64 || Tx_Height[ uvTx ] == 64 ){  
        if ( Tx_Width[ uvTx ] == 16 ) {  
            return TX_16X32  
        }  
        if ( Tx_Height[ uvTx ] == 16 ) {  
            return TX_32X16  
        }  
        return TX_32X32  
    }  
    return uvTx  
}  

Get plane residual size function

The get_plane_residual_size returns the size of a residual block for the specified plane. (The residual block will always have width and height at least equal to 4.)

get_plane_residual_size( subsize, plane ) { Type
    subx = plane > 0 ? subsampling_x : 0  
    suby = plane > 0 ? subsampling_y : 0  
    return Subsampled_Size[ subsize ][ subx ][ suby ]  
}  

The Subsampled_Size table is defined as:

Subsampled_Size[ BLOCK_SIZES ][ 2 ][ 2 ] = {
  { { BLOCK_4X4,    BLOCK_4X4},      {BLOCK_4X4,     BLOCK_4X4} },
  { { BLOCK_4X8,    BLOCK_4X4},      {BLOCK_INVALID, BLOCK_4X4} },
  { { BLOCK_8X4,    BLOCK_INVALID},  {BLOCK_4X4,     BLOCK_4X4} },
  { { BLOCK_8X8,    BLOCK_8X4},      {BLOCK_4X8,     BLOCK_4X4} },
  { {BLOCK_8X16,    BLOCK_8X8},      {BLOCK_INVALID, BLOCK_4X8} },
  { {BLOCK_16X8,    BLOCK_INVALID},  {BLOCK_8X8,     BLOCK_8X4} },
  { {BLOCK_16X16,   BLOCK_16X8},     {BLOCK_8X16,    BLOCK_8X8} },
  { {BLOCK_16X32,   BLOCK_16X16},    {BLOCK_INVALID, BLOCK_8X16} },
  { {BLOCK_32X16,   BLOCK_INVALID},  {BLOCK_16X16,   BLOCK_16X8} },
  { {BLOCK_32X32,   BLOCK_32X16},    {BLOCK_16X32,   BLOCK_16X16} },
  { {BLOCK_32X64,   BLOCK_32X32},    {BLOCK_INVALID, BLOCK_16X32} },
  { {BLOCK_64X32,   BLOCK_INVALID},  {BLOCK_32X32,   BLOCK_32X16} },
  { {BLOCK_64X64,   BLOCK_64X32},    {BLOCK_32X64,   BLOCK_32X32} },
  { {BLOCK_64X128,  BLOCK_64X64},    {BLOCK_INVALID, BLOCK_32X64} },
  { {BLOCK_128X64,  BLOCK_INVALID},  {BLOCK_64X64,   BLOCK_64X32} },
  { {BLOCK_128X128, BLOCK_128X64},   {BLOCK_64X128,  BLOCK_64X64} },
  { {BLOCK_4X16,    BLOCK_4X8},      {BLOCK_INVALID, BLOCK_4X8} },
  { {BLOCK_16X4,    BLOCK_INVALID},  {BLOCK_8X4,     BLOCK_8X4} },
  { {BLOCK_8X32,    BLOCK_8X16},     {BLOCK_INVALID, BLOCK_4X16} },
  { {BLOCK_32X8,    BLOCK_INVALID},  {BLOCK_16X8,    BLOCK_16X4} },
  { {BLOCK_16X64,   BLOCK_16X32},    {BLOCK_INVALID, BLOCK_8X32} },
  { {BLOCK_64X16,   BLOCK_INVALID},  {BLOCK_32X16,   BLOCK_32X8} },
}

Coefficients syntax

coeffs( plane, startX, startY, txSz ) { Type
    x4 = startX >> 2  
    y4 = startY >> 2  
    w4 = Tx_Width[ txSz ] >> 2  
    h4 = Tx_Height[ txSz ] >> 2  
   
    txSzCtx = ( Tx_Size_Sqr[txSz] + Tx_Size_Sqr_Up[txSz] + 1 ) >> 1  
    ptype = plane > 0  
    segEob = ( txSz == TX_16X64 || txSz == TX_64X16 ) ? 512 :  
                Min( 1024, Tx_Width[ txSz ] * Tx_Height[ txSz ] )  
   
    for ( c = 0; c < segEob; c++ )  
        Quant[c] = 0  
    for ( i = 0; i < 64; i++ )  
        for ( j = 0; j < 64; j++ )  
            Dequant[ i ][ j ] = 0  
   
    eob = 0  
    culLevel = 0  
    dcCategory = 0  
   
    all_zero S()
    if ( all_zero ) {  
        c = 0  
        if ( plane == 0 ) {  
            for ( i = 0; i < w4; i++ ) {  
                for ( j = 0; j < h4; j++ ) {  
                    TxTypes[ y4 + j ][ x4 + i ] = DCT_DCT  
                }  
            }  
        }  
    } else {  
        if ( plane == 0 )  
            transform_type( x4, y4, txSz )  
        PlaneTxType = compute_tx_type( plane, txSz, x4, y4 )  
        scan = get_scan( txSz )  
   
        eobMultisize = Min( Tx_Width_Log2[ txSz ], 5) + Min( Tx_Height_Log2[ txSz ], 5) - 4  
        if ( eobMultisize == 0 ) {  
            eob_pt_16 S()
            eobPt = eob_pt_16 + 1  
        } else if ( eobMultisize == 1 ) {  
            eob_pt_32 S()
            eobPt = eob_pt_32 + 1  
        } else if ( eobMultisize == 2 ) {  
            eob_pt_64 S()
            eobPt = eob_pt_64 + 1  
        } else if ( eobMultisize == 3 ) {  
            eob_pt_128 S()
            eobPt = eob_pt_128 + 1  
        } else if ( eobMultisize == 4 ) {  
            eob_pt_256 S()
            eobPt = eob_pt_256 + 1  
        } else if ( eobMultisize == 5 ) {  
            eob_pt_512 S()
            eobPt = eob_pt_512 + 1  
        } else {  
            eob_pt_1024 S()
            eobPt = eob_pt_1024 + 1  
        }  
   
        eob = ( eobPt < 2 ) ? eobPt : ( ( 1 << ( eobPt - 2 ) ) + 1 )  
        eobShift = Max( -1, eobPt - 3 )  
        if ( eobShift >= 0 ) {  
            eob_extra S()
            if ( eob_extra ) {  
                eob += ( 1 << eobShift )  
            }  
   
            for ( i = 1; i < Max( 0, eobPt - 2 ); i++ ) {  
                eobShift = Max( 0, eobPt - 2 ) - 1 - i  
                eob_extra_bit L(1)
                if ( eob_extra_bit ) {  
                    eob += ( 1 << eobShift )  
                }  
            }  
        }  
        for ( c = eob - 1; c >= 0; c-- ) {  
            pos = scan[ c ]  
            if ( c == ( eob - 1 ) ) {  
                coeff_base_eob S()
                level = coeff_base_eob + 1  
            } else {  
                coeff_base S()
                level = coeff_base  
            }  
   
            if ( level > NUM_BASE_LEVELS ) {  
                for ( idx = 0;  
                      idx < COEFF_BASE_RANGE / ( BR_CDF_SIZE - 1 );  
                      idx++ ) {  
                    coeff_br S()
                    level += coeff_br  
                    if ( coeff_br < ( BR_CDF_SIZE - 1 ) )  
                        break  
                }  
            }  
            Quant[ pos ] = level  
        }  
   
        for ( c = 0; c < eob; c++ ) {  
            pos = scan[ c ]  
            if ( Quant[ pos ] != 0 ) {  
                if ( c == 0 ) {  
                    dc_sign S()
                    sign = dc_sign  
                } else {  
                    sign_bit L(1)
                    sign = sign_bit  
                }  
            } else {  
                sign = 0  
            }  
            if ( Quant[ pos ] >  
                ( NUM_BASE_LEVELS + COEFF_BASE_RANGE ) ) {  
                length = 0  
                do {  
                    length++  
                    golomb_length_bit L(1)
                } while ( !golomb_length_bit )  
                x = 1  
                for ( i = length - 2; i >= 0; i-- ) {  
                    golomb_data_bit L(1)
                    x = ( x << 1 ) | golomb_data_bit  
                }  
               Quant[ pos ] = x + COEFF_BASE_RANGE + NUM_BASE_LEVELS  
            }  
            if ( pos == 0 && Quant[ pos ] > 0 ) {  
                dcCategory = sign ? 1 : 2  
            }  
            Quant[ pos ] = Quant[ pos ] & 0xFFFFF  
            culLevel += Quant[ pos ]  
            if ( sign )  
                Quant[ pos ] = - Quant[ pos ]  
        }  
        culLevel = Min( 63, culLevel )  
    }  
   
    for ( i = 0; i < w4; i++ ) {  
        AboveLevelContext[ plane ][ x4 + i ] = culLevel  
        AboveDcContext[ plane ][ x4 + i ] = dcCategory  
    }  
    for ( i = 0; i < h4; i++ ) {  
        LeftLevelContext[ plane ][ y4 + i ] = culLevel  
        LeftDcContext[ plane ][ y4 + i ] = dcCategory  
    }  
   
    return eob  
}  

Compute transform type function

compute_tx_type( plane, txSz, blockX, blockY ) { Type
    txSzSqrUp = Tx_Size_Sqr_Up[ txSz ]  
   
    if ( Lossless || txSzSqrUp > TX_32X32 )  
        return DCT_DCT  
   
    txSet = get_tx_set( txSz )  
   
    if ( plane == 0 ) {  
        return TxTypes[ blockY ][ blockX ]  
    }  
   
    if ( is_inter ) {  
        x4 = Max( MiCol, blockX << subsampling_x )  
        y4 = Max( MiRow, blockY << subsampling_y )  
        txType = TxTypes[ y4 ][ x4 ]  
        if ( !is_tx_type_in_set( txSet, txType ) )  
            return DCT_DCT  
        return txType  
    }  
   
    txType = Mode_To_Txfm[ UVMode ]  
    if ( !is_tx_type_in_set( txSet, txType ) )  
        return DCT_DCT  
    return txType  
}  
   
is_tx_type_in_set( txSet, txType ) {  
    return is_inter ? Tx_Type_In_Set_Inter[ txSet ][ txType ] :  
                      Tx_Type_In_Set_Intra[ txSet ][ txType ]  
}  

where the tables Tx_Type_In_Set_Inter and Tx_Type_In_Set_Intra are specified as follows:

Tx_Type_In_Set_Intra[ TX_SET_TYPES_INTRA ][ TX_TYPES ] = {
  {
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  },
  {
    1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
  },
  {
    1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
  }
}

Tx_Type_In_Set_Inter[ TX_SET_TYPES_INTER ][ TX_TYPES ] = {
  {
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  },
  {
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  },
  {
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
  },
  {
    1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
  }
}

Get scan function

get_mrow_scan( txSz ) { Type
    if ( txSz == TX_4X4 )  
        return Mrow_Scan_4x4  
    else if ( txSz == TX_4X8 )  
        return Mrow_Scan_4x8  
    else if ( txSz == TX_8X4 )  
        return Mrow_Scan_8x4  
    else if ( txSz == TX_8X8 )  
        return Mrow_Scan_8x8  
    else if ( txSz == TX_8X16 )  
        return Mrow_Scan_8x16  
    else if ( txSz == TX_16X8 )  
        return Mrow_Scan_16x8  
    else if ( txSz == TX_16X16 )  
        return Mrow_Scan_16x16  
    else if ( txSz == TX_4X16 )  
        return Mrow_Scan_4x16  
    return Mrow_Scan_16x4  
}  
   
get_mcol_scan( txSz ) {  
    if ( txSz == TX_4X4 )  
        return Mcol_Scan_4x4  
    else if ( txSz == TX_4X8 )  
        return Mcol_Scan_4x8  
    else if ( txSz == TX_8X4 )  
        return Mcol_Scan_8x4  
    else if ( txSz == TX_8X8 )  
        return Mcol_Scan_8x8  
    else if ( txSz == TX_8X16 )  
        return Mcol_Scan_8x16  
    else if ( txSz == TX_16X8 )  
        return Mcol_Scan_16x8  
    else if ( txSz == TX_16X16 )  
        return Mcol_Scan_16x16  
    else if ( txSz == TX_4X16 )  
        return Mcol_Scan_4x16  
    return Mcol_Scan_16x4  
}  
   
get_default_scan( txSz ) {  
    if ( txSz == TX_4X4 )  
        return Default_Scan_4x4  
    else if ( txSz == TX_4X8 )  
        return Default_Scan_4x8  
    else if ( txSz == TX_8X4 )  
        return Default_Scan_8x4  
    else if ( txSz == TX_8X8 )  
        return Default_Scan_8x8  
    else if ( txSz == TX_8X16 )  
        return Default_Scan_8x16  
    else if ( txSz == TX_16X8 )  
        return Default_Scan_16x8  
    else if ( txSz == TX_16X16 )  
        return Default_Scan_16x16  
    else if ( txSz == TX_16X32 )  
        return Default_Scan_16x32  
    else if ( txSz == TX_32X16 )  
        return Default_Scan_32x16  
    else if ( txSz == TX_4X16 )  
        return Default_Scan_4x16  
    else if ( txSz == TX_16X4 )  
        return Default_Scan_16x4  
    else if ( txSz == TX_8X32 )  
        return Default_Scan_8x32  
    else if ( txSz == TX_32X8 )  
        return Default_Scan_32x8  
    return Default_Scan_32x32  
}  
   
get_scan( txSz ) {  
    if ( txSz == TX_16X64 ) {  
        return Default_Scan_16x32  
    }  
    if ( txSz == TX_64X16 ) {  
        return Default_Scan_32x16  
    }  
    if ( Tx_Size_Sqr_Up[ txSz ] == TX_64X64 ) {  
        return Default_Scan_32x32  
    }  
   
    if ( PlaneTxType == IDTX ) {  
        return get_default_scan( txSz )  
    }  
   
    preferRow = ( PlaneTxType == V_DCT ||  
                  PlaneTxType == V_ADST ||  
                  PlaneTxType == V_FLIPADST )  
   
    preferCol = ( PlaneTxType == H_DCT ||  
                  PlaneTxType == H_ADST ||  
                  PlaneTxType == H_FLIPADST )  
   
    if ( preferRow ) {  
        return get_mrow_scan( txSz )  
    } else if ( preferCol ) {  
        return get_mcol_scan( txSz )  
    }  
    return get_default_scan( txSz )  
}  

Intra angle info luma syntax

intra_angle_info_y( ) { Type
    AngleDeltaY = 0  
    if ( MiSize >= BLOCK_8X8 ) {  
        if ( is_directional_mode( YMode ) ) {  
            angle_delta_y S()
            AngleDeltaY = angle_delta_y - MAX_ANGLE_DELTA  
        }  
    }  
}  

Intra angle info chroma syntax

intra_angle_info_uv( ) { Type
    AngleDeltaUV = 0  
    if ( MiSize >= BLOCK_8X8 ) {  
        if ( is_directional_mode( UVMode ) ) {  
            angle_delta_uv S()
            AngleDeltaUV = angle_delta_uv - MAX_ANGLE_DELTA  
        }  
    }  
}  

Is directional mode function

is_directional_mode( mode ) { Type
    if ( ( mode >= V_PRED ) && ( mode <= D67_PRED ) ) {  
        return 1  
    }  
    return 0  
}  

Read CFL alphas syntax

read_cfl_alphas() { Type
    cfl_alpha_signs S()
    signU = (cfl_alpha_signs + 1 ) / 3  
    signV = (cfl_alpha_signs + 1 ) % 3  
    if ( signU != CFL_SIGN_ZERO ) {  
        cfl_alpha_u S()
        CflAlphaU = 1 + cfl_alpha_u  
        if ( signU == CFL_SIGN_NEG )  
            CflAlphaU = -CflAlphaU  
    } else {  
      CflAlphaU = 0  
    }  
    if ( signV != CFL_SIGN_ZERO ) {  
        cfl_alpha_v S()
        CflAlphaV = 1 + cfl_alpha_v  
        if ( signV == CFL_SIGN_NEG )  
            CflAlphaV = -CflAlphaV  
    } else {  
      CflAlphaV = 0  
    }  

Palette mode info syntax

palette_mode_info( ) { Type
    bsizeCtx = Mi_Width_Log2[ MiSize ] + Mi_Height_Log2[ MiSize ] - 2  
    if ( YMode == DC_PRED ) {  
        has_palette_y S()
        if ( has_palette_y ) {  
            palette_size_y_minus_2 S()
            PaletteSizeY = palette_size_y_minus_2 + 2  
            cacheN = get_palette_cache( 0 )  
            idx = 0  
            for ( i = 0; i < cacheN && idx < PaletteSizeY; i++ ) {  
                use_palette_color_cache_y L(1)
                if ( use_palette_color_cache_y ) {  
                    palette_colors_y[ idx ] = PaletteCache[ i ]  
                    idx++  
                }  
            }  
            if ( idx < PaletteSizeY ) {  
                palette_colors_y[ idx ] L(BitDepth)
                idx++  
            }  
            if ( idx < PaletteSizeY ) {  
                minBits = BitDepth - 3  
                palette_num_extra_bits_y L(2)
                paletteBits = minBits + palette_num_extra_bits_y  
            }  
            while ( idx < PaletteSizeY ) {  
                palette_delta_y L(paletteBits)
                palette_delta_y++  
                palette_colors_y[ idx ] =  
                          Clip1( palette_colors_y[ idx - 1 ] +  
                                 palette_delta_y )  
                range = ( 1 << BitDepth ) - palette_colors_y[ idx ] - 1  
                paletteBits = Min( paletteBits, CeilLog2( range ) )  
                idx++  
            }  
            sort( palette_colors_y, 0, PaletteSizeY - 1 )  
        }  
    }  
    if ( HasChroma && UVMode == DC_PRED ) {  
        has_palette_uv S()
        if ( has_palette_uv ) {  
            palette_size_uv_minus_2 S()
            PaletteSizeUV = palette_size_uv_minus_2 + 2  
            cacheN = get_palette_cache( 1 )  
            idx = 0  
            for ( i = 0; i < cacheN && idx < PaletteSizeUV; i++ ) {  
                use_palette_color_cache_u L(1)
                if ( use_palette_color_cache_u ) {  
                    palette_colors_u[ idx ] = PaletteCache[ i ]  
                    idx++  
                }  
            }  
            if ( idx < PaletteSizeUV ) {  
                palette_colors_u[ idx ] L(BitDepth)
                idx++  
            }  
            if ( idx < PaletteSizeUV ) {  
                minBits = BitDepth - 3  
                palette_num_extra_bits_u L(2)
                paletteBits = minBits + palette_num_extra_bits_u  
            }  
            while ( idx < PaletteSizeUV ) {  
                palette_delta_u L(paletteBits)
                palette_colors_u[ idx ] =  
                          Clip1( palette_colors_u[ idx - 1 ] +  
                                 palette_delta_u )  
                range = ( 1 << BitDepth ) - palette_colors_u[ idx ]  
                paletteBits = Min( paletteBits, CeilLog2( range ) )  
                idx++  
            }  
            sort( palette_colors_u, 0, PaletteSizeUV - 1 )  
   
            delta_encode_palette_colors_v L(1)
            if ( delta_encode_palette_colors_v ) {  
                minBits = BitDepth - 4  
                maxVal = 1 << BitDepth  
                palette_num_extra_bits_v L(2)
                paletteBits = minBits + palette_num_extra_bits_v  
                palette_colors_v[ 0 ] L(BitDepth)
                for ( idx = 1; idx < PaletteSizeUV; idx++ ) {  
                    palette_delta_v L(paletteBits)
                    if ( palette_delta_v ) {  
                        palette_delta_sign_bit_v L(1)
                        if ( palette_delta_sign_bit_v ) {  
                            palette_delta_v = -palette_delta_v  
                        }  
                    }  
                    val = palette_colors_v[ idx - 1 ] + palette_delta_v  
                    if ( val < 0 ) val += maxVal  
                    if ( val >= maxVal ) val -= maxVal  
                    palette_colors_v[ idx ] = Clip1( val )  
                }  
            } else {  
                for ( idx = 0; idx < PaletteSizeUV; idx++ ) {  
                    palette_colors_v[ idx ] L(BitDepth)
                }  
            }  
        }  
    }  
}  

The function sort( arr, i1, i2 ) sorts a subarray of the array arr in-place into ascending order. The subarray to be sorted is between indices i1 and i2 inclusive.

Note: The palette colors are generated in ascending order. The palette cache is also in ascending order. This means that the sort function can be replaced in implementations by a merge of two sorted lists.

where the function get_palette_cache, which merges the above and left palettes to form a cache, is specified as follows:

get_palette_cache( plane ) { Type
    aboveN = 0  
    if ( ( MiRow * MI_SIZE ) % 64 ) {  
        aboveN = PaletteSizes[ plane ][ MiRow - 1 ][ MiCol ]  
    }  
    leftN = 0  
    if ( AvailL ) {  
        leftN = PaletteSizes[ plane ][ MiRow ][ MiCol - 1 ]  
    }  
    aboveIdx = 0  
    leftIdx = 0  
    n = 0  
    while ( aboveIdx < aboveN && leftIdx < leftN ) {  
        aboveC = PaletteColors[ plane ][ MiRow - 1 ][ MiCol ][ aboveIdx ]  
        leftC = PaletteColors[ plane ][ MiRow ][ MiCol - 1 ][ leftIdx ]  
        if ( leftC < aboveC ) {  
            if ( n == 0 || leftC != PaletteCache[ n - 1 ] ) {  
                PaletteCache[ n ] = leftC  
                n++  
            }  
            leftIdx++  
        } else {  
            if ( n == 0 || aboveC != PaletteCache[ n - 1 ] ) {  
                PaletteCache[ n ] = aboveC  
                n++  
            }  
            aboveIdx++  
            if ( leftC == aboveC ) {  
                leftIdx++  
            }  
        }  
    }  
    while ( aboveIdx < aboveN ) {  
        val = PaletteColors[ plane ][ MiRow - 1 ][ MiCol ][ aboveIdx ]  
        aboveIdx++  
        if ( n == 0 || val != PaletteCache[ n - 1 ] ) {  
            PaletteCache[ n ] = val  
            n++  
        }  
    }  
    while ( leftIdx < leftN ) {  
        val = PaletteColors[ plane ][ MiRow ][ MiCol - 1 ][ leftIdx ]  
        leftIdx++  
        if ( n == 0 || val != PaletteCache[ n - 1 ] ) {  
            PaletteCache[ n ] = val  
            n++  
        }  
    }  
    return n  
}  

Note: get_palette_cache is equivalent to sorting the available palette colors from above and left together and removing any duplicates.

Transform type syntax

transform_type( x4, y4, txSz ) { Type
    set = get_tx_set( txSz )  
   
    if ( set > 0 &&  
         ( segmentation_enabled ? get_qindex( 1, segment_id ) : base_q_idx ) > 0 ) {  
        if ( is_inter ) {  
            inter_tx_type S()
            if ( set == TX_SET_INTER_1 )  
                TxType = Tx_Type_Inter_Inv_Set1[ inter_tx_type ]  
            else if ( set == TX_SET_INTER_2 )  
                TxType = Tx_Type_Inter_Inv_Set2[ inter_tx_type ]  
            else  
                TxType = Tx_Type_Inter_Inv_Set3[ inter_tx_type ]  
        } else {  
            intra_tx_type S()
            if ( set == TX_SET_INTRA_1 )  
                TxType = Tx_Type_Intra_Inv_Set1[ intra_tx_type ]  
            else  
                TxType = Tx_Type_Intra_Inv_Set2[ intra_tx_type ]  
        }  
    } else {  
        TxType = DCT_DCT  
    }  
    for ( i = 0; i < ( Tx_Width[ txSz ] >> 2 ); i++ ) {  
        for ( j = 0; j < ( Tx_Height[ txSz ] >> 2 ); j++ ) {  
            TxTypes[ y4 + j ][ x4 + i ] = TxType  
        }  
    }  
}  

where the inversion tables used in the function are specified as follows:

Tx_Type_Intra_Inv_Set1[ 7 ]  = { IDTX, DCT_DCT, V_DCT, H_DCT, ADST_ADST, ADST_DCT, DCT_ADST }
Tx_Type_Intra_Inv_Set2[ 5 ]  = { IDTX, DCT_DCT, ADST_ADST, ADST_DCT, DCT_ADST }
Tx_Type_Inter_Inv_Set1[ 16 ] = { IDTX, V_DCT, H_DCT, V_ADST, H_ADST, V_FLIPADST, H_FLIPADST,
                                DCT_DCT, ADST_DCT, DCT_ADST, FLIPADST_DCT, DCT_FLIPADST, ADST_ADST,
                                FLIPADST_FLIPADST, ADST_FLIPADST, FLIPADST_ADST }
Tx_Type_Inter_Inv_Set2[ 12 ] = { IDTX, V_DCT, H_DCT, DCT_DCT, ADST_DCT, DCT_ADST, FLIPADST_DCT,
                                 DCT_FLIPADST, ADST_ADST, FLIPADST_FLIPADST, ADST_FLIPADST,
                                 FLIPADST_ADST }
Tx_Type_Inter_Inv_Set3[ 2 ]  = { IDTX, DCT_DCT }

Get transform set function

get_tx_set( txSz ) { Type
    txSzSqr = Tx_Size_Sqr[ txSz ]  
    txSzSqrUp = Tx_Size_Sqr_Up[ txSz ]  
    if ( txSzSqrUp > TX_32X32 )  
        return TX_SET_DCTONLY  
    if ( is_inter ) {  
        if ( reduced_tx_set || txSzSqrUp == TX_32X32 ) return TX_SET_INTER_3  
        else if ( txSzSqr == TX_16X16 ) return TX_SET_INTER_2  
        return TX_SET_INTER_1  
    } else {  
        if ( txSzSqrUp == TX_32X32 ) return TX_SET_DCTONLY  
        else if ( reduced_tx_set ) return TX_SET_INTRA_2  
        else if ( txSzSqr == TX_16X16 ) return TX_SET_INTRA_2  
        return TX_SET_INTRA_1  
    }  
}  

Palette tokens syntax

palette_tokens( ) { Type
    blockHeight = Block_Height[ MiSize ]  
    blockWidth = Block_Width[ MiSize ]  
    onscreenHeight = Min( blockHeight, (MiRows - MiRow) * MI_SIZE )  
    onscreenWidth = Min( blockWidth, (MiCols - MiCol) * MI_SIZE )  
   
    if ( PaletteSizeY ) {  
        color_index_map_y NS(PaletteSizeY)
        ColorMapY[0][0] = color_index_map_y  
        for ( i = 1; i < onscreenHeight + onscreenWidth - 1; i++ ) {  
            for ( j = Min( i, onscreenWidth - 1 );  
                      j >= Max( 0, i - onscreenHeight + 1 ); j-- ) {  
                get_palette_color_context(  
                    ColorMapY, ( i - j ), j, PaletteSizeY )  
                palette_color_idx_y S()
                ColorMapY[ i - j ][ j ] = ColorOrder[ palette_color_idx_y ]  
            }  
        }  
        for ( i = 0; i < onscreenHeight; i++ ) {  
            for ( j = onscreenWidth; j < blockWidth; j++ ) {  
                ColorMapY[ i ][ j ] = ColorMapY[ i ][ onscreenWidth - 1 ]  
            }  
        }  
        for ( i = onscreenHeight; i < blockHeight; i++ ) {  
            for ( j = 0; j < blockWidth; j++ ) {  
                ColorMapY[ i ][ j ] = ColorMapY[ onscreenHeight - 1 ][ j ]  
            }  
        }  
    }  
   
    if ( PaletteSizeUV ) {  
        color_index_map_uv NS(PaletteSizeUV)
        ColorMapUV[0][0] = color_index_map_uv  
        blockHeight = blockHeight >> subsampling_y  
        blockWidth = blockWidth >> subsampling_x  
        onscreenHeight = onscreenHeight >> subsampling_y  
        onscreenWidth = onscreenWidth >> subsampling_x  
        if ( blockWidth < 4 ) {  
            blockWidth += 2  
            onscreenWidth += 2  
        }  
        if ( blockHeight < 4 ) {  
            blockHeight += 2  
            onscreenHeight += 2  
        }  
   
        for ( i = 1; i < onscreenHeight + onscreenWidth - 1; i++ ) {  
            for ( j = Min( i, onscreenWidth - 1 );  
                      j >= Max( 0, i - onscreenHeight + 1 ); j-- ) {  
                get_palette_color_context(  
                    ColorMapUV, ( i - j ), j, PaletteSizeUV )  
                palette_color_idx_uv S()
                ColorMapUV[ i - j ][ j ] = ColorOrder[ palette_color_idx_uv ]  
            }  
        }  
        for ( i = 0; i < onscreenHeight; i++ ) {  
            for ( j = onscreenWidth; j < blockWidth; j++ ) {  
                ColorMapUV[ i ][ j ] = ColorMapUV[ i ][ onscreenWidth - 1 ]  
            }  
        }  
        for ( i = onscreenHeight; i < blockHeight; i++ ) {  
            for ( j = 0; j < blockWidth; j++ ) {  
                ColorMapUV[ i ][ j ] = ColorMapUV[ onscreenHeight - 1 ][ j ]  
            }  
        }  
    }  
}  

Palette color context function

get_palette_color_context( colorMap, r, c, n ) { Type
    for ( i = 0; i < PALETTE_COLORS; i++ ) {  
        scores[ i ] = 0  
        ColorOrder[i] = i  
    }  
    if ( c > 0 ) {  
        neighbor = colorMap[ r ][ c - 1 ]  
        scores[ neighbor ] += 2  
    }  
    if ( ( r > 0 ) && ( c > 0 ) ) {  
        neighbor = colorMap[ r - 1 ][ c - 1 ]  
        scores[ neighbor ] += 1  
    }  
    if ( r > 0 ) {  
        neighbor = colorMap[ r - 1 ][ c ]  
        scores[ neighbor ] += 2  
    }  
    for ( i = 0; i < PALETTE_NUM_NEIGHBORS; i++ ) {  
        maxScore = scores[ i ]  
        maxIdx = i  
        for ( j = i + 1; j < n; j++ ) {  
            if ( scores[ j ] > maxScore ) {  
                maxScore = scores[ j ]  
                maxIdx = j  
            }  
        }  
        if ( maxIdx != i ) {  
            maxScore = scores[ maxIdx ]  
            maxColorOrder = ColorOrder[ maxIdx ]  
            for ( k = maxIdx; k > i; k-- ) {  
                scores[ k ] = scores[ k - 1 ]  
                ColorOrder[ k ] = ColorOrder[ k - 1 ]  
            }  
            scores[ i ] = maxScore  
            ColorOrder[ i ] = maxColorOrder  
        }  
    }  
    ColorContextHash = 0  
    for ( i = 0; i < PALETTE_NUM_NEIGHBORS; i++ ) {  
        ColorContextHash += scores[ i ] * Palette_Color_Hash_Multipliers[ i ]  
    }  
}  

Is inside function

is_inside determines whether a candidate position is inside the current tile.

is_inside( candidateR, candidateC ) { Type
    return ( candidateC >= MiColStart &&  
             candidateC < MiColEnd &&  
             candidateR >= MiRowStart &&  
             candidateR < MiRowEnd )  
}  

Is inside filter region function

is_inside_filter_region determines whether a candidate position is inside the region that is being used for CDEF filtering.

is_inside_filter_region( candidateR, candidateC ) { Type
    colStart = 0  
    colEnd = MiCols  
    rowStart = 0  
    rowEnd = MiRows  
    return (candidateC >= colStart &&  
            candidateC < colEnd &&  
            candidateR >= rowStart &&  
            candidateR < rowEnd)  
}  

Clamp MV row function

clamp_mv_row( mvec, border ) { Type
    bh4 = Num_4x4_Blocks_High[ MiSize ]  
    mbToTopEdge = -((MiRow * MI_SIZE) * 8)  
    mbToBottomEdge = ((MiRows - bh4 - MiRow) * MI_SIZE) * 8  
    return Clip3( mbToTopEdge - border, mbToBottomEdge + border, mvec )  
}  

Clamp MV col function

clamp_mv_col( mvec, border ) { Type
    bw4 = Num_4x4_Blocks_Wide[ MiSize ]  
    mbToLeftEdge = -((MiCol * MI_SIZE) * 8)  
    mbToRightEdge = ((MiCols - bw4 - MiCol) * MI_SIZE) * 8  
    return Clip3( mbToLeftEdge - border, mbToRightEdge + border, mvec )  
}  

Clear CDEF function

clear_cdef( r, c ) { Type
    cdef_idx[ r ][ c ] = -1  
    if ( use_128x128_superblock ) {  
        cdefSize4 = Num_4x4_Blocks_Wide[ BLOCK_64X64 ]  
        cdef_idx[ r ][ c + cdefSize4 ] = -1  
        cdef_idx[ r + cdefSize4][ c ] = -1  
        cdef_idx[ r + cdefSize4][ c + cdefSize4 ] = -1  
    }  
}  

Read CDEF syntax

read_cdef( ) { Type
    if ( skip || CodedLossless || !enable_cdef || allow_intrabc) {  
        return  
    }  
    cdefSize4 = Num_4x4_Blocks_Wide[ BLOCK_64X64 ]  
    cdefMask4 = ~(cdefSize4 - 1)  
    r = MiRow & cdefMask4  
    c = MiCol & cdefMask4  
    if ( cdef_idx[ r ][ c ] == -1 ) {  
        cdef_idx[ r ][ c ] L(cdef_bits)
        w4 = Num_4x4_Blocks_Wide[ MiSize ]  
        h4 = Num_4x4_Blocks_High[ MiSize ]  
        for ( i = r; i < r + h4 ; i += cdefSize4 ) {  
            for ( j = c; j < c + w4 ; j += cdefSize4 ) {  
                cdef_idx[ i ][ j ] = cdef_idx[ r ][ c ]  
            }  
        }  
    }  
}  

Read loop restoration syntax

read_lr( r, c, bSize ) { Type
    if ( allow_intrabc ) {  
        return  
    }  
    w = Num_4x4_Blocks_Wide[ bSize ]  
    h = Num_4x4_Blocks_High[ bSize ]  
    for ( plane = 0; plane < NumPlanes; plane++ ) {  
        if ( FrameRestorationType[ plane ] != RESTORE_NONE ) {  
            subX = (plane == 0) ? 0 : subsampling_x  
            subY = (plane == 0) ? 0 : subsampling_y  
            unitSize = LoopRestorationSize[ plane ]  
            unitRows = count_units_in_frame( unitSize, Round2( FrameHeight, subY) )  
            unitCols = count_units_in_frame( unitSize, Round2( UpscaledWidth, subX) )  
            unitRowStart = ( r * ( MI_SIZE >> subY) +  
                                      unitSize - 1 ) / unitSize  
            unitRowEnd = Min( unitRows, ( (r + h) * ( MI_SIZE >> subY) +  
                                      unitSize - 1 ) / unitSize)  
            if ( use_superres ) {  
                numerator = (MI_SIZE >> subX) * SuperresDenom  
                denominator = unitSize * SUPERRES_NUM  
            } else {  
                numerator = MI_SIZE >> subX  
                denominator = unitSize  
            }  
            unitColStart = ( c * numerator + denominator - 1 ) / denominator  
            unitColEnd = Min( unitCols, ( (c + w) * numerator +  
                              denominator - 1 ) / denominator)  
            for ( unitRow = unitRowStart; unitRow < unitRowEnd; unitRow++ ) {  
                for ( unitCol = unitColStart; unitCol < unitColEnd; unitCol++ ) {  
                    read_lr_unit(plane, unitRow, unitCol)  
                }  
            }  
        }  
    }  
}  

where count_units_in_frame is a function specified as:

count_units_in_frame(unitSize, tileSize) {
    return Max((tileSize + (unitSize >> 1)) / unitSize, 1)
}

Read loop restoration unit syntax

read_lr_unit(plane, unitRow, unitCol) { Type
    if ( FrameRestorationType[ plane ] == RESTORE_WIENER ) {  
        use_wiener S()
        restoration_type = use_wiener ? RESTORE_WIENER : RESTORE_NONE  
    } else if ( FrameRestorationType[ plane ] == RESTORE_SGRPROJ ) {  
        use_sgrproj S()
        restoration_type = use_sgrproj ? RESTORE_SGRPROJ : RESTORE_NONE  
    } else {  
        restoration_type S()
    }  
    LrType[ plane ][ unitRow ][ unitCol ] = restoration_type  
    if ( restoration_type == RESTORE_WIENER ) {  
        for ( pass = 0; pass < 2; pass++ ) {  
            if ( plane ) {  
                firstCoeff = 1  
                LrWiener[ plane ]  
                        [ unitRow ][ unitCol ][ pass ][0] = 0  
            } else {  
                firstCoeff = 0  
            }  
            for ( j = firstCoeff; j < 3; j++ ) {  
                min = Wiener_Taps_Min[ j ]  
                max = Wiener_Taps_Max[ j ]  
                k = Wiener_Taps_K[ j ]  
                v = decode_signed_subexp_with_ref_bool(  
                        min, max + 1, k, RefLrWiener[ plane ][ pass ][ j ] )  
                LrWiener[ plane ]  
                        [ unitRow ][ unitCol ][ pass ][ j ] = v  
                RefLrWiener[ plane ][ pass ][ j ] = v  
            }  
        }  
    } else if ( restoration_type == RESTORE_SGRPROJ ) {  
        lr_sgr_set L(SGRPROJ_PARAMS_BITS)
        LrSgrSet[ plane ][ unitRow ][ unitCol ] = lr_sgr_set  
        for ( i = 0; i < 2; i++ ) {  
            radius = Sgr_Params[ lr_sgr_set ][ i * 2 ]  
            min = Sgrproj_Xqd_Min[i]  
            max = Sgrproj_Xqd_Max[i]  
            if ( radius ) {  
              v = decode_signed_subexp_with_ref_bool(  
                     min, max + 1, SGRPROJ_PRJ_SUBEXP_K,  
                     RefSgrXqd[ plane ][ i ])  
            } else {  
              v = 0  
              if ( i == 1 ) {  
                v = Clip3( min, max, (1 << SGRPROJ_PRJ_BITS) -  
                           RefSgrXqd[ plane ][ 0 ] )  
              }  
            }  
            LrSgrXqd[ plane ][ unitRow ][ unitCol ][ i ] = v  
            RefSgrXqd[ plane ][ i ] = v  
        }  
    }  
}  

where Wiener_Taps_Min, Wiener_Taps_Max, ,Wiener_Taps_K, Sgrproj_Xqd_Min, and Sgrproj_Xqd_Max are constant lookup tables:

Wiener_Taps_Min[3] = { -5, -23, -17 }
Wiener_Taps_Max[3] = { 10,   8,  46 }
Wiener_Taps_K[3] = { 1, 2, 3 }

Sgrproj_Xqd_Min[2] = { -96, -32 }
Sgrproj_Xqd_Max[2] = {  31,  95 }

Sgr_Params is a constant lookup table defined in section 7.17.3 and decode_signed_subexp_with_ref_bool is a function specified as follows:

decode_signed_subexp_with_ref_bool( low, high, k, r ) { Type
    x = decode_unsigned_subexp_with_ref_bool(high - low, k, r - low)  
    return x + low  
}  
   
decode_unsigned_subexp_with_ref_bool( mx, k, r ) {  
    v = decode_subexp_bool( mx, k )  
    if ( (r << 1) <= mx ) {  
        return inverse_recenter(r, v)  
    } else {  
        return mx - 1 - inverse_recenter(mx - 1 - r, v)  
    }  
}  
   
decode_subexp_bool( numSyms, k ) {  
    i = 0  
    mk = 0  
    while ( 1 ) {  
        b2 = i ? k + i - 1 : k  
        a = 1 << b2  
        if ( numSyms <= mk + 3 * a ) {  
            subexp_unif_bools NS(numSyms - mk)
            return subexp_unif_bools + mk  
        } else {  
            subexp_more_bools L(1)
            if ( subexp_more_bools ) {  
               i++  
               mk += a  
            } else {  
               subexp_bools L(b2)
               return subexp_bools + mk  
            }  
        }  
    }  
}  

Note: The decode_signed_subexp_with_ref_bool function is the same as the decode_signed_subexp_with_ref function except that the bits used to represent the symbol are arithmetic coded instead of being read directly from the bitstream.

Tile list OBU syntax

General tile list OBU syntax

tile_list_obu( ) { Type
    output_frame_width_in_tiles_minus_1 f(8)
    output_frame_height_in_tiles_minus_1 f(8)
    tile_count_minus_1 f(16)
    for ( tile = 0; tile <= tile_count_minus_1; tile++ )  
        tile_list_entry( )  
}  

Tile list entry syntax

tile_list_entry( ) { Type
    anchor_frame_idx f(8)
    anchor_tile_row f(8)
    anchor_tile_col f(8)
    tile_data_size_minus_1 f(16)
    N = 8 * (tile_data_size_minus_1 + 1)  
    coded_tile_data f(N)
}  

Syntax structures semantics

General

This section specifies the meaning of the syntax elements read in the syntax structures.

Important variables and function calls are also described.

OBU semantics

General OBU semantics

An ordered series of OBUs is presented to the decoding process. Each OBU is given to the decoding process as a string of bytes along with a variable sz that identifies the total number of bytes in the OBU.

If the syntax element obu_has_size_field (in the OBU header) is equal to 1, then the variable sz will be unused and does not have to be provided.

obu_size contains the size in bytes of the OBU not including the bytes within obu_header or the obu_size syntax element.

Methods of framing the OBUs (i.e. of identifying the series of OBUs and their size and payload data) in a delivery or container format may be established in a manner outside the scope of this Specification. One simple method is described in Annex B.

OBU data starts on the first (most significant) bit and ends on the last bit of the given bytes. The payload of an OBU lies between the first bit of the given bytes and the last bit before the first trailing bit. Trailing bits are always present, unless the OBU consists of only the header. Trailing bits achieve byte alignment when the payload of an OBU is not byte aligned. The trailing bits may also used for additional byte padding, and if used are taken into account in the sz value. In all cases, the pattern used for the trailing bits guarantees that all OBUs (except header-only OBUs) end with the same pattern: one bit set to one, optionally followed by zeros.

Note: As a validity check for malformed encoded data and for operation in environments in which losses and errors can occur, decoders may detect an error if the end of the parsed data is not directly followed by the correct trailing bits pattern or if the parsing of the OBU header and payload leads to the consumption of bits within the trailing bits (except for tile group data which is allowed to read a small distance into the trailing bits as described in section 8.2.4).

drop_obu( ) is a function call that indicates when the decoding process should ignore an OBU because it is not contained in the selected operating point. When an OBU is not in the selected operating point the contents have no effect on the decoding process.

When this function is called, the bitstream position indicator should be advanced by obu_size * 8 bits.

OBU header semantics

OBUs are structured with a header and a payload. The header identifies the type of the payload using the obu_type header parameter.

obu_forbidden_bit must be set to 0.

Note: This ensures that MPEG2 transport is possible by preventing emulation of MPEG2 transport stream ids.

obu_type specifies the type of data structure contained in the OBU payload:

obu_type Name of obu_type
0 Reserved
1 OBU_SEQUENCE_HEADER
2 OBU_TEMPORAL_DELIMITER
3 OBU_FRAME_HEADER
4 OBU_TILE_GROUP
5 OBU_METADATA
6 OBU_FRAME
7 OBU_REDUNDANT_FRAME_HEADER
8 OBU_TILE_LIST
9-14 Reserved
15 OBU_PADDING

Reserved units are for future use and shall be ignored by AV1 decoder.

obu_extension_flag indicates if the optional obu_extension_header is present.

obu_has_size_field equal to 1 indicates that the obu_size syntax element will be present. obu_has_size_field equal to 0 indicates that the obu_size syntax element will not be present.

obu_reserved_1bit must be set to 0. The value is ignored by a decoder.

OBU extension header semantics

temporal_id specifies the temporal level of the data contained in the OBU.

spatial_id specifies the spatial level of the data contained in the OBU.

Note: The term “spatial” refers to the fact that the enhancement here occurs in the spatial dimension: either as an increase in spatial resolution, or an increase in spatial fidelity (increased SNR).

Tile group OBU data associated with spatial_id and temporal_id equal to 0 are referred to as the base layer, whereas tile group OBU data that are associated with spatial_id greater than 0 or temporal_id greater than 0 are referred to as enhancement layer(s).

Coded video data of a temporal level with temporal_id T and spatial level with spatial_id S are only allowed to reference previously coded video data of temporal_id T’ and spatial_id S’, where T’ <= T and S’ <= S.

extension_header_reserved_3bits must be set to 0. The value is ignored by a decoder.

Trailing bits semantics

Note: Tile group OBUs, tile List OBUs, and frame OBUs do end with trailing bits, but for these cases, the trailing bits are consumed by the exit_symbol process.

trailing_one_bit shall be equal to 1.

When the syntax element trailing_one_bit is read, it is a requirement that nbBits is greater than zero.

trailing_zero_bit shall be equal to 0 and is inserted into the bitstream to align the bit position to a multiple of 8 bits and add optional zero padding bytes to the OBU.

Byte alignment semantics

zero_bit shall be equal to 0 and is inserted into the bitstream to align the bit position to a multiple of 8 bits.

Reserved OBU semantics

The reserved OBU allows the extension of this specification with additional OBU types in a way that allows older decoders to ignore them.

Sequence header OBU semantics

General sequence header OBU semantics

seq_profile specifies the features that can be used in the coded video sequence.

seq_profile Bit depth Monochrome support Chroma subsampling
0 8 or 10 Yes YUV 4:2:0
1 8 or 10 No YUV 4:4:4
2 8 or 10 Yes YUV 4:2:2
2 12 Yes YUV 4:2:0, YUV 4:2:2, YUV 4:4:4

It is a requirement of bitstream conformance that seq_profile is not greater than 2 (values 3 to 7 are reserved).

Monochrome can only be signaled when seq_profile is equal to 0 or 2.

AV1 profiles are defined in Annex A.

still_picture equal to 1 specifies that the coded video sequence contains only one coded frame. still_picture equal to 0 specifies that the coded video sequence contains one or more coded frames.

reduced_still_picture_header specifies that the syntax elements not needed by a still picture are omitted.

If reduced_still_picture_header is equal to 1, it is a requirement of bitstream conformance that still_picture is equal to 1.

Note: It is allowed to have still_picture equal to 1 and reduced_still_picture_header equal to 0. This allows a video frame to be converted to a still picture by changing a single bit.

timing_info_present_flag specifies whether timing info is present in the coded video sequence.

decoder_model_info_present_flag specifies whether decoder model information is present in the coded video sequence.

initial_display_delay_present_flag specifies whether initial display delay information is present in the coded video sequence.

operating_points_cnt_minus_1 indicates the number of operating points minus 1 present in the coded video sequence.

An operating point specifies which spatial and temporal layers should be decoded.

operating_point_idc[ i ] contains a bitmask that indicates which spatial and temporal layers should be decoded for operating point i. Bit k is equal to 1 if temporal layer k should be decoded (for k between 0 and 7). Bit j+8 is equal to 1 if spatial layer j should be decoded (for j between 0 and 3).

However, if operating_point_idc[ i ] is equal to 0 then the coded video sequence has no scalability information in OBU extension headers and the operating point applies to the entire coded video sequence. This means that all OBUs must be decoded.

It is a requirement of bitstream conformance that operating_point_idc[ i ] is not equal to operating_point_idc[ j ] for j = 0..(i - 1).

Note: This constraint means it is not allowed for two operating points to have the same value of operating_point_idc.

If operating_point_idc[ op ] is not equal to 0 for any value of op from 0 to operating_points_cnt_minus_1, it is a requirement of bitstream conformance that obu_extension_flag is equal to 1.

seq_level_idx[ i ] specifies the level that the coded video sequence conforms to when operating point i is selected.

Note: Encoders should select the lowest level that is satisfied by the operating point to maximize the number of decoders that can decode the stream, but this is not a requirement of bitstream conformance.

seq_tier[ i ] specifies the tier that the coded video sequence conforms to when operating point i is selected.

decoder_model_present_for_this_op[ i ] equal to one indicates that there is a decoder model associated with operating point i. decoder_model_present_for_this_op[ i ] equal to zero indicates that there is not a decoder model associated with operating point i.

initial_display_delay_present_for_this_op[ i ] equal to 1 indicates that initial_display_delay_minus_1 is specified for operating point i. initial_display_delay_present_for_this_op[ i ] equal to 0 indicates that initial_display_delay_minus_1 is not specified for operating point i.

initial_display_delay_minus_1[ i ] plus 1 specifies, for operating point i, the number of decoded frames that should be present in the buffer pool before the first presentable frame is displayed. This will ensure that all presentable frames in the sequence can be decoded at or before the time that they are scheduled for display. If not signaled then initial_display_delay_minus_1[ i ] = BUFFER_POOL_MAX_SIZE - 1.

choose_operating_point( ) is a function call that indicates that the operating point should be selected.

The implementation of this function depends on the capabilities of the chosen implementation. The order of operating points indicates the preferred order for producing an output: a decoder should select the earliest operating point in the list that meets its decoding capabilities as expressed by the level associated with each operating point.

A decoder must return a value from choose_operating_point between 0 and operating_points_cnt_minus_1, or abandon the decoding process if no level within the decoder’s capabilities can be found.

Note: To help with conformance testing, decoders may allow the operating point to be explicitly signaled by external means.

OperatingPointIdc specifies the value of operating_point_idc for the selected operating point.

It is a requirement of bitstream conformance that if OperatingPointIdc is equal to 0, then obu_extension_flag is equal to 0 for all OBUs that follow this sequence header until the next sequence header.

frame_width_bits_minus_1 specifies the number of bits minus 1 used for transmitting the frame width syntax elements.

frame_height_bits_minus_1 specifies the number of bits minus 1 used for transmitting the frame height syntax elements.

max_frame_width_minus_1 specifies the maximum frame width minus 1 for the frames represented by this sequence header.

max_frame_height_minus_1 specifies the maximum frame height minus 1 for the frames represented by this sequence header.

frame_id_numbers_present_flag specifies whether frame id numbers are present in the coded video sequence.

Note: The frame id numbers (represented in display_frame_id, current_frame_id, and RefFrameId[ i ]) are not needed by the decoding process, but allow decoders to spot when frames have been missed and take an appropriate action.

additional_frame_id_length_minus_1 is used to calculate the number of bits used to encode the frame_id syntax element.

delta_frame_id_length_minus_2 specifies the number of bits minus 2 used to encode delta_frame_id syntax elements.

use_128x128_superblock, when equal to 1, indicates that superblocks contain 128x128 luma samples. When equal to 0, it indicates that superblocks contain 64x64 luma samples. (The number of contained chroma samples depends on subsampling_x and subsampling_y.)

enable_filter_intra equal to 1 specifies that the use_filter_intra syntax element may be present. enable_filter_intra equal to 0 specifies that the use_filter_intra syntax element will not be present.

enable_intra_edge_filter specifies whether the intra edge filtering process should be enabled.

enable_interintra_compound equal to 1 specifies that the mode info for inter blocks may contain the syntax element interintra. enable_interintra_compound equal to 0 specifies that the syntax element interintra will not be present.

enable_masked_compound equal to 1 specifies that the mode info for inter blocks may contain the syntax element compound_type. enable_masked_compound equal to 0 specifies that the syntax element compound_type will not be present.

enable_warped_motion equal to 1 indicates that the allow_warped_motion syntax element may be present. enable_warped_motion equal to 0 indicates that the allow_warped_motion syntax element will not be present.

enable_order_hint equal to 1 indicates that tools based on the values of order hints may be used. enable_order_hint equal to 0 indicates that tools based on order hints are disabled.

enable_dual_filter equal to 1 indicates that the inter prediction filter type may be specified independently in the horizontal and vertical directions. If the flag is equal to 0, only one filter type may be specified, which is then used in both directions.

enable_jnt_comp equal to 1 indicates that the distance weights process may be used for inter prediction.

enable_ref_frame_mvs equal to 1 indicates that the use_ref_frame_mvs syntax element may be present. enable_ref_frame_mvs equal to 0 indicates that the use_ref_frame_mvs syntax element will not be present.

seq_choose_screen_content_tools equal to 0 indicates that the seq_force_screen_content_tools syntax element will be present. seq_choose_screen_content_tools equal to 1 indicates that seq_force_screen_content_tools should be set equal to SELECT_SCREEN_CONTENT_TOOLS.

seq_force_screen_content_tools equal to SELECT_SCREEN_CONTENT_TOOLS indicates that the allow_screen_content_tools syntax element will be present in the frame header. Otherwise, seq_force_screen_content_tools contains the value for allow_screen_content_tools.

seq_choose_integer_mv equal to 0 indicates that the seq_force_integer_mv syntax element will be present. seq_choose_integer_mv equal to 1 indicates that seq_force_integer_mv should be set equal to SELECT_INTEGER_MV.

seq_force_integer_mv equal to SELECT_INTEGER_MV indicates that the force_integer_mv syntax element will be present in the frame header (providing allow_screen_content_tools is equal to 1). Otherwise, seq_force_integer_mv contains the value for force_integer_mv.

order_hint_bits_minus_1 is used to compute OrderHintBits.

OrderHintBits specifies the number of bits used for the order_hint syntax element.

enable_superres equal to 1 specifies that the use_superres syntax element will be present in the uncompressed header. enable_superres equal to 0 specifies that the use_superres syntax element will not be present (instead use_superres will be set to 0 in the uncompressed header without being read).

Note: It is allowed to set enable_superres equal to 1 even when use_superres is not equal to 1 for any frame in the coded video sequence.

enable_cdef equal to 1 specifies that cdef filtering may be enabled. enable_cdef equal to 0 specifies that cdef filtering is disabled.

Note: It is allowed to set enable_cdef equal to 1 even when cdef filtering is not used on any frame in the coded video sequence.

enable_restoration equal to 1 specifies that loop restoration filtering may be enabled. enable_restoration equal to 0 specifies that loop restoration filtering is disabled.

Note: It is allowed to set enable_restoration equal to 1 even when loop restoration is not used on any frame in the coded video sequence.

film_grain_params_present specifies whether film grain parameters are present in the coded video sequence.

Color config semantics

high_bitdepth and twelve_bit are syntax elements which, together with seq_profile, determine the bit depth.

mono_chrome equal to 1 indicates that the video does not contain U and V color planes. mono_chrome equal to 0 indicates that the video contains Y, U, and V color planes.

color_description_present_flag equal to 1 specifies that color_primaries, transfer_characteristics, and matrix_coefficients are present. color_description_present_flag equal to 0 specifies that color_primaries, transfer_characteristics and matrix_coefficients are not present.

color_primaries is an integer that is defined by the “Color primaries” section of ISO/IEC 23091-4/ITU-T H.273.

color_primaries Name of color primaries Description
1 CP_BT_709 BT.709
2 CP_UNSPECIFIED Unspecified
4 CP_BT_470_M BT.470 System M (historical)
5 CP_BT_470_B_G BT.470 System B, G (historical)
6 CP_BT_601 BT.601
7 CP_SMPTE_240 SMPTE 240
8 CP_GENERIC_FILM Generic film (color filters using illuminant C)
9 CP_BT_2020 BT.2020, BT.2100
10 CP_XYZ SMPTE 428 (CIE 1921 XYZ)
11 CP_SMPTE_431 SMPTE RP 431-2
12 CP_SMPTE_432 SMPTE EG 432-1
22 CP_EBU_3213 EBU Tech. 3213-E

transfer_characteristics is an integer that is defined by the “Transfer characteristics” section of ISO/IEC 23091-4/ITU-T H.273.

transfer_characteristics Name of transfer characteristics Description
0 TC_RESERVED_0 For future use
1 TC_BT_709 BT.709
2 TC_UNSPECIFIED Unspecified
3 TC_RESERVED_3 For future use
4 TC_BT_470_M BT.470 System M (historical)
5 TC_BT_470_B_G BT.470 System B, G (historical)
6 TC_BT_601 BT.601
7 TC_SMPTE_240 SMPTE 240 M
8 TC_LINEAR Linear
9 TC_LOG_100 Logarithmic (100 : 1 range)
10 TC_LOG_100_SQRT10 Logarithmic (100 * Sqrt(10) : 1 range)
11 TC_IEC_61966 IEC 61966-2-4
12 TC_BT_1361 BT.1361
13 TC_SRGB sRGB or sYCC
14 TC_BT_2020_10_BIT BT.2020 10-bit systems
15 TC_BT_2020_12_BIT BT.2020 12-bit systems
16 TC_SMPTE_2084 SMPTE ST 2084, ITU BT.2100 PQ
17 TC_SMPTE_428 SMPTE ST 428
18 TC_HLG BT.2100 HLG, ARIB STD-B67

matrix_coefficients is an integer that is defined by the “Matrix coefficients” section of ISO/IEC 23091-4/ITU-T H.273.

matrix_coefficients Name of matrix coefficients Description
0 MC_IDENTITY Identity matrix
1 MC_BT_709 BT.709
2 MC_UNSPECIFIED Unspecified
3 MC_RESERVED_3 For future use
4 MC_FCC US FCC 73.628
5 MC_BT_470_B_G BT.470 System B, G (historical)
6 MC_BT_601 BT.601
7 MC_SMPTE_240 SMPTE 240 M
8 MC_SMPTE_YCGCO YCgCo
9 MC_BT_2020_NCL BT.2020 non-constant luminance, BT.2100 YCbCr
10 MC_BT_2020_CL BT.2020 constant luminance
11 MC_SMPTE_2085 SMPTE ST 2085 YDzDx
12 MC_CHROMAT_NCL Chromaticity-derived non-constant luminance
13 MC_CHROMAT_CL Chromaticity-derived constant luminance
14 MC_ICTCP BT.2100 ICtCp

color_range is a binary value that is associated with the VideoFullRangeFlag variable specified in ISO/IEC 23091-4/ITU-T H.273. color range equal to 0 shall be referred to as the studio swing representation and color range equal to 1 shall be referred to as the full swing representation for all intents relating to this specification.

Note: Note that this specification does not enforce the range when signaled as Studio swing. Therefore the application should perform additional clamping and color conversion operations according to the specified range.

subsampling_x, subsampling_y specify the chroma subsampling format:

subsampling_x subsampling_y mono_chrome Description
0 0 0 YUV 4:4:4
1 0 0 YUV 4:2:2
1 1 0 YUV 4:2:0
1 1 1 Monochrome 4:0:0

If matrix_coefficients is equal to MC_IDENTITY, it is a requirement of bitstream conformance that subsampling_x is equal to 0 and subsampling_y is equal to 0.

chroma_sample_position specifies the sample position for subsampled streams:

chroma_sample_position Name of chroma sample position Description
0 CSP_UNKNOWN Unknown (in this case the source video transfer function must be signaled outside the AV1 bitstream)
1 CSP_VERTICAL Horizontally co-located with (0, 0) luma sample, vertical position in the middle between two luma samples
2 CSP_COLOCATED co-located with (0, 0) luma sample
3 CSP_RESERVED  

separate_uv_delta_q equal to 1 indicates that the U and V planes may have separate delta quantizer values. separate_uv_delta_q equal to 0 indicates that the U and V planes will share the same delta quantizer value.

Timing info semantics

num_units_in_display_tick is the number of time units of a clock operating at the frequency time_scale Hz that corresponds to one increment of a clock tick counter. A display clock tick, in seconds, is equal to num_units_in_display_tick divided by time_scale:

DispCT = num_units_in_display_tick ÷ time_scale

Note: The ÷ operator represents standard mathematical division (in contrast to the / operator which repesents integer division).

It is a requirement of bitstream conformance that num_units_in_display_tick is greater than 0.

time_scale is the number of time units that pass in one second.

It is a requirement of bitstream conformance that time_scale is greater than 0.

equal_picture_interval equal to 1 indicates that pictures should be displayed according to their output order with the number of ticks between two consecutive pictures (without dropping frames) specified by num_ticks_per_picture_minus_1 + 1. equal_picture_interval equal to 0 indicates that the interval between two consecutive pictures is not specified.

num_ticks_per_picture_minus_1 plus 1 specifies the number of clock ticks corresponding to output time between two consecutive pictures in the output order.

It is a requirement of bitstream conformance that the value of num_ticks_per_picture_minus_1 shall be in the range of 0 to (1 << 32) − 2, inclusive.

Note: The frame rate, when specified explicitly, applies to the top temporal layer of the bitstream. If bitstream is expected to be manipulated, e.g. by intermediate network elements, then the resulting frame rate may not match the specified one. In this case, an encoder is advised to use explicit time codes or some mechanisms that convey picture timing information outside the bitstream.

Decoder model info semantics

buffer_delay_length_minus_1 plus 1 specifies the length of the decoder_buffer_delay and the encoder_buffer_delay syntax elements, in bits.

num_units_in_decoding_tick is the number of time units of a decoding clock operating at the frequency time_scale Hz that corresponds to one increment of a clock tick counter:

DecCT = num_units_in_decoding_tick ÷ time_scale

Note: The ÷ operator represents standard mathematical division (in contrast to the / operator which repesents integer division).

num_units_in_decoding_tick shall be greater than 0. DecCT represents the expected time to decode a single frame or a common divisor of the expected times to decode frames of different sizes and dimensions present in the coded video sequence.

buffer_removal_time_length_minus_1 plus 1 specifies the length of the buffer_removal_time syntax element, in bits.

frame_presentation_time_length_minus_1 plus 1 specifies the length of the frame_presentation_time syntax element, in bits.

Operating parameters info semantics

decoder_buffer_delay[ op ] specifies the time interval between the arrival of the first bit in the smoothing buffer and the subsequent removal of the data that belongs to the first coded frame for operating point op, measured in units of 1/90000 seconds. The length of decoder_buffer_delay is specified by buffer_delay_length_minus_1 + 1, in bits.

encoder_buffer_delay[ op ] specifies, in combination with decoder_buffer_delay[ op ] syntax element, the first bit arrival time of frames to be decoded to the smoothing buffer. encoder_buffer_delay is measured in units of 1/90000 seconds.

For a video sequence that includes one or more random access points the sum of decoder_buffer_delay and encoder_buffer_delay shall be kept constant.

low_delay_mode_flag[ op ] equal to 1 indicates that the smoothing buffer operates in low-delay mode for operating point op. In low-delay mode late decode times and buffer underflow are both permitted. low_delay_mode_flag[ op ] equal to 0 indicates that the smoothing buffer operates in strict mode, where buffer underflow is not allowed.

Temporal delimiter OBU semantics

SeenFrameHeader is a variable used to mark whether the frame header for the current frame has been received. It is initialized to zero.

Padding OBU semantics

Multiple padding units can be present, each padding with an arbitrary number of bytes.

obu_padding_byte is a padding byte. Padding bytes may have arbitrary values and have no effect on the decoding process.

Metadata OBU semantics

General metadata OBU semantics

metadata_type indicates the type of metadata:

metadata_type Name of metadata_type
0 Reserved for AOM use
1 METADATA_TYPE_HDR_CLL
2 METADATA_TYPE_HDR_MDCV
3 METADATA_TYPE_SCALABILITY
4 METADATA_TYPE_ITUT_T35
5 METADATA_TYPE_TIMECODE
6-31 Unregistered user private
32 and greater Reserved for AOM use

Metadata ITUT T35 semantics

itu_t_t35_country_code shall be a byte having a value specified as a country code by Annex A of Recommendation ITU-T T.35.

itu_t_t35_country_code_extension_byte shall be a byte having a value specified as a country code by Annex B of Recommendation ITU-T T.35.

itu_t_t35_payload_bytes shall be bytes containing data registered as specified in Recommendation ITU-T T.35.

The ITU-T T.35 terminal provider code and terminal provider oriented code shall be contained in the first one or more bytes of the itu_t_t35_payload_bytes, in the format specified by the Administration that issued the terminal provider code. Any remaining bytes in itu_t_t35_payload_bytes data shall be data having syntax and semantics as specified by the entity identified by the ITU-T T.35 country code and terminal provider code.

Metadata high dynamic range content light level semantics

max_cll specifies the maximum content light level as specified in CEA-861.3, Appendix A.

max_fall specifies the maximum frame-average light level as specified in CEA-861.3, Appendix A.

Metadata high dynamic range mastering display color volume semantics

primary_chromaticity_x[ i ] specifies a 0.16 fixed-point X chromaticity coordinate as defined by CIE 1931, where i = 0,1,2 specifies Red, Green, Blue respectively.

primary_chromaticity_y[ i ] specifies a 0.16 fixed-point Y chromaticity coordinate as defined by CIE 1931, where i = 0,1,2 specifies Red, Green, Blue respectively.

white_point_chromaticity_x specifies a 0.16 fixed-point white X chromaticity coordinate as defined by CIE 1931.

white_point_chromaticity_y specifies a 0.16 fixed-point white Y chromaticity coordinate as defined by CIE 1931.

luminance_max is a 24.8 fixed-point maximum luminance, represented in candelas per square meter.

luminance_min is a 18.14 fixed-point minimum luminance, represented in candelas per square meter.

Metadata scalability semantics

scalability_mode_idc indicates the picture prediction structure of the coded video sequence.

scalability_mode_idc Name of scalability_mode_idc
0 SCALABILITY_L1T2
1 SCALABILITY_L1T3
2 SCALABILITY_L2T1
3 SCALABILITY_L2T2
4 SCALABILITY_L2T3
5 SCALABILITY_S2T1
6 SCALABILITY_S2T2
7 SCALABILITY_S2T3
8 SCALABILITY_L2T1h
9 SCALABILITY_L2T2h
10 SCALABILITY_L2T3h
11 SCALABILITY_S2T1h
12 SCALABILITY_S2T2h
13 SCALABILITY_S2T3h
14 SCALABILITY_SS
15 SCALABILITY_L3T2_KEY
16 SCALABILITY_L3T3_KEY
17 SCALABILITY_L4T5_KEY
18 SCALABILITY_L4T7_KEY
19 SCALABILITY_L3T2_KEY_SHIFT
20 SCALABILITY_L3T3_KEY_SHIFT
21 SCALABILITY_L4T5_KEY_SHIFT
22 SCALABILITY_L4T7_KEY_SHIFT
23-255 reserved

The scalability metadata provides two mechanisms for describing the underlying picture prediction structure of the bitstream:

  1. Selection among a set of preconfigured structures, or modes, covering a number of cases that have found wide use in applications.
  2. A facility for specifying picture prediction structures to accommodate a variety of special cases.

The preconfigured modes are described below. The mechanism for describing alternative structures is described in scalability_structure() below.

All predefined modes follow a dyadic, hierarchical picture prediction structure. They support up to three temporal layers, in combinations with one or two spatial layers. The second spatial layer may have twice or one and a half times the resolution of the base layer in each dimension, depending on the mode. There is also support for a spatial layer that uses no inter-layer prediction (i.e., the second spatial layer does not use its corresponding base layer as a reference) and a spatial layer that uses inter-layer prediction only at key frames. The following table lists the predefined scalability structures.

Name of scalability_mode_idc Spatial Layers Resolution Ratio Temporal Layers Inter-layer dependency
SCALABILITY_L1T2 1   2  
SCALABILITY_L1T3 1   3  
SCALABILITY_L2T1 2 2:1 1 Yes
SCALABILITY_L2T2 2 2:1 2 Yes
SCALABILITY_L2T3 2 2:1 3 Yes
SCALABILITY_S2T1 2 2:1 1 No
SCALABILITY_S2T2 2 2:1 2 No
SCALABILITY_S2T3 2 2:1 3 No
SCALABILITY_L2T1h 2 1.5:1 1 Yes
SCALABILITY_L2T2h 2 1.5:1 2 Yes
SCALABILITY_L2T3h 2 1.5:1 3 Yes
SCALABILITY_S2T1h 2 1.5:1 1 No
SCALABILITY_S2T2h 2 1.5:1 2 No
SCALABILITY_S2T3h 2 1.5:1 3 No
SCALABILITY_L3T2_KEY 3 2:1 2 Yes
SCALABILITY_L3T3_KEY 3 2:1 3 Yes
SCALABILITY_L4T5_KEY 4 2:1 5 Yes
SCALABILITY_L4T7_KEY 4 2:1 7 Yes
SCALABILITY_L3T2_KEY_SHIFT 3 2:1 2 Yes
SCALABILITY_L3T3_KEY_SHIFT 3 2:1 3 Yes
SCALABILITY_L4T5_KEY_SHIFT 4 2:1 5 Yes
SCALABILITY_L4T7_KEY_SHIFT 4 2:1 7 Yes

The following figures show the picture prediction structures for certain modes:

L1T2
L1T3
L2T1
L2T2
L2T3
S2T1
S2T2
S2T3
L3T2_KEY
L3T3_KEY
L4T5_KEY
L4T7_KEY
L3T2_KEY_SHIFT
L3T3_KEY_SHIFT
L4T5_KEY_SHIFT
L4T7_KEY_SHIFT

Scalability structure semantics

General

Note: The scalability_structure is intended for use by intermediate processing entities that may perform selective layer elimination. Its presence allows these entities to know the structure of the video bitstream without have to decode individual frames. Scalability structures should be placed immediately after the sequence header so that these entities are informed of the scalability structure of the video sequence as early as possible.

spatial_layers_cnt_minus_1 indicates the number of spatial layers present in the video sequence minus one.

spatial_layer_description_present_flag indicates when set to 1 that the spatial_layer_ref_id is present for each of the (spatial_layers_cnt_minus_1 + 1) layers, or that it is not present when set to 0.

spatial_layer_dimensions_present_flag indicates when set to 1 that the spatial_layer_max_width and spatial_layer_max_height parameters are present for each of the (spatial_layers_cnt_minus_1 + 1) layers, or that it they are not present when set to 0.

temporal_group_description_present_flag indicates when set to 1 that the temporal dependency information is present, or that it is not when set to 0. When any temporal unit in a coded video sequence contains OBU extension headers that have temporal_id values that are not equal to each other, temporal_group_description_present_flag must be equal to 0.

scalability_structure_reserved_3bits must be set to zero and be ignored by decoders.

spatial_layer_max_width[ i ] specifies the maximum frame width for the frames with spatial_id equal to i. This number must not be larger than max_frame_width_minus_1 + 1.

spatial_layer_max_height[ i ] specifies the maximum frame height for the frames with spatial_id equal to i. This number must not be larger than max_frame_height_minus_1 + 1.

spatial_layer_ref_id[ i ] specifies the spatial_id value of the frame within the current temporal unit that the frame of layer i uses for reference. If no frame within the current temporal unit is used for reference the value must be equal to 255.

temporal_group_size indicates the number of pictures in a temporal picture group. If the temporal_group_size is greater than 0, then the scalability structure data allows the inter-picture temporal dependency structure of the video sequence to be specified. If the temporal_group_size is greater than 0, then for temporal_group_size pictures in the temporal group, each picture’s temporal layer id (temporal_id), switch up points (temporal_group_temporal_switching_up_point_flag and temporal_group_spatial_switching_up_point_flag), and the reference picture indices (temporal_group_ref_pic_diff) are specified.

The first picture specified in a temporal group must have temporal_id equal to 0.

If the parameter temporal_group_size is not present or set to 0, then either there is only one temporal layer or there is no fixed inter-picture temporal dependency present going forward in the video sequence.

Note that for a given picture, all frames follow the same inter-picture temporal dependency structure.
However, the frame rate of each layer can be different from each other. The specified dependency structure in the scalability structure data must be for the highest frame rate layer.

temporal_group_temporal_id[ i ] specifies the temporal_id value for the i-th picture in the temporal group.

temporal_group_temporal_switching_up_point_flag[ i ] is set to 1 if subsequent (in decoding order) pictures with a temporal_id higher than temporal_group_temporal_id[ i ] do not depend on any picture preceding the current picture (in coding order) with temporal_id higher than temporal_group_temporal_id[ i ].

Note: This condition ensures that switching up to a higher frame rate is possible at the current picture.

temporal_group_spatial_switching_up_point_flag[ i ]] is set to 1 if spatial layers of the current picture in the temporal group (i.e., pictures with a spatial_id higher than zero) do not depend on any picture preceding the current picture in the temporal group.

temporal_group_ref_cnt[ i ] indicates the number of reference pictures used by the i-th picture in the temporal group.

temporal_group_ref_pic_diff[ i ][ j ] indicates, for the i-th picture in the temporal group, the temporal distance between the i-th picture and the j-th reference picture used by the i-th picture. The temporal distance is measured in frames, counting only frames of identical spatial_id values.

Note: The scalability structure description does not allow different temporal prediction structures across non-temporal layers (i.e., layers with different spatial_id values). It also only allows for a single reference picture for inter-layer prediction.

The following sections contain the value of these syntax elements for the predefined modes.

L1T2 (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 0
     
Picture Temporal Group Description Value
  temporal_group_size 2
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 2
1 temporal_group_temporal_id[1] 1
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[0][0] 1
L1T3 (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 0
     
Picture Temporal Group Description Value
  temporal_group_size 4
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 4
1 temporal_group_temporal_id[1] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[1][0] 1
2 temporal_group_temporal_id[2] 1
  temporal_group_temporal_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[2] 1
  temporal_group_ref_pic_diff[2][0] 2
3 temporal_group_temporal_id[3] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[3] 1
  temporal_group_ref_pic_diff[3][0] 1
L2T1 / L2T1h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 0
     
Picture Temporal Group Description Value
  temporal_group_size 1
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 1
L2T2 / L2T2h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 0
     
Picture Temporal Group Description Value
  temporal_group_size 2
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 2
1 temporal_group_temporal_id[1] 1
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[1][0] 1
L2T3 / L2T3h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 0
     
Picture Temporal Group Description Value
  temporal_group_size 4
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 4
1 temporal_group_temporal_id[1] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[1][0] 1
2 temporal_group_temporal_id[2] 1
  temporal_group_temporal_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[2] 1
  temporal_group_ref_pic_diff[2][0] 2
3 temporal_group_temporal_id[3] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[3] 1
  temporal_group_ref_pic_diff[3][0] 1
S2T1 / S2T1h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 255
     
Picture Temporal Group Description Value
  temporal_group_size 1
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 1
S2T2 / S2T2h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 255
     
Picture Temporal Group Description Value
  temporal_group_size 2
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 2
1 temporal_group_temporal_id[1] 1
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[1][0] 1
S2T3 / S2T3h (Informative)
Layer Spatial Layers Description Value
  spatial_layers_cnt_minus_1 1
0 spatial_layer_ref_id[0] 255
1 spatial_layer_ref_id[1] 255
     
Picture Temporal Group Description Value
  temporal_group_size 4
0 temporal_group_temporal_id[0] 0
  temporal_group_temporal_switching_up_point_flag[0] 1
  temporal_group_spatial_switching_up_point_flag[0] 0
  temporal_group_ref_cnt[0] 1
  temporal_group_ref_pic_diff[0][0] 4
1 temporal_group_temporal_id[1] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[1] 1
  temporal_group_ref_pic_diff[1][0] 1
2 temporal_group_temporal_id[2] 1
  temporal_group_temporal_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[2] 1
  temporal_group_ref_pic_diff[2][0] 2
3 temporal_group_temporal_id[3] 2
  temporal_group_temporal_switching_up_point_flag[1] 1
  temporal_group_spatial_switching_up_point_flag[1] 0
  temporal_group_ref_cnt[3] 1
  temporal_group_ref_pic_diff[3][0] 1

Metadata timecode semantics

counting_type specifies the method of dropping values of the n_frames syntax element as specified in the table below. counting_type should be the same for all pictures in the coded video sequence.

counting_type Meaning
0 no dropping of n_frames count values and no use of time_offset_value
1 no dropping of n_frames count values
2 dropping of individual zero values of n_frames count
3 dropping of individual values of n_frames count equal to maxFps − 1
4 dropping of the two lowest (value 0 and 1) n_frames counts when seconds_value is equal to 0 and minutes_value is not an integer multiple of 10
5 dropping of unspecified individual n_frames count values
6 dropping of unspecified numbers of unspecified n_frames count values
7..31 reserved

full_timestamp_flag equal to 1 indicates that the the seconds_value, minutes_value, hours_value syntax elements will be present. full_timestamp_flag equal to 0 indicates that there are flags to control the presence of these syntax elements.

When timing_info_present_flag is equal to 1, the contents of the clock timestamp indicate a time of origin, capture, or ideal display. This indicated time is computed as follows:

if ( equal_picture_interval ) {
  ticksPerPicture = num_ticks_per_picture_minus_1 + 1
} else {
  ticksPerPicture = 1
}
ss = ( ( hours_value * 60 + minutes_value) * 60 + seconds_value )
clockTimestamp = ss * time_scale + n_frames * ticksPerPicture + time_offset_value 

clockTimestamp is in units of clock ticks of a clock with clock frequency equal to time_scale Hz, relative to some unspecified point in time for which clockTimestamp would be equal to 0.

discontinuity_flag equal to 0 indicates that the difference between the current value of clockTimestamp and the value of clockTimestamp computed from the previous set of timestamp syntax elements in output order can be interpreted as the time difference between the times of origin or capture of the associated frames or fields. discontinuity_flag equal to 1 indicates that the difference between the current value of clockTimestamp and the value of clockTimestamp computed from the previous set of clock timestamp syntax elements in output order should not be interpreted as the time difference between the times of origin or capture of the associated frames or fields.

When timing_info_present_flag is equal to 1 and discontinuity_flag is equal to 0, the value of clockTimestamp shall be greater than or equal to the value of clockTimestamp for the previous set of clock timestamp syntax elements in output order.

cnt_dropped_flag specifies the skipping of one or more values of n_frames using the counting method specified by counting_type.

n_frames is used to compute clockTimestamp. When timing_info_present_flag is equal to 1, n_frames shall be less than maxFps, where maxFps is specified by maxFps = ceil( time_scale / ( 2 * num_units_in_display_tick ) ).

seconds_flag equal to 1 specifies that seconds_value and minutes_flag are present when full_timestamp_flag is equal to 0. seconds_flag equal to 0 specifies that seconds_value and minutes_flag are not present.

seconds_value is used to compute clockTimestamp and shall be in the range of 0 to 59. When seconds_value is not present, its value is inferred to be equal to the value of seconds_value for the previous set of clock timestamp syntax elements in decoding order, and it is required that such a previous seconds_value shall have been present.

minutes_flag equal to 1 specifies that minutes_value and hours_flag are present when full_timestamp_flag is equal to 0 and seconds_flag is equal to 1. minutes_flag equal to 0 specifies that minutes_value and hours_flag are not present.

minutes_value specifies the value of mm used to compute clockTimestamp and shall be in the range of 0 to 59, inclusive. When minutes_value is not present, its value is inferred to be equal to the value of minutes_value for the previous set of clock timestamp syntax elements in decoding order, and it is required that such a previous minutes_value shall have been present.

hours_flag equal to 1 specifies that hours_value is present when full_timestamp_flag is equal to 0 and seconds_flag is equal to 1 and minutes_flag is equal to 1.

hours_value is used to compute clockTimestamp and shall be in the range of 0 to 23, inclusive. When hours_value is not present, its value is inferred to be equal to the value of hours_value for the previous set of clock timestamp syntax elements in decoding order, and it is required that such a previous hours_value shall have been present.

time_offset_length greater than 0 specifies the length in bits of the time_offset_value syntax element. time_offset_length equal to 0 specifies that the time_offset_value syntax element is not present. time_offset_length should be the same for all pictures in the coded video sequence.

time_offset_value is used to compute clockTimestamp. The number of bits used to represent time_offset_value is equal to time_offset_length. When time_offset_value is not present, its value is inferred to be equal to 0.

Frame header OBU semantics

General frame header OBU semantics

It is a requirement of bitstream conformance that a sequence header OBU has been received before a frame header OBU.

frame_header_copy is a function call that indicates that a copy of the previous frame_header_obu should be inserted at this point.

Note: Bitstreams may contain several copies of the frame_header_obu interspersed with tile_group_obu to allow for greater error resilience. However, the copies must contain identical contents to the original frame_header_obu.

If obu_type is equal to OBU_FRAME_HEADER, it is a requirement of bitstream conformance that SeenFrameHeader is equal to 0.

If obu_type is equal to OBU_REDUNDANT_FRAME_HEADER, it is a requirement of bitstream conformance that SeenFrameHeader is equal to 1.

Note: These requirements ensure that the first frame header for a frame has obu_type equal to OBU_FRAME_HEADER, while later copies of this frame header (if present) have obu_type equal to OBU_REDUNDANT_FRAME_HEADER.

TileNum is a variable giving the index (zero-based) of the current tile.

decode_frame_wrapup is a function call that indicates that the decode frame wrapup process specified in section 7.4 should be invoked.

Uncompressed header semantics

show_existing_frame equal to 1, indicates the frame indexed by frame_to_show_map_idx is to be output; show_existing_frame equal to 0 indicates that further processing is required.

If obu_type is equal to OBU_FRAME, it is a requirement of bitstream conformance that show_existing_frame is equal to 0.

frame_to_show_map_idx specifies the frame to be output. It is only available if show_existing_frame is 1.

display_frame_id provides the frame id number for the frame to output. It is a requirement of bitstream conformance that whenever display_frame_id is read, the value matches RefFrameId[ frame_to_show_map_idx ] (the value of current_frame_id at the time that the frame indexed by frame_to_show_map_idx was stored), and that RefValid[ frame_to_show_map_idx ] is equal to 1.

It is a requirement of bitstream conformance that the number of bits needed to read display_frame_id does not exceed 16. This is equivalent to the constraint that idLen <= 16.

frame_type specifies the type of the frame:

frame_type Name of frame_type
0 KEY_FRAME
1 INTER_FRAME
2 INTRA_ONLY_FRAME
3 SWITCH_FRAME

show_frame equal to 1 specifies that this frame should be immediately output once decoded. show_frame equal to 0 specifies that this frame should not be immediately output. (It may be output later if a later uncompressed header uses show_existing_frame equal to 1).

showable_frame equal to 1 specifies that the frame may be output using the show_existing_frame mechanism. showable_frame equal to 0 specifies that this frame will not be output using the show_existing_frame mechanism.

It is a requirement of bitstream conformance that when show_existing_frame is used to show a previous frame, that the value of showable_frame for the previous frame was equal to 1.

It is a requirement of bitstream conformance that when show_existing_frame is used to show a previous frame with RefFrameType[ frame_to_show_map_idx ] equal to KEY_FRAME, that the frame is output via the show_existing_frame mechanism at most once.

Note: This requirement also forbids storing a frame with frame_type equal to KEY_FRAME into multiple reference frames and then using show_existing_frame for each reference frame.

error_resilient_mode equal to 1 indicates that error resilient mode is enabled; error_resilient_mode equal to 0 indicates that error resilient mode is disabled.

Note: Error resilient mode allows the syntax of a frame to be parsed independently of previously decoded frames.

disable_cdf_update specifies whether the CDF update in the symbol decoding process should be disabled.

current_frame_id specifies the frame id number for the current frame. Frame id numbers are additional information that do not affect the decoding process, but provide decoders with a way of detecting missing reference frames so that appropriate action can be taken.

If frame_type is not equal to KEY_FRAME or show_frame is equal to 0, it is a requirement of bitstream conformance that all of the following conditions are true:

  • current_frame_id is not equal to PrevFrameID,

  • DiffFrameID is less than 1 << ( idLen - 1 )

where DiffFrameID is specified as follows:

  • If current_frame_id is greater than PrevFrameID, DiffFrameID is equal to current_frame_id - PrevFrameID.

  • Otherwise, DiffFrameID is equal to ( 1 << idLen ) + current_frame_id - PrevFrameID.

frame_size_override_flag equal to 0 specifies that the frame size is equal to the size in the sequence header. frame_size_override_flag equal to 1 specifies that the frame size will either be specified as the size of one of the reference frames, or computed from the frame_width_minus_1 and frame_height_minus_1 syntax elements.

order_hint is used to compute OrderHint.

OrderHint specifies OrderHintBits least significant bits of the expected output order for this frame.

Note: There is no requirement that OrderHint should reflect the true output order. As a guideline, the motion vector prediction is expected to be more accurate if the true output order is used for frames that will be shown later. If a frame is never to be shown (e.g. it has been constructed as an average of several frames for reference purposes), the encoder is free to choose whichever value of OrderHint will give the best compression.

primary_ref_frame specifies which reference frame contains the CDF values and other state that should be loaded at the start of the frame.

Note: It is allowed for primary_ref_frame to be coded as PRIMARY_REF_NONE, this will cause default values to be used for the CDF values and other state.

buffer_removal_time_present_flag equal to 1 specifies that buffer_removal_time is present. buffer_removal_time_present_flag equal to 0 specifies that buffer_removal_time is not present.

buffer_removal_time[ opNum ] specifies the frame removal time in units of DecCT clock ticks counted from the removal time of the last random access point for operating point opNum. buffer_removal_time is signaled as a fixed length unsigned integer with a length in bits given by buffer_removal_time_length_minus_1 + 1.

buffer_removal_time is the remainder of a modulo 1 << ( buffer_removal_time_length_minus_1 + 1 ) counter.

allow_screen_content_tools equal to 1 indicates that intra blocks may use palette encoding; allow_screen_content_tools equal to 0 indicates that palette encoding is never used.

allow_intrabc equal to 1 indicates that intra block copy may be used in this frame. allow_intrabc equal to 0 indicates that intra block copy is not allowed in this frame.

Note: intra block copy is only allowed in intra frames, and disables all loop filtering. force_integer_mv will be equal to 1 for intra frames, so only integer offsets are allowed in block copy mode.

force_integer_mv equal to 1 specifies that motion vectors will always be integers. force_integer_mv equal to 0 specifies that motion vectors can contain fractional bits.

ref_order_hint[ i ] specifies the expected output order hint for each reference frame.

Note: The values in the ref_order_hint array are provided to allow implementations to gracefully handle cases when some frames have been lost.

Note: When scalability is used, the values in RefOrderHint during the decode process may depend on the selected operating point.

refresh_frame_flags contains a bitmask that specifies which reference frame slots will be updated with the current frame after it is decoded.

If frame_type is equal to INTRA_ONLY_FRAME, it is a requirement of bitstream conformance that refresh_frame_flags is not equal to 0xff.

Note: This restriction encourages encoders to correctly label random access points (by forcing frame_type to be equal to KEY_FRAME when an intra frame is used to reset the decoding process).

See section 7.20 for details of the frame update process.

frame_refs_short_signaling equal to 1 indicates that only two reference frames are explicitly signaled. frame_refs_short_signaling equal to 0 indicates that all reference frames are explicitly signaled.

last_frame_idx specifies the reference frame to use for LAST_FRAME.

gold_frame_idx specifies the reference frame to use for GOLDEN_FRAME.

set_frame_refs is a function call that indicates the conceptual point where the ref_frame_idx values are computed (in the case when frame_refs_short_signaling is equal to 1, these syntax elements are computed instead of being explicitly signaled). When this function is called, the set frame refs process specified in section 7.8 is invoked.

ref_frame_idx[ i ] specifies which reference frames are used by inter frames. It is a requirement of bitstream conformance that RefValid[ ref_frame_idx[ i ] ] is equal to 1, and that the selected reference frames match the current frame in bit depth, profile, chroma subsampling, and color space.

Note: Syntax elements indicate a reference (such as LAST_FRAME, ALTREF_FRAME). These references are looked up in the ref_frame_idx array to find which reference frame should be used during inter prediction. There is no requirement that the values in ref_frame_idx should be distinct.

RefFrameSignBias specifies the intended direction of the motion vector in time for each reference frame. A sign bias equal to 0 indicates that the reference frame is a forwards reference (i.e. the reference frame is expected to be output before the current frame); a sign bias equal to 1 indicates that the reference frame is a backwards reference.

Note: The sign bias is just an indication that can improve the accuracy of motion vector prediction and is not constrained to reflect the actual output order of pictures.

delta_frame_id_minus_1 is used to calculate DeltaFrameId.

DeltaFrameId specifies the distance to the frame id for the reference frame.

RefFrameId[ i ] specifies the frame id for each reference frame.

expectedFrameId[ i ] specifies the frame id for each frame used for reference. It is a requirement of bitstream conformance that whenever expectedFrameId[ i ] is calculated, the value matches RefFrameId[ ref_frame_idx[ i ] ] (this contains the value of current_frame_id at the time that the frame indexed by ref_frame_idx was stored).

allow_high_precision_mv equal to 0 specifies that motion vectors are specified to quarter pel precision; allow_high_precision_mv equal to 1 specifies that motion vectors are specified to eighth pel precision.

is_motion_mode_switchable equal to 0 specifies that only the SIMPLE motion mode will be used.

use_ref_frame_mvs equal to 1 specifies that motion vector information from a previous frame can be used when decoding the current frame. use_ref_frame_mvs equal to 0 specifies that this information will not be used.

disable_frame_end_update_cdf equal to 1 indicates that the end of frame CDF update is disabled; disable_frame_end_update_cdf equal to 0 indicates that the end of frame CDF update is enabled.

Note: It can be useful to disable the CDF update because it means the next frame can start to be decoded as soon as the frame headers of the current frame have been processed.

motion_field_estimation is a function call which indicates that the motion field estimation process in section 7.9 should be invoked.

OrderHints specifies the expected output order for each reference frame.

CodedLossless is a variable that is equal to 1 when all segments use lossless encoding. This indicates that the frame is fully lossless at the coded resolution of FrameWidth by FrameHeight. In this case, the loop filter and CDEF filter are disabled.

It is a requirement of bitstream conformance that delta_q_present is equal to 0 when CodedLossless is equal to 1.

AllLossless is a variable that is equal to 1 when CodedLossless is equal to 1 and FrameWidth is equal to UpscaledWidth. This indicates that the frame is fully lossless at the upscaled resolution. In this case, the loop filter, CDEF filter, and loop restoration are disabled.

allow_warped_motion equal to 1 indicates that the syntax element motion_mode may be present. allow_warped_motion equal to 0 indicates that the syntax element motion_mode will not be present (this means that LOCALWARP cannot be signaled if allow_warped_motion is equal to 0).

reduced_tx_set equal to 1 specifies that the frame is restricted to a reduced subset of the full set of transform types.

setup_past_independence is a function call that indicates that this frame can be decoded without dependence on previous coded frames. When this function is invoked the following takes place:

  • FeatureData[ i ][ j ] and FeatureEnabled[ i ][ j ] are set equal to 0 for i = 0..MAX_SEGMENTS-1 and j = 0..SEG_LVL_MAX-1.

  • PrevSegmentIds[ row ][ col ] is set equal to 0 for row = 0..MiRows-1 and col = 0..MiCols-1.

  • GmType[ ref ] is set equal to IDENTITY for ref = LAST_FRAME..ALTREF_FRAME.

  • PrevGmParams[ ref ][ i ] is set equal to ( ( i % 3 == 2 ) ? 1 << WARPEDMODEL_PREC_BITS : 0 ) for ref = LAST_FRAME..ALTREF_FRAME, for i = 0..5.

  • loop_filter_delta_enabled is set equal to 1.

  • loop_filter_ref_deltas[ INTRA_FRAME ] is set equal to 1.

  • loop_filter_ref_deltas[ LAST_FRAME ] is set equal to 0.

  • loop_filter_ref_deltas[ LAST2_FRAME ] is set equal to 0.

  • loop_filter_ref_deltas[ LAST3_FRAME ] is set equal to 0.

  • loop_filter_ref_deltas[ BWDREF_FRAME ] is set equal to 0.

  • loop_filter_ref_deltas[ GOLDEN_FRAME ] is set equal to -1.

  • loop_filter_ref_deltas[ ALTREF_FRAME ] is set equal to -1.

  • loop_filter_ref_deltas[ ALTREF2_FRAME ] is set equal to -1.

  • loop_filter_mode_deltas[ i ] is set equal to 0 for i = 0..1.

init_non_coeff_cdfs is a function call that indicates that the CDF tables which are not used in the coeff( ) syntax structure should be initialised. When this function is invoked, the following steps apply:

  • YModeCdf is set to a copy of Default_Y_Mode_Cdf

  • UVModeCflNotAllowedCdf is set to a copy of Default_Uv_Mode_Cfl_Not_Allowed_Cdf

  • UVModeCflAllowedCdf is set to a copy of Default_Uv_Mode_Cfl_Allowed_Cdf

  • AngleDeltaCdf is set to a copy of Default_Angle_Delta_Cdf

  • IntrabcCdf is set to a copy of Default_Intrabc_Cdf

  • PartitionW8Cdf is set to a copy of Default_Partition_W8_Cdf

  • PartitionW16Cdf is set to a copy of Default_Partition_W16_Cdf

  • PartitionW32Cdf is set to a copy of Default_Partition_W32_Cdf

  • PartitionW64Cdf is set to a copy of Default_Partition_W64_Cdf

  • PartitionW128Cdf is set to a copy of Default_Partition_W128_Cdf

  • SegmentIdCdf is set to a copy of Default_Segment_Id_Cdf

  • SegmentIdPredictedCdf is set to a copy of Default_Segment_Id_Predicted_Cdf

  • Tx8x8Cdf is set to a copy of Default_Tx_8x8_Cdf

  • Tx16x16Cdf is set to a copy of Default_Tx_16x16_Cdf

  • Tx32x32Cdf is set to a copy of Default_Tx_32x32_Cdf

  • Tx64x64Cdf is set to a copy of Default_Tx_64x64_Cdf

  • TxfmSplitCdf is set to a copy of Default_Txfm_Split_Cdf

  • FilterIntraModeCdf is set to a copy of Default_Filter_Intra_Mode_Cdf

  • FilterIntraCdf is set to a copy of Default_Filter_Intra_Cdf

  • InterpFilterCdf is set to a copy of Default_Interp_Filter_Cdf

  • MotionModeCdf is set to a copy of Default_Motion_Mode_Cdf

  • NewMvCdf is set to a copy of Default_New_Mv_Cdf

  • ZeroMvCdf is set to a copy of Default_Zero_Mv_Cdf

  • RefMvCdf is set to a copy of Default_Ref_Mv_Cdf

  • CompoundModeCdf is set to a copy of Default_Compound_Mode_Cdf

  • DrlModeCdf is set to a copy of Default_Drl_Mode_Cdf

  • IsInterCdf is set to a copy of Default_Is_Inter_Cdf

  • CompModeCdf is set to a copy of Default_Comp_Mode_Cdf

  • SkipModeCdf is set to a copy of Default_Skip_Mode_Cdf

  • SkipCdf is set to a copy of Default_Skip_Cdf

  • CompRefCdf is set to a copy of Default_Comp_Ref_Cdf

  • CompBwdRefCdf is set to a copy of Default_Comp_Bwd_Ref_Cdf

  • SingleRefCdf is set to a copy of Default_Single_Ref_Cdf

  • MvJointCdf[ i ] is set to a copy of Default_Mv_Joint_Cdf for i = 0..MV_CONTEXTS-1

  • MvClassCdf[ i ] is set to a copy of Default_Mv_Class_Cdf for i = 0..MV_CONTEXTS-1

  • MvClass0BitCdf[ i ][ comp ] is set to a copy of Default_Mv_Class0_Bit_Cdf for i = 0..MV_CONTEXTS-1 and comp = 0..1

  • MvFrCdf[ i ] is set to a copy of Default_Mv_Fr_Cdf for i = 0..MV_CONTEXTS-1

  • MvClass0FrCdf[ i ] is set to a copy of Default_Mv_Class0_Fr_Cdf for i = 0..MV_CONTEXTS-1

  • MvClass0HpCdf[ i ][ comp ] is set to a copy of Default_Mv_Class0_Hp_Cdf for i = 0..MV_CONTEXTS-1 and comp = 0..1

  • MvSignCdf[ i ][ comp ] is set to a copy of Default_Mv_Sign_Cdf for i = 0..MV_CONTEXTS-1 and comp = 0..1

  • MvBitCdf[ i ][ comp ] is set to a copy of Default_Mv_Bit_Cdf for i = 0..MV_CONTEXTS-1 and comp = 0..1

  • MvHpCdf[ i ][ comp ] is set to a copy of Default_Mv_Hp_Cdf for i = 0..MV_CONTEXTS-1 and comp = 0..1

  • PaletteYModeCdf is set to a copy of Default_Palette_Y_Mode_Cdf

  • PaletteUVModeCdf is set to a copy of Default_Palette_Uv_Mode_Cdf

  • PaletteYSizeCdf is set to a copy of Default_Palette_Y_Size_Cdf

  • PaletteUVSizeCdf is set to a copy of Default_Palette_Uv_Size_Cdf

  • PaletteSize2YColorCdf is set to a copy of Default_Palette_Size_2_Y_Color_Cdf

  • PaletteSize2UVColorCdf is set to a copy of Default_Palette_Size_2_Uv_Color_Cdf

  • PaletteSize3YColorCdf is set to a copy of Default_Palette_Size_3_Y_Color_Cdf

  • PaletteSize3UVColorCdf is set to a copy of Default_Palette_Size_3_Uv_Color_Cdf

  • PaletteSize4YColorCdf is set to a copy of Default_Palette_Size_4_Y_Color_Cdf

  • PaletteSize4UVColorCdf is set to a copy of Default_Palette_Size_4_Uv_Color_Cdf

  • PaletteSize5YColorCdf is set to a copy of Default_Palette_Size_5_Y_Color_Cdf

  • PaletteSize5UVColorCdf is set to a copy of Default_Palette_Size_5_Uv_Color_Cdf

  • PaletteSize6YColorCdf is set to a copy of Default_Palette_Size_6_Y_Color_Cdf

  • PaletteSize6UVColorCdf is set to a copy of Default_Palette_Size_6_Uv_Color_Cdf

  • PaletteSize7YColorCdf is set to a copy of Default_Palette_Size_7_Y_Color_Cdf

  • PaletteSize7UVColorCdf is set to a copy of Default_Palette_Size_7_Uv_Color_Cdf

  • PaletteSize8YColorCdf is set to a copy of Default_Palette_Size_8_Y_Color_Cdf

  • PaletteSize8UVColorCdf is set to a copy of Default_Palette_Size_8_Uv_Color_Cdf

  • DeltaQCdf is set to a copy of Default_Delta_Q_Cdf

  • DeltaLFCdf is set to a copy of Default_Delta_Lf_Cdf

  • DeltaLFMultiCdf[ i ] is set to a copy of Default_Delta_Lf_Cdf for i = 0..FRAME_LF_COUNT-1

  • IntraTxTypeSet1Cdf is set to a copy of Default_Intra_Tx_Type_Set1_Cdf

  • IntraTxTypeSet2Cdf is set to a copy of Default_Intra_Tx_Type_Set2_Cdf

  • InterTxTypeSet1Cdf is set to a copy of Default_Inter_Tx_Type_Set1_Cdf

  • InterTxTypeSet2Cdf is set to a copy of Default_Inter_Tx_Type_Set2_Cdf

  • InterTxTypeSet3Cdf is set to a copy of Default_Inter_Tx_Type_Set3_Cdf

  • UseObmcCdf is set to a copy of Default_Use_Obmc_Cdf

  • InterIntraCdf is set to a copy of Default_Inter_Intra_Cdf

  • CompRefTypeCdf is set to a copy of Default_Comp_Ref_Type_Cdf

  • CflSignCdf is set to a copy of Default_Cfl_Sign_Cdf

  • UniCompRefCdf is set to a copy of Default_Uni_Comp_Ref_Cdf

  • WedgeInterIntraCdf is set to a copy of Default_Wedge_Inter_Intra_Cdf

  • CompGroupIdxCdf is set to a copy of Default_Comp_Group_Idx_Cdf

  • CompoundIdxCdf is set to a copy of Default_Compound_Idx_Cdf

  • CompoundTypeCdf is set to a copy of Default_Compound_Type_Cdf

  • InterIntraModeCdf is set to a copy of Default_Inter_Intra_Mode_Cdf

  • WedgeIndexCdf is set to a copy of Default_Wedge_Index_Cdf

  • CflAlphaCdf is set to a copy of Default_Cfl_Alpha_Cdf

  • UseWienerCdf is set to a copy of Default_Use_Wiener_Cdf

  • UseSgrprojCdf is set to a copy of Default_Use_Sgrproj_Cdf

  • RestorationTypeCdf is set to a copy of Default_Restoration_Type_Cdf

init_coeff_cdfs( ) is a function call that indicates that the CDF tables used in the coeff( ) syntax structure should be initialised. When this function is invoked, the following steps apply:

  • The variable idx is derived as follows:

    • If base_q_idx is less than or equal to 20, idx is set equal to 0.

    • Otherwise, if base_q_idx is less than or equal to 60, idx is set equal to 1.

    • Otherwise, if base_q_idx is less than or equal to 120, idx is set equal to 2.

    • Otherwise, idx is set equal to 3.

  • The cumulative distribution function arrays are reset to default values as follows:

    • TxbSkipCdf is set to a copy of Default_Txb_Skip_Cdf[ idx ].

    • EobPt16Cdf is set to a copy of Default_Eob_Pt_16_Cdf[ idx ].

    • EobPt32Cdf is set to a copy of Default_Eob_Pt_32_Cdf[ idx ].

    • EobPt64Cdf is set to a copy of Default_Eob_Pt_64_Cdf[ idx ].

    • EobPt128Cdf is set to a copy of Default_Eob_Pt_128_Cdf[ idx ].

    • EobPt256Cdf is set to a copy of Default_Eob_Pt_256_Cdf[ idx ].

    • EobPt512Cdf is set to a copy of Default_Eob_Pt_512_Cdf[ idx ].

    • EobPt1024Cdf is set to a copy of Default_Eob_Pt_1024_Cdf[ idx ].

    • EobExtraCdf is set to a copy of Default_Eob_Extra_Cdf[ idx ].

    • DcSignCdf is set to a copy of Default_Dc_Sign_Cdf[ idx ].

    • CoeffBaseEobCdf is set to a copy of Default_Coeff_Base_Eob_Cdf[ idx ].

    • CoeffBaseCdf is set to a copy of Default_Coeff_Base_Cdf[ idx ].

    • CoeffBrCdf is set to a copy of Default_Coeff_Br_Cdf[ idx ].

load_cdfs( ctx ) is a function call that indicates that the CDF tables are loaded from frame context number ctx in the range 0 to (NUM_REF_FRAMES - 1). When this function is invoked, a copy of each CDF array mentioned in the semantics for init_coeff_cdfs and init_non_coeff_cdfs is loaded from an area of memory indexed by ctx. (The memory contents of these frame contexts have been initialized by previous calls to save_cdfs). Once the CDF arrays have been loaded, the last entry in each array, representing the symbol count for that context, is set to 0.

load_previous( ) is a function call that indicates that information from a previous frame may be loaded for use in decoding the current frame. When this function is invoked the following ordered steps apply:

  1. The variable prevFrame is set equal to ref_frame_idx[ primary_ref_frame ].

  2. PrevGmParams is set equal to SavedGmParams[ prevFrame ].

  3. The function load_loop_filter_params( prevFrame ) specified in section 7.21 is invoked.

  4. The function load_segmentation_params( prevFrame ) specified in section 7.21 is invoked.

load_previous_segment_ids( ) is a function call that indicates that a segment map from a previous frame may be loaded for use in decoding the current frame. When this function is invoked the segment map contained in PrevSegmentIds is set as follows:

  1. The variable prevFrame is set equal to ref_frame_idx[ primary_ref_frame ].

  2. If segmentation_enabled is equal to 1, RefMiCols[ prevFrame ] is equal to MiCols, and RefMiRows[ prevFrame ] is equal to MiRows, PrevSegmentIds[ row ][ col ] is set equal to SavedSegmentIds[ prevFrame ][ row ][ col ] for row = 0..MiRows-1, for col = 0..MiCols-1.

    Otherwise, PrevSegmentIds[ row ][ col ] is set equal to 0 for row = 0..MiRows-1, for col = 0..MiCols-1.

Reference frame marking semantics

RefValid is an array which is indexed by a reference picture slot number. A value of 1 in the array signifies that the corresponding reference picture slot is valid for use as a reference picture, while a value of 0 signifies that the corresponding reference picture slot is not valid for use as a reference picture.

Note: RefValid is only used to define valid bitstreams when frame_id_numbers_present_flag is equal to 1. Frames are marked as invalid when they are too far in the past to be referenced by the frame id mechanism.

Frame size semantics

frame_width_minus_1 plus one is the width of the frame in luma samples.

frame_height_minus_1 plus one is the height of the frame in luma samples.

It is a requirement of bitstream conformance that frame_width_minus_1 is less than or equal to max_frame_width_minus_1.

It is a requirement of bitstream conformance that frame_height_minus_1 is less than or equal to max_frame_height_minus_1.

If FrameIsIntra is equal to 0 (indicating that this frame may use inter prediction), the requirements described in the frame size with refs semantics of section 6.8.6 must also be satisfied.

Render size semantics

The render size is provided as a hint to the application about the desired display size. It has no effect on the decoding process.

render_and_frame_size_different equal to 0 means that the render width and height are inferred from the frame width and height. render_and_frame_size_different equal to 1 means that the render width and height are explicitly coded.

Note: It is allowed for the bitstream to explicitly code the render dimensions in the bitstream even if they are an exact match for the frame dimensions.

render_width_minus_1 plus one is the render width of the frame in luma samples.

render_height_minus_1 plus one is the render height of the frame in luma samples.

Frame size with refs semantics

For inter frames, the frame size is either set equal to the size of a reference frame, or can be sent explicitly.

found_ref equal to 1 indicates that the frame dimensions can be inferred from reference frame i where i is the loop counter in the syntax parsing process for frame_size_with_refs. found_ref equal to 0 indicates that the frame dimensions are not inferred from reference frame i.

Once the FrameWidth and FrameHeight have been computed for an inter frame, it is a requirement of bitstream conformance that for all values of i in the range 0..(REFS_PER_FRAME - 1), all the following conditions are true:

  • 2 * FrameWidth >= RefUpscaledWidth[ ref_frame_idx[ i ] ]
  • 2 * FrameHeight >= RefFrameHeight[ ref_frame_idx[ i ] ]
  • FrameWidth <= 16 * RefUpscaledWidth[ ref_frame_idx[ i ] ]
  • FrameHeight <= 16 * RefFrameHeight[ ref_frame_idx[ i ] ]

Note: This is a requirement even if all the blocks in an inter frame are coded using intra prediction.

Superres params semantics

use_superres equal to 0 indicates that no upscaling is needed. use_superres equal to 1 indicates that upscaling is needed.

coded_denom is used to compute the amount of upscaling.

SuperresDenom is the denominator of a fraction that specifies the ratio between the superblock width before and after upscaling. The numerator of this fraction is equal to the constant SUPERRES_NUM.

Compute image size semantics

MiCols is the number of 4x4 block columns in the frame.

MiRows is the number of 4x4 block rows in the frame.

Interpolation filter semantics

is_filter_switchable equal to 1 indicates that the filter selection is signaled at the block level; is_filter_switchable equal to 0 indicates that the filter selection is signaled at the frame level.

interpolation_filter specifies the filter selection used for performing inter prediction:

interpolation_filter Name of interpolation_filter
0 EIGHTTAP
1 EIGHTTAP_SMOOTH
2 EIGHTTAP_SHARP
3 BILINEAR
4 SWITCHABLE

Loop filter semantics

loop_filter_level is an array containing loop filter strength values. Different loop filter strength values from the array are used depending on the image plane being filtered, and the edge direction (vertical or horizontal) being filtered.

loop_filter_sharpness indicates the sharpness level. The loop_filter_level and loop_filter_sharpness together determine when a block edge is filtered, and by how much the filtering can change the sample values.

The loop filter process is described in section 7.14.

loop_filter_delta_enabled equal to 1 means that the filter level depends on the mode and reference frame used to predict a block. loop_filter_delta_enabled equal to 0 means that the filter level does not depend on the mode and reference frame.

loop_filter_delta_update equal to 1 means that additional syntax elements are present that specify which mode and reference frame deltas are to be updated. loop_filter_delta_update equal to 0 means that these syntax elements are not present.

update_ref_delta equal to 1 means that the syntax element loop_filter_ref_delta is present; update_ref_delta equal to 0 means that this syntax element is not present.

loop_filter_ref_deltas contains the adjustment needed for the filter level based on the chosen reference frame. If this syntax element is not present, it maintains its previous value.

update_mode_delta equal to 1 means that the syntax element loop_filter_mode_deltas is present; update_mode_delta equal to 0 means that this syntax element is not present.

loop_filter_mode_deltas contains the adjustment needed for the filter level based on the chosen mode. If this syntax element is not present in the, it maintains its previous value.

Note: The previous values for loop_filter_mode_deltas and loop_filter_ref_deltas are intially set by the setup_past_independence function and can be subsequently modified by these syntax elements being coded in a previous frame.

Quantization params semantics

The residual is specified via decoded coefficients which are adjusted by one of four quantization parameters before the inverse transform is applied. The choice depends on the plane (Y or UV) and coefficient position (DC/AC coefficient). The Dequantization process is specified in section 7.12.

base_q_idx indicates the base frame qindex. This is used for Y AC coefficients and as the base value for the other quantizers.

DeltaQYDc indicates the Y DC quantizer relative to base_q_idx.

diff_uv_delta equal to 1 indicates that the U and V delta quantizer values are coded separately. diff_uv_delta equal to 0 indicates that the U and V delta quantizer values share a common value.

DeltaQUDc indicates the U DC quantizer relative to base_q_idx.

DeltaQUAc indicates the U AC quantizer relative to base_q_idx.

DeltaQVDc indicates the V DC quantizer relative to base_q_idx.

DeltaQVAc indicates the V AC quantizer relative to base_q_idx.

using_qmatrix specifies that the quantizer matrix will be used to compute quantizers.

qm_y specifies the level in the quantizer matrix that should be used for luma plane decoding.

qm_u specifies the level in the quantizer matrix that should be used for chroma U plane decoding.

qm_v specifies the level in the quantizer matrix that should be used for chroma V plane decoding.

Delta quantizer semantics

delta_coded specifies that the delta_q syntax element is present.

delta_q specifies an offset (relative to base_q_idx) for a particular quantization parameter.

Segmentation params semantics

AV1 provides a means of segmenting the image and then applying various adjustments at the segment level.

Up to 8 segments may be specified for any given frame. For each of these segments it is possible to specify:

  1. A quantizer (absolute value or delta).
  2. A loop filter strength (absolute value or delta).
  3. A prediction reference frame.
  4. A block skip mode that implies both the use of a (0,0) motion vector and that no residual will be coded.

Each of these data values for each segment may be individually updated at the frame level. Where a value is not updated in a given frame, the value from the previous frame persists. The exceptions to this are key frames, intra only frames or other frames where independence from past frame values is required (for example to enable error resilience). In such cases all values are reset as described in the semantics for setup_past_independence.

The segment affiliation (the segmentation map) is stored at the resolution of 4x4 blocks. If no explicit update is coded for a block’s segment affiliation, then it persists from frame to frame (until reset by a call to setup_past_independence).

SegIdPreSkip equal to 1 indicates that the segment id will be read before the skip syntax element. SegIdPreSkip equal to 0 indicates that the skip syntax element will be read first.

LastActiveSegId indicates the highest numbered segment id that has some enabled feature. This is used when decoding the segment id to only decode choices corresponding to used segments.

segmentation_enabled equal to 1 indicates that this frame makes use of the segmentation tool; segmentation_enabled equal to 0 indicates that the frame does not use segmentation.

segmentation_update_map equal to 1 indicates that the segmentation map are updated during the decoding of this frame. segmentation_update_map equal to 0 means that the segmentation map from the previous frame is used.

segmentation_temporal_update equal to 1 indicates that the updates to the segmentation map are coded relative to the existing segmentation map. segmentation_temporal_update equal to 0 indicates that the new segmentation map is coded without reference to the existing segmentation map.

segmentation_update_data equal to 1 indicates that new parameters are about to be specified for each segment. segmentation_update_data equal to 0 indicates that the segmentation parameters should keep their existing values.

feature_enabled equal to 0 indicates that the corresponding feature is unused and has value equal to 0. feature_enabled equal to 1 indicates that the feature value is coded.

feature_value specifies the feature data for a segment feature.

Tile info semantics

uniform_tile_spacing_flag equal to 1 means that the tiles are uniformly spaced across the frame. (In other words, all tiles are the same size except for the ones at the right and bottom edge which can be smaller.) uniform_tile_spacing_flag equal to 0 means that the tile sizes are coded.

increment_tile_cols_log2 is used to compute TileColsLog2.

TileColsLog2 specifies the base 2 logarithm of the desired number of tiles across the frame.

TileCols specifies the number of tiles across the frame. It is a requirement of bitstream conformance that TileCols is less than or equal to MAX_TILE_COLS.

increment_tile_rows_log2 is used to compute TileRowsLog2.

TileRowsLog2 specifies the base 2 logarithm of the desired number of tiles down the frame.

Note: For small frame sizes the actual number of tiles in the frame may be smaller than the desired number because the tile size is rounded up to a multiple of the maximum superblock size.

TileRows specifies the number of tiles down the frame. It is a requirement of bitstream conformance that TileRows is less than or equal to MAX_TILE_ROWS.

tileWidthSb is used to specify the width of each tile in units of superblocks. It is a requirement of bitstream conformance that tileWidthSb is less than maxTileWidthSb.

tileHeightSb is used to specify the height of each tile in units of superblocks. It is a requirement of bitstream conformance that tileWidthSb * tileHeightSb is less than maxTileAreaSb.

If uniform_tile_spacing_flag is equal to 0, it is a requirement of bitstream conformance that startSb is equal to sbCols when the loop writing MiColStarts exits.

If uniform_tile_spacing_flag is equal to 0, it is a requirement of bitstream conformance that startSb is equal to sbRows when the loop writing MiRowStarts exits.

Note: The requirements on startSb ensure that the sizes of each tile add up to the full size of the frame when measured in superblocks.

MiColStarts is an array specifying the start column (in units of 4x4 luma samples) for each tile across the image.

MiRowStarts is an array specifying the start row (in units of 4x4 luma samples) for each tile down the image.

width_in_sbs_minus_1 specifies the width of a tile minus 1 in units of superblocks.

height_in_sbs_minus_1 specifies the height of a tile minus 1 in units of superblocks.

maxTileHeightSb specifies the maximum height (in units of superblocks) that can be used for a tile (to avoid making tiles with too much area).

context_update_tile_id specifies which tile to use for the CDF update. It is a requirement of bitstream conformance that context_update_tile_id is less than TileCols * TileRows.

tile_size_bytes_minus_1 is used to compute TileSizeBytes.

TileSizeBytes specifies the number of bytes needed to code each tile size.

Quantizer index delta parameters semantics

delta_q_present specifies whether quantizer index delta values are present.

delta_q_res specifies the left shift which should be applied to decoded quantizer index delta values.

Loop filter delta parameters semantics

delta_lf_present specifies whether loop filter delta values are present.

delta_lf_res specifies the left shift which should be applied to decoded loop filter delta values.

delta_lf_multi equal to 1 specifies that separate loop filter deltas are sent for horizontal luma edges, vertical luma edges, the U edges, and the V edges. delta_lf_multi equal to 0 specifies that the same loop filter delta is used for all edges.

Global motion params semantics

is_global specifies whether global motion parameters are present for a particular reference frame.

is_rot_zoom specifies whether a particular reference frame uses rotation and zoom global motion.

is_translation specifies whether a particular reference frame uses translation global motion.

Global param semantics

absBits is used to compute the range of values that can be used for gm_params[ref][idx]. The values allowed are in the range -(1 << absBits) to (1 << absBits).

precBits specifies the number of fractional bits used for representing gm_params[ref][idx]. All global motion parameters are stored in the model with WARPEDMODEL_PREC_BITS fractional bits, but the parameters are encoded with less precision.

Decode subexp semantics

subexp_final_bits provide the final bits that are read once the appropriate range has been determined.

subexp_more_bits equal to 0 specifies that the parameter is in the range mk to mk+a-1. subexp_more_bits equal to 1 specifies that the parameter is greater than mk+a-1.

subexp_bits specifies the value of the parameter minus mk.

Film grain params semantics

apply_grain equal to 1 specifies that film grain should be added to this frame. apply_grain equal to 0 specifies that film grain should not be added.

reset_grain_params() is a function call that indicates that all the syntax elements read in film_grain_params should be set equal to 0.

grain_seed specifies the starting value for the pseudo-random numbers used during film grain synthesis.

update_grain equal to 1 means that a new set of parameters should be sent. update_grain equal to 0 means that the previous set of parameters should be used.

film_grain_params_ref_idx indicates which reference frame contains the film grain parameters to be used for this frame.

It is a requirement of bitstream conformance that film_grain_params_ref_idx is equal to ref_frame_idx[ j ] for some value of j in the range 0 to REFS_PER_FRAME - 1.

Note: This requirement means that film grain can only be predicted from the frames that the current frame is using as reference frames.

load_grain_params(idx) is a function call that indicates that all the syntax elements read in film_grain_params should be set equal to the values stored in an area of memory indexed by idx.

tempGrainSeed is a temporary variable that is used to avoid losing the value of grain_seed when load_grain_params is called. When update_grain is equal to 0, a previous set of parameters should be used for everything except grain_seed.

num_y_points specifies the number of points for the piece-wise linear scaling function of the luma component.

It is a requirement of bitstream conformance that num_y_points is less than or equal to 14.

point_y_value[ i ] represents the x (luma value) coordinate for the i-th point of the piecewise linear scaling function for luma component. The values are signaled on the scale of 0..255. (In case of 10 bit video, these values correspond to luma values divided by 4. In case of 12 bit video, these values correspond to luma values divided by 16.)

If i is greater than 0, it is a requirement of bitstream conformance that point_y_value[ i ] is greater than point_y_value[ i - 1 ] (this ensures the x coordinates are specified in increasing order).

point_y_scaling[ i ] represents the scaling (output) value for the i-th point of the piecewise linear scaling function for luma component.

chroma_scaling_from_luma specifies that the chroma scaling is inferred from the luma scaling.

num_cb_points specifies the number of points for the piece-wise linear scaling function of the cb component.

It is a requirement of bitstream conformance that num_cb_points is less than or equal to 10.

Note: When chroma_scaling_from_luma is equal to 1, it is still allowed for num_y_points to take values up to 14. This means that the chroma scaling also needs to support up to 14 points.

point_cb_value[ i ] represents the x coordinate for the i-th point of the piece-wise linear scaling function for cb component. The values are signaled on the scale of 0..255.

If i is greater than 0, it is a requirement of bitstream conformance that point_cb_value[ i ] is greater than point_cb_value[ i - 1 ].

point_cb_scaling[ i ] represents the scaling (output) value for the i-th point of the piecewise linear scaling function for cb component.

num_cr_points specifies represents the number of points for the piece-wise linear scaling function of the cr component.

It is a requirement of bitstream conformance that num_cr_points is less than or equal to 10.

If subsampling_x is equal to 1 and subsampling_y is equal to 1 and num_cb_points is equal to 0, it is a requirement of bitstream conformance that num_cr_points is equal to 0.

If subsampling_x is equal to 1 and subsampling_y is equal to 1 and num_cb_points is not equal to 0, it is a requirement of bitstream conformance that num_cr_points is not equal to 0.

Note: These requirements ensure that for 4:2:0 chroma subsampling, film grain noise will be applied to both chroma components, or to neither. There is no restriction for 4:2:2 or 4:4:4 chroma subsampling.

point_cr_value[ i ] represents the x coordinate for the i-th point of the piece-wise linear scaling function for cr component. The values are signaled on the scale of 0..255.

If i is greater than 0, it is a requirement of bitstream conformance that point_cr_value[ i ] is greater than point_cr_value[ i - 1 ].

point_cr_scaling[ i ] represents the scaling (output) value for the i-th point of the piecewise linear scaling function for cr component.

grain_scaling_minus_8 represents the shift – 8 applied to the values of the chroma component. The grain_scaling_minus_8 can take values of 0..3 and determines the range and quantization step of the standard deviation of film grain.

ar_coeff_lag specifies the number of auto-regressive coefficients for luma and chroma.

ar_coeffs_y_plus_128[ i ] specifies auto-regressive coefficients used for the Y plane.

ar_coeffs_cb_plus_128[ i ] specifies auto-regressive coefficients used for the U plane.

ar_coeffs_cr_plus_128[ i ] specifies auto-regressive coefficients used for the V plane.

ar_coeff_shift_minus_6 specifies the range of the auto-regressive coefficients. Values of 0, 1, 2, and 3 correspond to the ranges for auto-regressive coefficients of [-2, 2), [-1, 1), [-0.5, 0.5) and [-0.25, 0.25) respectively.

grain_scale_shift specifies how much the Gaussian random numbers should be scaled down during the grain synthesis process.

cb_mult represents a multiplier for the cb component used in derivation of the input index to the cb component scaling function.

cb_luma_mult represents a multiplier for the average luma component used in derivation of the input index to the cb component scaling function.

cb_offset represents an offset used in derivation of the input index to the cb component scaling function.

cr_mult represents a multiplier for the cr component used in derivation of the input index to the cr component scaling function.

cr_luma_mult represents a multiplier for the average luma component used in derivation of the input index to the cr component scaling function.

cr_offset represents an offset used in derivation of the input index to the cr component scaling function.

overlap_flag equal to 1 indicates that the overlap between film grain blocks shall be applied. overlap_flag equal to 0 indicates that the overlap between film grain blocks shall not be applied.

clip_to_restricted_range equal to 1 indicates that clipping to the restricted (studio) range shall be applied to the sample values after adding the film grain (see the semantics for color_range for an explanation of studio swing). clip_to_restricted_range equal to 0 indicates that clipping to the full range shall be applied to the sample values after adding the film grain.

TX mode semantics

tx_mode_select is used to compute TxMode.

TxMode specifies how the transform size is determined:

TxMode Name of TxMode
0 ONLY_4X4
1 TX_MODE_LARGEST
2 TX_MODE_SELECT

For tx_mode equal to TX_MODE_LARGEST, the inverse transform will use the largest transform size that fits inside the block.

For tx_mode equal to ONLY_4X4, the inverse transform will use only 4x4 transforms.

For tx_mode equal to TX_MODE_SELECT, the choice of transform size is specified explicitly for each block.

Skip mode params semantics

SkipModeFrame[ list ] specifies the frames to use for compound prediction when skip_mode is equal to 1.

skip_mode_present equal to 1 specifies that the syntax element skip_mode will be present. skip_mode_present equal to 0 specifies that skip_mode will not be used for this frame.

Note: Skip mode tries to use the closest forward and backward references (as measured by values in the RefOrderHint array). If no backward reference is found, then the second closest forward reference is used. If no forward reference is found, then skip mode is disabled. (Forward prediction is when a frame is used for reference that is considered to be output before the current frame, backward prediction is when a frame is used that has not yet been output.)

Frame reference mode semantics

reference_select equal to 1 specifies that the mode info for inter blocks contains the syntax element comp_mode that indicates whether to use single or compound reference prediction. Reference_select equal to 0 specifies that all inter blocks will use single prediction.

Temporal point info semantics

frame_presentation_time specifies the presentation time of the frame in clock ticks DispCT counted from the removal time of the last random access point for the operating point that is being decoded. The syntax element is signaled as a fixed length unsigned integer with a length in bits given by frame_presentation_time_length_minus_1 + 1. The frame_presentation_time is the remainder of a modulo 1 << (frame_presentation_time_length_minus_1 + 1) counter.

Frame OBU semantics

A frame OBU consists of a frame header OBU and a tile group OBU packed into a single OBU.

Note: The intention is to provide a more compact way of coding the common use case where the frame header is immediately followed by tile group data.

Tile group OBU semantics

General tile group OBU semantics

NumTiles specifies the total number of tiles in the frame.

tile_start_and_end_present_flag specifies whether tg_start and tg_end are present. If tg_start and tg_end are not present, this tile group covers the entire frame.

If obu_type is equal to OBU_FRAME, it is a requirement of bitstream conformance that the value of tile_start_and_end_present_flag is equal to 0.

tg_start specifies the zero-based index of the first tile in the current tile group.

It is a requirement of bitstream conformance that the value of tg_start is equal to the value of TileNum at the point that tile_group_obu is invoked.

tg_end specifies the zero-based index of the last tile in the current tile group.

It is a requirement of bitstream conformance that the value of tg_end is greater than or equal to tg_start.

It is a requirement of bitstream conformance that the value of tg_end for the last tile group in each frame is equal to NumTiles - 1.

Note: These requirements ensure that conceptually all tile groups are present and received in order for the purposes of specifying the decode process.

frame_end_update_cdf is a function call that indicates that the frame CDF arrays are set equal to the saved CDFs. This process is described in section 7.7.

tile_size_minus_1 is used to compute tileSize.

tileSize specifies the size in bytes of the next coded tile.

Note: This size includes any padding bytes if added by the exit process for the Symbol decoder. The size does not include the bytes used for tile_size_minus_1 or syntax elements sent before tile_size_minus_1. For the last tile in the tile group, tileSize is computed instead of being read and includes the OBU trailing bits.

decode_frame_wrapup is a function call that indicates that the decode frame wrapup process specified in section 7.4 should be invoked.

Decode tile semantics

clear_left_context is a function call that indicates that some arrays used to determine the probabilities are zeroed. When this function is invoked the arrays LeftLevelContext, LeftDcContext, and LeftSegPredContext are set equal to 0.

Note: LeftLevelContext[ plane ][ i ], LeftDcContext[ plane ][ i ], and LeftSegPredContext[ i ] need to be set to 0 for i = 0..MiRows-1, for plane = 0..2.

clear_above_context is a function call that indicates that some arrays used to determine the probabilities are zeroed. When this function is invoked the arrays AboveLevelContext, AboveDcContext, and AboveSegPredContext are set equal to 0.

Note: AboveLevelContext[ plane ][ i ], AboveDcContext[ plane ][ i ], and AboveSegPredContext[ i ] need to be set to 0 for i = 0..MiCols-1, for plane = 0..2.

ReadDeltas specifies whether the current block may read delta values for the quantizer index and loop filter. If the entire superblock is skipped the delta values are not read, otherwise delta values for the quantizer index and loop filter are read on the first block of a superblock. If delta_q_present is equal to 0, no delta values are read for the quantizer index. If delta_lf_present is equal to 0, no delta values are read for the loop filter.

Clear block decoded flags semantics

BlockDecoded is an array which stores one boolean value per 4x4 sample block per plane in the current superblock, plus a border of one 4x4 sample block on all sides of the superblock. Except for the borders, a value of 1 in BlockDecoded indicates that the corresponding 4x4 sample block has been decoded. The borders are used when computing above-right and below-left availability along the top and left edges of the superblock.

Decode partition semantics

partition specifies how a block is partitioned:

partition Name of partition
0 PARTITION_NONE
1 PARTITION_HORZ
2 PARTITION_VERT
3 PARTITION_SPLIT
4 PARTITION_HORZ_A
5 PARTITION_HORZ_B
6 PARTITION_VERT_A
7 PARTITION_VERT_B
8 PARTITION_HORZ_4
9 PARTITION_VERT_4

The variable subSize is computed from partition and indicates the size of the component blocks within this block:

subSize Name of subSize
0 BLOCK_4X4
1 BLOCK_4X8
2 BLOCK_8X4
3 BLOCK_8X8
4 BLOCK_8X16
5 BLOCK_16X8
6 BLOCK_16X16
7 BLOCK_16X32
8 BLOCK_32X16
9 BLOCK_32X32
10 BLOCK_32X64
11 BLOCK_64X32
12 BLOCK_64X64
13 BLOCK_64X128
14 BLOCK_128X64
15 BLOCK_128X128
16 BLOCK_4X16
17 BLOCK_16X4
18 BLOCK_8X32
19 BLOCK_32X8
20 BLOCK_16X64
21 BLOCK_64X16

The dimensions of these blocks are given in width, height order (e.g. BLOCK_8X16 corresponds to a block that is 8 samples wide, and 16 samples high).

It is a requirement of bitstream conformance that get_plane_residual_size( subSize, 1 ) is not equal to BLOCK_INVALID every time subSize is computed.

Note: This requirement prevents the UV blocks from being too tall or too wide (i.e. having aspect ratios outside the range 1:4 to 4:1). For example, when 4:2:2 chroma subsampling is used a luma partition of size 8x32 is invalid, as it implies a chroma partition of size 4x32, which results in an aspect ratio of 1:8.

split_or_vert is used to compute partition for blocks when only split or vert partitions are allowed because of overlap with the right hand edge of the frame.

split_or_horz is used to compute partition for blocks when only split or horz partitions are allowed because of overlap with the bottom edge of the frame.

Decode block semantics

MiRow is a variable holding the vertical location of the block in units of 4x4 luma samples.

MiCol is a variable holding the horizontal location of the block in units of 4x4 luma samples.

MiSize is a variable holding the size of the block with values having the same interpretation for the variable subSize.

HasChroma is a variable that specifies whether chroma information is coded for this block.

Variable AvailU is equal to 0 if the information from the block above cannot be used on the luma plane; AvailU is equal to 1 if the information from the block above can be used on the luma plane.

Variable AvailL is equal to 0 if the information from the block to the left can not be used on the luma plane; AvailL is equal to 1 if the information from the block to the left can be used on the luma plane.

Note: Information from a block in a different tile can be used in some circumstances if the block is above, but not if the block is to the left.

Variables AvailUChroma and AvailLChroma have the same significance as AvailU and AvailL, but on the chroma planes.

Intra frame mode info semantics

This syntax is used when coding an intra block within an intra frame.

use_intrabc equal to 1 specifies that intra block copy should be used for this block. use_intrabc equal to 0 specifies that intra block copy should not be used.

intra_frame_y_mode specifies the direction of intra prediction filtering:

intra_frame_y_mode Name of intra_frame_y_mode
0 DC_PRED
1 V_PRED
2 H_PRED
3 D45_PRED
4 D135_PRED
5 D113_PRED
6 D157_PRED
7 D203_PRED
8 D67_PRED
9 SMOOTH_PRED
10 SMOOTH_V_PRED
11 SMOOTH_H_PRED
12 PAETH_PRED

uv_mode specifies the chrominance intra prediction mode using values with the same interpretation as in the semantics for intra_frame_y_mode, with an additional mode UV_CFL_PRED.

uv_mode Name of uv_mode
0 DC_PRED
1 V_PRED
2 H_PRED
3 D45_PRED
4 D135_PRED
5 D113_PRED
6 D157_PRED
7 D203_PRED
8 D67_PRED
9 SMOOTH_PRED
10 SMOOTH_V_PRED
11 SMOOTH_H_PRED
12 PAETH_PRED
13 UV_CFL_PRED

Note: Due to the way the uv_mode syntax element is read, uv_mode can only be read as UV_CFL_PRED when Max( Block_Width[ MiSize ], Block_Height[ MiSize ] ) <= 32.

Intra segment ID semantics

Lossless is a variable which, if equal to 1, indicates that the block is coded using a special 4x4 transform designed for encoding frames that are bit-identical with the original frames.

Read segment ID semantics

segment_id specifies which segment is associated with the current intra block being decoded. It is first read from the stream, and then postprocessed based on the predicted segment id.

It is a requirement of bitstream conformance that the postprocessed value of segment_id (i.e. the value returned by neg_deinterleave) is in the range 0 to LastActiveSegId (inclusive of endpoints).

Inter segment ID semantics

seg_id_predicted equal to 1 specifies that the segment_id is taken from the segmentation map. seg_id_predicted equal to 0 specifies that the syntax element segment_id is parsed.

Note: It is allowed for seg_id_predicted to be equal to 0 even if the value coded for the segment_id is equal to predictedSegmentId.

Skip mode semantics

skip_mode equal to 1 indicates that this block will use some default settings (that correspond to compound prediction) and so most of the mode info is skipped. skip_mode equal to 0 indicates that the mode info is not skipped.

Skip semantics

skip equal to 0 indicates that there may be some transform coefficients to read for this block; skip equal to 1 indicates that there are no transform coefficients.

Quantizer index delta semantics

delta_q_abs specifies the absolute value of the quantizer index delta value being decoded. If delta_q_abs is equal to DELTA_Q_SMALL, the value is encoded using delta_q_rem_bits and delta_q_abs_bits.

delta_q_rem_bits and delta_q_abs_bits encode the absolute value of the quantizer index delta value being decoded, where the absolute value of the quantizer index delta value is of the form:

(1 << delta_q_rem_bits) + delta_q_abs_bits + 1

delta_q_sign_bit equal to 0 indicates that the quantizer index delta value is positive; delta_q_sign_bit equal to 1 indicates that the quantizer index delta value is negative.

Loop filter delta semantics

delta_lf_abs specifies the absolute value of the loop filter delta value being decoded. If delta_lf_abs is equal to DELTA_LF_SMALL, the value is encoded using delta_lf_rem_bits and delta_lf_abs_bits.

delta_lf_rem_bits and delta_lf_abs_bits encode the absolute value of the loop filter delta value being decoded, where the absolute value of the loop filter delta value is of the form:

( 1 << ( delta_lf_rem_bits + 1 ) ) + delta_lf_abs_bits + 1

delta_lf_sign_bit equal to 0 indicates that the loop filter delta value is positive; delta_lf_sign_bit equal to 1 indicates that the loop filter delta value is negative.

CDEF params semantics

cdef_damping_minus_3 controls the amount of damping in the deringing filter.

cdef_bits specifies the number of bits needed to specify which CDEF filter to apply.

cdef_y_pri_strength and cdef_uv_pri_strength specify the strength of the primary filter.

cdef_y_sec_strength and cdef_uv_sec_strength specify the strength of the secondary filter.

Loop restoration params semantics

lr_type is used to compute FrameRestorationType.

FrameRestorationType specifies the type of restoration used for each plane as follows:

lr_type FrameRestorationType Name of FrameRestorationType
0 0 RESTORE_NONE
1 3 RESTORE_SWITCHABLE
2 1 RESTORE_WIENER
3 2 RESTORE_SGRPROJ

UsesLr indicates if any plane uses loop restoration.

lr_unit_shift specifies if the luma restoration size should be halved.

lr_unit_extra_shift specifies if the luma restoration size should be halved again.

lr_uv_shift is only present for 4:2:0 formats and specifies if the chroma size should be half the luma size.

LoopRestorationSize[plane] specifies the size of loop restoration units in units of samples in the current plane.

TX size semantics

tx_depth is used to compute TxSize. tx_depth is inverted with respect to TxSize, i.e. it specifies how much smaller the transform size should be made than the largest possible transform size for the block.

TxSize specifies the transform size to be used for this block:

TxSize Name of TxSize
0 TX_4X4
1 TX_8X8
2 TX_16X16
3 TX_32X32
4 TX_64X64
5 TX_4X8
6 TX_8X4
7 TX_8X16
8 TX_16X8
9 TX_16X32
10 TX_32X16
11 TX_32X64
12 TX_64X32
13 TX_4X16
14 TX_16X4
15 TX_8X32
16 TX_32X8
17 TX_16X64
18 TX_64X16

Note: TxSize is determined for skipped intra blocks because TxSize controls the granularity of the intra prediction.

Block TX size semantics

InterTxSizes is an array that holds the transform sizes within inter frames.

Note: TxSizes and InterTxSizes contain different values. All the values in TxSizes across a residual block will share the same value, while InterTxSizes can represent several different transform sizes within a residual block.

Var TX size semantics

txfm_split equal to 1 specifies that the block should be split into smaller transform sizes. txfm_split equal to 0 specifies that the block should not be split any more.

Transform type semantics

set specifies the transform set.

is_inter set Name of transform set
Don’t care 0 TX_SET_DCTONLY
0 1 TX_SET_INTRA_1
0 2 TX_SET_INTRA_2
1 1 TX_SET_INTER_1
1 2 TX_SET_INTER_2
1 3 TX_SET_INTER_3

The transform sets determine what subset of transform types can be used, according to the following table.

Transform type TX_SET_
DCTONLY
TX_SET_
INTRA_1
TX_SET_
INTRA_2
TX_SET_
INTER_1
TX_SET_
INTER_2
TX_SET_
INTER_3
DCT_DCT X X X X X X
ADST_DCT   X X X X  
DCT_ADST   X X X X  
ADST_ADST   X X X X  
FLIPADST_DCT       X X  
DCT_FLIPADST       X X  
FLIPADST_FLIPADST       X X  
ADST_FLIPADST       X X  
FLIPADST_ADST       X X  
IDTX   X X X X X
V_DCT   X   X X  
H_DCT   X   X X  
V_ADST       X    
H_ADST       X    
V_FLIPADST       X    
H_FLIPADST       X    

inter_tx_type specifies the transform type for inter blocks.

intra_tx_type specifies the transform type for intra blocks.

Is inter semantics

is_inter equal to 0 specifies that the block is an intra block; is_inter equal to 1 specifies that the block is an inter block.

Intra block mode info semantics

This syntax is used when coding an intra block within an inter frame.

y_mode specifies the direction of luminance intra prediction using values with the same interpretation as for intra_frame_y_mode.

uv_mode specifies the chrominance intra prediction mode using values with the same interpretation as in the semantics for intra_frame_y_mode, with an additional mode UV_CFL_PRED.

Note: Due to the way the uv_mode syntax element is read, uv_mode can only be read as UV_CFL_PRED when Max( Block_Width[ MiSize ], Block_Height[ MiSize ] ) <= 32.

Inter block mode info semantics

This syntax is used when coding an inter block.

compound_mode specifies how the motion vector used by inter prediction is obtained when using compound prediction. An offset is added to compound_mode to compute YMode as follows:

YMode Name of YMode
14 NEARESTMV
15 NEARMV
16 GLOBALMV
17 NEWMV
18 NEAREST_NEARESTMV
19 NEAR_NEARMV
20 NEAREST_NEWMV
21 NEW_NEARESTMV
22 NEAR_NEWMV
23 NEW_NEARMV
24 GLOBAL_GLOBALMV
25 NEW_NEWMV

Note: The intra modes take values 0..13 so these YMode values start at 14.

new_mv equal to 0 means that a motion vector difference should be read.

zero_mv equal to 0 means that the motion vector should be set equal to default motion for the frame.

ref_mv equal to 0 means that the most likely motion vector should be used (called NEAREST), ref_mv equal to 1 means that the second most likely motion vector should be used (called NEAR).

interp_filter specifies the type of filter used in inter prediction. Values 0..3 are allowed with the same interpretation as for interpolation_filter. One filter type is specified for the vertical filter direction and one for the horizontal filter direction.

Note: The syntax element interpolation_filter from the uncompressed header can specify the type of filter to be used for the whole frame. If it is set to SWITCHABLE then the interp_filter syntax element is read from the bitstream for every inter block.

RefMvIdx specifies which candidate in the RefStackMv should be used.

drl_mode is a bit sent for candidates in the motion vector stack to indicate if they should be used. drl_mode equal to 0 means to use the current value of idx. drl_mode equal to 1 says to continue searching. DRL stands for “Dynamic Reference List”.

Filter intra mode info semantics

use_filter_intra is a bit specifying whether or not intra filtering can be used.

filter_intra_mode specifies the type of intra filtering, and can take on any of the following values:

filter_intra_mode Name of filter_intra_mode
0 FILTER_DC_PRED
1 FILTER_V_PRED
2 FILTER_H_PRED
3 FILTER_D157_PRED
4 FILTER_PAETH_PRED

Ref frames semantics

comp_mode specifies whether single or compound prediction is used:

comp_mode Name of comp_mode
0 SINGLE_REFERENCE
1 COMPOUND_REFERENCE

SINGLE_REFERENCE indicates that the inter block uses only a single reference frame to generate motion compensated prediction.

COMPOUND_REFERENCE indicates that the inter block uses compound mode.

There are two reference frame groups:

  • Group 1: LAST_FRAME, LAST2_FRAME, LAST3_FRAME, and GOLDEN_FRAME.

  • Group 2: BWDREF_FRAME, ALTREF2_FRAME, and ALTREF_FRAME.

Note: Encoders are free to assign these references to any of the reference frames (via the ref_frame_idx array). For example, there is no requirement of bitstream conformance that LAST_FRAME should indicate a frame that appears before the current frame in output order. Similarly, encoders can assign multiple references to the same reference frame.

comp_ref_type is used for compound prediction to specify whether both reference frames come from the same group or not:

comp_ref_type Name of comp_ref_type Description
0 UNIDIR_COMP_REFERENCE Both reference frames from the same group
1 BIDIR_COMP_REFERENCE One from Group 1 and one from Group 2

uni_comp_ref, uni_comp_ref_p1, and uni_comp_ref_p2 specify which reference frames are in use when both come from the same group.

comp_ref, comp_ref_p1, and comp_ref_p2 specify the first reference frame when the two reference frames come from different groups.

comp_bwdref and comp_bwdref_p1 specify the second reference frame when the two reference frames come from different groups.

single_ref_p1, single_ref_p2, single_ref_p3, single_ref_p4, single_ref_p5, and single_ref_p6 specify the reference frame when only a single reference frame is in use.

RefFrame[ 0 ] specifies which frame is used to compute the predicted samples for this block:

RefFrame[ 0 ] Name of ref_frame
0 INTRA_FRAME
1 LAST_FRAME
2 LAST2_FRAME
3 LAST3_FRAME
4 GOLDEN_FRAME
5 BWDREF_FRAME
6 ALTREF2_FRAME
7 ALTREF_FRAME

RefFrame[ 1 ] specifies which additional frame is used in compound prediction:

RefFrame[ 1 ] Name of ref_frame
-1 NONE (this block uses single prediction)
0 INTRA_FRAME (this block uses interintra prediction)
1 LAST_FRAME
2 LAST2_FRAME
3 LAST3_FRAME
4 GOLDEN_FRAME
5 BWDREF_FRAME
6 ALTREF2_FRAME
7 ALTREF_FRAME

Note: Not all combinations of RefFrame[0] and RefFrame[1] can be coded.

Assign mv semantics

It is a requirement of bitstream conformance that whenever assign_mv returns, the function is_mv_valid(isCompound) would return 1, where is_mv_valid is defined as:

is_mv_valid( isCompound ) {
    for ( i = 0; i < 1 + isCompound; i++ ) {
        for ( comp = 0; comp < 2; comp++ ) {
            if ( Abs( Mv[ i ][ comp ] ) >= ( 1 << 14 ) )
                return 0
        }
    }
    if ( !use_intrabc ) {
        return 1
    }
    bw = Block_Width[ MiSize ]
    bh = Block_Height[ MiSize ]
    if ( (Mv[ 0 ][ 0 ] & 7) || (Mv[ 0 ][ 1 ] & 7) ) {
        return 0
    }
    deltaRow = Mv[ 0 ][ 0 ] >> 3
    deltaCol = Mv[ 0 ][ 1 ] >> 3
    srcTopEdge = MiRow * MI_SIZE + deltaRow
    srcLeftEdge = MiCol * MI_SIZE + deltaCol
    srcBottomEdge = srcTopEdge + bh
    srcRightEdge = srcLeftEdge + bw
    if ( HasChroma ) {
        if ( bw < 8 && subsampling_x )
            srcLeftEdge -= 4
        if ( bh < 8 && subsampling_y )
            srcTopEdge -= 4
    }
    if ( srcTopEdge < MiRowStart * MI_SIZE ||
         srcLeftEdge < MiColStart * MI_SIZE ||
         srcBottomEdge > MiRowEnd * MI_SIZE ||
         srcRightEdge > MiColEnd * MI_SIZE ) {
        return 0
    }
    sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64
    sbH = Block_Height[ sbSize ]
    activeSbRow = (MiRow * MI_SIZE) / sbH
    activeSb64Col = (MiCol * MI_SIZE) >> 6
    srcSbRow = (srcBottomEdge - 1) / sbH
    srcSb64Col = (srcRightEdge - 1) >> 6
    totalSb64PerRow = ((MiColEnd - MiColStart - 1) >> 4) + 1
    activeSb64 = activeSbRow * totalSb64PerRow + activeSb64Col
    srcSb64 = srcSbRow * totalSb64PerRow + srcSb64Col
    if ( srcSb64 >= activeSb64 - INTRABC_DELAY_SB64) {
        return 0
    }
    gradient = 1 + INTRABC_DELAY_SB64 + use_128x128_superblock
    wfOffset = gradient * (activeSbRow - srcSbRow)
    if ( srcSbRow > activeSbRow ||
         srcSb64Col >= activeSb64Col - INTRABC_DELAY_SB64 + wfOffset ) {
        return 0
    }
    return 1
}

Note: The purpose of this function is to limit the maximum size of motion vectors and also, if use_intrabc is equal to 1, to additionally constrain the motion vector in order that the data is fetched from parts of the tile that have already been decoded, and that are not too close to the current block (in order to make a pipelined decoder implementation feasible).

Read motion mode semantics

use_obmc equal to 1 means that OBMC should be used. use_obmc equal to 0 means that simple translation should be used.

motion_mode specifies the type of motion compensation to perform:

motion_mode Name of motion_mode
0 SIMPLE
1 OBMC
2 LOCALWARP

Note: A motion_mode equal to SIMPLE is used for blocks requiring global motion.

Read inter intra semantics

interintra equal to 1 specifies that an inter prediction should be blended with an intra prediction.

interintra_mode specifies the type of intra prediction to be used:

interintra_mode Name of interintra_mode
0 II_DC_PRED
1 II_V_PRED
2 II_H_PRED
3 II_SMOOTH_PRED

wedge_interintra equal to 1 specifies that wedge blending should be used. wedge_interintra equal to 0 specifies that intra blending should be used.

wedge_index is used to derive the direction and offset of the wedge mask used during blending.

Read compound type semantics

comp_group_idx equal to 0 indicates that the compound_idx syntax element should be read. comp_group_idx equal to 1 indicates that the compound_idx syntax element is not present.

compound_idx equal to 0 indicates that a distance based weighted scheme should be used for blending. compound_idx equal to 1 indicates that the averaging scheme should be used for blending.

compound_type specifies how the two predictions should be blended together:

compound_type Name of compound_type
0 COMPOUND_WEDGE
1 COMPOUND_DIFFWTD
2 COMPOUND_AVERAGE
3 COMPOUND_INTRA
4 COMPOUND_DISTANCE

Note: COMPOUND_AVERAGE, COMPOUND_INTRA, and COMPOUND_DISTANCE cannot be directly signaled with the compound_type syntax element but are inferred from other syntax elements.

wedge_index is used to derive the direction and offset of the wedge mask used during blending.

wedge_sign specifies the sign of the wedge blend.

mask_type specifies the type of mask to be used during blending:

mask_type Name of mask_type
0 UNIFORM_45
1 UNIFORM_45_INV

MV semantics

MvCtx is used to determine which CDFs to use for the motion vector syntax elements.

mv_joint specifies which components of the motion vector difference are non-zero:

mv_joint Name of mv_joint Changes row Changes col
0 MV_JOINT_ZERO No No
1 MV_JOINT_HNZVZ No Yes
2 MV_JOINT_HZVNZ Yes No
3 MV_JOINT_HNZVNZ Yes Yes

The motion vector difference is added to the PredMv to compute the final motion vector in Mv.

MV component semantics

mv_sign equal to 0 means that the motion vector difference is positive; mv_sign equal to 1 means that the motion vector difference is negative.

mv_class specifies the class of the motion vector difference. A higher class means that the motion vector difference represents a larger update:

mv_class Name of mv_class
0 MV_CLASS_0
1 MV_CLASS_1
2 MV_CLASS_2
3 MV_CLASS_3
4 MV_CLASS_4
5 MV_CLASS_5
6 MV_CLASS_6
7 MV_CLASS_7
8 MV_CLASS_8
9 MV_CLASS_9
10 MV_CLASS_10

mv_class0_bit specifies the integer part of the motion vector difference. This is only present for class 0 motion vector differences.

mv_class0_fr specifies the first 2 fractional bits of the motion vector difference. This is only present for class 0 motion vector differences.

mv_class0_hp specifies the third fraction bit of the motion vector difference. This is only present for class 0 motion vector differences.

mv_bit specifies bit i of the integer part of the motion vector difference.

mv_fr specifies the first 2 fractional bits of the motion vector difference.

mv_hp specifies the third fractional bit of the motion vector difference.

Compute prediction semantics

The prediction for inter and interintra blocks is triggered within compute_prediction. However, intra prediction is done at the transform block granularity so predict_intra is also called from transform_block.

predW and predH are variables containing the smallest size that can be used for inter prediction. (This size may be increased for chroma blocks if not all blocks use inter prediction.)

predict_inter is a function call that indicates the conceptual point where inter prediction happens. When this function is called, the inter prediction process specified in section 7.11.3 is invoked.

predict_intra is a function call that indicates the conceptual point where intra prediction happens. When this function is called, the intra prediction process specified in section 7.11.2 is invoked.

Note: The predict_inter and predict_intra functions do not affect the syntax decode process.

someUseIntra is a variable that indicates if some of the blocks corresponding to this residual require intra prediction.

Note: The chroma residual block size is always at least 4 in width and height. This means that no transform width or height smaller than 4 is required. As such, a chroma residual may actually cover several luma blocks. If any of these blocks are intra, a single prediction is performed for the entire chroma residual block based on the mode info of the bottom right luma block. However, if all the constituent blocks are inter blocks, a special case is triggered and inter prediction is done using the smaller chroma block size that corresponds to each of the luma blocks.

Residual semantics

The residual consists of a number of transform blocks.

If the block is wider or higher than 64 luma samples, then the residual is split into 64 by 64 chunks.

Within each chunk, the transform blocks are either sent in raster order (if use_inter is equal to 0 or LossLess is equal to 1), or within a recursive transform tree.

Transform block semantics

reconstruct is a function call that indicates the conceptual point where inverse transform and reconstruction happens. When this function is called, the reconstruction process specified in section 7.12.3 is invoked.

predict_palette is a function call that indicates the conceptual point where palette prediction happens. When this function is called, the palette prediction process specified in section 7.11.4 is invoked.

predict_chroma_from_luma is a function call that indicates the conceptual point where predicting chroma from luma happens. When this function is called, the predict chroma from luma process specified in section 7.11.5 is invoked.

MaxLumaW and MaxLumaH are needed for chroma from luma prediction and store the extent of luma samples that can be used for prediction.

LoopfilterTxSizes is an array that stores the transform size for each plane and position for use in loop filtering. LoopfilterTxSizes[ plane ][ row ][ col ] stores the transform size where row and col are in units of 4x4 samples.

Note: The transform size is always equal for planes 1 and 2.

Coefficients semantics

TxTypes is an array which stores at a 4x4 luma sample granularity the transform type to be used.

Note: The transform type is only read for luma transform blocks, the chroma uses the transform type for a corresponding luma block. Chroma blocks will only use transform types that have been written for the current residual block.

Quant is an array storing the quantised coefficients for the current transform block.

all_zero equal to 1 specifies that all coefficients are zero.

Note: The transform type is only present when this is a luminance block and all_zero is equal to 0. If all_zero is equal to 1 for a luminance block, the transform type is set to DCT_DCT.

eob_extra and eob_extra_bit specify the position of the last non-zero coefficient by being used to compute the variable eob.

eob_pt_16, eob_pt_32, eob_pt_64, eob_pt_128, eob_pt_256, eob_pt_512, eob_pt_1024: syntax elements used to compute eob.

eob is a variable that indicates the index of the end of block. This index is equal to one plus the index of the last non-zero coefficient.

coeff_base_eob is a syntax element used to compute the base level of the last non-zero coefficient.

Note: The base level is set to coeff_base_eob plus 1. Since this coefficient is known to be non-zero, only base levels of 1, 2, or 3 can be coded via coeff_base_eob.

coeff_base specifies the base level of a coefficient (this syntax element is used for all coefficients except the last non-zero coefficient).

Note: The base level can take values of 0, 1, 2, or 3. If the base level is less than 3, then it contains the actual level of the coefficient. Otherwise, the syntax element coeff_br is used to optionally increase the level.

dc_sign specifies the sign of the DC coefficient.

sign_bit specifies the sign of a non-zero AC coefficient.

coeff_br specifies an increment to the coefficient.

Note: Each quantized coefficient can use coeff_br to provide up to 4 increments. If an increment less than 3 is coded, it signifies that this was the final increment.

golomb_length_bit is used to compute the number of extra bits required to code the coefficient.

If length is equal to 20, it is a requirement of bitstream conformance that golomb_length_bit is equal to 1.

golomb_data_bit specifies the value of one of the extra bits.

AboveLevelContext and LeftLevelContext are arrays that store at a 4 sample granularity the cumulative sum of coefficient levels.

AboveDcContext and LeftDcContext are arrays that store at a 4 sample granularity 2 bits signaling the sign of the DC coefficient (zero being counted as a separate sign).

Intra angle info semantics

angle_delta_y specifies the offset to be applied to the intra prediction angle specified by the prediction mode in the luma plane, biased by MAX_ANGLE_DELTA so as to encode a positive value.

angle_delta_uv specifies the offset to be applied to the intra prediction angle specified by the prediction mode in the chroma plane biased by MAX_ANGLE_DELTA so as to encode a positive value.

AngleDeltaY is computed from angle_delta_y by removing the MAX_ANGLE_DELTA offset to produce the final luma angle offset value, which may be positive or negative.

AngleDeltaUV is computed from angle_delta_uv by removing the MAX_ANGLE_DELTA offset to produce the final chroma angle offset value, which may be positive or negative.

Read CFL alphas semantics

cfl_alpha_signs contains the sign of the alpha values for U and V packed together into a single syntax element with 8 possible values. (The combination of two zero signs is prohibited as it is redundant with DC Intra prediction.)

cfl_alpha_signs Name of signU Name of signV
0 CFL_SIGN_ZERO CFL_SIGN_NEG
1 CFL_SIGN_ZERO CFL_SIGN_POS
2 CFL_SIGN_NEG CFL_SIGN_ZERO
3 CFL_SIGN_NEG CFL_SIGN_NEG
4 CFL_SIGN_NEG CFL_SIGN_POS
5 CFL_SIGN_POS CFL_SIGN_ZERO
6 CFL_SIGN_POS CFL_SIGN_NEG
7 CFL_SIGN_POS CFL_SIGN_POS

signU contains the sign of the alpha value for the U component:

signU Name of signU
0 CFL_SIGN_ZERO
1 CFL_SIGN_NEG
2 CFL_SIGN_POS

signV contains the sign of the alpha value for the V component with the same interpretation as for signU.

cfl_alpha_u contains the absolute value of alpha minus one for the U component.

cfl_alpha_v contains the absolute value of alpha minus one for the V component.

CflAlphaU contains the signed value of the alpha component for the U component.

CflAlphaV contains the signed value of the alpha component for the V component.

Palette mode info semantics

has_palette_y is a boolean value specifying whether a palette is encoded for the Y plane.

has_palette_uv is a boolean value specifying whether a palette is encoded for the UV plane.

palette_size_y_minus_2 is used to compute PaletteSizeY.

PaletteSizeY is a variable holding the Y plane palette size.

palette_size_uv_minus_2 is used to compute PaletteSizeUV.

PaletteSizeUV is a variable holding the UV plane palette size.

use_palette_color_cache_y, if equal to 1, indicates that for a particular palette entry in the luma palette, the cached entry should be used.

use_palette_color_cache_u, if equal to 1, indicates that for a particular palette entry in the U chroma palette, the cached entry should be used.

palette_colors_y is an array holding the Y plane palette colors.

palette_colors_u is an array holding the U plane palette colors.

palette_colors_v is an array holding the V plane palette colors.

delta_encode_palette_colors_v, if equal to 1, indicates that the V chroma palette is encoded using delta encoding.

palette_num_extra_bits_y is used to calculate the number of bits used to store each palette delta value for the luma palette.

palette_num_extra_bits_u is used to calculate the number of bits used to store each palette delta value for the U chroma palette.

palette_num_extra_bits_v is used to calculate the number of bits used to store each palette delta value for the V chroma palette.

palette_delta_y is a delta value for the luma palette.

palette_delta_u is a delta value for the U chroma palette.

palette_delta_v is a delta value for the V chroma palette.

Note: Luma and U delta values give a positive offset relative to the previous palette entry in the same plane. V delta values give a signed offset relative to the U palette entries.

palette_delta_sign_bit_v, if equal to 1, indicates that the decoded V chroma palette delta value should be negated.

Palette tokens semantics

color_index_map_y holds the index in palette_colors_y for the block’s Y plane top left sample.

color_index_map_uv holds the index in palette_colors_u and palette_colors_v for the block’s UV plane top left sample.

palette_color_idx_y holds the index in ColorOrder for a sample in the block’s Y plane.

palette_color_idx_uv holds the index in ColorOrder for a sample in the block’s UV plane.

Palette color context semantics

ColorOrder is an array holding the mapping from an encoded index to the palette. ColorOrder is ranked in order of frequency of occurrence of each color in the neighborhood of the current block, weighted by closeness to the current block.

ColorContextHash is a variable derived from the distribution of colors in the neighborhood of the current block, which is used to determine the probability context used to decode palette_color_idx_y and palette_color_idx_uv.

Read CDEF semantics

cdef_idx specifies which CDEF filtering parameters should be used for a particular 64 by 64 block. A value of -1 means that CDEF is disabled for that block.

Read loop restoration unit semantics

use_wiener specifies if the Wiener filter should be used.

use_sgrproj specifies if the self guided filter should be used.

restoration_type specifies the restoration filter that should be used with the same interpretation as FrameRestorationType.

lr_sgr_set specifies which set of parameters to use for the self guided filter.

subexp_more_bools equal to 0 specifies that the parameter is in the range mk to mk+a-1. subexp_more_bools equal to 1 specifies that the parameter is greater than mk+a-1.

subexp_unif_bools specifies the value of the parameter minus mk.

subexp_bools specifies the value of the parameter minus mk.

Tile list OBU semantics

General tile list OBU semantics

output_frame_width_in_tiles_minus_1 plus one is the width of the output frame, in tile units.

output_frame_height_in_tiles_minus_1 plus one is the height of the output frame, in tile units.

tile_count_minus_1 plus one is the number of tile_list_entry in the list.

It is a requirement of bitstream conformance that tile_count_minus_1 is less than or equal to 511.

Tile list entry semantics

anchor_frame_idx is the index into an array AnchorFrames of the frames that the tile uses for prediction. The AnchorFrames array is provided by external means and may change for each tile list OBU. The process for creating the AnchorFrames array is outside of the scope of this specification.

It is a requirement of bitstream conformance that anchor_frame_idx is less than or equal to 127.

anchor_tile_row is the row coordinate of the tile in the frame that it belongs, in tile units.

It is a requirement of bitstream conformance that anchor_tile_row is less than TileRows.

anchor_tile_col is the column coordinate of the tile in the frame that it belongs, in tile units.

It is a requirement of bitstream conformance that anchor_tile_col is less than TileCols.

tile_data_size_minus_1 plus one is the size of the coded tile data, coded_tile_data, in bytes.

coded_tile_data are the tile_data_size_minus_1 + 1 bytes of the coded tile.

Decoding process

Overview

AV1 contains two operating modes:

  1. General decoding (input is a sequence of OBUs, output is decoded frames)

  2. Large scale tile decoding (input is a tile list OBU plus additional side information, output is a decoded frame)

The general decoding process is specified in section 7.2.

The large scale tile decoding process is specified in section 7.3.

General decoding process

When film_grain_params_present is equal to 0, decoders shall produce output frames that are identical in all respects and have the same output order as those produced by the decoding process specified herein.

When film_grain_params_present is equal to 1, a decoder shall implement a film grain synthesis process that modifies the output arrays OutY, OutU, OutV. The reference film grain synthesis process is described in section 7.18.3.

When film_grain_params_present is equal to 1, a conformant decoder shall satisfy at least one of the following two options:

  1. A conformant decoder shall produce output frames that are identical in all respects and have the same output order as those produced by the decoding process specified herein including applying the exact film grain synthesis process as specified in section 7.18.3.

  2. A conformant decoder shall produce intermediate frames that are identical in all respects and have the same order as the frames produced by the process specified in section 7.18.2. In addition to that, a conformant decoder shall produce output frames that are in the same order and do not have perceptually significant differences with the frames produced by the reference film grain synthesis process specified in section 7.18.3 when applied to the input frames of the film grain synthesis process with the film grain parameters signaled for these frames. The decoder may also include optional processing steps which are applied to the intermediate frames produced by the process specified in section 7.18.2 and before the film grain synthesis process, resulting in the input frames of the film grain synthesis process. Such optional processing steps are beyond the scope of this specification. Otherwise, the intermediate frames are the input frames of the film grain synthesis process. The definition of “perceptually significant differences” is beyond the scope of this specification and may be specified, for example, by a service provider as part of their accreditation program. The film grain synthesis process applied by a conformant decoder should be feature complete with regards to the reference film grain synthesis process of section 7.18.3 including scaling strength of the film grain as a function of intensity according to the signaled parameters, same maximum AR lag, and similar modeling of correlation between luma and chroma and smoothing of transitions between blocks of grain when applicable.

Note: To ensure conformance, decoder manufacturers are advised to implement the film grain synthesis process as specified in section 7.18.3. One reason to choose the second conformance option is implementation of optional processing steps between the output of section 7.18.2 and the film grain synthesis process, in which case there could be minor differences in the output with the reference film grain synthesis process of section 7.18.3. Examples of these optional processing steps are algorithms improving output picture quality, such as de-banding filtering and coding artefacts removal.

Note: Some applications, such as transcoding from AV1 to AV1, may use intermediate output frames of section 7.18.2 for transcoding. In such cases, the original film grain synthesis information may be adapted and inserted in the transcoded bitstream.

The input to this process is a sequence of open bitstream units (OBUs).

The output from this process is a sequence of decoded frames.

For each OBU in turn the syntax elements are extracted as specified in section 5.3.

The syntax tables include function calls indicating when the remaining decode processes are triggered.

Large scale tile decoding process

General

The large scale tile decoding process is used to decode a random subset of tiles taken from a number of coded frames. The list of tiles is specified by a tile list OBU. One possible use case for this process is described in Annex D.

Note: A decoder is recommended to support decoding of tile list OBUs, but this is not a requirement for decoder conformance.

The inputs to this process are:

  • contents of all syntax elements and variables produced when parsing a sequence header OBU,

  • contents of all syntax elements and variables produced when parsing a frame header OBU (including CDF tables optionally loaded from a reference frame),

  • an array AnchorFrames containing up to 128 frames,

  • a tile list OBU.

The output from this process is:

  • an output frame containing decoded tiles in raster order.

Note: The syntax elements from the sequence header and frame header may be produced by decoding a sequence header OBU and a frame header OBU, but this is not a requirement of decoder conformance. The AnchorFrames may be produced by decoding an AV1 bitstream, but this is not a requirement of bitstream conformance.

The following figure shows the arrangement of data required to decode a single tile list OBU. Those data shown on a green background are normatively defined in this specification. Data items shown on a yellow background are defined by a process or processes beyond the scope of this specification.

Application level overview

For each tile list entry in the tile list OBU, the following ordered steps are applied:

  1. Parse the syntax elements within the tile_list_entry

  2. Set the bitstream position indicator to point to the the start of the coded_tile_data syntax element

  3. Set the variable last equal to ref_frame_idx[ 0 ]

  4. Set FrameStore[ last ] equal to AnchorFrames[ anchor_frame_idx ]

  5. RefValid[ last ] is set equal to 1.

  6. RefUpscaledWidth[ last ] is set equal to UpscaledWidth.

  7. RefFrameWidth[ last ] is set equal to FrameWidth.

  8. RefFrameHeight[ last ] is set equal to FrameHeight.

  9. RefMiCols[ last ] is set equal to MiCols.

  10. RefMiRows[ last ] is set equal to MiRows.

  11. RefSubsamplingX[ last ] is set equal to subsampling_x.

  12. RefSubsamplingY[ last ] is set equal to subsampling_y.

  13. RefBitDepth[ last ] is set equal to BitDepth.

  14. Invoke the decode camera tile process specified in section 7.3.2 and write the decoded tiles into an output frame in raster order, in the order that they occur in the tile list OBU.

The output from this process is the output frame that is built up in the final step above.

The variable outputW is defined as ( 1 + output_frame_width_in_tiles_minus_1 ) * TileWidth.

The variable outputH is defined as ( 1 + output_frame_height_in_tiles_minus_1 ) * TileHeight.

The operation of writing a decoded tile (with zero-based index given by the variable tile) into the output frame in raster order is defined as follows:

destX = TileWidth * ( tile % (output_frame_width_in_tiles_minus_1 + 1) )
destY = TileHeight * ( tile / (output_frame_width_in_tiles_minus_1 + 1) )
w = TileWidth
h = TileHeight
for ( y = 0; y < h; y++ ) {
  for ( x = 0; x < w; x++ ) {
    OutputFrameY[ y + destY ][ x + destX ] = OutY[ y ][ x ]
  }
}
w = w >> subsampling_x
h = h >> subsampling_y
destX = destX >> subsampling_x
destY = destY >> subsampling_y
for ( y = 0; y < h; y++ ) {
  for ( x = 0; x < w; x++ ) {
    OutputFrameU[ y + destY ][ x + destX ] = OutU[ y ][ x ]
    OutputFrameV[ y + destY ][ x + destX ] = OutV[ y ][ x ]
  }
}

OutputFrameY (representing the luma plane of the output frame) is outputW samples across by outputH samples down.

OutputFrameU (representing the U plane of the output frame) is ( outputW >> subsampling_x ) samples across by ( outputH >> subsampling_y ) samples down.

OutputFrameV (representing the V plane of the output frame) is ( outputW >> subsampling_x ) samples across by ( outputH >> subsampling_y ) samples down.

The bitdepth of each output sample is given by BitDepth.

The output frame may not be fully covered with decoded tiles. The decoder should not modify samples in the output frame outside of the boundaries of the decoded tiles.

Decoders that support large scale tile decoding shall produce output frames that are identical in all respects as those produced by this decoding process.

It is a requirement of bitstream conformance that the following conditions are met:

  • enable_superres is equal to 0

  • enable_order_hint is equal to 0

  • still_picture is equal to 0

  • film_grain_params_present is equal to 0

  • timing_info_present_flag is equal to 0

  • decoder_model_info_present_flag is equal to 0

  • initial_display_delay_present_flag is equal to 0

  • enable_restoration is equal to 0

  • enable_cdef is equal to 0

  • mono_chrome is equal to 0

  • TileHeight is equal to (use_128x128_superblock ? 128 : 64) for all tiles (i.e. the tile is exactly one superblock high)

  • TileWidth is identical for all tiles and is an integer multiple of TileHeight (i.e. the tile is an integer number of superblocks wide)

  • FrameWidth is equal to MiCols * MI_SIZE

  • FrameHeight is equal to MiRows * MI_SIZE

  • show_existing_frame is equal to 0

  • frame_type is equal to INTER_FRAME

  • show_frame is equal to 1

  • error_resilient_mode is equal to 0

  • disable_cdf_update is equal to 1

  • disable_frame_end_update_cdf is equal to 1

  • delta_lf_present is equal to 0

  • delta_q_present is equal to 0

  • frame_size_override_flag is equal to 0

  • refresh_frame_flags is equal to 0

  • use_ref_frame_mvs is equal to 0

  • segmentation_temporal_update is equal to 0

  • reference_select is equal to 0

  • loop_filter_level[ 0 ] and loop_filter_level[ 1 ] are equal to 0

  • tile_count_minus_1 + 1 is less than or equal to (output_frame_width_in_tiles_minus_1 + 1) * (output_frame_height_in_tiles_minus_1 + 1).

Decode camera tile process

This process decodes a single tile within a frame.

The output of this process are arrays OutY, OutU, OutV containing the decoded samples for the tile.

Note: The decoding process defined here does not invoke the post-processing steps of deblock, cdef, superres, loop restoration and reference frame update. Implementations may choose to implement this process by using the general decode process with these tools disabled.

The process is specified as:

CurrentQIndex = base_q_idx
init_symbol( tile_data_size_minus_1 + 1 )
clear_above_context( )
sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64
sbSize4 = Num_4x4_Blocks_Wide[ sbSize ]
MiRowStart = MiRowStarts[ anchor_tile_row ]
MiRowEnd = MiRowStarts[ anchor_tile_row + 1 ]
MiColStart = MiColStarts[ anchor_tile_col ]
MiColEnd = MiColStarts[ anchor_tile_col + 1 ]
for ( r = MiRowStart; r < MiRowEnd; r += sbSize4 ) {
    clear_left_context( )
    for ( c = MiColStart; c < MiColEnd; c += sbSize4 ) {
        ReadDeltas = delta_q_present
        clear_block_decoded_flags( c < ( MiColEnd - 1 ) )
        decode_partition( r, c, sbSize )
    }
}
exit_symbol( )
w = (MiColEnd - MiColStart) * MI_SIZE
h = (MiRowEnd - MiRowStart) * MI_SIZE
x0 = MiColStart * MI_SIZE
y0 = MiRowStart * MI_SIZE
subX = subsampling_x
subY = subsampling_y
xC0 = ( MiColStart * MI_SIZE ) >> subX
yC0 = ( MiRowStart * MI_SIZE ) >> subY

Note: The intention is that the same decoding process for tile data can be used as for the general decoding process.

It is a requirement of bitstream conformance that the following conditions are met whenever the parsing process returns from the read_ref_frames syntax:

  • RefFrame[ 0 ] = LAST_FRAME

  • RefFrame[ 1 ] = NONE

Note: It is allowed to use intra blocks, they are not forbidden by this constraint because intra blocks do not invoke the read_ref_frames syntax.

Arrays OutY, OutU, OutV (representing the decoded samples for the tile) are specified as:

  • The array OutY is w samples across by h samples down and the sample at location x samples across and y samples down is given by OutY[ y ][ x ] = CurrFrame[ 0 ][ y0 + y ][ x0 + x ] with x = 0..w - 1 and y = 0..h - 1.

  • The array OutU is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutU[ y ][ x ] = CurrFrame[ 1 ][ yC0 + y ][ xC0 + x ] with x = 0..(w >> subX) - 1 and y = 0..(h >> subY) - 1.

  • The array OutV is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutV[ y ][ x ] = CurrFrame[ 2 ][ yC0 + y ][ xC0 + x ] with x = 0..(w >> subX) - 1 and y = 0..(h >> subY) - 1.

The output of this process is arrays OutY, OutU, OutV representing the Y, U, and V samples.

Decode frame wrapup process

This process is triggered by a call to decode_frame_wrapup from within the syntax tables.

At this stage, all the tile level decode has been done, and this process performs any frame level decode that is required.

If show_existing_frame is equal to 0, the process first performs any post processing filtering by the following ordered steps:

  1. If loop_filter_level[ 0 ] is not equal to 0 or loop_filter_level[ 1 ] is not equal to 0, the loop filter process specified in section 7.14 is invoked (this process modifies the contents of CurrFrame).

  2. The CDEF process specified in section 7.15 is invoked (this process takes CurrFrame and produces CdefFrame).

  3. The upscaling process specified in section 7.16 is invoked with CdefFrame as input and the output is assigned to UpscaledCdefFrame.

  4. The upscaling process specified in section 7.16 is invoked with CurrFrame as input and the output is assigned to UpscaledCurrFrame.

  5. The loop restoration process specified in section 7.17 is invoked (this process takes UpscaledCurrFrame and UpscaledCdefFrame and produces LrFrame).

  6. The motion field motion vector storage process specified in section 7.19 is invoked.

  7. If segmentation_enabled is equal to 1 and segmentation_update_map is equal to 0, SegmentIds[ row ][ col ] is set equal to PrevSegmentIds[ row ][ col ] for row = 0..MiRows-1, for col = 0..MiCols-1.

Otherwise (show_existing_frame is equal to 1), if frame_type is equal to KEY_FRAME, the reference frame loading process as specified in section 7.21 is invoked (this process loads frame state from the reference frames into the current frame state variables).

The following ordered steps now apply:

  1. The reference frame update process as specified in section 7.20 is invoked (this process saves the current frame state into the reference frames).

  2. If show_frame is equal to 1 or show_existing_frame is equal to 1, the output process as specified in section 7.18 is invoked (this will output the current frame or a saved frame).

Note: Although it is specified that all samples in CurrFrame are upscaled, at most 2 lines above and below each stripe (defined by StripeStartY and StripeEndY) will end up being read. Implementations may wish to avoid upscaling the unused lines.

Ordering of OBUs

A bitstream conforming to this specification consists of one or more coded video sequences.

A coded video sequence consists of one or more temporal units. A temporal unit consists of a series of OBUs starting from a temporal delimiter, optional sequence headers, optional metadata OBUs, a sequence of one or more frame headers, each followed by zero or more tile group OBUs as well as optional padding OBUs.

If scalability is not being used (OperatingPointIdc equal to 0), then all frames are part of the operating point. The following constraints must hold:

  • The first frame header must have frame_type equal to KEY_FRAME and show_frame equal to 1.

  • Each temporal unit must have exactly one shown frame.

If scalability is being used (OperatingPointIdc not equal to 0), then only a subset of frames are part of the operating point. For each operating point, the following constraints must hold:

  • The first frame header that will be decoded must have frame_type equal to KEY_FRAME and show_frame equal to 1.

  • Every layer that has a coded frame in a temporal unit must have exactly one shown frame that is the last frame of that layer in the temporal unit.

Note: A shown frame is either a frame with show_frame equal to 1, or with show_existing_frame equal to 1.

A frame header and its associated tile group OBUs within a temporal unit must use the same value of obu_extension_flag (i.e., either both include or both not include the optional OBU extension header).

All OBU extension headers that are contained in the same temporal unit and have the same spatial_id value must have the same temporal_id value.

If a coded video sequence contains at least one enhancement layer (OBUs with spatial_id greater than 0 or temporal_id greater than 0) then all frame headers and tile group OBUs associated with base (spatial_id equals 0 and temporal_id equals 0) and enhancement layer (spatial_id greater than 0 or temporal_id greater than 0) data must include the OBU extension header.

OBUs with spatial level IDs (spatial_id) greater than 0 must appear within a temporal unit in increasing order of the spatial level ID values.

The first temporal unit of a coded video sequence must contain one or more sequence header OBUs before the first frame header OBU.

Note: There is not a requirement that every temporal unit with a key frame also contains a sequence header, just that the sequence header has been sent before the first key frame. However, note that temporal units without sequence header OBUs are not considered to be random access points.

Sequence header OBUs may appear in any order within a coded video sequence. Within a particular coded video sequence, the contents of sequence_header_obu must be bit-identical each time the sequence header appears except for the contents of operating_parameters_info. A new coded video sequence is required if the sequence header parameters change. Any sequence header in a bitstream which changes the parameters must be contained in a temporal unit with temporal_id equal to zero.

If a temporal unit contains one or more sequence header OBUs, the first appearance of the sequence header OBU must be before the first frame header OBU.

One or more metadata and padding OBUs may appear in any order within an OBU sequence (unless constrained by semantics provided elsewhere in this specification). Specific metadata types may be required or recommended to be placed in specific locations, as identified in their corresponding definitions.

OBU types that are not defined in this specification can be ignored by a decoder.

Note: Some applications may choose to use bitstreams that are not fully conformant to the requirements described in this section. For example, a bitstream received in a streaming use case may never contain key frames, but instead rely on gradual intra refresh.

Random access decoding

General

In general, random access points are places in a bitstream where decoding can be started.

This section defines the types of random access point that must be supported by all conformant decoders.

The purpose of this section is to define a minimum level of functionality that must be supported, not a maximum. In other words, decoders may choose to support more types of random access point.

The random access points are defined in section 7.6.2.

The conformance requirements are specified in section 7.6.3.

The consequences for encoders are specified in section 7.6.4.

The consequences for decoders are specified in section 7.6.5.

Definitions

This section defines the following terms:

  • key frame random access point,

  • delayed random access point,

  • key frame dependent recovery point.

A key frame random access point is defined as being a frame:

  • with frame_type equal to KEY_FRAME

  • with show_frame equal to 1

  • that is contained in a temporal unit that also contains a sequence header OBU

A delayed random access point is defined as being a frame:

  • with frame_type equal to KEY_FRAME

  • with show_frame equal to 0

  • that is contained in a temporal unit that also contains a sequence header OBU

A key frame dependent recovery point is defined as being a frame:

  • with show_existing_frame equal to 1

  • with frame_to_show_map_idx specifying a frame to output that was a delayed random access point

Conformance requirements

Informally, the requirement for decoder conformance is that decoding can start at any key frame random access point or delayed random access point. The rest of this section makes this requirement more precise.

Starting at a key frame random access point is trivial, because if the earlier temporal units are dropped, the remaining temporal units still constitute a valid bitstream.

Starting at a delayed random access point is harder to define because:

  • if all temporal units before the key frame dependent recovery point are dropped, it is impossible to decode (because the relevant delayed random access point has been dropped)

  • if all temporal units before the delayed random access point are dropped, it is unclear what should happen for frames between the delayed random access point and the key-frame dependent recovery point (some applications may wish these to be dropped, while others may wish them to be displayed)

  • in either case, the remaining temporal units do not constitute a valid standalone bitstream (because it does not start with a shown key frame)

To support the different modes of operation, a conformant decoder is required to be able to decode bitstreams consisting of:

  • a temporal unit containing a delayed random access point

  • immediately followed by a temporal unit containing the associated key frame dependent recovery point

  • followed by optional additional temporal units

This moves the responsibility for dropping the intermediate temporal units (between the delayed random access point and the key frame dependent recovery point) out of the normatively defined decoding process into application specific behavior. This allows applications to choose which behavior to use depending on the use case and capabilities of the specific decoder implementation.

Note: In practice, decoder implementations are expected to be able to start decoding bitstreams from a delayed random access point when the intermediate temporal units are still present. The decoder should correctly produce all output frames from the next key frame or key frame dependent recovery point onwards, while the preceding frames are implementation defined. For example: a streaming decoder may choose to decode and display all frames even when the reference frames are not available (tolerating some errors in the output), a low latency decoder may choose to decode and display all frames that are guaranteed to be correct (e.g. an inter frame that only uses inter prediction from the delayed random access point), a media player decoder may choose to decode and display only frames starting from a key frame or key frame dependent recovery point (guaranteeing smooth playback once display starts).

Encoder consequences

Random access points introduce no additional conformance requirements on encoders.

Encoders are free to insert any number of random access points.

Decoder consequences

The conformance requirement means that conformant decoders must be able to start decoding at a delayed random access point partway through a valid bitstream.

This is almost the same as decoding a bitstream from the start - the only differences are that:

  • The first frame has show_frame equal to 0.

  • If frame_id_numbers_present_flag is equal to 1, for the first frame current_frame_id should not be compared to PrevFrameID (because PrevFrameID is uninitialized).

Frame end update CDF process

This process is triggered when the function frame_end_update_cdf is called from the tile group syntax table.

The frame CDF arrays are set equal to the saved CDF arrays as follows.

A copy is made of the saved CDF values for each of the CDF arrays mentioned in the semantics for init_coeff_cdfs and init_non_coeff_cdfs. The name of the destination for the copy is the name of the CDF array with no prefix. The name of the source for the copy is the name of the CDF array prefixed with “Saved”. For example, the array YModeCdf will be updated with values equal to the contents of SavedYModeCdf.

Set frame refs process

This process is triggered if the function set_frame_refs is called while reading the uncompressed header.

The syntax elements in the ref_frame_idx array are computed based on:

  • the syntax elements last_frame_idx and gold_frame_idx,
  • the values stored within the RefOrderHint array (these values represent the least significant bits of the expected output order of the frames).

The reference frames used for the LAST_FRAME and GOLDEN_FRAME references are sent explicitly and used to set the corresponding entries of ref_frame_idx as follows (the other entries are initialized to -1 and will be overwritten later in this process):

for ( i = 0; i < REFS_PER_FRAME; i++ )
  ref_frame_idx[ i ] = -1
ref_frame_idx[ LAST_FRAME - LAST_FRAME ] = last_frame_idx
ref_frame_idx[ GOLDEN_FRAME - LAST_FRAME ] = gold_frame_idx

An array usedFrame marking which reference frames have been used is prepared as follows:

for ( i = 0; i < NUM_REF_FRAMES; i++ )
  usedFrame[ i ] = 0
usedFrame[ last_frame_idx ] = 1
usedFrame[ gold_frame_idx ] = 1

A variable curFrameHint is set equal to 1 << (OrderHintBits - 1).

An array shiftedOrderHints (containing the expected output order shifted such that the current frame has hint equal to curFrameHint) is prepared as follows:

for ( i = 0; i < NUM_REF_FRAMES; i++ )
  shiftedOrderHints[ i ] = curFrameHint + get_relative_dist( RefOrderHint[ i ], OrderHint )

The variable lastOrderHint (representing the expected output order for LAST_FRAME) is set equal to shiftedOrderHints[ last_frame_idx ].

It is a requirement of bitstream conformance that lastOrderHint is strictly less than curFrameHint.

The variable goldOrderHint (representing the expected output order for GOLDEN_FRAME) is set equal to shiftedOrderHints[ gold_frame_idx ].

It is a requirement of bitstream conformance that goldOrderHint is strictly less than curFrameHint.

The ALTREF_FRAME reference is set to be a backward reference to the frame with highest output order as follows:

ref = find_latest_backward()
if ( ref >= 0 ) {
  ref_frame_idx[ ALTREF_FRAME - LAST_FRAME ] = ref
  usedFrame[ ref ] = 1
}

where find_latest_backward is defined as:

find_latest_backward() {
  ref = -1
  for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
    hint = shiftedOrderHints[ i ]
    if ( !usedFrame[ i ] && 
         hint >= curFrameHint &&
         ( ref < 0 || hint >= latestOrderHint ) ) {
      ref = i
      latestOrderHint = hint
    }
  }
  return ref
}

The BWDREF_FRAME reference is set to be a backward reference to the closest frame as follows:

ref = find_earliest_backward()
if ( ref >= 0 ) {
  ref_frame_idx[ BWDREF_FRAME - LAST_FRAME ] = ref
  usedFrame[ ref ] = 1
}

where find_earliest_backward is defined as:

find_earliest_backward() {
  ref = -1
  for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
    hint = shiftedOrderHints[ i ]
    if ( !usedFrame[ i ] && 
         hint >= curFrameHint &&
         ( ref < 0 || hint < earliestOrderHint ) ) {
      ref = i
      earliestOrderHint = hint
    }
  }
  return ref
}

The ALTREF2_FRAME reference is set to the next closest backward reference as follows:

ref = find_earliest_backward()
if ( ref >= 0 ) {
  ref_frame_idx[ ALTREF2_FRAME - LAST_FRAME ] = ref
  usedFrame[ ref ] = 1
}

The remaining references are set to be forward references in anti-chronological order as follows:

for ( i = 0; i < REFS_PER_FRAME - 2; i++ ) {
  refFrame = Ref_Frame_List[ i ]
  if ( ref_frame_idx[ refFrame - LAST_FRAME ] < 0 ) {
    ref = find_latest_forward()
    if ( ref >= 0 ) {
      ref_frame_idx[ refFrame - LAST_FRAME ] = ref
      usedFrame[ ref ] = 1
    }
  }
}

where Ref_Frame_List is specifed as:

Ref_Frame_List[ REFS_PER_FRAME - 2 ] = {
  LAST2_FRAME, LAST3_FRAME, BWDREF_FRAME, ALTREF2_FRAME, ALTREF_FRAME
}

and find_latest_forward is defined as:

find_latest_forward() {
  ref = -1
  for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
    hint = shiftedOrderHints[ i ]
    if ( !usedFrame[ i ] && 
         hint < curFrameHint &&
         ( ref < 0 || hint >= latestOrderHint ) ) {
      ref = i
      latestOrderHint = hint
    }
  }
  return ref
}

Finally, any remaining references are set to the reference frame with smallest output order as follows:

ref = -1
for ( i = 0; i < NUM_REF_FRAMES; i++ )
  hint = shiftedOrderHints[ i ]
  if ( ref < 0 || hint < earliestOrderHint ) {
    ref = i
    earliestOrderHint = hint
  }
}
for ( i = 0; i < REFS_PER_FRAME; i++ ) {
  if ( ref_frame_idx[ i ] < 0 ) {
    ref_frame_idx[ i ] = ref
  }
}

Note: Multiple reference frames can share the same value for OrderHint and care needs to be taken to handle this case consistently. The reference implementation uses an equivalent implementation based on sorting the reference frames based on their expected output order, with ties broken based on the reference frame index.

Motion field estimation process

General

This process is triggered by a call to motion_field_estimation while reading the uncompressed header.

A linear projection model is employed to create a motion field estimation that is able to capture high velocity temporal motion trajectories.

The motion field is estimated based on the saved motion vectors from the reference frames and the relative frame distances.

As the frame distances depend on the frame being referenced, a separate motion field is estimated for each reference frame used by the current frame.

A motion vector (for each reference frame type) is prepared at each location on an 8x8 luma sample grid.

The variable w8 (representing the width of the motion field in units of 8x8 luma samples) is set equal to MiCols >> 1.

The variable h8 (representing the height of the motion field in units of 8x8 luma samples) is set equal to MiRows >> 1.

As the linear projection can create a field with holes, the motion fields are initialized to an invalid motion vector of -32768, -32768 as follows:

for ( ref = LAST_FRAME; ref <= ALTREF_FRAME; ref++ )
    for ( y = 0; y < h8 ; y++ )
        for ( x = 0; x < w8; x++ )
            for ( j = 0; j < 2; j++ )
                MotionFieldMvs[ ref ][ y ][ x ][ j ] = -1 << 15

The variable lastIdx (representing which reference frame is used for LAST_FRAME) is set equal to ref_frame_idx[ 0 ].

The variable curGoldOrderHint (representing the expected output order for GOLDEN_FRAME of the current frame) is set equal to OrderHints[ GOLDEN_FRAME ].

The variable lastAltOrderHint (representing the expected output order for ALTREF_FRAME of LAST_FRAME) is set equal to SavedOrderHints[ lastIdx ][ ALTREF_FRAME ].

The variable useLast (representing whether to project the motion vectors from LAST_FRAME) is set equal to ( lastAltOrderHint != curGoldOrderHint ).

If useLast is equal to 1, the projection process in section 7.9.2 is invoked with src equal to LAST_FRAME, and dstSign equal to -1. (The output of this process is discarded.)

The variable refStamp (that limits how many reference frames have to be projected) is set equal to MFMV_STACK_SIZE - 2.

The variable useBwd is set equal to get_relative_dist( OrderHints[ BWDREF_FRAME ], OrderHint ) > 0.

If useBwd is equal to 1, the following steps apply:

  • The projection process in section 7.9.2 is invoked with src equal to BWDREF_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

  • If projOutput is equal to 1, refStamp is set equal to refStamp - 1.

The variable useAlt2 is set equal to get_relative_dist( OrderHints[ ALTREF2_FRAME ], OrderHint ) > 0.

If useAlt2 is equal to 1, the following steps apply:

  • The projection process in section 7.9.2 is invoked with src equal to ALTREF2_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

  • If projOutput is equal to 1, refStamp is set equal to refStamp - 1.

The variable useAlt is set equal to get_relative_dist( OrderHints[ ALTREF_FRAME ], OrderHint ) > 0.

If useAlt is equal to 1 and (refStamp >= 0), the following steps apply:

  • The projection process in section 7.9.2 is invoked with src equal to ALTREF_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

  • If projOutput is equal to 1, refStamp is set equal to refStamp - 1.

If ( refStamp >= 0 ), the projection process in section 7.9.2 is invoked with src equal to LAST2_FRAME, and dstSign equal to -1. (The output of this process is discarded.)

Projection process

The inputs to this process are:

  • a variable src specifying which reference frame’s motion vectors should be projected,

  • a variable dstSign specifying a negation multiplier for the motion vector direction.

The process projects the motion vectors from a whole reference frame and stores the results in MotionFieldMvs.

The process outputs a single boolean value representing whether the source frame was valid for this operation. If the output is zero, no modification is made to MotionFieldMvs.

The variable srcIdx (representing which reference frame is used) is set equal to ref_frame_idx[ src - LAST_FRAME ].

The variable w8 (representing the width of the motion field in units of 8x8 luma samples) is set equal to MiCols >> 1.

The variable h8 (representing the height of the motion field in units of 8x8 luma samples) is set equal to MiRows >> 1.

If RefMiRows[ srcIdx ] is not equal to MiRows, RefMiCols[ srcIdx ] is not equal to MiCols, or RefFrameType[ srcIdx ] is equal to INTRA_ONLY_FRAME or KEY_FRAME, the process exits at this point, with the output set equal to 0.

The process is specified as follows:

for ( y8 = 0; y8 < h8; y8++ ) {
    for ( x8 = 0; x8 < w8; x8++ ) {
        row = 2 * y8 + 1
        col = 2 * x8 + 1
        srcRef = SavedRefFrames[ srcIdx ][ row ][ col ]
        if ( srcRef > INTRA_FRAME ) {
            refToCur = get_relative_dist( OrderHints[ src ], OrderHint )
            refOffset = get_relative_dist( OrderHints[ src ], SavedOrderHints[ srcIdx ][ srcRef ] )
            posValid = Abs( refToCur ) <= MAX_FRAME_DISTANCE &&
                       Abs( refOffset ) <= MAX_FRAME_DISTANCE &&
                       refOffset > 0
            if ( posValid ) {
                mv = SavedMvs[ srcIdx ][ row ][ col ]
                projMv = get_mv_projection( mv, refToCur * dstSign, refOffset )
                posValid = get_block_position( x8, y8, dstSign, projMv )
                if ( posValid ) {
                    for ( dst = LAST_FRAME; dst <= ALTREF_FRAME; dst++ ) {
                        refToDst = get_relative_dist( OrderHint, OrderHints[ dst ] )
                        projMv = get_mv_projection( mv, refToDst, refOffset )
                        MotionFieldMvs[ dst ][ PosY8 ][ PosX8 ] = projMv
                    }
                }
            }
        }
    }
}

When the function get_mv_projection is called, the get mv projection process specified in section 7.9.3 is invoked and the output assigned to projMv.

When the function get_block_position is called, the get block position process specified in section 7.9.4 is invoked and the output assigned to posValid. This process also sets up the variables PosY8 and PosX8 representing the projected location in the motion field.

The process now exits with the output set equal to 1.

Get MV projection process

The inputs to this process are:

  • a length 2 array mv specifying a motion vector,

  • a variable numerator specifying the number of frames to be covered by the projected motion vector,

  • a variable denominator specifying the number of frames covered by the original motion vector.

The outputs of this process are:

  • a length 2 array projMv containing the projected motion vector

This process starts with a motion vector mv from a previous frame. This motion vector gives the displacement expected when moving a certain number of frames (given by the variable denominator). In order to use the motion vector for predictions using a different reference frame, the length of the motion vector must be scaled.

The variable clippedDenominator is set equal to Min( MAX_FRAME_DISTANCE, denominator ).

The variable clippedNumerator is set equal to Clip3( -MAX_FRAME_DISTANCE, MAX_FRAME_DISTANCE, numerator ).

The projected motion vector is specified as follows:

for ( i = 0; i < 2; i++ ) {
    scaled = Round2Signed( mv[ i ] * clippedNumerator * Div_Mult[ clippedDenominator ], 14 )
    projMv[ i ] = Clip3( -(1 << 14) + 1, (1 << 14) - 1, scaled )
}

where Div_Mult is a constant lookup table specified as:

Div_Mult[32] = {
  0,    16384, 8192, 5461, 4096, 3276, 2730, 2340, 2048, 1820, 1638,
  1489, 1365,  1260, 1170, 1092, 1024, 963,  910,  862,  819,  780,
  744,  712,   682,  655,  630,  606,  585,  564,  546,  528
}

Get block position process

The inputs to this process are:

  • variables x8 and y8 specifying a location in units of 8x8 luma samples,

  • a variable dstSign specifying a negation multiplier for the motion vector direction,

  • a length 2 array projMv specifying a projected motion vector.

The process generates global variables PosX8 and PosY8 representing the projected location in units of 8x8 luma samples.

The process returns a flag posValid that indicates if the position should be used.

Note: posValid is specified such that only blocks within a certain distance of the current location need to be projected.

The variable posValid is set equal to 1.

The variable PosY8 is set equal to project(y8, projMv[ 0 ], dstSign, MiRows >> 1, MAX_OFFSET_HEIGHT).

The variable PosX8 is set equal to project(x8, projMv[ 1 ], dstSign, MiCols >> 1, MAX_OFFSET_WIDTH).

where the function project is specified as follows:

project( v8, delta, dstSign, max8, maxOff8 ) {
    base8 = (v8 >> 3) << 3
    if ( delta >= 0 ) {
        offset8 = delta >> ( 3 + 1 + MI_SIZE_LOG2 )
    } else {
        offset8 = -( ( -delta ) >> ( 3 + 1 + MI_SIZE_LOG2 ) )
    }
    v8 += dstSign * offset8
    if ( v8 < 0 || 
         v8 >= max8 || 
         v8 < base8 - maxOff8 ||
         v8 >= base8 + 8 + maxOff8 ) {
        posValid = 0
    }
    return v8
}

The project function clears posValid if the resulting position is offset too far.

Motion vector prediction processes

General

The following sections define the processes used for predicting the motion vectors.

The entry point to these processes is triggered by the function call to find_mv_stack in the inter block mode info syntax described in section 5.11.23. This function call invokes the Find MV Stack Process specified in section 7.10.2.

Find MV stack process

This process is triggered by a function call to find_mv_stack.

The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process constructs an array RefStackMv containing motion vector candidates.

The process also prepares the value of the contexts used when decoding inter prediction syntax elements.

The array RefStackMv will be constructed during this process. RefStackMv[ idx ][ list ][ comp ] represents component comp (0 for y or 1 for x) of a motion vector for a particular list (0 or 1) at position idx (0 to MAX_REF_MV_STACK_SIZE - 1) in the stack. No initialization is needed because each entry is always written before it can be read.

The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].

The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].

The following ordered steps apply:

  1. The variable NumMvFound (representing the number of motion vector candidates in RefStackMv) is set equal to 0.

  2. The variable NewMvCount (representing the number of candidates found that used NEWMV encoding) is set equal to 0.

  3. The setup global mv process specified in section 7.10.2.1 is invoked with the input 0 and the output is assigned to GlobalMvs[ 0 ].

  4. If isCompound is equal to 1, the setup global mv process specified in section 7.10.2.1 is invoked with the input 1 and the output is assigned to GlobalMvs[ 1 ].

  5. The variable FoundMatch is set equal to 0.

  6. The scan row process in section 7.10.2.2 is invoked with deltaRow equal to -1 and isCompound as inputs.

  7. The variable foundAboveMatch is set equal to FoundMatch, and FoundMatch is set equal to 0.

  8. The scan col process in section 7.10.2.3 is invoked with deltaCol equal to -1 and isCompound as inputs.

  9. The variable foundLeftMatch is set equal to FoundMatch, and FoundMatch is set equal to 0.

  10. If Max( bw4, bh4 ) is less than or equal to 16, the scan point process in section 7.10.2.4 is invoked with deltaRow equal to -1, deltaCol equal to bw4, and isCompound as inputs.

  11. If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

  12. The variable CloseMatches (representing candidates found in the immediate neighborhood) is set equal to foundAboveMatch + foundLeftMatch.

  13. The variable numNearest (representing the number of motion vectors found in the immediate neighborhood) is set equal to NumMvFound

  14. The variable numNew (representing the number of times a NEWMV candidate was found in the immediate neighborhood) is set equal to NewMvCount

  15. If numNearest is greater than 0, WeightStack[ idx ] is incremented by REF_CAT_LEVEL for idx = 0..(numNearest-1).

  16. The variable ZeroMvContext is set equal to 0.

  17. If use_ref_frame_mvs is equal to 1, the temporal scan process in section 7.10.2.5 is invoked with isCompound as input (the temporal scan process affects ZeroMvContext).

  18. The scan point process in section 7.10.2.4 is invoked with deltaRow equal to -1, deltaCol equal to -1, and isCompound as inputs.

  19. If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

  20. The variable FoundMatch is set equal to 0.

  21. The scan row process in section 7.10.2.2 is invoked with deltaRow equal to -3 and isCompound as inputs.

  22. If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

  23. The variable FoundMatch is set equal to 0.

  24. The scan col process in section 7.10.2.3 is invoked with deltaCol equal to -3 and isCompound as inputs.

  25. If FoundMatch is equal to 1, the variable foundLeftMatch is set equal to 1.

  26. The variable FoundMatch is set equal to 0.

  27. If bh4 is greater than 1, the scan row process in section 7.10.2.2 is invoked with deltaRow equal to -5 and isCompound as inputs.

  28. If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

  29. The variable FoundMatch is set equal to 0.

  30. If bw4 is greater than 1, the scan col process in section 7.10.2.3 is invoked with deltaCol equal to -5 and isCompound as inputs.

  31. If FoundMatch is equal to 1, the variable foundLeftMatch is set equal to 1.

  32. The variable TotalMatches (representing all found candidates) is set equal to foundAboveMatch + foundLeftMatch.

  33. The sorting process in section 7.10.2.11 is invoked with start equal to 0, end equal to numNearest, and isCompound as input.

  34. The sorting process in section 7.10.2.11 is invoked with start equal to numNearest, end equal to NumMvFound, and isCompound as input.

  35. If NumMvFound is less than 2, the extra search process in section 7.10.2.12 is invoked with isCompound as input.

  36. The context and clamping process in section 7.10.2.14 is invoked with isCompound and numNew as input.

Setup global MV process

The input to this process is a variable refList specifying which set of motion vectors to predict.

The output is a motion vector mv representing global motion for this block.

The variable ref (specifying the reference frame) is set equal to RefFrame[ refList ].

If ref is not equal to INTRA_FRAME, the variable typ (specifying the type of global motion) is set equal to GmType[ ref ].

The variable bw (representing the width of the block in units of luma samples) is set equal to Block_Width[ MiSize ].

The variable bh (representing the height of the block in units of luma samples) is set equal to Block_Height[ MiSize ].

The output motion vector mv is specified by projecting the central luma sample of the block as follows:

if ( ref == INTRA_FRAME || typ == IDENTITY ) {
    mv[0] = 0
    mv[1] = 0
} else if ( typ == TRANSLATION ) {
    mv[0] = gm_params[ref][0] >> (WARPEDMODEL_PREC_BITS - 3)
    mv[1] = gm_params[ref][1] >> (WARPEDMODEL_PREC_BITS - 3)
} else {
    x = MiCol * MI_SIZE + bw / 2 - 1
    y = MiRow * MI_SIZE + bh / 2 - 1
    xc = (gm_params[ref][2] - (1 << WARPEDMODEL_PREC_BITS)) * x +
          gm_params[ref][3] * y +
          gm_params[ref][0]
    yc =  gm_params[ref][4] * x +
         (gm_params[ref][5] - (1 << WARPEDMODEL_PREC_BITS)) * y +
          gm_params[ref][1]
    if ( allow_high_precision_mv ) {
      mv[0] = Round2Signed(yc, WARPEDMODEL_PREC_BITS - 3)
      mv[1] = Round2Signed(xc, WARPEDMODEL_PREC_BITS - 3)
    } else {
      mv[0] = Round2Signed(yc, WARPEDMODEL_PREC_BITS - 2) * 2
      mv[1] = Round2Signed(xc, WARPEDMODEL_PREC_BITS - 2) * 2
    }
}
lower_mv_precision( mv )

where the call to lower_mv_precision invokes the lower precision process specified in section 7.10.2.10.

Scan row process

The inputs to this process are:

  • a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for motion vectors,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].

The variable end4 specifying the last block to be scanned in horizontal 4x4 luma samples is set equal to Min( Min( bw4, MiCols - MiCol ), 16 ).

Note: end4 limits the number of locations to be searched for large blocks. There is a similar optimization in that the scan point process for the top-right location is not invoked for large blocks. For example, for a 64 by 64 block, candidates from the row above will be examined at x offsets of -1, 0, 16, 32, 48, 64. (The 0, 16, 32, 48 locations are scanned in this process, while the -1 and 64 are scanned by the scan point process.) However, for a 128 by 64 or 64 by 128 block, candidates from the row above will only be examined at x offsets of -1, 0, 16, 32, 48 because the scan point process for the top-right location is not invoked.

The variable deltaCol is set equal to 0.

The variable useStep16 is set equal to (bw4 >= 16).

Note: useStep16 is equal to 1 when the block is 64 luma samples wide or wider. This means only 4 locations will be searched in this case. However, a 32 luma samples wide block may still search 8 locations.

If Abs(deltaRow) is greater than 1, the offset is adjusted as follows:

deltaRow += MiRow & 1
deltaCol = 1 - (MiCol & 1)

Note: These adjustments reduce the number of motion vectors that need to be kept in memory.

A series of motion vector locations is scanned as follows:

i = 0
while ( i < end4 ) {
    mvRow = MiRow + deltaRow
    mvCol = MiCol + deltaCol + i
    if ( !is_inside(mvRow,mvCol) )
        break
    len = Min(bw4, Num_4x4_Blocks_Wide[ MiSizes[ mvRow ][ mvCol ] ])
    if ( Abs(deltaRow) > 1 )
        len = Max(2, len)
    if ( useStep16 )
        len = Max(4, len)
    weight = len * 2
    add_ref_mv_candidate( mvRow, mvCol, isCompound, weight)
    i += len
}

where the call to add_ref_mv_candidate invokes the process in section 7.10.2.7.

Scan col process

The inputs to this process are:

  • a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for motion vectors,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].

The variable end4 specifying the last block to be scanned in vertical 4x4 luma samples is set equal to Min( Min( bh4, MiRows - MiRow ), 16 ).

The variable deltaRow is set equal to 0.

The variable useStep16 is set equal to (bh4 >= 16).

If Abs(deltaCol) is greater than 1, the offset is adjusted as follows:

deltaRow = 1 - (MiRow & 1)
deltaCol += MiCol & 1

A series of motion vector locations is scanned as follows:

i = 0
while ( i < end4 ) {
    mvRow = MiRow + deltaRow + i
    mvCol = MiCol + deltaCol
    if ( !is_inside(mvRow,mvCol) )
        break
    len = Min(bh4, Num_4x4_Blocks_High[ MiSizes[ mvRow ][ mvCol ] ])
    if ( Abs(deltaCol) > 1 )
        len = Max(2, len)
    if ( useStep16 )
        len = Max(4, len)
    weight = len * 2
    add_ref_mv_candidate( mvRow, mvCol, isCompound, weight )
    i += len
}

where the call to add_ref_mv_candidate invokes the process in section 7.10.2.7.

Scan point process

The inputs to this process are:

  • a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for a motion vector,

  • a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for a motion vector,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

The variable mvRow is set equal to MiRow + deltaRow.

The variable mvCol is set equal to MiCol + deltaCol.

The variable weight is set equal to 4.

If is_inside( mvRow, mvCol ) is equal to 1 and RefFrames[ mvRow ][ mvCol ][ 0 ] has been written for this frame (this checks that the candidate location has been decoded), the add reference motion vector process in section 7.10.2.7 is invoked with mvRow, mvCol, isCompound, weight as inputs.

Temporal scan process

The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process scans the motion vectors in a previous frame looking for candidates which use the same reference frame.

The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].

The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].

The variable stepW4 is set equal to ( bw4 >= 16 ) ? 4 : 2.

The variable stepH4 is set equal to ( bh4 >= 16 ) ? 4 : 2.

The process scans the locations within the block as follows:

for ( deltaRow = 0; deltaRow < Min( bh4, 16 ) ; deltaRow += stepH4 ) {
    for ( deltaCol = 0; deltaCol < Min( bw4, 16 ) ; deltaCol += stepW4 ) {
        add_tpl_ref_mv( deltaRow, deltaCol, isCompound)
    }
}

where the call to add_tpl_ref_mv invokes the temporal sample process in section 7.10.2.6.

The process then scans positions around the block (but still within the same superblock) as follows:

allowExtension = ((bh4 >= Num_4x4_Blocks_High[BLOCK_8X8]) && 
                  (bh4 < Num_4x4_Blocks_High[BLOCK_64X64]) &&
                  (bw4 >= Num_4x4_Blocks_Wide[BLOCK_8X8]) &&
                  (bw4 < Num_4x4_Blocks_Wide[BLOCK_64X64]))
if ( allowExtension ) {
    for ( i = 0; i < 3; i++ ) {
        deltaRow = tplSamplePos[ i ][ 0 ]
        deltaCol = tplSamplePos[ i ][ 1 ]
        if ( check_sb_border( deltaRow, deltaCol ) ) { 
            add_tpl_ref_mv( deltaRow, deltaCol, isCompound)
        }
    }
}

where tplSamplePos contains the offsets to search (in units of 4x4 luma samples) and is specified as:

tplSamplePos[3][2] = {
    { bh4, -2 }, { bh4, bw4 }, { bh4 - 2, bw4 }
}

and check_sb_border checks that the position is within the same 64x64 block as follows:

check_sb_border( deltaRow, deltaCol ) {
    row = (MiRow & 15) + deltaRow 
    col = (MiCol & 15) + deltaCol
    
    return ( row >= 0 && row < 16 && col >= 0 && col < 16 )
}
Temporal sample process

The inputs to this process are:

  • variables deltaRow and deltaCol specifying (in units of 4x4 luma samples) the offset to the candidate location,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process looks up a motion vector from the motion field and adds it into the stack.

The variable mvRow is set equal to (MiRow + deltaRow) | 1.

The variable mvCol is set equal to (MiCol + deltaCol) | 1.

If is_inside( mvRow, mvCol ) is equal to 0, this process terminates immediately.

The variable x8 is set equal to mvCol >> 1.

The variable y8 is set equal to mvRow >> 1.

(x8 and y8 represent the position of the candidate in units of 8x8 luma samples.)

The process is specified as follows:

if ( deltaRow == 0 && deltaCol == 0 ) {
    ZeroMvContext = 1
}
if ( !isCompound ) {
    candMv = MotionFieldMvs[ RefFrame[ 0 ] ][ y8 ][ x8 ]
    if ( candMv[ 0 ] == -1 << 15 )
        return
    lower_mv_precision( candMv )
    if ( deltaRow == 0 && deltaCol == 0 ) {
        if ( Abs( candMv[ 0 ] - GlobalMvs[ 0 ][ 0 ] ) >= 16 || 
             Abs( candMv[ 1 ] - GlobalMvs[ 0 ][ 1 ] ) >= 16 )
            ZeroMvContext = 1
        else
            ZeroMvContext = 0
    }
    for ( idx = 0; idx < NumMvFound; idx++ ) {
        if ( candMv[ 0 ] == RefStackMv[ idx ][ 0 ][ 0 ] &&
             candMv[ 1 ] == RefStackMv[ idx ][ 0 ][ 1 ] )
             break
    }
    if ( idx < NumMvFound ) {
        WeightStack[ idx ] += 2
    } else if ( NumMvFound < MAX_REF_MV_STACK_SIZE ) {
        RefStackMv[ NumMvFound ][ 0 ] = candMv
        WeightStack[ NumMvFound ] = 2
        NumMvFound += 1
    }
} else {
    candMv0 = MotionFieldMvs[ RefFrame[ 0 ] ][ y8 ][ x8 ]
    if ( candMv0[ 0 ] == -1 << 15 )
        return
    candMv1 = MotionFieldMvs[ RefFrame[ 1 ] ][ y8 ][ x8 ]
    if ( candMv1[ 0 ] == -1 << 15 )
        return
    lower_mv_precision( candMv0 )
    lower_mv_precision( candMv1 )
    if ( deltaRow == 0 && deltaCol == 0 ) {
        if ( Abs( candMv0[ 0 ] - GlobalMvs[ 0 ][ 0 ] ) >= 16 || 
             Abs( candMv0[ 1 ] - GlobalMvs[ 0 ][ 1 ] ) >= 16 ||
             Abs( candMv1[ 0 ] - GlobalMvs[ 1 ][ 0 ] ) >= 16 || 
             Abs( candMv1[ 1 ] - GlobalMvs[ 1 ][ 1 ] ) >= 16 )
            ZeroMvContext = 1
        else
            ZeroMvContext = 0
    }
    for ( idx = 0; idx < NumMvFound; idx++ ) {
        if ( candMv0[ 0 ] == RefStackMv[ idx ][ 0 ][ 0 ]  &&
             candMv0[ 1 ] == RefStackMv[ idx ][ 0 ][ 1 ]  &&
             candMv1[ 0 ] == RefStackMv[ idx ][ 1 ][ 0 ]  &&
             candMv1[ 1 ] == RefStackMv[ idx ][ 1 ][ 1 ] )
             break
    }
    if ( idx < NumMvFound ) {
        WeightStack[ idx ] += 2
    } else if ( NumMvFound < MAX_REF_MV_STACK_SIZE ) {
        RefStackMv[ NumMvFound ][ 0 ] = candMv0
        RefStackMv[ NumMvFound ][ 1 ] = candMv1
        WeightStack[ NumMvFound ] = 2
        NumMvFound += 1
    }
}

where the call to lower_mv_precision invokes the lower precision process specified in section 7.10.2.10.

Add reference motion vector process

The inputs to this process are:

  • variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction,

  • a variable weight specifying the weight attached to this motion vector.

This process examines the candidate to find matching reference frames.

If IsInters[ mvRow ][ mvCol ] is equal to 0, this process terminates immediately.

If isCompound is equal to 0, the following applies for candList = 0..1:

  1. If RefFrames[ mvRow ][ mvCol ][ candList ] is equal to RefFrame[ 0 ], the search stack process in section 7.10.2.8 is invoked with mvRow, mvCol, weight, and candList as inputs.

Otherwise (isCompound is equal to 1), the following applies:

  1. If RefFrames[ mvRow ][ mvCol ][ 0 ] is equal to RefFrame[ 0 ] and RefFrames[ mvRow ][ mvCol ][ 1 ] is equal to RefFrame[ 1 ], the compound search stack process in section 7.10.2.9 is invoked with mvRow, mvCol, and weight as inputs.
Search stack process

The inputs to this process are:

  • variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

  • a variable candList specifying which list in the candidate matches our reference frame,

  • a variable weight proportional to the corresponding block width or height for the candidate motion vector.

This process searches the stack for an exact match with a candidate motion vector. If present, the weight of the candidate motion vector is added to the weight of its counterpart in the stack, otherwise the process adds a motion vector to the stack.

The variable candMode is set equal to YModes[ mvRow ][ mvCol ].

The variable candSize is set equal to MiSizes[ mvRow ][ mvCol ].

The variable large is set equal to ( Min( Block_Width[ candSize ],Block_Height[ candSize ] ) >= 8 ).

The candidate motion vector candMv is set as follows:

  • If ( candMode == GLOBALMV || candMode == GLOBAL_GLOBALMV) and ( GmType[ RefFrame[ 0 ] ] > TRANSLATION ) and ( large == 1 ), candMv is set equal to GlobalMvs[ 0 ].

  • Otherwise, candMv is set equal to Mvs[ mvRow ][ mvCol ][ candList ].

The lower precision process specified in section 7.10.2.10 is invoked with candMv.

If has_newmv( candMode ) is equal to 1, NewMvCount is set equal to NewMvCount + 1.

The variable FoundMatch is set equal to 1.

The process depends on whether the candidate motion vector is already in the stack as follows:

  • If candMv is already equal to RefStackMv[ idx ][ 0 ] for some idx less than NumMvFound, then WeightStack[ idx ] is increased by weight

  • Otherwise, if NumMvFound is less than MAX_REF_MV_STACK_SIZE, the following ordered steps apply:

    a. RefStackMv[ NumMvFound ][ 0 ] is set equal to candMv

    b. WeightStack[ NumMvFound ] is set equal to weight

    c. NumMvFound is set equal to NumMvFound + 1.

  • Otherwise, (NumMvFound is greater than or equal to MAX_REF_MV_STACK_SIZE), the process has no effect.

Compound search stack process

The inputs to this process are:

  • variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

  • a variable weight proportional to the corresponding block width or height for the candidate pair of motion vectors.

This process searches the stack for an exact match with a candidate pair of motion vectors. If present, the weight of the candidate pair of motion vectors is added to the weight of its counterpart in the stack, otherwise the process adds the motion vectors to the stack.

The array candMvs (containing two motion vectors) is set equal to Mvs[ mvRow ][ mvCol ].

The variable candMode is set equal to YModes[ mvRow ][ mvCol ].

The variable candSize is set equal to MiSizes[ mvRow ][ mvCol ].

If candMode is equal to GLOBAL_GLOBALMV, for refList = 0..1 the following applies:

  • If GmType[ RefFrame[ refList ] ] > TRANSLATION, candMvs[ refList ] is set equal to GlobalMvs[ refList ].

For i = 0..1, the lower precision process specified in section 7.10.2.10 is invoked with candMvs[ i ].

The variable FoundMatch is set equal to 1.

The process depends on whether the candidate motion vector pair is already in the stack as follows:

  • If candMvs[ 0 ] is equal to RefStackMv[ idx ][ 0 ] and candMvs[ 1 ] is equal to RefStackMv[ idx ][ 1 ] for some idx less than NumMvFound, then WeightStack[ idx ] is increased by weight

  • Otherwise, if NumMvFound is less than MAX_REF_MV_STACK_SIZE, the following ordered steps apply:

    a. RefStackMv[ NumMvFound ][ i ] is set equal to candMvs[ i ] for i = 0..1

    b. WeightStack[ NumMvFound ] is set equal to weight

    c. NumMvFound is set equal to NumMvFound + 1.

  • Otherwise, (NumMvFound is greater than or equal to MAX_REF_MV_STACK_SIZE), the process has no effect.

If has_newmv( candMode ) is equal to 1, NewMvCount is set equal to NewMvCount + 1.

The function has_newmv is defined as:

has_newmv( mode ) {
    return (mode == NEWMV || 
            mode == NEW_NEWMV || 
            mode == NEAR_NEWMV || 
            mode == NEW_NEARMV ||
            mode == NEAREST_NEWMV || 
            mode == NEW_NEARESTMV)
}

Note: It is impossible for mode to equal NEWMV in this function because it is only called for compound modes.

Lower precision process

The input to this process is a reference candMv to a motion vector array.

This process modifies the contents of the input motion vector to remove the least significant bit when high precision is not allowed, and all three fractional bits when force_integer_mv is equal to 1.

If allow_high_precision_mv is equal to 1, this process terminates immediately.

For i = 0..1, the following applies:

if ( force_integer_mv ) {
    a = Abs( candMv[ i ] )
    aInt = (a + 3) >> 3
    if ( candMv[ i ] > 0 )
        candMv[ i ] = aInt << 3
    else
        candMv[ i ] = -( aInt << 3 )
} else {
    if ( candMv[ i ] & 1 ) {
        if ( candMv[ i ] > 0 )
            candMv[ i ]--
        else
            candMv[ i ]++
    }
}
Sorting process

The inputs to this process are:

  • a variable start representing the first position to be sorted,

  • a variable end representing the length of the array,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process performs a stable sort of part of the stack of motion vectors according to the corresponding weight.

Entries in RefStackMv from start (inclusive) to end (exclusive) are sorted.

The sorting process is specified as:

while ( end > start ) {
    newEnd = start
    for ( idx = start + 1; idx < end; idx++ ) {
        if ( WeightStack[ idx - 1 ] < WeightStack[ idx ] ) {
            swap_stack(idx - 1, idx)
            newEnd = idx
        }
    }
    end = newEnd
}

When the function swap_stack is invoked, the entries at locations idx and idx - 1 should be swapped in WeightStack and RefStackMv as follows:

swap_stack( i, j ) {
  temp = WeightStack[ i ]
  WeightStack[ i ] = WeightStack[ j ]
  WeightStack[ j ] = temp
  for ( list = 0; list < 1 + isCompound; list++ ) {
    for ( comp = 0; comp < 2; comp++ ) {
      temp = RefStackMv[ i ][ list ][ comp ]
      RefStackMv[ i ][ list ][ comp ] = RefStackMv[ j ][ list ][ comp ]
      RefStackMv[ j ][ list ][ comp ] = temp
    }
  }
}
Extra search process

The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process adds additional motion vectors to RefStackMv until it has 2 choices of motion vector by first searching the left and above neighbors for partially matching candidates, and second adding global motion candidates.

When doing single prediction, the motion vectors go straight on the stack.

When doing compound prediction, the motion vectors are added to arrays called RefIdMvs (counting matches from the same frame) and RefDiffMvs (counting matches from different frames).

The number of entries in these arrays are initialized to zero as follows:

for ( list = 0; list < 2; list++ ) {
  RefIdCount[ list ] = 0
  RefDiffCount[ list ] = 0
}

A two pass search for the partial matching candidates is specified as:

w4 = Min( 16, Num_4x4_Blocks_Wide[ MiSize ] )
h4 = Min( 16, Num_4x4_Blocks_High[ MiSize ] )
w4 = Min( w4, MiCols - MiCol )
h4 = Min( h4, MiRows - MiRow )
num4x4 = Min( w4, h4 )
for ( pass = 0; pass < 2; pass++ ) {
  idx = 0
  while ( idx < num4x4 && NumMvFound < 2 ) {
    if ( pass == 0 ) {
      mvRow = MiRow - 1
      mvCol = MiCol + idx
    } else {
      mvRow = MiRow + idx
      mvCol = MiCol - 1
    }
    if ( !is_inside( mvRow, mvCol ) )
      break
    add_extra_mv_candidate( mvRow, mvCol, isCompound )
    if ( pass == 0 ) {
      idx += Num_4x4_Blocks_Wide[ MiSizes[ mvRow ][ mvCol ] ]
    } else {
      idx += Num_4x4_Blocks_High[ MiSizes[ mvRow ][ mvCol ] ]
    }
  }
} 

The first pass searches the row above, the second searches the column to the left.

The function call to add_extra_mv_candidate invokes the add extra mv candidate process specified in section 7.10.2.13 with mvRow, mvCol, isCompound as inputs.

If isCompound is equal to 1, the candidates in the RefIdMvs and RefDiffMvs arrays are added to the stack as follows (using the temporary array combinedMvs):

for ( list = 0; list < 2; list++ ) {
  compCount = 0
  for ( idx = 0; idx < RefIdCount[ list ]; idx++ ) {
    combinedMvs[ compCount ][ list ] = RefIdMvs[ list ][ idx ]
    compCount++
  }
  for ( idx = 0; idx < RefDiffCount[ list ] && compCount < 2; idx++ ) {
    combinedMvs[ compCount ][ list ] = RefDiffMvs[ list ][ idx ]
    compCount++
  }
  while ( compCount < 2 ) {
    combinedMvs[ compCount ][ list ] = GlobalMvs[ list ]
    compCount++
  }
}
if ( NumMvFound == 1 ) {
  if ( combinedMvs[ 0 ][ 0 ] == RefStackMv[ 0 ][ 0 ] &&
       combinedMvs[ 0 ][ 1 ] == RefStackMv[ 0 ][ 1 ] ) {
    RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ 1 ][ 0 ]
    RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ 1 ][ 1 ]
  } else {
    RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ 0 ][ 0 ]
    RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ 0 ][ 1 ]
  }
  WeightStack[ NumMvFound ] = 2
  NumMvFound++
} else {
  for ( idx = 0; idx < 2; idx++ ) {
    RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ idx ][ 0 ]
    RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ idx ][ 1 ]
    WeightStack[ NumMvFound ] = 2
    NumMvFound++
  }
}

If isCompound is equal to 0, the candidates have already been added to RefStackMv, and this process simply extends with global motion candidates as follows:

for ( idx = NumMvFound; idx < 2; idx++ ) {
  RefStackMv[ idx ][ 0 ] = GlobalMvs[ 0 ]
}

Note: For single prediction, NumMvFound is not incremented by the addition of global motion candidates, whereas for compound prediction NumMvFound will always be greater or equal to 2 by this point.

Add extra MV candidate process

The inputs to this process are:

  • variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.

This process may modify the contents of the global variables RefIdMvs, RefIdCount, RefDiffMvs, RefDiffCount, RefStackMv, WeightStack, and NumMvFound.

This process examines the candidate location to find possible motion vectors as follows:

if ( isCompound ) {
  for ( candList = 0; candList < 2; candList++ ) {
    candRef = RefFrames[ mvRow ][ mvCol ][ candList ]
    if ( candRef > INTRA_FRAME ) {
      for ( list = 0; list < 2; list++ ) {
        candMv = Mvs[ mvRow ][ mvCol ][ candList ]
        if ( candRef == RefFrame[ list ] && RefIdCount[ list ] < 2 ) {
          RefIdMvs[ list ][ RefIdCount[ list ] ] = candMv
          RefIdCount[ list ]++
        } else if ( RefDiffCount[ list ] < 2 ) {
          if ( RefFrameSignBias[ candRef ] != RefFrameSignBias[ RefFrame[list] ] ) {
            candMv[ 0 ] *= -1
            candMv[ 1 ] *= -1
          }
          RefDiffMvs[ list ][ RefDiffCount[ list ] ] = candMv
          RefDiffCount[ list ]++
        }
      }
    }
  }
} else {
  for ( candList = 0; candList < 2; candList++ ) {
    candRef = RefFrames[ mvRow ][ mvCol ][ candList ]
    if ( candRef > INTRA_FRAME ) {
      candMv = Mvs[ mvRow ][ mvCol ][ candList ]
      if ( RefFrameSignBias[ candRef ] != RefFrameSignBias[ RefFrame[ 0 ] ] ) {
        candMv[ 0 ] *= -1
        candMv[ 1 ] *= -1
      }
      for ( idx = 0; idx < NumMvFound; idx++ ) {
        if ( candMv == RefStackMv[ idx ][ 0 ] )
          break
      }
      if ( idx == NumMvFound ) {
        RefStackMv[ idx ][ 0 ] = candMv
        WeightStack[ idx ] = 2
        NumMvFound++
      }
    }
  }
} 
Context and clamping process

The inputs to this process are:

  • a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction,

  • a variable numNew specifying the number of NEWMV candidates found in the immediate neighborhood.

This process computes contexts to be used when decoding syntax elements, and clamps the candidates in RefStackMv.

The variable bw (representing the width of the block in units of luma samples) is set equal to Block_Width[ MiSize ].

The variable bh (representing the height of the block in units of luma samples) is set equal to Block_Height[ MiSize ].

Note: It only matters whether numNew is zero or non-zero because the value is clipped at 1 when it is used. Implementations may therefore choose to implement numNew and NewMvCount as a boolean instead of a counter.

The variable numLists specifying the number of reference frames used for this block is set equal to ( isCompound ? 2 : 1 ).

The array DrlCtxStack is set as follows:

for ( idx = 0; idx < NumMvFound ; idx++ ) {
    z = 0
    if ( idx + 1 < NumMvFound ) {
        w0 = WeightStack[ idx ]
        w1 = WeightStack[ idx + 1 ]
        if ( w0 >= REF_CAT_LEVEL ) {
            if ( w1 < REF_CAT_LEVEL ) {
                z = 1
            }
        } else {
            z = 2
        }
    }      
    DrlCtxStack[ idx ] = z
}

The motion vectors are clamped as follows:

for ( list = 0; list < numLists; list++ ) {
    for ( idx = 0; idx < NumMvFound ; idx++ ) {
        refMv = RefStackMv[ idx ][ list ]
        refMv[ 0 ] = clamp_mv_row( refMv[ 0 ], MV_BORDER + bh * 8)
        refMv[ 1 ] = clamp_mv_col( refMv[ 1 ], MV_BORDER + bw * 8)
        RefStackMv[ idx ][ list ] = refMv
    }
}

The variables RefMvContext and NewMvContext are set as follows:

if ( CloseMatches == 0 ) {
    NewMvContext = Min( TotalMatches, 1 )       // 0,1
    RefMvContext = TotalMatches
} else if ( CloseMatches == 1 ) {
    NewMvContext = 3 - Min( numNew, 1 )       // 2,3
    RefMvContext = 2 + TotalMatches
} else {
    NewMvContext = 5 - Min( numNew, 1 )       // 4,5
    RefMvContext = 5
}

Has overlappable candidates process

This process is triggered by a call to has_overlappable_candidates.

It returns 1 to indicate that the block has neighbors suitable for use by overlapped motion compensation, or 0 otherwise.

The process looks to see if there are any inter blocks to the left or above.

The check is only made at 8x8 granularity.

The process is specified as:

has_overlappable_candidates( ) {
  if ( AvailU ) {
    w4 = Num_4x4_Blocks_Wide[ MiSize ]
    for ( x4 = MiCol; x4 < Min( MiCols, MiCol + w4 ); x4 += 2 ) {
        if ( RefFrames[ MiRow - 1 ][ x4 | 1 ][ 0 ] > INTRA_FRAME )
            return 1
    }
  }
  if ( AvailL ) {
    h4 = Num_4x4_Blocks_High[ MiSize ]
    for ( y4 = MiRow; y4 < Min( MiRows, MiRow + h4 ); y4 += 2 ) {
        if ( RefFrames[ y4 | 1 ][ MiCol - 1 ][ 0 ] > INTRA_FRAME )
            return 1
    }
  }
  return 0
}

Find warp samples process

General

This process is triggered when the find_warp_samples function is invoked.

The process examines the neighboring inter predicted blocks and estimates a local warp transformation based on the motion vectors.

The process produces a variable NumSamples containing the number of valid candidates found, and an array CandList containing sorted candidates.

The variables NumSamples and NumSamplesScanned are both set equal to 0.

Note: NumSamplesScanned counts the number of distinct candidates found by the add sample process - even if the motion vectors are too large. NumSamples counts the number of distinct valid candidates found by the add sample process (i.e. only counting cases where the motion vector is small enough to be considered valid). As a special case, if no small motion vectors are found, then the process returns the first large motion vector found (by setting NumSamples to 1).

The variable w4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].

The variable h4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].

The process is specified as:

doTopLeft = 1
doTopRight = 1
if ( AvailU ) {
  srcSize = MiSizes[ MiRow - 1 ][ MiCol ]
  srcW = Num_4x4_Blocks_Wide[ srcSize ]
  if ( w4 <= srcW ) {
    colOffset = -(MiCol & (srcW - 1))
    if ( colOffset < 0 )
      doTopLeft = 0
    if ( colOffset + srcW > w4 )
      doTopRight = 0
    add_sample( -1, 0 )
  } else {
    for ( i = 0; i < Min( w4, MiCols - MiCol ); i += miStep ) {
      srcSize = MiSizes[ MiRow - 1 ][ MiCol + i ]
      srcW = Num_4x4_Blocks_Wide[ srcSize ]
      miStep = Min(w4, srcW)
      add_sample( -1, i )
    }
  }
}
if ( AvailL ) {
  srcSize = MiSizes[ MiRow ][ MiCol - 1 ]
  srcH = Num_4x4_Blocks_High[ srcSize ]
  if ( h4 <= srcH ) {
    rowOffset = -(MiRow & (srcH - 1))
    if ( rowOffset < 0 )
      doTopLeft = 0
    add_sample( 0, -1 )
  } else {
    for ( i = 0; i < Min( h4, MiRows - MiRow); i += miStep ) {
      srcSize = MiSizes[ MiRow + i ][ MiCol - 1 ]
      srcH = Num_4x4_Blocks_High[ srcSize ]
      miStep = Min(h4, srcH)
      add_sample( i, -1 )
    }
  }
}
if ( doTopLeft ) {
  add_sample( -1, -1 )
}
if ( doTopRight ) {
  if ( Max( w4, h4 ) <= 16 ) {
    add_sample( -1, w4 )
  }
}
if ( NumSamples == 0 && NumSamplesScanned > 0 )
   NumSamples = 1

where the call to add_sample specifies that the add sample process in section 7.10.4.2 should be invoked.

Add sample process

The inputs to this process are:

  • a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for a motion vector,

  • a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for a motion vector.

The output of this process is to add a new sample to the list of candidates if it is a valid candidate and has not been seen before.

If NumSamplesScanned is greater than or equal to LEAST_SQUARES_SAMPLES_MAX, this process immediately exits.

The variable mvRow is set equal to MiRow + deltaRow.

The variable mvCol is set equal to MiCol + deltaCol.

If is_inside( mvRow, mvCol ) is equal to 0, then this process immediately returns.

If RefFrames[ mvRow ][ mvCol ][ 0 ] has not been written for this frame, then this process immediately returns.

If RefFrames[ mvRow ][ mvCol ][ 0 ] is not equal to RefFrame[ 0 ], then this process immediately returns.

If RefFrames[ mvRow ][ mvCol ][ 1 ] is not equal to NONE, then this process immediately returns.

The variable candSz is set equal to MiSizes[ mvRow ][ mvCol ].

The variable candW4 is set equal to Num_4x4_Blocks_Wide[ candSz ].

The variable candH4 is set equal to Num_4x4_Blocks_High[ candSz ].

The variable candRow is set equal to mvRow & ~(candH4 - 1).

The variable candCol is set equal to mvCol & ~(candW4 - 1).

The variable midY is set equal to candRow * 4 + candH4 * 2 - 1.

The variable midX is set equal to candCol * 4 + candW4 * 2 - 1.

The variable threshold is set equal to Clip3( 16, 112, Max( Block_Width[ MiSize ], Block_Height[ MiSize ] ) ).

The variable mvDiffRow is set equal to Abs( Mvs[ candRow ][ candCol ][ 0 ][ 0 ] - Mv[ 0 ][ 0 ] ).

The variable mvDiffCol is set equal to Abs( Mvs[ candRow ][ candCol ][ 0 ][ 1 ] - Mv[ 0 ][ 1 ] ).

The variable valid is set equal to ( ( mvDiffRow + mvDiffCol ) <= threshold ).

Note: candRow and candCol give the top-left position of the candidate block in units of 4x4 blocks. midX and midY give the central position of the candidate block in units of luma samples.

A candidate array (representing source and destination locations in units of 1/8 luma samples) is specified as:

cand[ 0 ] = midY * 8
cand[ 1 ] = midX * 8
cand[ 2 ] = midY * 8 + Mvs[ candRow ][ candCol ][ 0 ][ 0 ]
cand[ 3 ] = midX * 8 + Mvs[ candRow ][ candCol ][ 0 ][ 1 ]

The following ordered steps apply:

  1. NumSamplesScanned is increased by 1.

  2. If valid is equal to 0 and NumSamplesScanned is greater than 1, the process exits.

  3. CandList[ NumSamples ][ j ] is set equal to cand[ j ] for j=0..3.

  4. If valid is equal to 1, NumSamples is increased by 1.

Prediction processes

General

The following sections define the processes used for predicting the sample values.

These processes are triggered at points defined by function calls to predict_intra, predict_inter, predict_chroma_from_luma, and predict_palette in the residual syntax table described in section 5.11.34.

Intra prediction process

General

The intra prediction process is invoked for intra coded blocks to predict a part of the block corresponding to a transform block. When the transform size is smaller than the block size, this process can be invoked multiple times within a single block for the same plane, and the invocations are in raster order within the block.

This process is triggered by a call to predict_intra.

The inputs to this process are:

  • a variable plane specifying which plane is being predicted,

  • variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

  • a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

  • a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

  • a variable haveAboveRight that is equal to 1 if there are valid samples above the transform block to the right of this transform block,

  • a variable haveBelowLeft that is equal to 1 if there are valid samples to the left of the transform block below this transform block,

  • a variable mode specifying the type of intra prediction to apply,

  • a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

  • a variable log2H specifying the base 2 logarithm of the height of the region to be predicted.

The process makes use of the already reconstructed samples in the current frame CurrFrame to form a prediction for the current block.

The outputs of this process are intra predicted samples in the current frame CurrFrame.

The variable w is set equal to 1 << log2W.

The variable h is set equal to 1 << log2H.

The variable maxX is set equal to ( MiCols * MI_SIZE ) - 1.

The variable maxY is set equal to ( MiRows * MI_SIZE ) - 1.

If plane is greater than 0, then:

  • maxX is set equal to ( ( MiCols * MI_SIZE ) >> subsampling_x ) - 1.

  • maxY is set equal to ( ( MiRows * MI_SIZE ) >> subsampling_y ) - 1.

The array AboveRow[ i ] for i = 0..w + h - 1 is derived as follows:

  • If haveAbove is equal to 0 and haveLeft is equal to 1, AboveRow[ i ] is set equal to CurrFrame[ plane ][ y ][ x - 1 ].

  • Otherwise, if haveAbove is equal to 0 and haveLeft is equal to 0, AboveRow[ i ] is set equal to ( 1 << ( BitDepth - 1 ) ) - 1.

  • Otherwise, the following applies:

    • The variable aboveLimit is set equal to Min( maxX, x + ( haveAboveRight ? 2 * w : w ) - 1 ).

    • AboveRow[ i ] is set equal to CurrFrame[ plane ][ y-1 ][ Min(aboveLimit, x+i) ].

The array LeftCol[ i ] for i = 0..w + h - 1 is derived as follows:

  • If haveLeft is equal to 0 and haveAbove is equal to 1, LeftCol[ i ] is set equal to CurrFrame[ plane ][ y - 1 ][ x ].

  • Otherwise, if haveLeft is equal to 0 and haveAbove is equal to 0, LeftCol[ i ] is set equal to ( 1 << ( BitDepth - 1 ) ) + 1.

  • Otherwise, the following applies:

    • The variable leftLimit is set equal to Min( maxY, y + ( haveBelowLeft ? 2 * h : h ) - 1 ).

    • LeftCol[ i ] is set equal to CurrFrame[ plane ][ Min(leftLimit, y+i) ][ x-1 ].

The array AboveRow[ i ] for i = -1 is specified by:

  • If haveAbove is equal to 1 and haveLeft is equal to 1, AboveRow[ -1 ] is set equal to CurrFrame[ plane ][ y-1 ][ x-1 ].

  • Otherwise if haveAbove is equal to 1, AboveRow[ -1 ] is set equal to CurrFrame [ plane ][ y - 1 ][ x ].

  • Otherwise if haveLeft is equal to 1, AboveRow[ -1 ] is set equal to CurrFrame [ plane ][ y ][ x - 1 ].

  • Otherwise, AboveRow[ -1 ] is set equal to 1 << ( BitDepth - 1 ).

The array LeftCol[ i ] for i = -1 is set equal to AboveRow[ -1 ].

A 2D array named pred containing the intra predicted samples is constructed as follows:

  • If plane is equal to 0 and use_filter_intra is true, the recursive intra prediction process specified in section 7.11.2.3 is invoked with w and h as inputs, and the output is assigned to pred.

  • Otherwise, if is_directional_mode( mode ) is true, the directional intra prediction process specified in section 7.11.2.4 is invoked with plane, x, y, haveLeft, haveAbove, mode, w, h, maxX, maxY as inputs and the output is assigned to pred.

  • Otherwise if mode is equal to SMOOTH_PRED or SMOOTH_V_PRED or SMOOTH_H_PRED, the smooth intra prediction process specified in section 7.11.2.6 is invoked with mode, log2W, log2H, w, and h as inputs, and the output is assigned to pred.

  • Otherwise if mode is equal to DC_PRED, the DC intra prediction process specified in section 7.11.2.5 is invoked with haveLeft, haveAbove, log2W, log2H, w, and h as inputs and the output is assigned to pred.

  • Otherwise (mode is equal to PAETH_PRED), the basic intra prediction process specified in section 7.11.2.2 is invoked with mode, w, and h as inputs, and the output is assigned to pred.

The current frame is updated as follows:

  • CurrFrame[ plane ][ y + i ][ x + j ] is set equal to pred[ i ][ j ] for i = 0..h-1 and j = 0..w-1.
Basic intra prediction process

The inputs to this process are:

  • a variable w specifying the width of the region to be predicted,

  • a variable h specifying the height of the region to be predicted.

The output of this process is a 2D array named pred containing the intra predicted samples.

The process generates filtered samples from the samples in LeftCol and AboveRow as follows:

  • The following ordered steps apply for i = 0..h-1, for j = 0..w-1:

    1. The variable base is set equal to AboveRow[ j ] + LeftCol[ i ] - AboveRow[ -1 ].

    2. The variable pLeft is set equal to Abs( base - LeftCol[ i ]).

    3. The variable pTop is set equal to Abs( base - AboveRow[ j ]).

    4. The variable pTopLeft is set equal to Abs( base - AboveRow[ -1 ] ).

    5. If pLeft <= pTop and pLeft <= pTopLeft, pred[ i ][ j ] is set equal to LeftCol[ i ].

    6. Otherwise, if pTop <= pTopLeft, pred[ i ][ j ] is set equal to AboveRow[ j ].

    7. Otherwise, pred[ i ][ j ] is set equal to AboveRow[ -1 ].

The output of the process is the array pred.

Recursive intra prediction process

The inputs to this process are:

  • a variable w specifying the width of the region to be predicted,

  • a variable h specifying the height of the region to be predicted.

The output of this process is a 2D array named pred containing the intra predicted samples.

For each block of 4x2 samples, this process first prepares an array p of 7 neighboring samples, and then produces the output block by filtering this array.

The variable w4 is set equal to w >> 2.

The variable h2 is set equal to h >> 1.

The following steps apply for i2 = 0..h2-1, for j4 = 0..w4-1:

  • The array p is derived as follows for i = 0..6:

    • If i is less than 5, p[ i ] is derived as follows:

      • If i2 is equal to 0, p[ i ] is set equal to AboveRow[ ( j4 << 2 ) + i - 1 ].

      • Otherwise, if j4 is equal to 0 and i is equal to 0, p[ i ] is set equal to LeftCol[ ( i2 << 1 ) - 1 ].

      • Otherwise, p[ i ] is set equal to pred[ ( i2 << 1 ) - 1 ][ ( j4 << 2 ) + i - 1 ].

    • Otherwise (i is greater than or equal to 5), p[ i ] is derived as follows:

      • If j4 is equal to 0, p[ i ] is set equal to LeftCol[ ( i2 << 1 ) + i - 5 ].

      • Otherwise (j4 is not equal to 0), p[ i ] is set equal to pred[ ( i2 << 1 ) + i - 5 ][ ( j4 << 2 ) - 1 ].

  • The following steps apply for i1 = 0..1, j1 = 0..3:

    • The variable pr is set equal to 0.

    • The variable pr is incremented by Intra_Filter_Taps[ filter_intra_mode ][ ( i1 << 2 ) + j1 ][ i ] * p[ i ] for i = 0..6.

    • pred[ ( i2 << 1 ) + i1 ][ ( j4 << 2 ) + j1 ] is set equal to Clip1( Round2Signed( pr, INTRA_FILTER_SCALE_BITS ) ).

The output of the process is the array pred.

Directional intra prediction process

The inputs to this process are:

  • a variable plane specifying which plane is being predicted,

  • variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

  • a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

  • a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

  • a variable mode specifying the type of intra prediction to apply,

  • a variable w specifying the width of the region to be predicted,

  • a variable h specifying the height of the region to be predicted,

  • a variable maxX specifying the largest valid x coordinate for the current plane,

  • a variable maxY specifying the largest valid y coordinate for the current plane.

The output of this process is a 2D array named pred containing the intra predicted samples.

The process uses a directional filter to generate filtered samples from the samples in LeftCol and AboveRow.

The following ordered steps apply:

  1. The variable angleDelta is derived as follows:

    • If plane is equal to 0, angleDelta is set equal to AngleDeltaY.

    • Otherwise (plane is not equal to 0), angleDelta is set equal to AngleDeltaUV.

  2. The variable pAngle is set equal to ( Mode_To_Angle[ mode ] + angleDelta * ANGLE_STEP ).

  3. The variables upsampleAbove and upsampleLeft are set equal to 0.

  4. If enable_intra_edge_filter is equal to 1, the following applies:

    • If pAngle is not equal to 90 and pAngle is not equal to 180, the following applies:

      • If (pAngle > 90) and (pAngle < 180) and (w + h) >= 24), the filter corner process specified in section 7.11.2.7 is invoked and the output assigned to both LeftCol[ -1 ] and AboveRow[ -1 ].

      • The intra filter type process specified in section 7.11.2.8 is invoked with the input variable plane and the output assigned to filterType.

      • If haveAbove is equal to 1, the following steps apply:

        • The intra edge filter strength selection process specified in section 7.11.2.9 is invoked with w, h, filterType, and pAngle - 90 as inputs, and the output assigned to the variable strength.

        • The variable numPx is set equal to Min( w, ( maxX - x + 1 ) ) + ( pAngle < 90 ? h : 0 ) + 1.

        • The intra edge filter process specified in section 7.11.2.12 is invoked with the parameters numPx, strength, and 0 as inputs.

      • If haveLeft is equal to 1, the following steps apply:

        • The intra edge filter strength selection process specified in section 7.11.2.9 is invoked with w, h, filterType, and pAngle - 180 as inputs, and the output assigned to the variable strength.

        • The variable numPx is set equal to Min( h, ( maxY - y + 1 ) ) + ( pAngle > 180 ? w : 0 ) + 1.

        • The intra edge filter process specified in section 7.11.2.12 is invoked with the parameters numPx, strength, and 1 as inputs.

    • The intra edge upsample selection process specified in section 7.11.2.10 is invoked with w, h, filterType, and pAngle - 90 as inputs, and the output assigned to the variable upsampleAbove.

    • The variable numPx is set equal to ( w + (pAngle < 90 ? h : 0) ).

    • If upsampleAbove is equal to 1, the intra edge upsample process specified in section 7.11.2.11 is invoked with the parameters numPx and 0 as inputs.

    • The intra edge upsample selection process specified in section 7.11.2.10 is invoked with w, h, filterType, and pAngle - 180 as inputs, and the output assigned to the variable upsampleLeft.

    • The variable numPx is set equal to ( h + (pAngle > 180 ? w : 0) ).

    • If upsampleLeft is equal to 1, the intra edge upsample process specified in section 7.11.2.11 is invoked with the parameters numPx and 1 as inputs.

  5. The variable dx is derived as follows:

    • If pAngle is less than 90, dx is set equal to Dr_Intra_Derivative[ pAngle ].

    • Otherwise, if pAngle is greater than 90 and less than 180, dx is set equal to Dr_Intra_Derivative[ 180 - pAngle ].

    • Otherwise, dx is undefined.

  6. The variable dy is derived as follows:

    • If pAngle is greater than 90 and less than 180, dy is set equal to Dr_Intra_Derivative[ pAngle - 90 ].

    • Otherwise, if pAngle is greater than 180, dy is set equal to Dr_Intra_Derivative[ 270 - pAngle ].

    • Otherwise, dy is undefined.

  7. If pAngle is less than 90, the following steps apply for i = 0..h-1, for j = 0..w-1:

    • The variable idx is set equal to ( i + 1 ) * dx.

    • The variable base is set equal to (idx >> ( 6 - upsampleAbove ) ) + (j << upsampleAbove).

    • The variable shift is set equal to ( (idx << upsampleAbove) >> 1 ) & 0x1F.

    • The variable maxBaseX is set equal to (w + h - 1) << upsampleAbove.

    • If base is less than maxBaseX, pred[ i ][ j ] is set equal to Round2( AboveRow[ base ] * ( 32 - shift ) + AboveRow[ base + 1 ] * shift, 5 ).

    • Otherwise (base is greater than or equal to maxBaseX), pred[ i ][ j ] is set equal to AboveRow[ maxBaseX ].

  8. Otherwise, if pAngle is greater than 90 and pAngle is less than 180, the following steps apply for i = 0..h-1, for j = 0..w-1:

    • The variable idx is set equal to ( j << 6 ) - ( i + 1 ) * dx.

    • The variable base is set equal to idx >> ( 6 - upsampleAbove ).

    • If base is greater than or equal to -(1 << upsampleAbove), the following steps apply:

      • The variable shift is set equal to ( ( idx << upsampleAbove ) >> 1 ) & 0x1F.

      • pred[ i ][ j ] is set equal to Round2( AboveRow[ base ] * ( 32 - shift ) + AboveRow[ base + 1 ] * shift, 5 ).

    • Otherwise (base is less than -(1 << upsampleAbove), the following steps apply:

      • The variable idx is set equal to ( i << 6 ) - ( j + 1 ) * dy.

      • The variable base is set equal to idx >> ( 6 - upsampleLeft ).

      • The variable shift is set equal to ( ( idx << upsampleLeft ) >> 1 ) & 0x1F.

      • pred[ i ][ j ] is set equal to Round2( LeftCol[ base ] * ( 32 - shift ) + LeftCol[ base + 1 ] * shift, 5 ).

  9. Otherwise, if pAngle is greater than 180, the following steps apply for i = 0..h-1, for j = 0..w-1:

    • The variable idx is set equal to ( j + 1 ) * dy.

    • The variable base is set equal to ( idx >> ( 6 - upsampleLeft ) ) + ( i << upsampleLeft ).

    • The variable shift is set equal to ( ( idx << upsampleLeft ) >> 1 ) & 0x1F.

    • pred[ i ][ j ] is set equal to Round2( LeftCol[ base ] * ( 32 - shift ) + LeftCol[ base + 1 ] * shift, 5 ).

  10. Otherwise, if pAngle is equal to 90, pred[ i ][ j ] is set equal to AboveRow[ j ] with j = 0..w-1 and i = 0..h-1 (each row of the block is filled with a copy of AboveRow).

  11. Otherwise, if pAngle is equal to 180, pred[ i ][ j ] is set equal to LeftCol[ i ] with j = 0..w-1 and i = 0..h-1 (each column of the block is filled with a copy of LeftCol).

The output of the process is the array pred.

DC intra prediction process

The inputs to this process are:

  • a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

  • a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

  • a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

  • a variable log2H specifying the base 2 logarithm of the height of the region to be predicted,

  • a variable w specifying the width of the region to be predicted,

  • a variable h specifying the height of the region to be predicted.

The output of this process is a 2D array named pred containing the intra predicted samples.

The process averages the available edge samples in LeftCol and AboveRow to generate the prediction as follows:

  • If haveLeft is equal to 1 and haveAbove is equal to 1, pred[ i ][ j ] is set equal to avg with i = 0..h-1 and j = 0..w-1. The variable avg (the average of the samples in union of AboveRow and LeftCol) is specified as follows:

    sum = 0
    for ( k = 0; k < h; k++ )
        sum += LeftCol[ k ]
    for ( k = 0; k < w; k++ )
        sum += AboveRow[ k ]
        
    sum += ( w + h ) >> 1
    avg = sum / ( w + h )
    

Note: The reference code shows how the division by (w+h) can be implemented with multiplication and shift operations.

  • Otherwise if haveLeft is equal to 1 and haveAbove is equal to 0, pred[ i ][ j ] is set equal to leftAvg with i = 0..h-1 and j = 0..w-1. The variable leftAvg is specified as follows:

    sum = 0
    for ( k = 0; k < h; k++ ) {
        sum += LeftCol[ k ]
    }
    leftAvg = Clip1( ( sum + ( h >> 1 ) ) >> log2H )
    
  • Otherwise if haveLeft is equal to 0 and haveAbove is equal to 1, pred[ i ][ j ] is set equal to aboveAvg with i = 0..h-1 and j = 0..w-1. The variable aboveAvg is specified as follows:

    sum = 0
    for ( k = 0; k < w; k++ ) {
        sum += AboveRow[ k ]
    }
    aboveAvg = Clip1( ( sum + ( w >> 1 ) ) >> log2W )
    
  • Otherwise (haveLeft is equal to 0 and haveAbove is equal to 0), pred[ i ][ j ] is set equal to 1 << ( BitDepth - 1 ) with i = 0..h-1 and j = 0..w-1.

The output of the process is the array pred.

Smooth intra prediction process

The inputs to this process are:

  • a variable mode specifying the type of intra prediction to apply,

  • a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

  • a variable log2H specifying the base 2 logarithm of the height of the region to be predicted,

  • a variable w specifying the width of the region to be predicted,

  • a variable h specifying the height of the region to be predicted.

The output of this process is a 2D array named pred containing the intra predicted samples.

The process uses linear interpolation to generate filtered samples from the samples in LeftCol and AboveRow as follows:

  • If mode is equal to SMOOTH_PRED, the following ordered steps apply for i = 0..h-1, for j = 0..w-1:

    1. The array smWeightsX is set dependent on the value of log2W according to the following table:

      log2W smWeightsX
      2 Sm_Weights_Tx_4x4
      3 Sm_Weights_Tx_8x8
      4 Sm_Weights_Tx_16x16
      5 Sm_Weights_Tx_32x32
      6 Sm_Weights_Tx_64x64
    2. The array smWeightsY is set dependent on the value of log2H according to the following table:

      log2H smWeightsY
      2 Sm_Weights_Tx_4x4
      3 Sm_Weights_Tx_8x8
      4 Sm_Weights_Tx_16x16
      5 Sm_Weights_Tx_32x32
      6 Sm_Weights_Tx_64x64
    3. The variable smoothPred is set as follows:

      smoothPred =   smWeightsY[ i ] * AboveRow[ j ] +
                  ( 256 - smWeightsY[ i ] ) * LeftCol[ h - 1 ] +
                    smWeightsX[ j ] * LeftCol[ i ] +
                  ( 256 - smWeightsX[ j ] ) * AboveRow[ w - 1 ]
      
    4. pred[ i ][ j ] is set equal to Round2( smoothPred, 9 ).

  • Otherwise if mode is equal to SMOOTH_V_PRED, the following ordered steps apply for i = 0..h-1, for j = 0..w-1:

    1. The array smWeights is set dependent on the value of log2H according to the following table:

      log2H smWeights
      2 Sm_Weights_Tx_4x4
      3 Sm_Weights_Tx_8x8
      4 Sm_Weights_Tx_16x16
      5 Sm_Weights_Tx_32x32
      6 Sm_Weights_Tx_64x64
    2. The variable smoothPred is set as follows:

      smoothPred =   smWeights[ i ] * AboveRow[ j ] +
                  ( 256 - smWeights[ i ] ) * LeftCol[ h - 1 ]
      
    3. pred[ i ][ j ] is set equal to Round2( smoothPred, 8 ).

  • Otherwise (mode is equal to SMOOTH_H_PRED), the following ordered steps apply for i = 0..h-1, for j = 0..w-1:

    1. The array smWeights is set dependent on the value of log2W according to the following table:

      log2W smWeights
      2 Sm_Weights_Tx_4x4
      3 Sm_Weights_Tx_8x8
      4 Sm_Weights_Tx_16x16
      5 Sm_Weights_Tx_32x32
      6 Sm_Weights_Tx_64x64
    2. The variable smoothPred is set as follows:

      smoothPred =   smWeights[ j ] * LeftCol[ i ] +
                  ( 256 - smWeights[ j ] ) * AboveRow[ w - 1 ]
      
    3. pred[ i ][ j ] is set equal to Round2( smoothPred, 8 ).

The output of the process is the array pred.

Filter corner process

This process uses a three tap filter to compute the value to be used for the top-left corner.

The variable s is set equal to LeftCol[ 0 ] * 5 + AboveRow[ -1 ] * 6 + AboveRow[ 0 ] * 5.

The output of this process is Round2(s, 4).

Intra filter type process

The input to this process is a variable plane specifying the color plane being processed.

The output of this process is a variable filterType that is set to 1 if either the block above or to the left uses a smooth prediction mode.

The process is specified as follows:

get_filter_type( plane ) {
  aboveSmooth = 0
  leftSmooth = 0
  if ( ( plane == 0 ) ? AvailU : AvailUChroma ) {
    r = MiRow - 1
    c = MiCol
    if ( plane > 0 ) {
        if ( subsampling_x && !( MiCol & 1 ) )
            c++
        if ( subsampling_y && ( MiRow & 1 ) )
            r--
    }
    aboveSmooth = is_smooth( r, c, plane )
  }
  if ( ( plane == 0 ) ? AvailL : AvailLChroma ) {
    r = MiRow
    c = MiCol - 1
    if ( plane > 0 ) {
        if ( subsampling_x && ( MiCol & 1 ) )
            c--
        if ( subsampling_y && !( MiRow & 1 ) )
            r++
    }
    leftSmooth = is_smooth( r, c, plane )
  }
  return aboveSmooth || leftSmooth
}

where the function is_smooth indicates if a prediction mode is one of the smooth intra modes and is specified as:

is_smooth( row, col, plane ) {
  if ( plane == 0 ) {
    mode = YModes[ row ][ col ]
  } else {
    if ( RefFrames[ row ][ col ][ 0 ] > INTRA_FRAME )
      return 0
    mode = UVModes[ row ][ col ]
  }
  return (mode == SMOOTH_PRED || mode == SMOOTH_V_PRED || mode == SMOOTH_H_PRED)
}
Intra edge filter strength selection process

The inputs to this process are:

  • a variable w containing the width of the transform in samples,

  • a variable h containing the height of the transform in samples,

  • a variable filterType that is 0 or 1 that controls the strength of filtering,

  • a variable delta containing an angle difference in degrees.

The output is an intra edge filter strength from 0 to 3 inclusive.

The variable d is set equal to Abs( delta ).

The variable blkWh (containing the sum of the dimensions) is set equal to w + h.

The output variable strength is specified as follows:

strength = 0
if ( filterType == 0 ) {
    if ( blkWh <= 8 ) {
        if ( d >= 56 ) strength = 1
    } else if ( blkWh <= 12 ) {
        if ( d >= 40 ) strength = 1
    } else if ( blkWh <= 16 ) {
        if ( d >= 40 ) strength = 1
    } else if ( blkWh <= 24 ) {
        if ( d >= 8 ) strength = 1
        if ( d >= 16 ) strength = 2
        if ( d >= 32 ) strength = 3
    } else if ( blkWh <= 32 ) {
        strength = 1
        if ( d >= 4 ) strength = 2
        if ( d >= 32 ) strength = 3
    } else {
        strength = 3
    }
} else {
    if ( blkWh <= 8 ) {
        if ( d >= 40 ) strength = 1
        if ( d >= 64 ) strength = 2
    } else if ( blkWh <= 16 ) {
        if ( d >= 20 ) strength = 1
        if ( d >= 48 ) strength = 2
    } else if ( blkWh <= 24 ) {
        if ( d >= 4 ) strength = 3
    } else {
        strength = 3
    }
}
Intra edge upsample selection process

The inputs to this process are:

  • a variable w containing the width of the transform in samples,

  • a variable h containing the height of the transform in samples,

  • a variable filterType that is 0 or 1 that controls the strength of filtering,

  • a variable delta containing an angle difference in degrees.

The output is a flag useUpsample that is true if upsampling should be applied to the edge.

The variable d is set equal to Abs( delta ).

The variable blkWh (containing the sum of the dimensions) is set equal to w + h.

The output variable useUpsample is specified as follows:

if ( d <= 0 || d >= 40 ) {
    useUpsample = 0
} else if ( filterType == 0 ) {
    useUpsample = (blkWh <= 16)
} else {
    useUpsample = (blkWh <= 8)
}
Intra edge upsample process

The inputs to this process are:

  • a variable numPx specifying the number of samples to filter,

  • a variable dir containing 0 when filtering the above samples, and 1 when filtering the left samples.

The output of this process are upsampled samples in the AboveRow and LeftCol arrays.

The variable buf is set depending on dir:

  • If dir is equal to 0, buf is set equal to a reference to AboveRow.

  • Otherwise (dir is equal to 1), buf is set equal to a reference to LeftCol.

Note: buf is a reference to either AboveRow or LeftCol. “reference” indicates that modifying values in buf modifies values in the original array.

When the process starts, entries -1 to numPx-1 are valid in buf and contain the original values. When the process completes, entries -2 to 2*numPx-2 are valid in buf and contain the upsampled values.

An array dup of length numPx+3 is generated by extending buf by one sample at the start and end as follows:

dup[ 0 ] = buf[ -1 ]
for ( i = -1; i < numPx; i++ ) {
    dup[ i + 2 ] = buf[ i ]
}
dup[ numPx + 2 ] = buf[ numPx - 1 ]

The upsampling process (modifying values in buf) is specified as follows:

buf[-2] = dup[0]
for ( i = 0; i < numPx; i++ ) {
    s = -dup[i] + (9 * dup[i + 1]) + (9 * dup[i + 2]) - dup[i + 3]
    s = Clip1( Round2(s, 4) )
    buf[ 2 * i - 1 ] = s
    buf[ 2 * i ] = dup[i + 2]
}
Intra edge filter process

The inputs to this process are:

  • a size sz (sz will always be less than or equal to 129),

  • a filter strength strength between 0 and 3 inclusive,

  • an edge direction left (when equal to 1, it specifies a vertical edge; when equal to 0, it specifies a horizontal edge.

The process filters the LeftCol (if left is equal to 1) or AboveRow (if left is equal to 0) arrays.

If strength is equal to 0, the process returns without doing anything.

The array edge is derived by setting edge[ i ] equal to ( left ? LeftCol[ i - 1 ] : AboveRow[ i - 1 ] ) for i = 0..sz-1.

Otherwise (strength is not equal to 0), the following ordered steps apply for i = 1..sz-1:

  1. The variable s is set equal to 0.

  2. The following steps now apply for j = 0..INTRA_EDGE_TAPS-1.

    a. The variable k is set equal to Clip3( 0, sz - 1, i - 2 + j ).

    b. The variable s is incremented by Intra_Edge_Kernel[ strength - 1 ][ j ] * edge[ k ].

  3. If left is equal to 1, LeftCol[ i - 1 ] is set equal to ( s + 8 ) >> 4.

  4. If left is equal to 0, AboveRow[ i - 1 ] is set equal to ( s + 8 ) >> 4.

The array Intra_Edge_Kernel is specified as follows:

Intra_Edge_Kernel[INTRA_EDGE_KERNELS][INTRA_EDGE_TAPS] = {
  { 0, 4, 8, 4, 0 },
  { 0, 5, 6, 5, 0 },
  { 2, 4, 4, 4, 2 }
}

Inter prediction process

General

The inter prediction process is invoked for inter coded blocks and interintra blocks. The inputs to this process are:

  • a variable plane specifying which plane is being predicted,

  • variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

  • variables w and h specifying the width and height of the region to be predicted,

  • variables candRow and candCol specifying the location (in units of 4x4 blocks) of the motion vector information to be used.

The outputs of this process are predicted samples in the current frame CurrFrame.

This process is triggered by a function call to predict_inter.

The variable isCompound is set equal to RefFrames[ candRow ][ candCol ][ 1 ] > INTRA_FRAME.

The prediction arrays are formed by the following ordered steps:

  1. The rounding variables derivation process specified in section 7.11.3.2 is invoked with the variable isCompound as input.

  2. If plane is equal to 0 and motion_mode is equal to LOCALWARP, the warp estimation process in section 7.11.3.8 is invoked.

  3. If plane is equal to 0 and motion_mode is equal to LOCALWARP and LocalValid is equal to 1, the setup shear process specified in section 7.11.3.6 is invoked with LocalWarpParams as input, and the output warpValid is assigned to LocalValid (the other outputs are discarded).

  4. The variable refList is set equal to 0.

  5. The variable refFrame is set equal to RefFrames[ candRow ][ candCol ][ refList ].

  6. If (YMode == GLOBALMV || YMode == GLOBAL_GLOBALMV) and GmType[ refFrame ] > TRANSLATION, the setup shear process specified in section 7.11.3.6 is invoked with gm_params[ refFrame ] as input, and the output warpValid is assigned to globalValid (the other outputs are discarded).

  7. The variable useWarp (a value of 1 indicates local warping, 2 indicates global warping) is derived as follows:

    • If w < 8 or h < 8, useWarp is set equal to 0.

    • Otherwise, if force_integer_mv is equal to 1, useWarp is set equal to 0.

    • Otherwise, if motion_mode is equal to LOCALWARP and LocalValid is equal to 1, useWarp is set equal to 1.

    • Otherwise, if all of the following are true, useWarp is set equal to 2.

      • (YMode == GLOBALMV || YMode == GLOBAL_GLOBALMV).

      • GmType[ refFrame ] > TRANSLATION.

      • is_scaled( refFrame ) is equal to 0.

      • globalValid is equal to 1.

    • Otherwise, useWarp is set equal to 0.

  8. The motion vector array mv is set equal to Mvs[ candRow ][ candCol ][ refList ].

  9. The variable refIdx specifying which reference frame is being used is set as follows:

    • If use_intrabc is equal to 0, refIdx is set equal to ref_frame_idx[ refFrame - LAST_FRAME ].

    • Otherwise (use_intrabc is equal to 1), refIdx is set equal to -1 and RefFrameWidth[ -1 ] is set equal to FrameWidth, RefFrameHeight[ -1 ] is set equal to FrameHeight, and RefUpscaledWidth[ -1 ] is set equal to UpscaledWidth. (These values ensure that the motion vector scaling has no effect.)

  10. The motion vector scaling process in section 7.11.3.3 is invoked with plane, refIdx, x, y, mv as inputs and the output being the initial location startX, startY, and the step sizes stepX, stepY.

  11. If use_intrabc is equal to 1, RefFrameWidth[ -1 ] is set equal to MiCols * MI_SIZE, RefFrameHeight[ -1 ] is set equal to MiRows * MI_SIZE, and RefUpscaledWidth[ -1 ] is set equal to MiCols * MI_SIZE. (These values are needed to avoid intrabc prediction being cropped to the frame boundaries.)

  12. If useWarp is not equal to 0, the block warp process in section 7.11.3.5 is invoked with useWarp, plane, refList, x, y, i8, j8, w, h as inputs and the output is merged into the 2D array preds[ refList ] for i8 = 0..((h-1) >> 3) and for j8 = 0..((w-1) >> 3). (Each invocation fills in a block of output of size w by h at x offset j8 * 8 and y offset i8 * 8.)

  13. If useWarp is equal to 0, the block inter prediction process in section 7.11.3.4 is invoked with plane, refIdx, startX, startY, stepX, stepY, w, h, candRow, candCol as inputs and the output is assigned to the 2D array preds[ refList ].

  14. If isCompound is equal to 1, then the variable refList is set equal to 1 and steps 5 to 13 are repeated to form the prediction for the second reference.

An array named Mask is prepared as follows:

  • If compound_type is equal to COMPOUND_WEDGE and plane is equal to 0, the wedge mask process in section 7.11.3.11 is invoked with w, h as inputs.

  • Otherwise if compound_type is equal to COMPOUND_INTRA, the intra mode variant mask process in section 7.11.3.13 is invoked with w, h as inputs.

  • Otherwise if compound_type is equal to COMPOUND_DIFFWTD and plane is equal to 0, the difference weight mask process in section 7.11.3.12 is invoked with preds, w, h as inputs.

  • Otherwise, no mask array is needed.

If compound_type is equal to COMPOUND_DISTANCE, the distance weights process in section 7.11.3.15 is invoked with candRow and candCol as inputs.

The inter predicted samples are then derived as follows:

  • If isCompound is equal to 0 and IsInterIntra is equal to 0, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( preds[ 0 ][ i ][ j ] ) for i = 0..h-1 and j = 0..w-1.

  • Otherwise if compound_type is equal to COMPOUND_AVERAGE, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( Round2( preds[ 0 ][ i ][ j ] + preds[ 1 ][ i ][ j ], 1 + InterPostRound ) ) for i = 0..h-1 and j = 0..w-1.

  • Otherwise if compound_type is equal to COMPOUND_DISTANCE, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( Round2( FwdWeight * preds[ 0 ][ i ][ j ] + BckWeight * preds[ 1 ][ i ][ j ], 4 + InterPostRound ) ) for i = 0..h-1 and j = 0..w-1.

  • Otherwise, the mask blend process in section 7.11.3.14 is invoked with preds, plane, x, y, w, h as inputs.

If motion_mode is equal to OBMC, the overlapped motion compensation in section 7.11.3.9 is invoked with plane, w, h as inputs.

Rounding variables derivation process

The input to this process is a variable isCompound.

The rounding variables InterRound0, InterRound1, and InterPostRound are derived as follows:

  • InterRound0 (representing the amount to round by after horizontal filtering) is set equal to 3.

  • InterRound1 (representing the amount to round by after vertical filtering) is set equal to ( isCompound ? 7 : 11).

  • If BitDepth is equal to 12, InterRound0 is set equal to InterRound0 + 2.

  • If BitDepth is equal to 12 and isCompound is equal to 0, InterRound1 is set equal to InterRound1 - 2.

  • InterPostRound (representing the amount to round by at the end of the prediction process) is set equal to 2 * FILTER_BITS - ( InterRound0 + InterRound1 ).

Note: The rounding is chosen to ensure that the output of the horizontal filter always fits within 16 bits.

Motion vector scaling process

The inputs to this process are:

  • a variable plane specifying which plane is being predicted,

  • a variable refIdx specifying which reference frame is being used,

  • variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

  • a variable mv specifying the clamped motion vector (in units of 1/8 th of a luma sample, i.e. with 3 fractional bits).

The outputs of this process are the variables startX and startY giving the reference block location in units of 1/1024 th of a sample, and variables xStep and yStep giving the step size in units of 1/1024 th of a sample.

This process is responsible for computing the sampling locations in the reference frame based on the motion vector. The sampling locations are also adjusted to compensate for any difference in the size of the reference frame compared to the current frame.

Note: When intra block copy is being used, refIdx will be equal to -1 to signal prediction from the frame currently being decoded. The arrays RefFrameWidth, RefFrameHeight, and RefUpscaledWidth include values at index -1 giving the dimensions of the current frame.

It is a requirement of bitstream conformance that all the following conditions are satisfied:

  • 2 * FrameWidth >= RefUpscaledWidth[ refIdx ]

  • 2 * FrameHeight >= RefFrameHeight[ refIdx ]

  • FrameWidth <= 16 * RefUpscaledWidth[ refIdx ]

  • FrameHeight <= 16 * RefFrameHeight[ refIdx ]

A variable xScale is set equal to ( ( RefUpscaledWidth[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameWidth / 2 ) ) / FrameWidth.

A variable yScale is set equal to ( ( RefFrameHeight[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameHeight / 2 ) ) / FrameHeight.

(xScale and yScale specify the size of the reference frame relative to the current frame in units where (1 << 14) is equivalent to both frames having the same size.)

The variables subX and subY are set equal to the subsampling for the current plane as follows:

  • If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

  • Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.

The variable halfSample (representing half the size of a sample in units of 1/16 th of a sample) is set equal to ( 1 << ( SUBPEL_BITS - 1 ) ).

The variable origX is set equal to ( (x << SUBPEL_BITS) + ( ( 2 * mv[1] ) >> subX ) + halfSample ).

The variable origY is set equal to ( (y << SUBPEL_BITS) + ( ( 2 * mv[0] ) >> subY ) + halfSample ).

(origX and origY specify the location of the centre of the sample at the top-left corner of the reference block in the current frame’s coordinate system in units of 1/16 th of a sample, i.e. with SUBPEL_BITS=4 fractional bits.)

The variable baseX is set equal to (origX * xScale - ( halfSample << REF_SCALE_SHIFT ) ).

The variable baseY is set equal to (origY * yScale - ( halfSample << REF_SCALE_SHIFT ) ).

(baseX and baseY specify the location of the top-left corner of the block in the reference frame in the reference frame’s coordinate system with 18 fractional bits.)

The variable off (containing a rounding offset for the filter tap selection) is set equal to ( ( 1 << (SCALE_SUBPEL_BITS - SUBPEL_BITS) ) / 2 ).

The output variable startX is set equal to (Round2Signed( baseX, REF_SCALE_SHIFT + SUBPEL_BITS - SCALE_SUBPEL_BITS) + off).

The output variable startY is set equal to (Round2Signed( baseY, REF_SCALE_SHIFT + SUBPEL_BITS - SCALE_SUBPEL_BITS) + off).

(startX and startY specify the location of the top-left corner of the block in the reference frame in the reference frame’s coordinate system with SCALE_SUBPEL_BITS=10 fractional bits.)

The output variable stepX is set equal to Round2Signed( xScale, REF_SCALE_SHIFT - SCALE_SUBPEL_BITS).

The output variable stepY is set equal to Round2Signed( yScale, REF_SCALE_SHIFT - SCALE_SUBPEL_BITS).

(stepX and stepY are the size of one current frame sample in the reference frame’s coordinate system with 10 fractional bits.)

Block inter prediction process

The inputs to this process are:

  • a variable plane,

  • a variable refIdx specifying which reference frame is being used (or -1 for intra block copy),

  • variables x and y giving the block location with in units of 1/1024 th of a sample,

  • variables xStep and yStep giving the step size in units of 1/1024 th of a sample,

  • variables w and h giving the width and height of the block in units of samples,

  • variables candRow and candCol specifying the location (in units of 4x4 blocks) of the motion vector information to be used.

The output from this process is the 2D array named pred containing inter predicted samples.

A variable ref specifying the reference frame contents is set as follows:

  • If refIdx is equal to -1, ref is set equal to CurrFrame.

  • Otherwise (refIdx is greater than or equal to 0), ref is set equal to FrameStore[ refIdx ].

The variables subX and subY are set equal to the subsampling for the current plane as follows:

  • If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

  • Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.

The variable lastX is set equal to ( (RefUpscaledWidth[ refIdx ] + subX) >> subX) - 1.

The variable lastY is set equal to ( (RefFrameHeight[ refIdx ] + subY) >> subY) - 1.

(lastX and lastY specify the coordinates of the bottom right sample of the reference plane.)

The variable intermediateHeight specifying the height required for the intermediate array is set equal to (((h - 1) * yStep + (1 << SCALE_SUBPEL_BITS) - 1) >> SCALE_SUBPEL_BITS) + 8.

The sub-sample interpolation is effected via two one-dimensional convolutions. First a horizontal filter is used to build up a temporary array, and then this array is vertically filtered to obtain the final prediction. The fractional parts of the motion vectors determine the filtering process. If the fractional part is zero, then the filtering is equivalent to a straight sample copy.

The filtering is applied as follows:

  • The array intermediate is specified as follows:

    interpFilter = InterpFilters[ candRow ][ candCol ][ 1 ]
    if ( w <= 4 ) {
        if ( interpFilter == EIGHTTAP || interpFilter == EIGHTTAP_SHARP ) {
            interpFilter = 4
        } else if ( interpFilter == EIGHTTAP_SMOOTH ) {
            interpFilter = 5
        }
    }
    for ( r = 0; r < intermediateHeight; r++ ) {
        for ( c = 0; c < w; c++ ) {
            s = 0
            p = x + xStep * c
            for ( t = 0; t < 8; t++ )
                s += Subpel_Filters[ interpFilter ][ (p >> 6) & SUBPEL_MASK ][ t ] *
                  ref[ plane ] [ Clip3( 0, lastY, (y >> 10) + r - 3 ) ]
                               [ Clip3( 0, lastX, (p >> 10) + t - 3 ) ]
            intermediate[ r ][ c ] = Round2(s, InterRound0)
        }
    }
    
  • The array pred is specified as follows:

    interpFilter = InterpFilters[ candRow ][ candCol ][ 0 ]
    if ( h <= 4 ) {
        if ( interpFilter == EIGHTTAP || interpFilter == EIGHTTAP_SHARP ) {
            interpFilter = 4
        } else if ( interpFilter == EIGHTTAP_SMOOTH ) {
            interpFilter = 5
        }
    }
    for ( r = 0; r < h; r++ ) {
        for ( c = 0; c < w; c++ ) {
            s = 0
            p = (y & 1023) + yStep * r
            for ( t = 0; t < 8; t++ )
                s += Subpel_Filters[ interpFilter ][ (p >> 6) & SUBPEL_MASK ][ t ] *
                  intermediate[ (p >> 10) + t ][ c ]
            pred[ r ][ c ] = Round2(s, InterRound1)
        }
    }
    

    where the constant array Subpel_Filters is specified as:

    Subpel_Filters[ 6 ][ 16 ][ 8 ] = {
      {
        { 0, 0, 0, 128, 0, 0, 0, 0 },
        { 0, 2, -6, 126, 8, -2, 0, 0 },
        { 0, 2, -10, 122, 18, -4, 0, 0 },
        { 0, 2, -12, 116, 28, -8, 2, 0 },
        { 0, 2, -14, 110, 38, -10, 2, 0 },
        { 0, 2, -14, 102, 48, -12, 2, 0 },
        { 0, 2, -16, 94, 58, -12, 2, 0 },
        { 0, 2, -14, 84, 66, -12, 2, 0 },
        { 0, 2, -14, 76, 76, -14, 2, 0 },
        { 0, 2, -12, 66, 84, -14, 2, 0 },
        { 0, 2, -12, 58, 94, -16, 2, 0 },
        { 0, 2, -12, 48, 102, -14, 2, 0 },
        { 0, 2, -10, 38, 110, -14, 2, 0 },
        { 0, 2, -8, 28, 116, -12, 2, 0 },
        { 0, 0, -4, 18, 122, -10, 2, 0 },
        { 0, 0, -2, 8, 126, -6, 2, 0 }
      },
      {
        { 0, 0, 0, 128, 0, 0, 0, 0 },
        { 0, 2, 28, 62, 34, 2, 0, 0 },
        { 0, 0, 26, 62, 36, 4, 0, 0 },
        { 0, 0, 22, 62, 40, 4, 0, 0 },
        { 0, 0, 20, 60, 42, 6, 0, 0 },
        { 0, 0, 18, 58, 44, 8, 0, 0 },
        { 0, 0, 16, 56, 46, 10, 0, 0 },
        { 0, -2, 16, 54, 48, 12, 0, 0 },
        { 0, -2, 14, 52, 52, 14, -2, 0 },
        { 0, 0, 12, 48, 54, 16, -2, 0 },
        { 0, 0, 10, 46, 56, 16, 0, 0 },
        { 0, 0, 8, 44, 58, 18, 0, 0 },
        { 0, 0, 6, 42, 60, 20, 0, 0 },
        { 0, 0, 4, 40, 62, 22, 0, 0 },
        { 0, 0, 4, 36, 62, 26, 0, 0 },
        { 0, 0, 2, 34, 62, 28, 2, 0 }
      },
      {
        { 0, 0, 0, 128, 0, 0, 0, 0 },
        { -2, 2, -6, 126, 8, -2,