av1-mpeg2-ts

AV1 specification for carriage inside MPEG-2 TS

NOTA BENE: this is a work-in-progress specification

Copyright 2021, The Alliance for Open Media

Licensing information is available at http://aomedia.org/license/

The MATERIALS ARE PROVIDED “AS IS.” The Alliance for Open Media, its members, and its contributors expressly disclaim any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the materials. The entire risk as to implementing or otherwise using the materials is assumed by the implementer and user. IN NO EVENT WILL THE ALLIANCE FOR OPEN MEDIA, ITS MEMBERS, OR CONTRIBUTORS BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1. Introduction

This document specifies how to carry AV1 video elementary streams in the MPEG-2 Transport Stream format. It defines the carriage of AV1 in a single PID, assuming buffer model info from the first operating point. It may not be optimal for layered streams or streams with multiple operating points. Future versions may incorporate this capability.

In the present document “shall”, “shall not”, “should”, “should not”, “may”, “need not”, “will”, “will not”, “can” and “cannot” are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of provisions).

2. References

2.1 Normative references

Referenced normative documents:

2.2 Informative references

So far, none.

2.3 Definitions

3. Descriptor

3.1 Registration Descriptor

The presence of a Registration Descriptor, as defined in MPEG-2 TS, is mandatory with the format_identifier field set to ‘AV01’ (A-V-0-1). The Registration Descriptor shall be the first in the PMT loop and included before an AV1 video descriptor.

3.2 AV1 video descriptor

Need to use DVB private data specifier descriptor beforehands, with a private data specifier registered from https://www.dvbservices.com/identifiers/private_data_spec_id.

The AV1 video descriptor is based on a “Private data specifier descriptor” in ETSI EN 300 468.

Note: the group is discussing whether to use a private descriptor or a DVB extension descriptor

For an AV1 video stream, the AV1 video descriptor provides basic information for identifying coding parameters, such as profile and level parameters of that AV1 video stream. The same data structure as AV1CodecConfigurationRecord in ISOBMFF is used to aid conversion between the two formats, EXCEPT that two of the reserved bits are used for HDR/WCG identification.

Syntax No. Of bits Mnemonic
AV1_descriptor() {    
descriptor_tag 8 uimsbf
descriptor_length 8 uimsbf
private_data_specifier 32 uimsbf
marker 1 bslbf
version 7 uimsbf
seq_profile 3 uimsbf
seq_level_idx_0 5 uimsbf
seq_tier_0 1 bslbf
high_bitdepth 1 bslbf
twelve_bit 1 bslbf
monochrome 1 bslbf
chroma_subsampling_x 1 bslbf
chroma_subsampling_y 1 bslbf
chroma_sample_position 2 uimsbf
hdr_wcg_idc 2 uimsbf
reserved_zeros 1 bslbf
initial_presentation_delay_present 1 bslbf
if (initial_presentation_delay_present) {    
initial_presentation_delay_minus_one 4 uimsbf
} else {    
reserved_zeros 4 uimsbf
}    
}    

3.3 Semantic definition of fields in AV1 video descriptor

descriptor_tag - This value shall be set to 0x5F.

private_data_specifier - This value shall be set to FIXME.

marker - This value shall be set to 1.

version - This field indicates the version of the AV1_Descriptor. This value shall be set to 1.

seq_profile, seq_level_idx_0 and high_bitdepth - These fields shall be coded according to the semantics defined in AV1 Bitstream and Decoding Process Specification. If these fields are not coded in the Sequence Header OBU in the AV1 video stream, the inferred values are coded in the descriptor.

seq_tier_0, twelve_bit, monochrome, chroma_subsampling_x, chroma_subsampling_y, chroma_sample_position - These fields shall be coded according to the same semantics when they are present. If they are not present, they will be coded using the value inferred by the semantics.

hdr_wcg_idc - The value of this syntax element indicates the presence or absence of high dynamic range (HDR) and/or wide color gamut (WCG) video components in the associated PID according to the table below. HDR is defined to be video that has high dynamic range if the video stream EOTF is higher than the Rec. ITU-R BT.1886 reference EOTF. WCG is defined to be video that is coded using colour primaries with a colour gamut not contained within Rec. ITU-R BT.709.

hdr_wcg_idc Description
0 SDR, i.e., video is based on the Rec. ITU-R BT.1886 reference EOTF with a color gamut that is contained within Rec. ITU-R BT.709 with a Rec. ITU-R BT.709 container
1 WCG only, i.e., video color gamut in a Rec ITU-R BT.2020 container that exceeds Rec. ITU-R BT.709
2 Both HDR and WCG are to be indicated in the stream
3 No indication made regarding HDR/WCG or SDR characteristics of the stream

reserved_zeros - Will be set to zeroes.

initial_presentation_delay_present - Indicates initial_presentation_delay_minus_one field is present.

initial_presentation_delay_minus_one - Ignored for MPEG-2 TS use, included only to aid conversion to/from ISOBMFF.

4 Carriage of AV1

4.1 Constraints for the transport of AV1

For AV1 video streams, the following constraints additionally apply:

4.2 Carriage in PES packets

AV1 Bitstream & Decoding Process Specification video is carried in PES packets as PES_packet_data_bytes, using the stream_id 0xBD (private_stream_id_1).

The highest level that may occur in an AV1 video stream, as well as a profile and tier that the entire stream conforms to, shall be signalled using the AV1 video descriptor.

If an AV1 video descriptor is associated with an AV1 video stream, then this descriptor shall be conveyed in the descriptor loop for the respective elementary stream entry in the program map table. This specification does not specify the presentation of AV1 Bitstream & Decoding Process Specification streams in the context of a program stream.

For PES packetization, no specific data alignment constraints apply, except when random_access_indicator is set to 1. random_access_indicator shall be set to 1 when the PES packet contains an elementary stream access point. In AV1 an elementary stream access point is the first byte of a Key Frame or a Delayed Key Frame. When random_access_indicator is set, a PES_packet shall start, and in its header, data_alignment_indicator shall be set to 1. When error resilience is a consideration, it is recommended to set one, and only one, AV1 access unit per PES, and that all PES have data_alignment_indicator set to 1. Usage of data_stream_alignment_descriptor is not specified and the only allowed alignment_type is 1 (Access unit level). Future versions of this specification may define other values. elementary_stream_priority_indicator may be set to ‘1’ if the payload contains one or more bytes from an AV1 intra frame. A value of ‘0’ indicates that the payload has the same priority as all other packets which do not have elementary_stream_priority_indicator set to ‘1’

For synchronization and STD management, PTSs and, when appropriate, DTSs are encoded in the header of the PES packet that carries the AV1 video stream data. For PTS and DTS encoding, the constraints and semantics apply as defined in the PES Header and associated constraints on timestamp intervals.

4.3 Buffer Pool management

Carriage of an AV1 video stream over MPEG-2 TS does not impact the size of the Buffer Pool.

For decoding of an AV1 video stream in the STD, the size of the Buffer Pool is as defined in AV1 Bitstream & Decoding Process Specification. The Buffer Pool shall be managed as specified in Annex E of AV1 Bitstream & Decoding Process Specification.

A decoded AV1 access unit enters the Buffer Pool instantaneously upon decoding the AV1 access unit, hence at the Scheduled Removal Timing of the AV1 access unit. A decoded AV1 access unit is presented at the Presentation Time.

If the AV1 video stream provides insufficient information to determine the Scheduled Removal Timing and the Presentation Time of AV1 access units, then these time instants shall be determined in the STD model from PTS and DTS timestamps as follows:

  1. The Scheduled Removal Timing of AV1 access unit n is the instant in time indicated by DTS(n) where DTS(n) is the DTS value of AV1 access unit n.
  2. The Presentation Time of AV1 access unit n is the instant in time indicated by PTS(n) where PTS(n) is the PTS value of AV1 access unit n.

4.4 T-STD Extensions for AV1

When there is an AV1 video stream in an MPEG-2 TS program, the T-STD model as described in the section “Transport stream system target decoder” is extended as as specified below.

T-STD Extensions for AV1

TBn, MBn, EBn buffer management

The following additional notations are used to describe the T-STD extensions and are illustrated in Figure X-YY.

Notation Definition
t(i) indicates the time in seconds at which the i-th byte of the transport stream enters the system target decoder
TBn is the transport buffer for elementary stream n
TBS is the size of the transport buffer TBn, measured in bytes
MBn is the multiplexing buffer for elementary stream n
MBSn is the size of the multiplexing buffer MBn, measured in bytes
EBn is the elementary stream buffer for the AV1 video stream
EBSn is the size of the multiplexing buffer MBn, measured in bytes
j is an index to the AV1 access unit of the AV1 video stream
An(j) is the j-th access unit of the AV1 video bitstream
tdn (j) is the decoding time of An(j), measured in seconds, in the system target decoder
Rxn is the transfer rate from the transport buffer TBn to the multiplex buffer MBn as specified below.
Rbxn is the transfer rate from the multiplex buffer MBn to the elementary stream buffer EBn as specified below

The following apply:

If there is PES packet payload data in MBn, and buffer EBn is not full, the PES packet payload is transferred from MBn to EBn at a rate equal to Rbxn. If EBn is full, data are not removed from MBn. When a byte of data is transferred from MBn to EBn, all PES packet header bytes that are in MBn and precede that byte are instantaneously removed and discarded. When there is no PES packet payload data present in MBn, no data is removed from MBn. All data that enters MBn leaves it. All PES packet payload data bytes enter EBn instantaneously upon leaving MBn.

4.5 STD delay

The STD delay of any AV1 video through the system target decoders buffers TBn, MBn, and EBn shall be constrained by tdn(j) – t(i) ≤ 10 seconds for all j, and all bytes i in access unit An(j).

4.6 Buffer management conditions

Transport streams shall be constructed so that the following conditions for buffer management are satisfied:

5 Definition of DTS and PTS

An AV1 video stream multiplexed into MPEG-2 TS may contain decoder_model_info syntax elements but this is not mandatory.

5.1 PTS

If a PTS is present in the PES packet header, it shall refer to the first AV1 access unit that commences in this PES packet. If a PTS is not present in the PES packet header, it may be possible to compute its value based on the presence of timing information in the bitstream or by other means (e.g by using equal_picture_interval).

The PTS for a Decodable Frame Group (DFG) containing a frame with show_frame = 1 is the PTS of that frame. The PTS for a DFG with only show_frame = 0 is:

To achieve consistency between the STD model and the buffer model defined in Annex E of the AV1 Bitstream & Decoding Process Specification, for each AV1 access unit the PTS value in the STD shall, within the accuracy of their respective clocks, indicate the same instant in time as the PresentationTime in the decoder buffer model, as defined in Annex E of AV1 Bitstream & Decoding Process Specification.

5.2 DTS

If a DTS is present in the PES packet header, it shall refer to the first AV1 access unit that commences in this PES packet.

To achieve consistency between the STD model and the buffer model defined in Annex E of the AV1 Bitstream & Decoding Process Specification, for each AV1 access unit the DTS value in the STD shall, within the accuracy of their respective clocks, indicate the same instant in time as the ScheduledRemovalTiming in the decoder buffer model, as defined in Annex E of AV1 Bitstream & Decoding Process Specification.

6. Acknowledgements

This Technical Specification has been produced by VideoLAN, with inputs from the authors mentioned below who are from the following companies: ATEME, OpenHeadend, Open Broadcast Systems, Videolabs under the direction of VideoLAN.

Authors