AV1 Image File Format (AVIF)

Working Group Approved,

This version:
https://AOMediaCodec.github.io/av1-avif
Issue Tracking:
GitHub
Editors:
(Netflix)
(Microsoft)
Former Editor:
(Netflix)
Last approved version:
v1.0.0
Warning

This specification is a draft of a potential new version of this specification and should not be referenced other than as a working draft.

Copyright 2021, The Alliance for Open Media

Licensing information is available at http://aomedia.org/license/

The MATERIALS ARE PROVIDED “AS IS.” The Alliance for Open Media, its members, and its contributors expressly disclaim any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the materials. The entire risk as to implementing or otherwise using the materials is assumed by the implementer and user. IN NO EVENT WILL THE ALLIANCE FOR OPEN MEDIA, ITS MEMBERS, OR CONTRIBUTORS BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Abstract

This document specifies syntax and semantics for the storage of [AV1] images in the generic image file format [HEIF], which is based on [ISOBMFF]. While [HEIF] defines general requirements, this document also specifies additional constraints to ensure higher interoperability between writers and readers when [HEIF] is used with [AV1] images. These constraints are based on constraints defined in the Multi-Image Application Format [MIAF] and are grouped into profiles inspired by the profiles defined in [MIAF].

1. Scope

[AV1] defines the syntax and semantics of an AV1 bitstream. The AV1 Image File Format (AVIF) defined in this document supports the storage of a subset of the syntax and semantics of an AV1 bitstream in a [HEIF] file. The AV1 Image File Format defines multiple profiles, which restrict the allowed syntax and semantics of the AV1 bitstream with the goal to improve interoperability, especially for hardware implementations. The profiles defined in this specification follow the conventions of the [MIAF] specification. Images encoded with AV1 and not meeting the restrictions of the defined profiles may still be compliant to this AV1 Image File Format if they adhere to the general AVIF requirements.

AV1 Image File Format supports High Dynamic Range (HDR) and Wide Color Gamut (WCG) images as well as Standard Dynamic Range (SDR). It supports monochrome images as well as multi-channel images with all the bit depths and color spaces specified in [AV1].

AV1 Image File Format also supports multi-layer images as specified in [AV1] to be stored both in image items and image sequences.

An AVIF file is designed to be a conformant [HEIF] file for both image items and image sequences. Specifically, this specification follows the recommendations given in "Annex I: Guidelines On Defining New Image Formats and Brands" of [HEIF].

This specification reuses syntax and semantics used in [AV1-ISOBMFF].

2. Image Items and properties

2.1. AV1 Image Item

When an item is of type av01, it is called an AV1 Image Item, and shall obey the following constraints:

2.2. Image Item Properties

2.2.1. AV1 Item Configuration Property

Box Type:                 av1C
Property type:            Descriptive item property
Container:                ItemPropertyContainerBox
Mandatory (per  item):    Yes, for an image item of type 'av01'
Quantity:                 One for an image item of type 'av01'

The syntax and semantics of the AV1 Item Configuration Property are identical to those of the AV1CodecConfigurationBox defined in [AV1-ISOBMFF], with the following constraints:

This property should be marked as essential.

2.2.2. Image Spatial Extents Property

The semantics of the ispe property as defined in [HEIF] apply. More specifically, for AV1 images, it corresponds to the FrameWidth and FrameHeight values as defined in [AV1] for a frame that depends on the presence of the lsel and OperatingPointSelectorProperty properties as follows:

NOTE: The dimensions indicated in the ispe property might not match the values max_frame_width_minus1+1 and max_frame_height_minus1+1 indicated in the AV1 bitstream.

NOTE: The values of render_width_minus1 and render_height_minus1 possibly present in the AV1 bistream are not exposed in the AVIF container level.

2.2.3. Other Item Properties

In addition to the Image Properties defined in [HEIF], such as colr, pixi or pasp, AV1 image items MAY also be associated with clli, cclv and mdcv introduced in [MIAF].

In general, it is recommended to use properties instead of Metadata OBUs in the AV1 Item Configuration Property.

NOTE: Although the clean aperture property (clap) defined in [HEIF] is applicable to AVIF, implementers of authoring tools should be aware of the possibility of unintended consequences since users may not realize image data outside the clap region is still in the file. A future revision of this specification may place normative restrictions on how clap can be used.

2.3. AV1 Layered Image Items

2.3.1. Overview

[AV1] supports encoding a frame using multiple spatial layers. A spatial layer may improve the resolution or quality of the image decoded based on one or more of the previous layers. A layer may also provide an image that does not depend on the previous layers. Additionally, not all layers are expected to produce an image meant to be rendered. Some decoded images may be used only as intermediate decodes. Finally, layers are grouped into one or more Operating Points. The Sequence Header OBU defines the list of Operating Points, provides required decoding capabilities, and indicates which layers form each Operating Point.

[AV1] delegates the selection of which Operating Point to process to the application, by means of a function called choose_operating_point(). AVIF defines the OperatingPointSelectorProperty to control this selection. In the absence of an OperatingPointSelectorProperty associated with an AV1 Image Item, the AVIF renderer is free to process any Operating Point present in the AV1 Image Item Data. In particular, when the AV1 Image Item is composed of a unique Operating Point, the OperatingPointSelectorProperty should not be present. If an OperatingPointSelectorProperty is associated with an AV1 Image Item, the op_index field indicates which Operating Point is expected to be processed for this item.

NOTE: When an author wants to offer the ability to render multiple Operating Points from the same AV1 image (e.g. in the case of multi-view images), multiple AV1 Image Items can be created that share the same AV1 Image Item Data but have different OperatingPointSelectorPropertys.

[AV1] expects the renderer to display only one frame within the selected Operating Point, which should be the highest spatial layer that is both within the Operating Point and present within the temporal unit, but [AV1] leaves the option for other applications to set their own policy about which frames are output, as defined in the general output process. AVIF sets a different policy, and defines how the lsel property (defined in [HEIF]) is used to control which layer is rendered. In the absence of a lsel property associated with an AV1 Image Item, the renderer is free to render either: only the output image of the highest spatial layer, or to render all output images of all the intermediate layers, resulting in a form of progressive decoding. If a lsel property is associated with an AV1 Image Item, the renderer is expected to render only the output image for that layer.

NOTE: When such a progressive decoding of the layers within an Operating Point is not desired or when an author wants to expose each layer as a specific item, multiple AV1 Image Items sharing the same AV1 Image Item Data can be created and associated with different lsel properties, each with a different value of layer_id.

2.3.2. Properties

2.3.2.1. Operating Point Selector Property
2.3.2.1.1. Definition
Box Type:       a1op
Property type:  Descriptive item property
Container:      ItemPropertyContainerBox
Mandatory:      No
Quantity:       Zero or one
2.3.2.1.2. Description

An OperatingPointSelectorProperty may be associated with an AV1 Image Item to provide the index of the operating point to be processed for this item. If associated, it shall be marked as essential.

2.3.2.1.3. Syntax
class OperatingPointSelectorProperty extends ItemProperty('a1op') {
    u8 op_index;
}
2.3.2.1.4. Semantics

op_index indicates the index of the operating point to be processed for this item. Its value shall be between 0 and operating_points_cnt_minus_1.

2.3.2.2. Layer Selector Property
The lsel property defined in [HEIF] may be associated with an AV1 Image Item. The layer_id indicates the value of the spatial_id to render. The value shall be between 0 and 3. The corresponding spatial layer shall be present in the bitstream and shall produce an output frame. Other layers may be needed to decode the indicated layer.
2.3.2.3. Layered Image Indexing Property
2.3.2.3.1. Definition
Box Type:       a1lx
Property type:  Descriptive item property
Container:      ItemPropertyContainerBox
Mandatory:      No
Quantity:       Zero or one
2.3.2.3.2. Description

The AV1LayeredImageIndexingProperty property may be associated with an AV1 Image Item. It should not be associated with AV1 Image Items consisting of only one layer.

The AV1LayeredImageIndexingProperty documents the size in bytes of each layer (except the last one) in the AV1 Image Item Data, and enables determining the byte ranges required to process one or more layers of an Operating Point. If associated, it shall not be marked as essential.

2.3.2.3.3. Syntax
class AV1LayeredImageIndexingProperty extends ItemProperty('a1lx') {
    unsigned int(7) reserved = 0;
    unsigned int(1) large_size;
    FieldLength = ((large_size & 1) + 1) * 16;
    unsigned int(FieldLength)[3] layer_size;
}
2.3.2.3.4. Semantics

layer_size indicates the number of bytes corresponding to each layer in the item payload, except for the last layer. Values are provided in increasing order of spatial_id. A value of zero means that all the layers except the last one have been documented and following values shall be 0. The number of non-zero values shall match the number of layers in the image minus one.

NOTE: The size of the last layer can be determined by subtracting the sum of the sizes of all layers indicated in this property from the entire item size.

A property indicating [X,0,0] is used for an image item composed of 2 layers. The size of the first layer is X and the size of the second layer is ItemSize - X. Note that the spatial_id for the first layer does not necessarily match the index in the array that provides the size. In other words, in this case the index giving value X is 0, but the corresponding spatial_id could be 0, 1 or 2. Similarly, a property indicating [X,Y,0] is used for an image is made of 3 layers.

3. Image Sequences

An AV1 Image Sequence is defined as a set of AV1 Temporal Units stored in an AV1 track as defined in [AV1-ISOBMFF] with the following constraints:

4. Auxiliary Image Items and Sequences

An AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence) is an AV1 Image Item (respectively AV1 Image Sequence) with the following additional constraints:

An AV1 Alpha Image Item (respectively an AV1 Alpha Image Sequence) is an AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence), and as defined in [MIAF], with the aux_type field of the AuxiliaryTypeProperty (respectively AuxiliaryTypeInfoBox) set to urn:mpeg:mpegB:cicp:systems:auxiliary:alpha. An AV1 Alpha Image Item (respectively an AV1 Alpha Image Sequence) shall be encoded with the same bit depth as the associated master AV1 Image Item (respectively AV1 Image Sequence).

For AV1 Alpha Image Item and AV1 Alpha Image Sequence, the ColourInformationBox should be omitted. If present, readers shall ignore it.

An AV1 Depth Image Item (respectively an AV1 Depth Image Sequence) is an AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence), and as defined in [MIAF], with the aux_type field of the AuxiliaryTypeProperty (respectively AuxiliaryTypeInfoBox) set to urn:mpeg:mpegB:cicp:systems:auxiliary:depth.

NOTE: [AV1] supports encoding either 3-component images (whose semantics are given by the matrix_coefficients element), or 1-component images (monochrome). When an image requires a different number of components, multiple auxiliary images may be used, each providing additional component(s), according to the semantics of their aux_type field. In such case, the maximum number of components is restricted by number of possible items in a file, coded on 16 or 32 bits.

5. Brands, Internet media types and file extensions

5.1. Brands overview

As defined by [ISOBMFF], the presence of a brand in the compatible_brands list in the FileTypeBox can be interpreted as the permission for those AV1 Image File Format readers/parsers and AV1 Image File Format renderers that only implement the features required by the brand, to process the corresponding file and only the parts (e.g. items or sequences) that comply with the brand.

An AV1 Image File Format file may conform to multiple brands. Similarly, an AV1 Image File Format reader/parser or AV1 Image File Format renderer may be capable of processing the features associated with one or more brands.

If any of the brands defined in this document is specified in the major_brand field of the FileTypeBox, the file extension and Internet Media Type should respectively be ".avif" and "image/avif" as defined in § 8 AVIF Media Type Registration.

5.2. AVIF image and image collection brand

The brand to identify AV1 image items is avif.

Files that indicate this brand in the compatible_brands field of the FileTypeBox shall comply with the following:

Files that conform with these constraints should include the brand avif in the compatible_brands field of the FileTypeBox.

Additionally, the brand avio is defined. If the file indicates the brand avio in the compatible_brands field of the FileTypeBox, then the primary item or all the items referenced by the primary item shall be AV1 image items made only of Intra Frames. Conversely, if the previous constraint applies, the brand avio should be used in the compatible_brands field of the FileTypeBox.

5.3. AVIF image sequence brands

The brand to identify AVIF image sequences is avis.

Files that indicate this brand in the compatible_brands field of the FileTypeBox shall comply with the following:

Files that conform with these constraints should include the brand avis in the compatible_brands field of the FileTypeBox.

Additionally, if a file contains AV1 image sequences and the brand avio is used in the compatible_brands field of the FileTypeBox, the item constraints for this brand shall be met and at least one of the AV1 image sequences shall be made only of AV1 Samples marked as sync. Conversely, if such a track exists and the constraints of the brand avio on AV1 image items are met, the brand should be used.

NOTE: As defined in [MIAF], a file that is primarily an image sequence still has at least an image item. Hence, it can also declare brands for signaling the image item.

6. General constraints

The following constraints are common to files compliant with this specification:

NOTE: This constraints further restricts files compared to [MIAF].

7. Profiles

7.1. Overview

The profiles defined in this section are for enabling interoperability between AV1 Image File Format files and AV1 Image File Format readers/parsers. A profile imposes a set of specific restrictions and is signaled by brands defined in this specification.

The FileTypeBox should declare at least one profile that enables decoding of the primary image item. It is not an error for the encoder to include an auxiliary image that is not allowed by the specified profile(s).

If 'avis' is declared in the FileTypeBox and a profile is declared in the FileTypeBox, the profile shall also enable decoding of at least one image sequence track. The profile should allow decoding of any associated auxiliary image sequence tracks, unless it is acceptable to decode the image sequence without its auxiliary image sequence tracks.

It is possible for a file compliant to this AV1 Image File Format to not be able to declare an AVIF profile, if the corresponding AV1 encoding characteristics do not match any of the defined profiles.

NOTE: [AV1] supports 3 bit depths: 8, 10 and 12 bits, and the maximum dimensions of a coded image is 65536x65536, when seq_level_idx is set to 31 (maximum parameters level).

If an image is encoded with dimensions (respectively a bit depth) that exceed the maximum dimensions (respectively bit depth) required by the AV1 profile and level of the AVIF profiles defined in this specification, the file will only signal general AVIF brands.

7.2. AVIF Baseline Profile

This section defines the MIAF AV1 Baseline profile of [HEIF], specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1B.

If the brand MA1B is in the list of compatible_brands of the FileTypeBox, the common constraints in the section § 5 Brands, Internet media types and file extensions shall apply.

The following additional constraints apply to all AV1 Image Items and all AV1 Image Sequences:

NOTE: AV1 tiers are not constrained because timing is optional in image sequences and are not relevant in image items or collections.

NOTE: Level 5.1 is chosen for the Baseline profile to ensure that no single coded image exceeds 4k resolution, as some decoder may not be able to handle larger images. More precisely, following [AV1] level definitions, coded image items compliant to the AVIF Baseline profile may not have a number of pixels greater than 8912896, a width greater than 8192 or a height greater than 4352. It is still possible to use the Baseline profile to create larger images using grid derivation.

A file containing items compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:

avif, mif1, miaf, MA1B

A file containing a pict track compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:

avis, msf1, miaf, MA1B

A file containing a pict track compliant with this profile and made only of samples marked sync is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:

avis, avio, msf1, miaf, MA1B

7.3. AVIF Advanced Profile

This section defines the MIAF AV1 Advanced profile of [HEIF], specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1A.

If the brand MA1A is in the list of compatible_brands of the FileTypeBox, the common constraints in the section § 5 Brands, Internet media types and file extensions shall apply.

The following additional constraints apply to all AV1 Image Items:

NOTE: Following [AV1] level definitions, coded image items compliant to the AVIF Advanced profile may not have a number of pixels greater than 35651584, a width greater than 16384 or a height greater than 8704. It is still possible to use the Advanced profile to create larger images using grid derivation.

The following additional constraints apply only to AV1 Image Sequences:

A file containing items compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:

avif, mif1, miaf, MA1A

A file containing a pict track compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:

avis, msf1, miaf, MA1A

8. AVIF Media Type Registration

The media type "image/avif" is officially registered with IANA and available at: https://www.iana.org/assignments/media-types/image/avif.

9. Changes since v1.0.0 release

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[AV1]
AV1 Bitstream & Decoding Process Specification. LS. URL: https://aomediacodec.github.io/av1-spec/av1-spec.pdf
[AV1-ISOBMFF]
AV1 Codec ISO Media File Format Binding. LS. URL: https://aomediacodec.github.io/av1-isobmff/
[HEIF]
Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 12: Image File Format. International Standard. URL: https://www.iso.org/standard/66067.html
[ISOBMFF]
Information technology — Coding of audio-visual objects — Part 12: ISO base media file format. International Standard. URL: https://www.iso.org/standard/68960.html
[MIAF]
Information technology -- Multimedia application format (MPEG-A) -- Part 22: Multi-Image Application Format (MiAF). Enquiry. URL: https://www.iso.org/standard/74417.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119