1. Scope
[AV1] defines the syntax and semantics of an AV1 bitstream. The AV1 Image File Format (AVIF) defined in this document supports the storage of a subset of the syntax and semantics of an AV1 bitstream in a [HEIF] file. The AV1 Image File Format defines multiple profiles, which restrict the allowed syntax and semantics of the AV1 bitstream with the goal to improve interoperability, especially for hardware implementations. The profiles defined in this specification follow the conventions of the [MIAF] specification. Images encoded with AV1 and not meeting the restrictions of the defined profiles may still be compliant to this AV1 Image File Format if they adhere to the general AVIF requirements.
AV1 Image File Format supports High Dynamic Range (HDR) and Wide Color Gamut (WCG) images as well as Standard Dynamic Range (SDR). It supports monochrome images as well as multi-channel images with all the bit depths and color spaces specified in [AV1].
AV1 Image File Format also supports multi-layer images as specified in [AV1] to be stored both in image items and image sequences.
An AVIF file is designed to be a conformant [HEIF] file for both image items and image sequences. Specifically, this specification follows the recommendations given in "Annex I: Guidelines On Defining New Image Formats and Brands" of [HEIF].
This specification reuses syntax and semantics used in [AV1-ISOBMFF].
2. Image Items and properties
2.1. AV1 Image Item
When an item is of type av01, it is called an AV1 Image Item, and shall obey the following constraints:
- The AV1 Image Item shall be a conformant MIAF image item.
- The AV1 Image Item shall be associated with an AV1ItemConfigurationProperty.
-
The content of an AV1 Image Item is called the AV1 Image Item Data and shall obey the following constraints:
- The AV1 Image Item Data shall be identical to the content of an AV1 Sample marked as sync, as defined in [AV1-ISOBMFF].
- The AV1 Image Item Data shall have exactly one Sequence Header OBU.
-
If the AV1 Image Item Data consists of a single frame (i.e. when using a single layer),
- It should have its
still_pictureflag set to 1. - It should have its
reduced_still_picture_headerflag set to 1.
- It should have its
2.2. Image Item Properties
2.2.1. AV1 Item Configuration Property
Box Type: av1C Property type: Descriptive item property Container: ItemPropertyContainerBox Mandatory (per item): Yes, for an image item of type 'av01', no otherwise Quantity (per item): One for an image item of type 'av01', zero otherwise
The syntax and semantics of the AV1ItemConfigurationProperty are identical to those of the AV1CodecConfigurationBox defined in [AV1-ISOBMFF], with the following constraints:
- Sequence Header OBUs should not be present in the AV1CodecConfigurationBox.
- If a Sequence Header OBU is present in the AV1CodecConfigurationBox, it shall match the Sequence Header OBU in the AV1 Image Item Data.
- The values of the fields in the AV1CodecConfigurationBox shall match those of the Sequence Header OBU in the AV1 Image Item Data.
- The bit depth and number of channels in the AV1CodecConfigurationBox shall match the PixelInformationProperty if present.
- Metadata OBUs, if present, shall match the values given in other item properties, such as the MasteringDisplayColourVolumeBox or ContentLightLevelBox.
2.2.2. Image Spatial Extents Property
The semantics of the ispe property as defined in [HEIF] apply. More specifically, for AV1 images, the values of image_width and image_height shall respectively equal the values of UpscaledWidth and FrameHeight as defined in [AV1] but for a specific frame in the item payload. The exact frame depends on the presence and content of the lsel and OperatingPointSelectorProperty properties as follows:
-
In the absence of a lsel property associated with the item, or if it is present and its layer_id value is set to 0xFFFF:
-
If no OperatingPointSelectorProperty is associated with the item, the ispe shall document the dimensions of the last frame decoded when processing the operating point whose index is 0.
-
If an OperatingPointSelectorProperty is associated with the item, the ispe property shall document the dimensions of the last frame decoded when processing the corresponding operating point.
NOTE: The dimensions of possible intermediate output images might not match the ones given in the ispe property. If they display these intermediate images, renderers are expected to scale the output image to match the ispe property.
-
-
If a lsel property is associated with an item and its layer_id is different from 0xFFFF, the ispe property documents the dimensions of the output frame produced by decoding the corresponding layer.
NOTE: The dimensions indicated in the ispe property might not match the values max_frame_width_minus1+1 and max_frame_height_minus1+1 indicated in the AV1 bitstream.
NOTE: The values of render_width_minus1 and render_height_minus1 possibly present in the AV1 bistream are not exposed in the AVIF container level.
2.2.3. Other Item Properties
In addition to the Image Properties defined in this document, AV1 image items MAY also be associated with item properties defined in other specifications such as [HEIF] and [MIAF]. Examples of commonly used item properties are:
In general, it is recommended to use properties instead of Metadata OBUs in the AV1ItemConfigurationProperty.
NOTE: Although the clean aperture property (clap) defined in [HEIF] is applicable to AVIF, implementers of authoring tools should be aware of the possibility of unintended consequences since users may not realize image data outside the clap region is still in the file. A future revision of this specification may place normative restrictions on how clap can be used.
2.3. AV1 Layered Image Items
2.3.1. Overview
[AV1] supports encoding a frame using multiple spatial layers. A spatial layer may improve the resolution or quality of the image decoded based on one or more of the previous layers. A layer may also provide an image that does not depend on the previous layers. Additionally, not all layers are expected to produce an image meant to be rendered. Some decoded images may be used only as intermediate decodes. Finally, layers are grouped into one or more Operating Points. The Sequence Header OBU defines the list of Operating Points, provides required decoding capabilities, and indicates which layers form each Operating Point.
[AV1] delegates the selection of which Operating Point to process to the application, by means of a function called choose_operating_point(). AVIF defines the OperatingPointSelectorProperty to control this selection. In the absence of an OperatingPointSelectorProperty associated with an AV1 Image Item, the AVIF renderer is free to process any Operating Point present in the AV1 Image Item Data. In particular, when the AV1 Image Item is composed of a unique Operating Point, the OperatingPointSelectorProperty should not be present. If an OperatingPointSelectorProperty is associated with an AV1 Image Item, the op_index field indicates which Operating Point is expected to be processed for this item.
NOTE: When an author wants to offer the ability to render multiple Operating Points from the same AV1 image (e.g. in the case of multi-view images), multiple AV1 Image Items can be created that share the same AV1 Image Item Data but have different OperatingPointSelectorPropertys.
[AV1] expects the renderer to display only one frame within the selected Operating Point, which should be the highest spatial layer that is both within the Operating Point and present within the temporal unit, but [AV1] leaves the option for other applications to set their own policy about which frames are output, as defined in the general output process. AVIF sets a different policy, and defines how the lsel property (mandated by [HEIF] for layered images) is used to control which layer is rendered. According to [HEIF], the interpretation of the layer_id field in the lsel property is codec specific. In this specification, the value 0xFFFF is reserved for a special meaning. If a lsel property is associated with an AV1 Image Item but its layer_id value is set to 0xFFFF, the renderer is free to render either only the output image of the highest spatial layer, or to render all output images of all the intermediate layers and the highest spatial layer, resulting in a form of progressive decoding. If a lsel property is associated with an AV1 Image Item and the value of layer_id is not 0xFFFF, the renderer is expected to render only the output image for that layer.
NOTE: When such a progressive decoding of the layers within an Operating Point is not desired or when an author wants to expose each layer as a specific item, multiple AV1 Image Items sharing the same AV1 Image Item Data can be created and associated with different lsel properties, each with a different value of layer_id.
2.3.2. Properties
2.3.2.1. Operating Point Selector Property
2.3.2.1.1. Definition
Box Type: a1op
Property type: Descriptive item property
Container: ItemPropertyContainerBox
Mandatory (per item): No
Quantity (per item): Zero or one
2.3.2.1.2. Description
An OperatingPointSelectorProperty may be associated with an AV1 Image Item to provide the index of the operating point to be processed for this item. If associated, it shall be marked as essential.
2.3.2.1.3. Syntax
class OperatingPointSelectorProperty extends ItemProperty('a1op') {
unsigned int(8) op_index;
}
2.3.2.1.4. Semantics
op_index indicates the index of the operating point to be processed for this item. Its value shall be between 0 and operating_points_cnt_minus_1.
2.3.2.2. Layer Selector Property
The lsel property defined in [HEIF] may be associated with an AV1 Image Item. The layer_id indicates the value of the spatial_id to render. The value shall be between 0 and 3, or the special value 0xFFFF. When a value between 0 and 3 is used, the corresponding spatial layer shall be present in the bitstream and shall produce an output frame. Other layers may be needed to decode the indicated layer. When the special value 0xFFFF is used, progressive decoding is allowed as described in § 2.3.1 Overview.2.3.2.3. Layered Image Indexing Property
2.3.2.3.1. Definition
Box Type: a1lx
Property type: Descriptive item property
Container: ItemPropertyContainerBox
Mandatory (per item): No
Quantity (per item): Zero or one
2.3.2.3.2. Description
The AV1LayeredImageIndexingProperty property may be associated with an AV1 Image Item. It should not be associated with AV1 Image Items consisting of only one layer.
The AV1LayeredImageIndexingProperty documents the size in bytes of each layer (except the last one) in the AV1 Image Item Data, and enables determining the byte ranges required to process one or more layers of an Operating Point. If associated, it shall not be marked as essential.
2.3.2.3.3. Syntax
class AV1LayeredImageIndexingProperty extends ItemProperty('a1lx') {
unsigned int(7) reserved = 0;
unsigned int(1) large_size;
FieldLength = (large_size + 1) * 16;
unsigned int(FieldLength) layer_size[3];
}
2.3.2.3.4. Semantics
layer_size indicates the number of bytes corresponding to each layer in the item payload, except for the last layer. Values are provided in increasing order of spatial_id. A value of zero means that all the layers except the last one have been documented and following values shall be 0. The number of non-zero values shall match the number of layers in the image minus one..
NOTE: The size of the last layer can be determined by subtracting the sum of the sizes of all layers indicated in this property from the entire item size.
3. Image Sequences
An AV1 Image Sequence is defined as a set of AV1 Temporal Units stored in an AV1 track as defined in [AV1-ISOBMFF] with the following constraints:
- The track shall be a valid MIAF image sequence.
- The track handler for an AV1 Image Sequence shall be
pict. - The track shall have only one sample description entry.
- If multiple Sequence Header OBUs are present in the track payload, they shall be identical.
4. Auxiliary Image Items and Sequences
An AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence) is an AV1 Image Item (respectively AV1 Image Sequence) with the following additional constraints:
- It shall be a compliant MIAF Auxiliary Image Item (respectively MIAF Auxiliary Image Sequence).
- The
mono_chromefield in the Sequence Header OBU shall be set to 1. -
The
color_rangefield in the Sequence Header OBU shall be set to 1.
An AV1 Alpha Image Item (respectively an AV1 Alpha Image Sequence) is an AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence), and as defined in [MIAF], with the aux_type field of the AuxiliaryTypeProperty (respectively AuxiliaryTypeInfoBox) set to urn:mpeg:mpegB:cicp:systems:auxiliary:alpha. An AV1 Alpha Image Item (respectively an AV1 Alpha Image Sequence) shall be encoded with the same bit depth as the associated master AV1 Image Item (respectively AV1 Image Sequence).
For AV1 Alpha Image Item and AV1 Alpha Image Sequence, the ColourInformationBox should be omitted. If present, readers shall ignore it.
An AV1 Depth Image Item (respectively an AV1 Depth Image Sequence) is an AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence), and as defined in [MIAF], with the aux_type field of the AuxiliaryTypeProperty (respectively AuxiliaryTypeInfoBox) set to urn:mpeg:mpegB:cicp:systems:auxiliary:depth.
NOTE: [AV1] supports encoding either 3-component images (whose semantics are given by the matrix_coefficients element), or 1-component images (monochrome). When an image requires a different number of components, multiple auxiliary images may be used, each providing additional component(s), according to the semantics of their aux_type field. In such case, the maximum number of components is restricted by number of possible items in a file, coded on 16 or 32 bits.
5. Brands, Internet media types and file extensions
5.1. Brands overview
As defined by [ISOBMFF], the presence of a brand in the compatible_brands list in the FileTypeBox can be interpreted as the permission for those AV1 Image File Format readers/parsers and AV1 Image File Format renderers that only implement the features required by the brand, to process the corresponding file and only the parts (e.g. items or sequences) that comply with the brand.
An AV1 Image File Format file may conform to multiple brands. Similarly, an AV1 Image File Format reader/parser or AV1 Image File Format renderer may be capable of processing the features associated with one or more brands.
If any of the brands defined in this document is specified in the major_brand field of the FileTypeBox, the file extension and Internet Media Type should respectively be ".avif" and "image/avif" as defined in § 9 AVIF Media Type Registration.
5.2. AVIF image and image collection brand
The brand to identify AV1 image items is avif.Files that indicate this brand in the compatible_brands field of the FileTypeBox shall comply with the following:
- The primary item shall be an AV1 Image Item or be a derived image that references directly or indirectly one or more items that all are AV1 Image Items.
-
AV1 auxiliary image items may be present in the file.
avif in the compatible_brands field of the FileTypeBox.
Additionally, the brand avio is defined. If the file indicates the brand avio in the compatible_brands field of the FileTypeBox, then the primary item or all the items referenced by the primary item shall be AV1 image items made only of Intra Frames. Conversely, if the previous constraint applies, the brand avio should be used in the compatible_brands field of the FileTypeBox.
5.3. AVIF image sequence brands
The brand to identify AVIF image sequences is avis.Files that indicate this brand in the compatible_brands field of the FileTypeBox shall comply with the following:
- they shall contain one or more AV1 image sequences.
-
they may contain AV1 auxiliary image sequences.
avis in the compatible_brands field of the FileTypeBox.
Additionally, if a file contains AV1 image sequences and the brand avio is used in the compatible_brands field of the FileTypeBox, the item constraints for this brand shall be met and at least one of the AV1 image sequences shall be made only of AV1 Samples marked as sync. Conversely, if such a track exists and the constraints of the brand avio on AV1 image items are met, the brand should be used.
NOTE: As defined in [MIAF], a file that is primarily an image sequence still has at least an image item. Hence, it can also declare brands for signaling the image item.
6. General constraints
The following constraints are common to files compliant with this specification:
- The file shall be compliant with the [MIAF] specification and list miaf in the compatible_brands field of the FileTypeBox.
- The file shall list
'avif'or'avis'in the compatible_brands field of the FileTypeBox. - Transformative properties shall not be associated with items in a derivation chain (as defined in [MIAF]) that serves as an input to a grid item. For example, if a file contains a grid item and its referenced coded image items, cropping, mirroring or rotation transformations are only permitted on the grid item itself.
NOTE: This constraint further restricts files compared to [MIAF].
7. Profiles
7.1. Overview
The profiles defined in this section are for enabling interoperability between AV1 Image File Format files and AV1 Image File Format readers/parsers. A profile imposes a set of specific restrictions and is signaled by brands defined in this specification.
The FileTypeBox should declare at least one profile that enables decoding of the primary image item. It is not an error for the encoder to include an auxiliary image that is not allowed by the specified profile(s). If'avis' is declared in the FileTypeBox and a profile is declared in the FileTypeBox, the profile shall also enable decoding of at least one image sequence track. The profile should allow decoding of any associated auxiliary image sequence tracks, unless it is acceptable to decode the image sequence without its auxiliary image sequence tracks.
It is possible for a file compliant to this AV1 Image File Format to not be able to declare an AVIF profile, if the corresponding AV1 encoding characteristics do not match any of the defined profiles.
NOTE: [AV1] supports 3 bit depths: 8, 10 and 12 bits, and the maximum dimensions of a coded image is 65536x65536, when seq_level_idx is set to 31 (maximum parameters level).
7.2. AVIF Baseline Profile
This section defines the MIAF AV1 Baseline profile of [HEIF], specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1B.
If the brand MA1B is in the list of compatible_brands of the FileTypeBox, the common constraints in the section § 5 Brands, Internet media types and file extensions shall apply.
The following additional constraints apply to all AV1 Image Items and all AV1 Image Sequences:
- The AV1 profile shall be the Main Profile and the level shall be 5.1 or lower.
NOTE: AV1 tiers are not constrained because timing is optional in image sequences and are not relevant in image items or collections.
NOTE: Level 5.1 is chosen for the Baseline profile to ensure that no single coded image exceeds 4k resolution, as some decoder may not be able to handle larger images. More precisely, following [AV1] level definitions, coded image items compliant to the AVIF Baseline profile may not have a number of pixels greater than 8912896, a width greater than 8192 or a height greater than 4352. It is still possible to use the Baseline profile to create larger images using grid derivation.
avif, mif1, miaf, MA1B
A file containing a pict track compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:
avis, msf1, miaf, MA1B
A file containing a pict track compliant with this profile and made only of samples marked sync is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:
avis, avio, msf1, miaf, MA1B
7.3. AVIF Advanced Profile
This section defines the MIAF AV1 Advanced profile of [HEIF], specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1A.
If the brand MA1A is in the list of compatible_brands of the FileTypeBox, the common constraints in the section § 5 Brands, Internet media types and file extensions shall apply.
The following additional constraints apply to all AV1 Image Items:
- The AV1 profile shall be the High Profile and the level shall be 6.0 or lower.
NOTE: Following [AV1] level definitions, coded image items compliant to the AVIF Advanced profile may not have a number of pixels greater than 35651584, a width greater than 16384 or a height greater than 8704. It is still possible to use the Advanced profile to create larger images using grid derivation.
The following additional constraints apply only to AV1 Image Sequences:
- The AV1 profile shall be either Main Profile or High Profile.
- The AV1 level for Main Profile shall be 5.1 or lower.
- The AV1 level for High Profile shall be 5.1 or lower.
avif, mif1, miaf, MA1A
A file containing a pict track compliant with this profile is expected to list the following brands, in any order, in the compatible_brands of the FileTypeBox:
avis, msf1, miaf, MA1A
8. Box requirements
8.1. Image item boxes
This section discusses the box requirements for an AVIF file containing only image items.8.1.1. Minimum set of boxes
As indicated in § 6 General constraints, an AVIF file is a compliant [MIAF] file. As a consequence, some [ISOBMFF] or [HEIF] boxes are required, as indicated in the following table. The order of the boxes is indicative in the table. The specifications listed in the "Specification"
column may require a specific order for the box or for its children and shall be respected. For example, per [ISOBMFF], the FileTypeBox is required to appear first in an AVIF file.
The "Version(s)" column in the following table lists the version(s) of the boxes allowed by this brand.
Other versions of the boxes shall not be used. "-" means that the box does not have a version.
| Top-Level | Level 1 | Level 2 | Level 3 | Version(s) | Specification | Note |
|---|---|---|---|---|---|---|
| ftyp | - | ISOBMFF | ||||
| meta | 0 | ISOBMFF | ||||
| hdlr | 0 | ISOBMFF | ||||
| pitm | 0, 1 | ISOBMFF | ||||
| iloc | 0, 1, 2 | ISOBMFF | ||||
| iinf | 0, 1 | ISOBMFF | ||||
| infe | 2, 3 | ISOBMFF | ||||
| iprp | - | ISOBMFF | ||||
| ipco | - | ISOBMFF | ||||
| av1C | - | AVIF | ||||
| ispe | 0 | HEIF | ||||
| pixi | 0 | HEIF | ||||
| ipma | 0, 1 | ISOBMFF | ||||
| mdat | - | ISOBMFF | The coded payload may be placed in idat rather than mdat, in which case mdat is not required. |
8.1.2. Requirements on additional image item related boxes
The boxes indicated in the following table may be present in an AVIF file to provide additional signaling for image items. The boxes may be present inside the box indicated in the "Containing box" column. If present, they shall use the version indicated in the table and AVIF readers are expected to understand them. The order of the boxes is indicative in the table. Specifications may require specific order and shall be respected. Additionally, the free and skip boxes may be present at any level in the hierarchy. AVIF readers are expected to ignore them. Additional boxes in the meta hierarchy not listed in the following table may also be present and may be ignored by AVIF readers.
| Level 1 | Level 2 | Version(s) | Specification | Containing Box | Description |
|---|---|---|---|---|---|
| dinf | - | ISOBMFF | meta | Used to indicate the location of the media information in a track | |
| dref | 0 | ISOBMFF | |||
| iref | 0, 1 | ISOBMFF | meta | Used to indicate directional relationships between images or metadata | |
| auxl | - | HEIF | Used when an image is auxiliary to another image | ||
| thmb | - | HEIF | Used when an image is a thumbnail of another image | ||
| dimg | - | HEIF | Used when an image is derived from another image | ||
| prem | - | HEIF | Used when when an alpha image contains premultiplied color values from another image | ||
| cdsc | - | HEIF | Used to link metadata with an image | ||
| idat | - | ISOBMFF | meta | Used to store derived image definitions | |
| grpl | - | ISOBMFF | meta | Used to indicate that multiple images are semantically grouped | |
| altr | 0 | ISOBMFF | Used when images in a group are alternative to each other | ||
| pasp | - | ISOBMFF | ipco | Used to signal pixel aspect ratio. If present, shall indicate a pixel aspect ratio of 1:1 | |
| colr | - | ISOBMFF | ipco | Used to signal information such as color primaries. | |
| auxC | 0 | HEIF | ipco | Used to signal the type of an auxiliary image (e.g. alpha, depth). | |
| clap | - | ISOBMFF | ipco | Used to signal cropping applied to an image | |
| irot | - | HEIF | ipco | Used to signal a rotation applied to an image | |
| imir | - | HEIF | ipco | Used to signal a mirroring applied to an image | |
| clli | - | ISOBMFF | ipco | Used to signal HDR light level information for an image | |
| cclv | - | ISOBMFF | ipco | Used to signal HDR color volume for an image | |
| mdcv | - | ISOBMFF | ipco | Used to signal HDR mastering information for an image | |
| a1op | - | AVIF | ipco | Used to configure rendering of a multiple operating points image | |
| lsel | - | HEIF | ipco | Used to configure rendering of a multilayered image | |
| a1lx | - | AVIF | ipco | Used to assist reader in parsing a multilayered image |
9. AVIF Media Type Registration
The media type "image/avif" is officially registered with IANA and available at: https://www.iana.org/assignments/media-types/image/avif.
10. Changes since v1.1.0 release
-
EDITORIAL: Stop using `dfn value` for definitions.
-
EDITORIAL: Add assert-ids in the spec for conformance file testing and ComplianceWarden
-
EDITORIAL: Add "per item" to item property definitions
-
EDITORIAL: Fix broken link for latest-draft.html
-
Relax constraint on transformative properties in derivation chains to only apply to grid items
-
Clarify relationship between av1C, metadata OBUs and item properties
-
EDITORIAL: Update list of other item properties