AV1 Still Image File Format (AVIF)

Living Standard,

This version:
https://AOMediaCodec.github.io/av1-avif
Issue Tracking:
GitHub
Editor:
(Netflix)
Warning

This specification is still at draft stage and should not be referenced other than as a working draft.


Abstract

This document specifies how to store [AV1] still images in a file using [ISOBMFF] structures.

1. Overview

AVIF is a file format wrapping compressed images based on the Alliance for Open Media AV1 intra-frame encoding toolkit. AVIF supports High Dynamic Range (HDR) and wide color gamut (WCG) images as well as standard dynamic range (SDR). Only the intra-frame encoding toolkit is used in AVIF version 1.0. Using the intra-frame encoding mechanism from an existing video codec standard has a precedent in WebP: VP8, and HEIF: HEVC. This document describes at a high level a proposal on the structure of AVIF version 1.0.

The initial version of AVIF seeks to be simple, with just enough structure to allow the distribution of images based on the AV1 intra-frame coding toolset. At its core, AVIF 1.0 will allow for one or more images plus all supporting data needed to correctly reconstruct and display the images to be conveyed in a file. The ability to embed a thumbnail image will also be provided. An image sequence with suggested playback timing may be defined.

1.1. Target Features

2. Terms and Definitions

2.1. Alpha Image

A specific type of Auxiliary Image that may be used to convey information representing the opacity of associated Master Images.

2.2. Auxiliary Image

An image that is not be intended to be displayed but provides supplemental information for associated Master Images. AVIF allows only one type of Auxiliary Image: an Alpha Image.

2.3. Cover Image

A Master Image that may be used to represent the file contents. An example of this is a single image used to represent an animation before the animation sequence is activated.

2.4. Image Collection

One or more Master Images stored as items in a single file with no defined order or timing information. Within a collection, groups of image samples may share properties and metadata.

2.5. Image Properties

This is a class of non-media data. The property items may be descriptive image attributes or decoder configuration data. The properties are primarily for consumption by the decoding agent. This information may include:

2.6. Image Sequence

A sequence of Master Images stored as a track for which information is provided that defines a sequential ordering and temporal information indicating suggested playback timing. An agent decoding and presenting an AVIF file may chose to render an Image Sequence as an animation.

2.7. Master Image

An image that is not a thumbnail or auxiliary image. For the purpose of this specification, such an image is encoded using AV1 intra-frame tools. This type of image is the primary displayable payload of an AVIF file. A Master Image may be used as a member of both an Image Collection and an Image Sequence.

2.8. Metadata

Metadata conveys image attributes that are not used to decode or reconstruct an image. This data is considered to be non-essential and non-normative. Examples of this include EXIF, XMP, and MPEG-7. An AVIF reader will not be required to extract metadata from a Informational Metadata boxes. Essential information shall be carried in the image media directly or be conveyed as Image Properties.

2.9. Thumbnail Image

This is a non-master image that may be used to represent one or more Master Images found in an AVIF file. It is typically of a smaller scale than the Master Images. Its compression format may be different than the one used by the Master Images.

3. Object Model and Structure

An AVIF file should be a simplified and conformant version of an [HEIF] file. This is to allow for the deployment of general libraries that may be used to create and parse HEIF-based image files wrapping different coding methods for the actual image content. This should be similar to ISO-BMFF usage in the video domain.

The AVIF file format will be built on the box-structured media interchange format introduced by the ISO Base Media File Format ([ISOBMFF]). The format specified by AVIF defines the use of a subset of box structures introduced in ISOBMFF. Where the necessary structures do not exist in ISOBMFF, structures defined as part of the High Efficiency Image File Format ([HEIF]: ISO/IEC 23008-12) that are codec neutral and can be applied in a generic manner are used. An AVIF version 1.0 file shall be compliant to the requirements of Clause 4 of the [ISOBMFF] specification, and where applicable, the recommendations in Annex I: Guidelines On Defining New Image Formats and Brands in the MPEG HEIF specification shall be followed for AVIF 1.0.

3.1. Image Storage

All of the constituent elements, including image samples, shall be contained in a single file. All media data locations, regardless of construction method, shall resolve to an offset within an AVIF file.

3.2. Image Collection Elements

An Image Collection is the most basic format of an AVIF file. This form should be used for the case of a single image, or when a group of images that have no logical or temporal sequencing.

Image Collections are structured as items defined in a single file-level MetaBox. Each master, thumbnail or auxiliary image that are a component of the Image Collection shall have an item definition in this MetaBox. Items have no timing information. Any association between a Master Image and an Auxiliary or Thumbnail image must be defined explicitly.

An AVIF file containing an Image Collection shall list the "mif1" brand as one of the entries in the compatible_brands array and conform to section 10.1 of [HEIF]. All boxes used to structure a collection are located at the file level within a MetaBox: there are no track level boxes. The image data for a collection may be stored within an ItemDataBox or a MediaDataBox.

The images stored in the file are listed in an ItemLocationBox. A unique ItemLocationBox entry shall be used to reference the media data for each image in the collection: Master, Thumbnail and Alpha images. Each image shall have an entry in the ItemLocationBox.

The type of the item element is identified in an entry in the ItemInfoBox. Every ItemLocationBox entry shall have a matching entry in the ItemInfoBox cross referenced by the item_ID.

The images making up an Image Collection do not need to have the same Image Properties. A specific property may be assigned to one or more of the images within the Image Collection using an ItemPropertyBox entry.

3.3. Image Sequence Elements

Image Sequences are structured as tracks. An individual track shall contain one type of image: master, auxiliary or thumbnail. Associations between a thumbnail or auxiliary image, and a master image, is determined by finding components of each track that are time-parallel as defined by procedures detailed in [ISOBMFF].

An AVIF file containing an Image Sequence shall list the "msf1" brand as one of the entries in the compatible_brands array.

The timing information provided for a Master Image track is advisory. However, the timing relationship between the master track and any associated non-master track shall be treated strictly when used to determine which images are time-parallel.

All Master Images that are part of a sequence shall share the same Image Properties. Any Image Properties assigned to the sequence shall be linked to the track and shall not be linked to an individual image sample. Image Properties are linked to tracks by setting the item_ID field of an ItemPropertyAssociation to a track_id.

The determination as to which master track images a non-master image is bound to is made after the decoding sample time is reconstructed for all tracks of the Image Sequence using the TimeToSampleBox (stts) information associated with each track. A non-master image is linked to each master image that has a decode time equal the decode time for that image or falls within the period before the decode time of the next non-master image in the track sequence.

Determining time-parallel images between track using decode time.
Determining time-parallel images between track using decode time

3.4. Thumbnails

For Image Collections, a Thumbnail Image and one or more Master Images shall be linked using an ItemReferenceBox entry with a referenceType of "thmb" in the file-level MetaBox. A single thumbnail may be linked to multiple non-sequential Master Images in the collection.

For Image Sequences, a Thumbnail Image track may be associated with the Master Image track. The thumbnail track may contain one or more images. The number of Thumbnail Images shall not exceed the number of Master Images. Presentation timing for the thumbnail track is derived from the track-level TimeToSampleBox and may be treated as advisory for playback. A thumbnail track shall be associated with the primary track using a TrackReferenceBox with a referenceType of "thmb".

3.5. Cover Image

A Cover Image differs from a Thumbnail Image in that it is also a Master Image. For Image Collections, a PrimaryItemBox found in the file-level MetaBox may be used to indicate the image item that is to be considered the Cover Image. This image item shall be a master image. If not explicitly indicated in the above manner, the Cover Image shall be assumed to be the first master image entry in the ItemLocationBox.

For an Image Sequence, a PrimaryItemBox in a MetaBox found in the TrackBox of the master image track may be used to indicate the Cover Image. The MetaBox containing this PrimaryItemBox shall have a HandlerBox with a hander_type set to "pict" and an ItemLocationBox that has an entry with the same item_ID that is contained in the PrimaryItemBox. The item identified by the item_ID shall a Master Image. If not explicitly indicated in the above manner, the Cover Image shall be assumed to be first entry in the master track.

3.6. Alpha Channel

An Alpha Image is a specific type of auxiliary image that is used to carry per pixel opacity information for one or more Master Images. This is the only type of Auxiliary Image supported by AVIF.

A URN will be defined to identify AVIF alpha auxiliary images or tracks. For the purposes of this draft the placeholder urn:aom:avif:alpha will be used.

A brand that represents the encoding format of the alpha image shall be placed in the compatible_brands array of the FileTypeBox.

The alpha image shall have the same dimensional attributes as the largest channel plane in the Master Image: width, height, and pixel aspect ratio. Furthermore, the pixels of the Alpha Image shall overlay the pixels of the largest component plane of any linked Master Image exactly. For example, for YUV 4:2:x, this would be the Y component plane. The decoded value of an alpha pixel is shall be a normalized unsigned integer of at least 8 bits representing a value between 0.0 and 1.0 or a floating point value between 0.0 and 1.0.

For an Image Collection, Alpha Images and Master Images shall be linked using an ItemReferenceBox entry with a reference type of "auxl". A single Alpha Image may be linked to one or more Master Images. The total number of Alpha Images shall not exceed the number of Master Images. When an Alpha Image applies to a subset of the Master Image items this shall constrain all Master Images in that subset to have the same dimensional attributes.

When the AVIF file is structured as an Image Sequence, Alpha images may be associated with an Image Sequence using an associated auxiliary track. This track shall have a TrackReferenceBox with a referenceType of "auxl" with an auxiliary type of urn:aom:avif:alpha. For AVIF, only one alpha track may be included in the file. Linking of samples in the Alpha Image track and Master Image track is defined by derived parallel time alignment using each stream’s TimeToSampleBox. The timing relationship between alpha track and primary track is strict and shall not be treated as advisory. At a minimum, the alpha track shall contain a single image.

3.7. Metadata

Metadata is associated through a "cdsc" (content describes) item referenceType element in an ItemReferenceBox for an Image Collection, or a TrackReferenceBox for an Image Sequence. A conforming AVIF reader may ignore all metadata.

4. File Format

4.1. Common Elements

4.1.1. FileTypeBox

Each AVIF file will begin with a FileTypeBox. This shall be the first box in the file and may only be preceded by non-ISOBMFF data when necessary. The box should identify this file as conforming to HEIF with specific branding registered for AVIF.

If the major_brand field is set to "avif" then the minor_version shall be set to 0.

If the major_brand field is not set to "avif", then the brand "avif" shall appear in the compatible_brands array.

The compatible_brands array shall contain "mif1" if the file contains an Image Collection.

The compatible_brands array shall contain "msf1" if the file contains an Image Sequence.

The compatible_brands array shall contain an appropriate brand for the encoding used for thumbnails images if it is not AV1.

The compatible_brands array shall contain an appropriate brand for the encoding used for alpha images if it is not AV1.

4.2. Collection Elements

An AVIF file reader must be able recognize the following boxes. Any box field or feature not explicitly limited by this specification should be handled as defined in ISO 14496‑12 and ISO 23008-12.

box hierarchy version box description
ftyp - file type
meta 0 metadata container box
hdlr 0 handler type definition
pitm 0,1 primary item reference
iloc 0,1,2 item location table
iinf 0,1 item information table
infe 2,3 item information table entry
iprp - item properties container box
ipco 0 item property definitions
ipma 0,1 item property associations

4.2.1. MetaBox

A file-level MetaBox shall follow immediately after the FileTypeBox if the file contains an Image Collection.

4.2.2. HandlerBox

The handler type for the MetaBox shall be "pict".

4.2.3. PrimaryItemBox

One PrimaryItemBox may be used in this MetaBox to provide a reference to a Cover Image. The image referenced by the PrimaryItemBox shall be a Master Image. The PrimaryItemBox shall come before the ItemLocationBox.

4.2.4. ItemLocationBox

The data_reference_index for any AVIF conformant element in the ItemLocationBox table entry shall be set to zero.

The construction_method value of 2 shall not be used.

4.2.5. ItemPropertiesBox

Item Properties are associated with an item using an ItemPropertiesBox. The type of each property is uniquely identified with a fourCC. Two categories of image properties may be found in an AVIF file.

The first are decoder specific configuration and initialization properties. The AVIF codec configuration property will be identified with an property type of "av1C". Non-master image types recommendations may also define codec configuration property formats and usage rules.

The second are image descriptive properties conveying the attributes of the encoded image and, by implication, the reconstructed image post decode. The descriptive image properties that may be used in AVIF compliant files are:

4.2.6. ItemInfoBox

All images samples that are members of the Image Collection shall have an entry in the ItemInfoBox item_infos table. This includes primary, thumbnail, and alpha images

Image samples of the collection shall use version 2 or 3 of the ItemInfoEntry box.

Each ItemInfoEntry referencing a master image shall have an item_type of "av1i".

Each ItemInfoEntry referencing an alpha plane image shall have an item_type field of"uri" and an item_uri_type set to "urn:aom:avif:alpha".

Each ItemInfoEntry referencing a thumbnail image shall have an item_type field set to either"av1i" or"jpeg".

The item_name is optional. A single byte null string shall be used to indicate an empty item_name.

4.2.7. ItemReferenceBox

An AVIF file reader shall support item references of the following types:

An AVIF file reader may optionally support item references of the follow type:

4.2.8. ItemDataBox

An ItemDataBox shall be used if any ItemLocationBox entry has an"idat" construction method.

4.3. Sequence Elements

An AVIF file reader must be able recognize the following boxes. Any box field or feature not explicitly limited by this specification should be handled as defined in ISO 14496‑12 and ISO 23008-12.

box hierarchy version box description
ftyp - file type
moov - movie container box
trak - track container box
tkhd 0,1 track header
tref - track references
mdia - media information container
mdhd 0,1 media information header
hdlr 0 media handler type
minf - media information box
vmhd video media header
dinf - data information container
dref 0 data references for track sources
stbl - sample table mapping container
stts 0 sample to decode time table
stsd sample description(visual sample entry box subclass)
stsz 0 sample size table
stsc 0 sample to chunk table
stco 0 chunk offset table
mdat -

4.3.1. MovieBox

4.3.2. TrackHeaderBox

For a track of sample type "av1i" the flag field shall be set to track_enabled and track_in_movie.

For "thmb" the track_in_preview flag shall be set and track_in_movie cleared.

For "auxv" (auxiliary used to convey alpha images) tracks the flag field shall be set to zero.

4.3.3. TrackReferenceBox

The reference_type values supported by an AVIF file shall be:

4.3.4. MediaBox

Each MediaBox associated with AVIF elements "av1i" and "thmb" shall contain a HandlerBox with the handler_type shall be set to "pict".

For alpha auxiliary tracks, the handler_type of the HandlerBox shall be "auxv".

4.3.5. SampleTableBox

The "av1i" codec configuration (see ItemPropertyBox above) block shall be used as the AV1 specific version of the SampleDescriptionBox for all AV1 encoded image tracks: primary and AV1 encoded thumbnail images.

A JPEG encoded thumbnail shall conform to section H of the HEIF specification and use the "jpgC" JpegConfigurationBox as its sample descriptor.

For an alpha track, the AuxiliaryTypeInfoBox shall be used as SampleDescriptionBox entry in the SampleTableBox for an alpha track. The aux_track_type shall be assigned the string urn:aom:avif:alpha. This is the only type of auxiliary track defined by this specification.

4.3.6. DataReferenceBox

Only one type of DataReferenceBox table entry shall be used to define to the location of an AVIF media data element: the DataEntryUrlBox. Furthermore, the entry_flag must be set to 0x001 to signal that the data element resides in the same file as the containing DataReferenceBox.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[AV1]
AV1 Bitstream & Decoding Process Specification. LS. URL: http://av1-spec.argondesign.com/av1-spec/av1-spec.html
[HEIF]
Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 12: Image File Format. December 2017. International Standard. URL: https://www.iso.org/standard/66067.html
[ISOBMFF]
Information technology — Coding of audio-visual objects — Part 12: ISO Base Media File Format. December 2015. International Standard. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c068960_ISO_IEC_14496-12_2015.zip
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119