Internet-Draft | Matroska Codec | December 2024 |
Lhomme, et al. | Expires 18 June 2025 | [Page] |
This document defines the Matroska codec mappings, including the codec ID, layout of data
in a Block
element and in an optional CodecPrivate
element.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 18 June 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Matroska is a multimedia container format. It stores interleaved and timestamped audiovisual data using various codecs. To interpret the codec data, a mapping between the way the data is stored in Matroska and how it is understood by such a codec is necessary.¶
This document intends to define this mapping for many commonly used codecs in Matroska.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
A Codec Mapping
is a set of attributes to identify, name, and contextualize the format
and characteristics of encoded data that can be contained within Matroska Clusters.¶
Each TrackEntry
used within Matroska MUST reference a defined Codec Mapping
using the
CodecID
to identify and describe the format of the encoded data in its associated Clusters.
This CodecID
is a unique registered identifier that represents the encoding stored within
the Track
. Certain encodings MAY also require some form of codec initialization
to provide its decoder with context and technical metadata.¶
The intention behind this list is not to list all existing audio and video codecs,
but rather to list those codecs that are currently supported in Matroska and therefore
need a well defined CodecID
so that all developers supporting Matroska will use the
same CodecID
. If you feel we missed support for a very important codec, please tell
us on our development mailing list (cellar at ietf.org).¶
Support for a codec is defined in Matroska with the following values.¶
Each codec supported for storage in Matroska MUST have a unique CodecID
.
Each CodecID
MUST be prefixed with the string from the following table according to
the associated type of the codec. All characters of a Codec ID Prefix
MUST be
capital letters (A-Z) except for the last character of a Codec ID Prefix
which MUST be
an underscore ("_").¶
Codec Type | Codec ID Prefix |
---|---|
Video | "V_" |
Audio | "A_" |
Subtitle | "S_" |
Button | "B_" |
Each CodecID
MUST include a Major Codec ID
immediately following the Codec ID Prefix
.
A Major Codec ID
MAY be followed by an OPTIONAL Codec ID Suffix
to communicate a refinement
of the Major Codec ID
. If a Codec ID Suffix
is used, then the CodecID
MUST include a
forward slash ("/") as a separator between the Major Codec ID
and the Codec ID Suffix
.
The Major Codec ID
MUST be composed of only capital letters (A-Z) and numbers (0-9).
The Codec ID Suffix
MUST be composed of only capital letters (A-Z), numbers (0-9),
underscore ("_"), and forward slash ("/").¶
The following table provides examples of valid Codec IDs
and their components:¶
Codec ID Prefix | Major Codec ID | Separator | Codec ID Suffix | Codec ID |
---|---|---|---|---|
A_ | AAC | / | MPEG2/LC/SBR | A_AAC/MPEG2/LC/SBR |
V_ | MPEG4 | / | ISO/ASP | V_MPEG4/ISO/ASP |
V_ | MPEG1 | V_MPEG1 |
Each encoding supported for storage in Matroska MUST have a Codec Name
.
The Codec Name
provides a readable label for the encoding.¶
An optional description for the encoding. This value is only intended for human consumption.¶
Each encoding supported for storage in Matroska MUST have a defined Initialization.
The Initialization MUST describe the storage of data necessary to initialize the decoder,
which MUST be stored within the CodecPrivate
element. When the Initialization is updated
within a track, then that updated Initialization data MUST be written into the CodecState
element
of the first Cluster
to require it. If the encoding does not require any form of Initialization,
then none
MUST be used to define the Initialization and the CodecPrivate
element
SHOULD NOT be written and MUST be ignored.¶
Additional data that contextualizes or supplements a Block
can be stored within
the BlockAdditional
element of a BlockMore
element Section 5.1.3.5.2.1 of [RFC9559].
Each BlockAdditional
is coupled with a BlockAddID
that identifies the kind of data it contains.¶
A BlockAddID
of 1 means the data in the BlockAdditional
element are tied to the codec.
This BlockAdditional
data with a BlockAddID
of 1 MAY be passed to the associated decoder alongside the Block
content .¶
A codec definition MUST contain a "Codec BlockAdditions" section if the codec can use BlockAdditional
data with a BlockAddID
of 1.¶
The BlockAddID
values are defined in Section 3.7.¶
Documentation of the associated normative and informative references for the codec is RECOMMENDED.¶
When a Superseded By
is set, the specified CodecID
value MUST be used instead of the CodecID
it's defined for.¶
Files MAY exist with the superseded CodecID
and MAY be supported by Matroska Players.¶
Creators of new Codec Mappings
to be used in the context of Matroska:¶
SHOULD assume that all Codec Mappings
they create might become standardized, public,
commonly deployed, or usable across multiple implementations.¶
SHOULD employ meaningful values for CodecID
and Codec Name
that they have reason
to believe are currently unused.¶
SHOULD NOT prefix their CodecID
with "X_" or similar constructs.¶
All codecs described in this section MUST have a TrackType
(Section 5.1.4.1.3 of [RFC9559]) value of "1" for video.
The track using these codecs MUST contain a Video
element -- EBML Path \Segment\Tracks\TrackEntry\Video
.¶
Most video codec contain meta information about the data they contain, like encoded width and height, chroma subsampling, etc.
Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with
the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Video
and \Segment\Tracks\TrackEntry
elements.
These values MUST be valid for the whole Segment.¶
Codec ID: V_AV1
¶
Codec Name: Alliance for Open Media AV1 Video codec¶
Description: Only one Sequence Header OBU
, as defined in section 6.4 of [AV1], is supported per Matroska Segment.
Each Block
contains one Temporal Unit
containing one or more OBUs. Each OBU stored in the Block MUST contain its header and its payload.
The OBUs in the Block
follow the Low Overhead Bitstream Format syntax
.
They MUST have the obu_has_size_field
set to 1 except for the last OBU in the frame, for which obu_has_size_field
MAY be set to 0, in which case it is assumed to fill the remainder of the frame.
A SimpleBlock
MUST NOT be marked as a Keyframe if it doesn't contain a Frame OBU
.
A SimpleBlock
MUST NOT be marked as a Keyframe if the first Frame OBU
doesn't have a frame_type
of KEY_FRAME
.
A SimpleBlock
MUST NOT be marked as a Keyframe if it doesn't contains a Sequence Header OBU
.
A Block
inside a BlockGroup
MUST use ReferenceBlock
elements if the first Frame OBU
in the Block
has a frame_type
other than KEY_FRAME
.
A Block
inside a BlockGroup
MUST use ReferenceBlock
elements if the Block
doesn't contain a Sequence Header OBU
.
A Block
with frame_header_obu
where the frame_type
is INTRA_ONLY_FRAME
MUST use a ReferenceBlock
with a value of 0 to reference itself.¶
Initialization: The CodecPrivate
consists of the AV1CodecConfigurationRecord
described in section 2.3 of [AV1-ISOBMFF].¶
PixelWidth: MUST be max_frame_width_minus_1
+1 of the Sequence Header OBU
.¶
PixelHeight: MUST be max_frame_height_minus_1
+1 of the Sequence Header OBU
.¶
Codec ID: V_AVS2¶
Codec Name: AVS2-P2/IEEE.1857.4¶
Description: Individual pictures of AVS2-P2 stored as described in the second part of [IEEE.1857-4].¶
Initialization: none.¶
Codec ID: V_AVS3¶
Codec Name: AVS3-P2/IEEE.1857.10¶
Description: Individual pictures of AVS3-P2 stored as described in the second part of [IEEE.1857-10].¶
Initialization: none.¶
Codec ID: V_CAVS¶
Codec Name: AVS1-P2, JiZhun profile¶
Description: Individual pictures of AVS1-P2 stored as described in [IEEE.1857-3].¶
Codec ID: V_DIRAC¶
Codec Name: BBC Dirac¶
Description: A video codec developed by the BBC [Dirac]. The Intra-only version of Dirac, also known as Dirac Pro, resulted in SMPTE VC-2 [SMPTE.ST2042-1]. Each Matroska frame corresponds to a Sequence as defined in [Dirac].¶
Codec ID: V_FFV1¶
Codec Name: FF Video Codec 1¶
Description: FFV1 is a lossless intra-frame video encoding format designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. [RFC9043]¶
Initialization: For FFV1 versions 0 or 1, CodecPrivate
SHOULD NOT be written.
For FFV1 version 3 or greater, the CodecPrivate
MUST contain the FFV1 Configuration Record structure, as defined in Section 4.3 of [RFC9043], and no other data.¶
Codec ID: V_MJPEG¶
Codec Name: Motion JPEG¶
Description: Motion JPEG is a video compression format in which each video frame or interlaced field is compressed separately as a [JPEG] image.¶
Codec ID: V_MPEGH/ISO/HEVC¶
Codec Name: HEVC/H.265¶
Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of HEVC/H.265 stored as described in [ISO.14496-15].¶
Initialization: The CodecPrivate
contains a HEVCDecoderConfigurationRecord
structure, as defined in [ISO.14496-15].¶
Codec ID: V_MPEGI/ISO/VVC¶
Codec Name: VVC/H.266¶
Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of VVC/H.266 stored as described in [ISO.14496-15].¶
Initialization: The CodecPrivate
contains a VVCDecoderConfigurationRecord
structure, as defined in [ISO.14496-15].¶
Codec ID: V_MPEG1¶
Codec Name: MPEG 1¶
Description: Frames correspond to a Video Sequence as defined in [ISO.11172-2].¶
Initialization: none¶
Codec ID: V_MPEG2¶
Codec Name: MPEG 2¶
Description: Frames correspond to a Video Sequence as defined in [ISO.13818-2].¶
Initialization: none¶
Codec ID: V_MPEG4/ISO/AVC¶
Codec Name: AVC/H.264¶
Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of AVC/H.264 stored as described in [ISO.14496-15].¶
Initialization: The CodecPrivate
contains a AVCDecoderConfigurationRecord
structure, as defined in [ISO.14496-15].
For legacy reasons, because Block Additional Mappings
are preferred; see Section 3.7,
the AVCDecoderConfigurationRecord
structure MAY be followed by an extension block beginning
with a 4-byte extension block size field in big-endian byte order which is the size of the extension block
minus 4 (excluding the size of the extension block size field) and a 4-byte field corresponding
to a BlockAddIDType
of "mvcC" followed by a content corresponding to the content of BlockAddIDExtraData
for mvcC
; see Section 3.7.9.¶
Codec ID: V_MPEG4/ISO/AP¶
Codec Name: MPEG4 ISO Advanced Profile¶
Description: Frames correspond to frames defined in [ISO.14496-2].
Stream was created via improved codec API (UCI) or transmuxed from MP4, not simply transmuxed from AVI.
Note there are differences how b-frames are handled in these original streams,
when being compared to a VfW created stream, as here there are no
dummy frames inserted,
same as in MP4 streams.¶
Initialization: none¶
Codec ID: V_MPEG4/ISO/ASP¶
Codec Name: MPEG4 ISO Advanced Simple Profile (DivX5, XviD)¶
Description: Frames correspond to frames defined in [ISO.14496-2].
Stream was created via improved codec API (UCI) or transmuxed from MP4, not simply transmuxed from AVI.
Note there are differences how b-frames are handled in these original streams,
when being compared to a VfW created stream, as here there are no
dummy frames inserted,
same as in MP4 streams.¶
Initialization: none¶
Codec ID: V_MPEG4/ISO/SP¶
Codec Name: MPEG4 ISO Simple Profile (DivX4)¶
Description: Frames correspond to frames defined in [ISO.14496-2]. Stream was created via improved codec API (UCI) or even transmuxed from AVI (no b-frames in Simple Profile).¶
Initialization: none¶
Codec ID: V_MPEG4/MS/V3¶
Codec Name: Microsoft MPEG4 V3¶
Description: Microsoft MPEG4 V3 and derivates, means DivX3, Angelpotion, SMR, etc.; stream was created using VfW codec or transmuxed from AVI; note that V1/V2 are covered in VfW compatibility mode.¶
Initialization: none¶
Codec ID: V_MS/VFW/FOURCC
¶
Codec Name: Microsoft Video Codec Manager (VCM)¶
Description: The CodecPrivate
contains the VCM structure BITMAPINFOHEADER including
the extra private bytes, as defined in [BITMAPINFOHEADER].
The data are stored in little-endian format (like on IA32 machines). Where is the Huffman table stored
in HuffYUV, not AVISTREAMINFO ??? And the FourCC, not in AVISTREAMINFO.fccHandler ???¶
Initialization: CodecPrivate
contains the VCM structure BITMAPINFOHEADER including the extra private bytes,
as defined by Microsoft in [BITMAPINFOHEADER].¶
Codec ID: V_QUICKTIME¶
Codec Name: Video taken from QuickTime files¶
Description: Several codecs as stored in QuickTime (e.g., Sorenson or Cinepak).¶
Initialization: The CodecPrivate
contains all additional data that is stored in the 'stsd' (sample description) atom
in the QuickTime file after the mandatory video descriptor structure
(starting with the size and FourCC fields). For an explanation of the QuickTime file format read [QTFF].¶
Codec ID: V_PRORES¶
Codec Name: Apple ProRes¶
Initialization: The CodecPrivate
contains the FourCC as found in MP4 movies:¶
ap4x: ProRes 4444 XQ¶
ap4h: ProRes 4444¶
apch: ProRes 422 High Quality¶
apcn: ProRes 422 Standard Definition¶
apcs: ProRes 422 LT¶
apco: ProRes 422 Proxy¶
aprh: ProRes RAW High Quality¶
aprn: ProRes RAW Standard Definition¶
ProRes is defined as [SMPTE.RDD36].¶
Codec ID: V_REAL/RV10¶
Codec Name: RealVideo 1.0 aka RealVideo 5¶
Description: Individual slices from the Real container are combined into a single frame.¶
Initialization: The CodecPrivate
contains a real_video_props_t
structure in big-endian byte order as found in [librmff].¶
Codec ID: V_REAL/RV20¶
Codec Name: RealVideo G2 and RealVideo G2+SVT¶
Description: Individual slices from the Real container are combined into a single frame.¶
Initialization: The CodecPrivate
contains a real_video_props_t
structure in big-endian byte order as found in [librmff].¶
Codec ID: V_REAL/RV30¶
Codec Name: RealVideo 8¶
Description: Individual slices from the Real container are combined into a single frame.¶
Initialization: The CodecPrivate
contains a real_video_props_t
structure in big-endian byte order as found in [librmff].¶
Codec ID: V_REAL/RV40¶
Codec Name: rv40 : RealVideo 9¶
Description: Individual slices from the Real container are combined into a single frame.¶
Initialization: The CodecPrivate
contains a real_video_props_t
structure in big-endian byte order as found in [librmff].¶
Codec ID: V_THEORA¶
Codec Name: Theora¶
Description: Frames correspond to a Theora Frame as defined in [Theora].¶
Initialization: The CodecPrivate
contains the first three Theora packets in order. The lengths of the packets precedes them. The actual layout is:¶
Byte 1: number of distinct packets #p
minus one inside the CodecPrivate
block. This MUST be "2" for current (as of 2016-07-08) Theora headers.¶
Bytes 2..n: lengths of the first #p
packets, coded in Xiph-style lacing. The length of the last packet is the length of the CodecPrivate
block minus the lengths coded in these bytes minus one.¶
Bytes n+1..: The Theora identification header, followed by the commend header followed by the codec setup header. Those are described in the [Theora].¶
Codec ID: V_UNCOMPRESSED¶
Codec Name: Video, raw uncompressed video frames¶
Description: All details about the used color specs and bit depth are to be put/read from the TrackEntry\Video\UncompressedFourCC
elements.¶
Initialization: none¶
Codec ID: V_VP8¶
Codec Name: VP8 Codec format¶
Description: VP8 is an open and royalty free video compression format developed by Google and created by On2 Technologies as a successor to VP7. [RFC6386]¶
Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions
. The BlockAddID
of the BlockMore
containing these data MUST be 1.¶
Initialization: none¶
Codec ID: V_VP9¶
Codec Name: VP9 Codec format¶
Description: VP9 is an open and royalty free video compression format developed by Google as a successor to VP8. [VP9]¶
Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions
. The BlockAddID
of the BlockMore
containing these data MUST be 1.¶
Initialization: The CodecPrivate
SHOULD contain a list of specific VP9 codec features as described in the "VP9 Codec Feature Metadata" section of [WebMContainer].
This piece of data helps to select a decoder on playback, but as many muxers don't provide the CodecPrivate
for "V_VP9" it's not a hard requirement.
It is possible for the decoder to reconstruct the "VP9 Codec Feature Metadata" from the first frame in case the CodecPrivate
is not present.¶
Note that the format differs from the VPCodecConfigurationRecord
structure, as defined in [VP-ISOBMFF].¶
All codecs described in this section MUST have a TrackType
(Section 5.1.4.1.3 of [RFC9559]) value of "2" for audio.
The track using these codecs MUST contain an Audio
element -- EBML Path \Segment\Tracks\TrackEntry\Audio
.¶
Most audio codec contain meta information about the data they contain, like encoded sampling frequency, channel count, etc.
Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with
the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Audio
and \Segment\Tracks\TrackEntry
elements.
These values MUST be valid for the whole Segment.¶
Codec ID: A_AAC/MPEG2/LC¶
Codec Name: Low Complexity¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG2/LC/SBR¶
Codec Name: Low Complexity with Spectral Band Replication¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG2/MAIN¶
Codec Name: MPEG2 Main Profile¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG2/SSR¶
Codec Name: Scalable Sampling Rate¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG4/LC¶
Codec Name: Low Complexity¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG4/LC/SBR¶
Codec Name: Low Complexity with Spectral Band Replication¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG4/LTP¶
Codec Name: Long Term Prediction¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG4/MAIN¶
Codec Name: MPEG4 Main Profile¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AAC/MPEG4/SSR¶
Codec Name: Scalable Sampling Rate¶
Description: Channel number and sample rate have to be read from the corresponding audio element. Audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied.¶
Initialization: none¶
Codec ID: A_AC3¶
Codec Name: Dolby Digital / AC-3¶
Description: Individual frames of AC-3 syncframe()
stored as described in [ATSC.A52] or [ETSI.TS102-366] when the value of the bsid
field defined in Section 5.4.2.1 of [ATSC.A52] or Section 4.4.2.1 of [ETSI.TS102-366] is 10 or below.
Channel number have to be read from the corresponding audio element¶
Codec ID: A_AC3/BSID9¶
Codec Name: Dolby Digital / AC-3¶
Description: Individual frames of AC-3 syncframe()
stored as described in [ATSC.A52] or [ETSI.TS102-366] when the value of the bsid
field defined in Section 5.4.2.1 of [ATSC.A52] or Section 4.4.2.1 of [ETSI.TS102-366] is 9.
Note that the value 9 in the bsid
field is not standard but it is defacto used for dividing the sampling rate defined in Section 5.4.1.3 of [ATSC.A52] or Section 4.4.2.1 of [ETSI.TS102-366] by 2.¶
Using this Codec ID is NOT RECOMMENDED as many Matroska Players don't support it. The generic A_AC3
Codec ID should be used instead as it supports a bsid
of 9 as well.¶
Initialization: none¶
Codec ID: A_AC3/BSID10¶
Codec Name: Dolby Digital / AC-3¶
Description: Individual frames of AC-3 syncframe()
stored as described in [ATSC.A52] or [ETSI.TS102-366] when the value of the bsid
field defined in Section 5.4.2.1 of [ATSC.A52] or Section 4.4.2.1 of [ETSI.TS102-366] is 10.
Note that the value 10 in the bsid
field is not standard but it is defacto used for dividing the sampling rate defined in Section 5.4.1.3 of [ATSC.A52] or Section 4.4.2.1 of [ETSI.TS102-366] by 4.¶
Using this Codec ID is NOT RECOMMENDED as many Matroska Players don't support it. The generic A_AC3
Codec ID should be used instead as it supports a bsid
of 10 as well.¶
Initialization: none¶
Codec ID: A_ALAC¶
Codec Name: ALAC (Apple Lossless Audio Codec)¶
Initialization: The CodecPrivate
contains ALAC's magic cookie (both the codec specific configuration as well as the optional channel layout information).
Its format is described in the "Magic Cookie" defined in [ALAC].¶
Codec ID: A_ATRAC/AT1¶
Codec Name: Sony ATRAC1 Codec¶
Description: The original ATRAC codec by Sony, mainly used in MiniDisc platforms. The core technical details on ATRAC1 can be found in [AtracAES]. An example encoder/decoder can be found at [atracdenc].¶
Initialization: None¶
Codec ID: A_DTS¶
Codec Name: Digital Theatre System¶
Description: Supports DTS, DTS-ES, DTS-96/26, DTS-HD High Resolution Audio and DTS-HD Master Audio. It corresponds to the base codec defined in [ETSI.TS102-114].¶
Initialization: none¶
Codec ID: A_DTS/EXPRESS¶
Codec Name: Digital Theatre System Express¶
Description: DTS Express (a.k.a. LBR) audio streams. It corresponds to the LBR extension of the DTS codec defined in section 9 of [ETSI.TS102-114].¶
Initialization: none¶
Codec ID: A_DTS/LOSSLESS¶
Codec Name: Digital Theatre System Lossless¶
Description: DTS Lossless audio that does not have a core substream. It corresponds to the Lossless extension (XLL) of the DTS codec defined in section 8 of [ETSI.TS102-114].¶
Initialization: none¶
Codec ID: A_EAC3¶
Codec Name: Dolby Digital Plus / E-AC-3¶
Description: Individual frames of E-AC-3 syncframe()
stored as described in [ATSC.A52] or [ETSI.TS102-366] when the value of the bsid
field defined in Annex E Section 2.1 of [ATSC.A52] or Section E.1.3.1.6 of [ETSI.TS102-366] is 11, 12, 13, 14, 15 or 16.¶
Codec ID: A_FLAC¶
Codec Name: FLAC (Free Lossless Audio Codec)¶
Initialization: The CodecPrivate
contains all the header/metadata packets before the first data packet as defined in [I-D.ietf-cellar-flac].
These include the first header packet containing only the word fLaC
as well as all metadata packets.¶
Codec ID: A_MLP¶
Codec Name: Meridian Lossless Packing / MLP¶
Description: A lossless audio codec used in DVD-Audio discs. The format is similar to Dolby TrueHD (Section 3.4.39) but with less channels.¶
Codec ID: A_MPC¶
Codec Name: MPC (musepack) SV8¶
Description: The main developer for musepack has requested that we wait until the SV8 framing has been fully defined for musepack before defining how to store it in Matroska.¶
Codec ID: A_MPEG/L1¶
Codec Name: MPEG Audio 1, 2 Layer I¶
Description: Frames correspond to Audio Frames of a Layer I bitstream as defined in [ISO.11172-3].¶
Initialization: none¶
Codec ID: A_MPEG/L2¶
Codec Name: MPEG Audio 1, 2 Layer II¶
Description: Frames correspond to Audio Frames of a Layer II bitstream as defined in [ISO.11172-3].¶
Initialization: none¶
Codec ID: A_MPEG/L3¶
Codec Name: MPEG Audio 1, 2, 2.5 Layer III¶
Description: Frames correspond to Audio Frames of a Layer III bitstream as defined in [ISO.11172-3].¶
Initialization: none¶
Codec ID: A_MS/ACM¶
Codec Name: Microsoft Audio Codec Manager (ACM)¶
Description: The data are stored in little-endian format (like on IA32 machines).¶
Initialization: The CodecPrivate
contains the [WAVEFORMATEX] structure including the extra format information bytes.
The structure is stored without packing or padding bytes.
A WORD
corresponds to a signed 2 octets integer, DWORD
corresponds to a signed 4 octets integer.
The extra format information are appended after the WAVEFORMATEX octets.¶
Codec ID: A_REAL/14_4¶
Codec Name: Real Audio 1¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_REAL/28_8¶
Codec Name: Real Audio 2¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_REAL/ATRC¶
Codec Name: Sony Atrac3 Codec¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_REAL/COOK¶
Codec Name: Real Audio Cook Codec (codename: Gecko)¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_REAL/RALF¶
Codec Name: Real Audio Lossless Format¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_REAL/SIPR¶
Codec Name: Sipro Voice Codec¶
Initialization: The CodecPrivate
contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure
(differentiated by their "version" field; big-endian byte order) as found in [librmff].¶
Codec ID: A_OPUS¶
Codec Name: Opus interactive speech and audio codec¶
Description: The OPUS audio codec defined by [RFC6716] using a similar encapsulation as the Ogg Encapsulation [RFC7845].¶
Initialization: The track CodecPrivate
MUST be present and contain the Identification Header
defined in Section 5.1 of [RFC7845].¶
Channels: The track Channels
element value MUST be the "Output Channel Count" value of the Identification Header
.¶
SamplingFrequency: The track SamplingFrequency
element value MUST be the "Input Sample Rate" value of the Identification Header
.¶
CodecDelay: The track CodecDelay
element MUST be present and set to the "Pre-skip" value of the Identification Header
translated to Matroska Ticks.
The "Pre-skip" value is in samples at 48,000 Hz. The formula to get the CodecDelay
is:¶
CodecDelay = pre-skip * 1,000,000,000 / 48,000.¶
SeekPreRoll: The track SeekPreRoll
element SHOULD be present and set to 80,000,000 -- 80 ms in Matroska Ticks --
in order to ensure that the output audio is correct by the time it reaches the seek target.¶
Codec ID: A_PCM/FLOAT/IEEE¶
Codec Name: Floating-Point, IEEE compatible¶
Description: The audio bit depth MUST be read and set from the BitDepth
element (32 bits in most cases).
The floats are stored as defined in [IEEE.754] and in little-endian order.¶
Initialization: none¶
Codec ID: A_PCM/INT/BIG¶
Codec Name: PCM Integer Big Endian¶
Description: The audio bit depth MUST be read and set from the BitDepth
element. Audio samples MUST be considered as signed values,
except if the audio bit depth is 8 which MUST be interpreted as unsigned values.¶
Initialization: none¶
Codec ID: A_PCM/INT/LIT¶
Codec Name: PCM Integer Little Endian¶
Description: The audio bit depth MUST be read and set from the BitDepth
element. Audio samples MUST be considered as signed values,
except if the audio bit depth is 8 which MUST be interpreted as unsigned values.¶
Initialization: none¶
Codec ID: A_QUICKTIME¶
Codec Name: Audio taken from QuickTime files¶
Description: Several codecs as stored in QuickTime (e.g., QDesign Music v1 or v2).¶
Initialization: The CodecPrivate
contains all additional data that is stored in the 'stsd' (sample description) atom
in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields).
For an explanation of the QuickTime file format read [QTFF].¶
Codec ID: A_QUICKTIME/QDMC¶
Codec Name: QDesign Music¶
Description:¶
Initialization: The CodecPrivate
contains all additional data that is stored in the 'stsd' (sample description) atom
in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields).
For an explanation of the QuickTime file format read [QTFF].¶
Superseded By: A_QUICKTIME
(Section 3.4.36)¶
Codec ID: A_QUICKTIME/QDM2¶
Codec Name: QDesign Music v2¶
Description:¶
Initialization: The CodecPrivate
contains all additional data that is stored in the 'stsd' (sample description) atom
in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields).
For an explanation of the QuickTime file format read [QTFF].¶
Superseded By: A_QUICKTIME
(Section 3.4.36)¶
Codec ID: A_TRUEHD¶
Codec Name: Dolby TrueHD¶
Description: Lossless audio codec from Dolby. Each Matroska frame corresponds to a single Access Unit as defined in [TRUEHD].¶
Codec ID: A_TTA1¶
Codec Name: The True Audio lossless audio compressor¶
Description: The format is described in [TTA].
Each frame is kept intact, including the CRC32. The header and seektable are dropped. SamplingFrequency
, Channels
and BitDepth
are used in the TrackEntry
.¶
Initialization: The CodecPrivate
contains the TTA Header Structure, as defined in [TTA].¶
Codec ID: A_VORBIS¶
Codec Name: Vorbis¶
Initialization: The CodecPrivate
contains the first three Vorbis packet in order. The lengths of the packets precedes them. The actual layout is:¶
Byte 1: number of distinct packets #p
minus one inside the CodecPrivate
block.
This MUST be "2" for current (as of 2016-07-08) Vorbis headers.¶
Bytes 2..n: lengths of the first #p
packets, coded in Xiph-style lacing.
The length of the last packet is the length of the CodecPrivate
block minus the lengths coded in these bytes minus one.¶
Bytes n+1..: The "Identification Header" as defined in Section 4.2.2 of [VORBIS], followed by the "Comment Header" as defined in Section 5 of [VORBIS], followed by the "Setup Header" as defined in Section 4.2.4 of [VORBIS].¶
Codec ID: A_WAVPACK4¶
Codec Name: WavPack lossless audio compressor¶
Description: The WavPack packets consist of a block defined in [WAVPACK] with a WavpackHeader
header.
For multichannel (> 2 channels) a frame consists of many packets. For more details, check the WavPack muxing description Section 4.1.¶
Codec BlockAdditions: For hybrid A_WAVPACK4
encodings (that include a lossy encoding with a supplemental correction
to produce a lossless encoding), the correction part is stored in BlockAdditional
.
The BlockAddID
of the BlockMore
containing these data MUST be 1.¶
Initialization: The CodecPrivate
contains the version
16-bit integer from the WavpackHeader
of [WAVPACK] stored in little-endian.¶
All codecs described in this section MUST have a TrackType
(Section 5.1.4.1.3 of [RFC9559]) value of "17" for subtitles.¶
Subtitle codec often contain meta information about the data they contain, like expected output dimension, language, etc.
Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with
the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Video
and \Segment\Tracks\TrackEntry
elements.
These values MUST be valid for the whole Segment.¶
Codec ID: S_ARIBSUB¶
Codec Name: ARIB STD-B24 subtitles¶
Description: This is the textual subtitle format used in the ISDB/ARIB broadcasting standard. For more information see Section 5.8 on ARIB (ISDB) subtitles.¶
Codec ID: S_DVBSUB¶
Codec Name: Digital Video Broadcasting (DVB) subtitles¶
Description: This is the graphical subtitle format used in the Digital Video Broadcasting standard. For more information see Section 5.7 on Digital Video Broadcasting (DVB).¶
Codec ID: S_HDMV/PGS¶
Codec Name: HDMV presentation graphics subtitles (PGS)¶
Description: This is the graphical subtitle format used on Blu-rays. For more information, see Section 5.6 on HDMV text presentation.¶
Codec ID: S_HDMV/TEXTST¶
Codec Name: HDMV text subtitles¶
Description: This is the textual subtitle format used on Blu-rays. For more information, see Section 5.5 on HDMV graphics presentation.¶
Codec ID: S_KATE¶
Codec Name: Karaoke And Text Encapsulation¶
Description: A subtitle format developed for ogg. The mapping for Matroska is described
on the "Matroska mapping" section of [OggKate].
Kate headers are stored in the CodecPrivate
as xiph-laced packets.
The length of the last packet isn't encoded, it is deduced from the sizes of the other packets and the total size of the CodecPrivate
.¶
Codec ID: S_IMAGE/BMP¶
Codec Name: Bitmap¶
Description: Basic image based subtitle format; The subtitles are stored as images, like in the DVD [DVD-Video].
The timestamp in the block header of Matroska indicates the start display time,
the duration is set with the BlockDuration
element. The full data for the subtitle bitmap
is stored in the Block's data section.¶
Codec ID: S_TEXT/ASS¶
Codec Name: Advanced SubStation Alpha Format¶
Description: Each event is stored in its own Block
.
For more information see Section 5.3 on SSA/ASS.¶
This codec ID MUST be used when "ScriptType: v4.00+" or "[V4+ Styles]" sections are found in the original SSA script.¶
The codec MAY also be found with the Codec ID S_ASS
, but using that value is NOT RECOMMENDED.¶
Initialization: The "[Script Info]" and "[V4 Styles]" sections are stored in the CodecPrivate
.¶
Codec ID: S_TEXT/ASCII¶
Codec Name: ASCII Plain Text¶
Description: Basic text subtitles with only ASCII characters allowed.¶
Codec ID: S_TEXT/SSA¶
Codec Name: SubStation Alpha Format¶
Description: Each event is stored in its own Block
.
For more information see Section 5.3 on SSA/ASS.¶
This codec ID MUST NOT be used when "ScriptType: v4.00+" or "[V4+ Styles]" sections are found in the original SSA script.¶
The codec MAY also be found with the Codec ID S_SSA
, but using that value is NOT RECOMMENDED.¶
Initialization: The "[Script Info]" and "[V4+ Styles]" sections are stored in the CodecPrivate
.¶
Codec ID: S_TEXT/USF¶
Codec Name: Universal Subtitle Format¶
Description: An XML based subtitle format.
Each BlockGroup
contains XML data from a "subtitle" XML element as defined in section 3.4 of [USF],
without the "subtitle" element itself and with the start, stop duration mapped to the BlockGroup
timestamp and BlockDuration
element.
The "image" XML elements are turned into Matroska attachments and replaced in the stream with their attachment filename.¶
Initialization: The CodecPrivate
element MAY be present.
If present it MAY contains "metadata", "styles" and "effects" XML elements usable in the whole stream inside a parent "USFSubtitles" XML parent element,
similar to the "USFSubtitles" element of a standalone USF file but without the "subtitles" XML element.¶
Codec ID: S_TEXT/UTF8¶
Codec Name: UTF-8 Plain Text¶
Description: Basic text subtitles. For more information see Section 5 on Subtitles.¶
Codec ID: S_TEXT/WEBVTT¶
Codec Name: Web Video Text Tracks Format (WebVTT)¶
Description: Advanced text subtitles. For more information see Section 5.4 on WebVTT.¶
Codec ID: S_VOBSUB¶
Codec Name: VobSub subtitles¶
Description: The same subtitle format used on DVDs [DVD-Video]. Supported is only format version 7 and newer.
VobSubs consist of two files, the .idx containing information, and the .sub, containing the actual data.
The .idx file is stripped of all empty lines, of all comments and of lines beginning with alt:
or langidx:
.
The line beginning with id:
SHOULD be transformed into the appropriate Matroska track language element
and is discarded. All remaining lines but the ones containing timestamps and file positions
are put into the CodecPrivate
element.¶
For each line containing the timestamp and file position data is read from the appropriate position in the .sub file. This data consists of a MPEG program stream which in turn contains SPU packets. The MPEG program stream data is discarded, and each SPU packet is put into one Matroska frame.¶
Registered BlockAddIDType
are:¶
Block type identifier: 0¶
Block type name: Use BlockAddIDValue¶
Description: This value indicates that the actual type is stored in BlockAddIDValue
instead.
This value is expected to be used when it is important to have a strong compatibility
with players or derived formats not supporting BlockAdditionMapping
but using BlockAdditions
with an unknown BlockAddIDValue
, and SHOULD NOT be used if it is possible to use another value.¶
Block type identifier: 1¶
Block type name: Opaque data¶
Description: the BlockAdditional
data is interpreted as opaque additional data passed to the codec
with the Block data.
The usage of these BlockAdditional
data is defined in the "Codec BlockAdditions" section of the codec; see Section 3.1.5.¶
Block type identifier: 4¶
Block type name: ITU T.35 metadata¶
Description: the BlockAdditional
data is interpreted as ITU T.35 metadata, as defined by [ITU-T.35]
terminal codes. BlockAddIDValue
MUST be 4.¶
Block type identifier: 0x61766345¶
Block type name: Dolby Vision enhancement-layer AVC configuration¶
Description: the BlockAddIDExtraData
data is interpreted as the Dolby Vision enhancement-layer AVC
configuration box as described in [DolbyVision-ISOBMFF]. This extension MUST NOT
be used if CodecID
is not V_MPEG4/ISO/AVC
.¶
Block type identifier: 0x68766345¶
Block type name: Dolby Vision enhancement-layer HEVC configuration¶
Description: the BlockAddIDExtraData
data is interpreted as the Dolby Vision enhancement-layer HEVC configuration as described in [DolbyVision-ISOBMFF].
This extension MUST NOT be used if CodecID
is not V_MPEGH/ISO/HEVC
.¶
Block type identifier: 0x64766343¶
Block type name: Dolby Vision configuration dvcC¶
Description: the BlockAddIDExtraData
data is interpreted as DOVIDecoderConfigurationRecord
structure, as defined in [DolbyVision-ISOBMFF],
for Dolby Vision profiles 0 to 7 included.¶
Block type identifier: 0x64767643¶
Block type name: Dolby Vision configuration dvvC¶
Description: the BlockAddIDExtraData
data is interpreted as DOVIDecoderConfigurationRecord
structure, as defined in [DolbyVision-ISOBMFF],
for Dolby Vision profiles 8 to 10 included and 20.¶
WavPack is an audio codec primarily designed for lossless audio, but it can also be used as a lossy codec.¶
[WAVPACK] stores each data in variable length frames. That means each frame can have a different number of samples.¶
Each WavPack block starts with a WavpackHeader
header as defined in [WAVPACK], stored in little-endian.¶
To save space and avoid redundant information in Matroska some data from the WavpackHeader
header are removed, when saved in Matroska.
All the data from the WavpackHeader
are kept in little-endian.¶
The CodecPrivate
contains the version
16-bit integer from the WavpackHeader
of [WAVPACK] stored in little-endian.¶
Depending on the number of audio channels and whether the hybrid mode is kept or not, the storage of WavPack blocks in Matroska differ.¶
For multichannel files (more than 2 channels, like for 5.1), a frame consists of multiple WavPack blocks.
The first one having the INITIAL_BLOCK
(bit 11) flag set and the last one the FINAL_BLOCK
(bit 12) flag set.
For a mono or stereo file, both flags are set in each WavPack block.¶
A Block
or SimpleBlock
frame contains the following header with the some fields taken from the WavpackHeader
of a single WavPack block followed by the data of that WavPack block.¶
{ uint32_t block_samples; // # samples in this block uint32_t flags; // various flags for id and decoding uint32_t crc; // crc for actual decoded data } [ block data ]¶
For multichannel files, a WavPack file uses multiple WavPack block to store all channels of a frame.
The WavPack blocks for each channels of a frame are stored consecutively into a Matroska Block
or SimpleBlock
.¶
Each WavPack block is preceded by a header.
The header for the first WavPack block is similar to the mono/stereo one (Section 4.1.1.1)
with the addition of a "blocksize" field, which is the size of the first WavPack block minus the WavpackHeader
size.
The header for the following WavPack blocks use the "flags" and "crc" of the WavpackHeader
of each respective WavPack block,
followed with the size of each respective WavPack block minus the WavpackHeader
size.¶
{ uint32_t block_samples; // # samples in this block uint32_t flags; // various flags for id and decoding uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ block data # 1 ] { uint32_t flags; // various flags for id and decoding uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ block data # 2 ] { uint32_t flags; // various flags for id and decoding uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ block data # 3 ] ...¶
WavPack has a hybrid mode that splits the audio frames between lossy and correction packets. Adding both gives a lossless version of the original audio. It is possible to only store the lossy part in Matroska or both together. Storing only the lossy part is equivalent to the format described in Section 4.1.1. This section explains how to store all hybrid data in Matroska.¶
Hybrid WavPack is encoded in 2 files. The first one has a lossy part and the second file has the correction part to reconstruct the original audio losslessly.¶
Each WavPack block in the correction file corresponds to a WavPack block in the lossy file with the same number of samples, that's also true for a multichannel file. This means that if a frame is made of 4 WavPack blocks, the correction file will have 4 WavPack blocks in the corresponding frame. The header of the correction WavPack block is exactly the same as in the lossy WavPack block, except for the CRC.¶
In Matroska, the correction part is stored as an additional data available to the Block
(see Section 6).
This way a file could be remuxed and not keep the Block Additional data and still be usable as a lossy WavPack file.
The Block
data of the lossy file are stored exactly the same as for lossy storage defined in Section 4.1.1.¶
A BlockAdditionMapping
MUST be used for hybrid WavPack TrackEntry
'.¶
The BlockAddIDType
of that BlockAdditionMapping
MUST be set to 1 for hybrid WavPack, corresponding to Opaque data; see Section 3.7.2.¶
Each WavPack frame is stored in a BlockGroup
that MUST have at least a BlockMore
to hold the correction data.¶
The BlockAddID
of that BlockMore
MUST be 1, i.e., the default value.¶
The BlockAdditional
element of the correction data BlockMore
contains the following header with the "crc" field from the WavpackHeader
of the WavPack block of the correction file
matching the WavPack block of the lossy frame used to fill the Block
data, followed by the data of that correction file WavPack block.¶
{ uint32_t crc; // crc for actual decoded data } [ correction block data ]¶
The BlockAdditional
element of the correction data BlockMore
contains the following header with the data from the each WavpackHeader
of the WavPack block of the correction file
matching the WavPack block in the lossy file used to fill the Block
data, followed by the data of the correction file WavPack block.¶
{ uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ correction block data # 1 ] { uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ correction block data # 2 ] { uint32_t crc; // crc for actual decoded data uint32_t blocksize; // size of the data to follow } [ correction block data # 3 ] ...¶
Here is a list of pointers for storing subtitles in Matroska:¶
As a general rule of thumb for all codecs, information that is global to an entire stream
SHOULD be stored in the CodecPrivate
element.¶
As subtitles usually come with a start and stop timestamps or a start timestamp and a duration,
SimpleBlock
is usually not used as it doesn't allow storing the BlockDuration
.¶
Start and stop timestamps that are used in a timestamps original storage format SHOULD
be removed when being placed in Matroska as they could interfere if the file is edited
afterwards. Instead, the Block
's timestamp and BlockDuration
SHOULD be used to say when the timestamp is displayed.¶
Because a "subtitle" stream is actually just an overlay stream, anything with a transparency layer could be use, including video.¶
The first image format that is a goal to import into Matroska is the VobSub subtitle format. This subtitle type is generated by exporting the subtitles from a DVD [DVD-Video].¶
The requirement for muxing VobSub into Matroska is v7 subtitles (see first line of the .IDX file). If the version is smaller, you must remux them using the SubResync utility from VobSub 2.23 (or MPC) into v7 format. Generally any newly created subs will be in v7 format.¶
The .IFO file will not be used at all.¶
If there is more than one subtitle stream in the VobSub set, each stream will need to be separated into separate tracks for storage in Matroska. E.g. the VobSub file contains streams for both English and German subtitles. Then the resulting Matroska file SHOULD contain two tracks. That way the language information can be dropped and mapped to Matroska's language tags.¶
The .IDX file is reformatted (see below) and placed in the CodecPrivate
.¶
Each .BMP will be stored in its own Block. The Timestamp will be stored in the Block
timestamp
and the duration will be stored in the Default Duration.¶
Here is an example .IDX file:¶
# VobSub index file, v7 (do not modify this line!) # # To repair desynchronization, you can insert gaps this way: # (it usually happens after vob id changes) # # delay: [sign]hh:mm:ss:ms # # Where: # [sign]: +, - (optional) # hh: hours (0 <= hh) # mm/ss: minutes/seconds (0 <= mm/ss <= 59) # ms: milliseconds (0 <= ms <= 999) # # Note: You can't position a sub before the previous with a negative # value. # # You can also modify timestamps or delete a few subs you don't # like. Just make sure they stay in increasing order. # Settings # Original frame size size: 720x480 # Origin, relative to the upper-left corner, can be overloaded by # alignment org: 0, 0 # Image scaling (hor,ver), origin is at the upper-left corner or at # the alignment coord (x, y) scale: 100%, 100% # Alpha blending alpha: 100% # Smoothing for very blocky images (use OLD for no filtering) smooth: OFF # In millisecs fadein/out: 50, 50 # Force subtitle placement relative to (org.x, org.y) align: OFF at LEFT TOP # For correcting non-progressive desync. (in millisecs or # hh:mm:ss:ms) # Note: Not effective in DirectVobSub, use "delay: ... " instead. time offset: 0 # ON: displays only forced subtitles, OFF: shows everything forced subs: OFF # The original palette of the DVD palette: 000000, 7e7e7e, fbff8b, cb86f1, 7f74b8, e23f06, 0a48ea, \ b3d65a, 6b92f1, 87f087, c02081, f8d0f4, e3c411, 382201, e8840b, \ fdfdfd # Custom colors (transp idxs and the four colors) custom colors: OFF, tridx: 0000, colors: 000000, 000000, 000000, \ 000000 # Language index in use langidx: 0 # English id: en, index: 0 # Uncomment next line to activate alternative name in DirectVobSub / # Windows Media Player 6.x # alt: English # Vob/Cell ID: 1, 1 (PTS: 0) timestamp: 00:00:01:101, filepos: 000000000 timestamp: 00:00:08:708, filepos: 000001000¶
First, lines beginning with "#" are removed. These are comments to make text file editing easier, and as this is not a text file, they aren't needed.¶
Next remove the "langidx" and "id" lines. These are used to differentiate the subtitle streams and define the language. As the streams will be stored separately anyway, there is no need to differentiate them here. Also, the language setting will be stored in the Matroska tags, so there is no need to store it here.¶
Finally, the "timestamp" will be used to set the Block
's timestamp. Once it is set there,
there is no need for it to be stored here. Also, as it may interfere if the file is edited,
it SHOULD NOT be stored here.¶
Once all of these items are removed, the data to store in the CodecPrivate
SHOULD look like this:¶
size: 720x480 org: 0, 0 scale: 100%, 100% alpha: 100% smooth: OFF fadein/out: 50, 50 align: OFF at LEFT TOP time offset: 0 forced subs: OFF palette: 000000, 7e7e7e, fbff8b, cb86f1, 7f74b8, e23f06, 0a48ea, \ b3d65a, 6b92f1, 87f087, c02081, f8d0f4, e3c411, 382201, e8840b, \ fdfdfd custom colors: OFF, tridx: 0000, colors: 000000, 000000, 000000, \ 000000¶
There SHOULD also be two Blocks containing one image each with the timestamps "00:00:01:101" and "00:00:08:708".¶
SRT is perhaps the most basic of all subtitle formats.¶
It consists of four parts, all in text:¶
A number indicating which subtitle it is in the sequence.¶
The time that the subtitle appears on the screen, and then disappears.¶
The subtitle itself.¶
A blank line indicating the start of a new subtitle.¶
When placing SRT in Matroska, part 3 is converted to UTF-8 (S_TEXT/UTF8) and placed
in the data portion of the Block. Part 2 is used to set the timestamp of the Block,
and BlockDuration
element. Nothing else is used.¶
Here is an example SRT file:¶
1 00:02:17,440 --> 00:02:20,375 Senator, we're making our final approach into Coruscant. 2 00:02:20,476 --> 00:02:22,501 Very good, Lieutenant.¶
In this example, the text "Senator, we're making our final approach into Coruscant."
would be converted into UTF-8 and placed in the Block. The timestamp of the block would
be set to "00:02:17,440". And the BlockDuration
element would be set to "00:00:02,935".¶
The same is repeated for the next subtitle.¶
Because there are no general settings for SRT, the CodecPrivate
is left blank.¶
SSA stands for Sub Station Alpha. It's the file format used by the popular subtitle editor SubStation Alpha. It allows you to do some advanced display features, like positioning, karaoke, style managements...¶
For detailed information on SSA/ASS, see the SSA specs [SSA]. It includes an SSA specs description and the advanced features added by ASS format (standing for Advanced SSA). Because SSA and ASS are so similar, they are treated the same here.¶
Like SRT, this format is text based with a particular syntax.¶
A file consists of 4 or 5 parts, declared ala INI file (but it's not an INI !)¶
The first, "[Script Info]" contains some information about the subtitle file, such as it's title, who created it, type of script and a very important one: "PlayResY". Be careful of this value, everything in your script (font size, positioning) is scaled by it. Sub Station Alpha uses your desktops Y resolution to write this value, so if a friend with a large monitor and a high screen resolution gives you an edited script, you can mess everything up by saving the script in SSA with your low-cost monitor.¶
The second, "[V4 Styles]" or "[V4+ Styles]", is a list of style definitions. A style describes how a text will look on the screen. It defines font, font size, primary/.../outile colour, position, alignment, etc.¶
For example, this:¶
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \ TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \ Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\ 0,1,1,2,2,5,5,30,0,0¶
The third, "[Events]", is the list of text you want to display at the right timing. You can specify some attribute here. Like the style to use for this event (MUSTbe defined in the list), the position of the text (Left, Right, Vertical Margin), an effect. Name is mostly used by translator to know who said this sentence. Timing is in h:mm:ss.cc (centisec).¶
Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, \ Effect, Text Dialogue: Marked=0,0:02:40.65,0:02:41.79,Wolf main,Cher,0000,0000,\ 0000,,Et les enregistrements de ses ondes delta ? Dialogue: Marked=0,0:02:42.42,0:02:44.15,Wolf main,autre,0000,0000,\ 0000,,Toujours rien.¶
"[Pictures]" or "[Fonts]" part can be found in some SSA file, they contains UUE-encoded pictures/font but those features are only used by Sub Station Alpha -- i.e., no filter (Vobsub/Avery Lee Subtiler filter) use them.¶
Now, how are they stored in Matroska?¶
All text is converted to UTF-8¶
All the headers, "[Script Info]" and the "[V4 Styles]"/"[V4+ Styles]" list, are stored in CodecPrivate
.¶
Start & End field are used to set TimeStamp
and the BlockDuration
element. the data stored is:¶
Events are stored in the Block in this order: ReadOrder, Layer, Style, Name, MarginL, MarginR, MarginV, Effect, Text (Layer comes from ASS specs ... it's empty for SSA.) "ReadOrder field is needed for the decoder to be able to reorder the streamed samples as they were placed originally in the file."¶
Here is an example of an SSA file.¶
[Script Info] ; This is a Sub Station Alpha v4 script. Title: Wolf's rain 2 Original Script: Anime-spirit Ishin-francais Original Translation: Coolman Original Editing: Spikewolfwood Original Timing: Lord_alucard Original Script Checking: Spikewolfwood ScriptType: v4.00 Collisions: Normal PlayResY: 1024 PlayDepth: 0 Wav: 0, 128697,D:\Alex\Anime\- Fansub -\- TAFF -\WR_-_02_Wav.wav Wav: 0, 120692,H:\team truc\WR_-_02.wav Wav: 0, 116504,E:\sub\wolf's_rain\WOLF'S RAIN 02.wav LastWav: 3 Timer: 100,0000 [V4 Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \ TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \ Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding Style: Default,Arial,20,65535,65535,65535,-2147483640,-1,0,1,3,0,2,\ 30,30,30,0,0 Style: Titre_episode,Akbar,140,15724527,65535,65535,986895,-1,0,1,1,\ 0,3,30,30,30,0,0 Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\ 0,1,1,2,2,5,5,30,0,0 [Events] Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, \ Effect, Text Dialogue: Marked=0,0:02:40.65,0:02:41.79,Wolf main,Cher,0000,0000,\ 0000,,Et les enregistrements de ses ondes delta ? Dialogue: Marked=0,0:02:42.42,0:02:44.15,Wolf main,autre,0000,0000,\ 0000,,Toujours rien.¶
Here is what would be placed into the CodecPrivate
element.¶
[Script Info] ; This is a Sub Station Alpha v4 script. Title: Wolf's rain 2 Original Script: Anime-spirit Ishin-francais Original Translation: Coolman Original Editing: Spikewolfwood Original Timing: Lord_alucard Original Script Checking: Spikewolfwood ScriptType: v4.00 Collisions: Normal PlayResY: 1024 PlayDepth: 0 Wav: 0, 128697,D:\Alex\Anime\- Fansub -\- TAFF -\WR_-_02_Wav.wav Wav: 0, 120692,H:\team truc\WR_-_02.wav Wav: 0, 116504,E:\sub\wolf's_rain\WOLF'S RAIN 02.wav LastWav: 3 Timer: 100,0000 [V4 Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, \ TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, \ Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding Style: Default,Arial,20,65535,65535,65535,-2147483640,-1,0,1,3,0,2,\ 30,30,30,0,0 Style: Titre_episode,Akbar,140,15724527,65535,65535,986895,-1,0,1,1,\ 0,3,30,30,30,0,0 Style: Wolf main,Wolf_Rain,56,15724527,15724527,15724527,4144959,0,\ 0,1,1,2,2,5,5,30,0,0¶
And here are the two blocks that would be generated.¶
Block
's timestamp: 00:02:40.650
BlockDuration
: 00:00:01.140¶
1,,Wolf main,Cher,0000,0000,0000,,Et les enregistrements de ses \ ondes delta ?¶
Block
's timestamp: 00:02:42.420
BlockDuration
: 00:00:01.730¶
2,,Wolf main,autre,0000,0000,0000,,Toujours rien.¶
The "Web Video Text Tracks Format" (short: WebVTT) is developed by the World Wide Web Consortium (W3C). Its specifications are freely available at [WebVTT].¶
The guiding principles for the storage of WebVTT in Matroska are:¶
Consistency: store data in a similar way to other subtitle codecs¶
Simplicity: making decoding and remuxing as easy as possible for existing infrastructures¶
Completeness: keeping as much data as possible from the original WebVTT file¶
The CodecID
to use is S_TEXT/WEBVTT
.¶
This CodecPrivate
contains all global blocks before the first subtitle entry. This starts at the "WEBVTT
"
file identification marker but excludes the optional byte order mark.¶
Non-global WebVTT blocks (e.g., "NOTE") before a WebVTT Cue Text are stored in Matroska's BlockAddition element together with the Matroska Block containing the WebVTT Cue Text these blocks precede (see below for the actual format).¶
Each WebVTT Cue Text is stored directly in the Matroska Block.¶
A muxer MUST change all WebVTT Cue Timestamps present within the Cue Text to be relative
to the Matroska Block
's timestamp.¶
The Cue's start timestamp is used as the Matroska Block
's timestamp.¶
The difference between the Cue's end timestamp and its start timestamp is used as
the Matroska BlockDuration
.¶
Each Matroska Block may be accompanied by one BlockAdditions
element. Its format is as follows:¶
The first line contains the WebVTT Cue Text's optional Cue Settings List followed by one line feed character (U+0x000a). The Cue Settings List may be empty, in which case the line consists of the line feed character only.¶
The second line contains the WebVTT Cue Text's optional Cue Identifier followed by one line feed character (U+0x000a). The line may be empty indicating that there was no Cue Identifier in the source file, in which case the line consists of the line feed character only.¶
The third and all following lines contain all WebVTT Comment Blocks that precede the current WebVTT Cue Block. These may be absent.¶
If there is no Matroska BlockAddition element stored together with the Matroska Block, then all three components (Cue Settings List, Cue Identifier, Cue Comments) MUST be assumed to be absent.¶
Here's an example how a WebVTT is transformed.¶
Let's take the following example file:¶
WEBVTT with text after the signature STYLE ::cue { background-image: linear-gradient(to bottom, dimgray, lightgray); color: papayawhip; } /* Style blocks cannot use blank lines nor "dash dash greater \ than" */ NOTE comment blocks can be used between style blocks. STYLE ::cue(b) { color: peachpuff; } REGION id:bill width:40% lines:3 regionanchor:0%,100% viewportanchor:10%,90% scroll:up NOTE Notes always span a whole block and can cover multiple lines. Like this one. An empty line ends the block. hello 00:00:00.000 --> 00:00:10.000 Example entry 1: Hello <b>world</b>. NOTE style blocks cannot appear after the first cue. 00:00:25.000 --> 00:00:35.000 Example entry 2: Another entry. This one has multiple lines. 00:01:03.000 --> 00:01:06.500 position:90% align:right size:35% Example entry 3: That stuff to the right of the timestamps are cue \ settings. 00:03:10.000 --> 00:03:20.000 Example entry 4: Entries can even include timestamps. For example:<00:03:15.000>This becomes visible five seconds after the first part.¶
The resulting CodecPrivate
element will look like this:¶
WEBVTT with text after the signature STYLE ::cue { background-image: linear-gradient(to bottom, dimgray, lightgray); color: papayawhip; } /* Style blocks cannot use blank lines nor "dash dash greater \ than" */ NOTE comment blocks can be used between style blocks. STYLE ::cue(b) { color: peachpuff; } REGION id:bill width:40% lines:3 regionanchor:0%,100% viewportanchor:10%,90% scroll:up NOTE Notes always span a whole block and can cover multiple lines. Like this one. An empty line ends the block.¶
Example Cue 1: timestamp 00:00:00.000, duration 00:00:10.000, Block's content:¶
Example entry 1: Hello <b>world</b>.¶
BlockAddition's content starts with one empty line as there's no Cue Settings List:¶
hello¶
Example Cue 2: timestamp 00:00:25.000, duration 00:00:10.000, Block's content:¶
Example entry 2: Another entry. This one has multiple lines.¶
BlockAddition's content starts with two empty lines as there's neither a Cue Settings List nor a Cue Identifier:¶
NOTE style blocks cannot appear after the first cue.¶
Example Cue 3: timestamp 00:01:03.000, duration 00:00:03.500, Block's content:¶
Example entry 3: That stuff to the right of the timestamps are cue \ settings.¶
BlockAddition's content ends with an empty line as there's no Cue Identifier and there were no WebVTT Comment blocks:¶
position:90% align:right size:35%¶
Example Cue 4: timestamp 00:03:10.000, duration 00:00:10.000, Block's content:¶
Example entry 4: Entries can even include timestamps. For example:00:00:05.000This becomes visible five seconds after the first part.¶
This Block does not need a BlockAddition as the Cue did not contain an Identifier, nor a Settings List, and it wasn't preceded by Comment blocks.¶
Note: the storage of WebVTT in Matroska is not the same as the design document for storage of WebVTT in WebM [WebM-WebVTT]. There are several reasons for this including but not limited to: the WebM document is old (from February 2012) and was based on an earlier draft of WebVTT and ignores several parts that were added to WebVTT later; WebM does still not support subtitles at all [WebMContainer]; the proposal suggests splitting the information across multiple tracks making demuxer's and remuxer's life very difficult.¶
WebM uses the "D_WEBVTT/SUBTITLES", "D_WEBVTT/CAPTIONS", "D_WEBVTT/DESCRIPTIONS", and "D_WEBVTT/METADATA" CodecID
with different tracks depending on the data type and without a CodecPrivate
.¶
The specifications for the HDMV Presentation Graphics Subtitle format (short: HDMV PGS) can be found in in section 9.14 "HDMV graphics streams" of the Blu-ray specifications [Blu-ray.Part3].¶
The CodecID
to use is S_HDMV/PGS
. A CodecPrivate
element is not used.¶
Each HDMV PGS Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.14.2.1 "Segment coding structure and parameters" of the Blu-ray specifications [Blu-ray.Part3].¶
Each Segment contains a presentation timestamp. This timestamp will be used as the timestamp for the Matroska Block.¶
A Segment is normally shown until a subsequent Segment is encountered. Therefore, the Matroska Block MAY have no Duration. In that case, a player MUST display a Segment within a Matroska Block until the next Segment is encountered.¶
A muxer MAY use a Duration, e.g., by calculating the distance between two subsequent Segments.
If a Matroska Block has a Duration, a player MUST display that Segment only for
the duration of the BlockDuration
.¶
The specifications for the HDMV Text Subtitle format (short: HDMV TextST) can be found in section 9.15 "HDMV text subtitle streams" of the Blu-ray specifications [Blu-ray.Part3].¶
The CodecID
to use is S_HDMV/TEXTST
.¶
A CodecPrivate
element is required. It MUST contain the stream's Dialog Style Segment
as described in section 9.15.4.2 "Dialog Style Segment" of the Blu-ray specifications [Blu-ray.Part3].¶
Each HDMV Dialog Presentation Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.15.4.3 "Dialog presentation segment" of the Blu-ray specifications [Blu-ray.Part3].¶
Each Segment contains a start and an end presentation timestamp (short: start PTS & end PTS). The start PTS will be used as the timestamp for the Matroska Block. The Matroska Block MUST have a Duration, and that Duration is the difference between the end PTS and the start PTS.¶
A player MUST use the Matroska Block
's timestamp and BlockDuration
instead of the Segment's
start and end PTS for determining when and how long to show the Segment.¶
When TextST subtitles are stored inside Matroska, the only allowed character set is UTF-8.¶
Each HDMV text subtitle stream in a Blu-ray can use one of a handful of character sets. This information is not stored in the MPEG2 Transport Stream itself but in the accompanying Clip Information file.¶
Therefore, a muxer MUST parse the accompanying Clip Information file. If the information indicates a character set other than UTF-8, it MUST re-encode all text Dialog Presentation Segments from the indicated character set to UTF-8 prior to storing them in Matroska.¶
The specifications for the Digital Video Broadcasting subtitle bitstream format (short: DVB subtitles) can be found in the [ETSI.EN300-743] document. The storage of DVB subtitles in MPEG transport streams is specified in the [ETSI.EN300-468] document.¶
The CodecID
to use is S_DVBSUB
.¶
The CodecPrivate
element is five bytes long and has the following structure:¶
2 bytes: composition page ID (bit string, left bit first)¶
2 bytes: ancillary page ID (bit string, left bit first)¶
1 byte: subtitling type (bit string, left bit first)¶
The semantics of these bytes are the same as the ones described in section 6.2.41 "Subtitling descriptor" of [ETSI.EN300-468].¶
Each Matroska Block consists of one or more DVB Subtitle Segments as described in section 7.2 "Syntax and semantics of the subtitling segment" of [ETSI.EN300-743].¶
Each Matroska Block SHOULD have a Duration indicating how long the DVB Subtitle Segments in that Block SHOULD be displayed.¶
The specifications for the ARIB B-24 subtitle bitstream format (short: ARIB subtitles) and its storage in MPEG transport streams can be found in the documents [ARIB.STD-B24], [ARIB.STD-B10], and [ARIB.TR-B14].¶
The CodecID
to use is S_ARIBSUB
.¶
The CodecPrivate
element is three bytes long and has the following structure:¶
1 byte: component tag (bit string, left bit first)¶
2 bytes: data component ID (bit string, left bit first)¶
The semantics of the component tag are the same as those described in [ARIB.STD-B10], part 2, Annex J. The semantics of the data component ID are the same as those described in [ARIB.TR-B14], fascicle 2, Vol. 3, Section 2, 4.2.8.1.¶
Each Matroska Block consists of a single synchronized PES data structure as described in chapter 5 "Independent PES transmission protocol" of [ARIB.STD-B24], volume 3, with a Synchronized_PES_data_byte block containing one or more ISDB Caption Data Groups as described in chapter 9 "Transmission of caption and superimpose" of [ARIB.STD-B24], volume 1, part 3. All of the Caption Statement Data Groups in a given Matroska Track MUST use the same language index.¶
A Data Group is normally shown until a subsequent Group provides instructions to clear it. Therefore, the Matroska Block SHOULD NOT have a Duration. A player SHOULD display a Data Group within a Matroska Block until its internal duration elapses, or until a subsequent Data Group removes it.¶
Extra data or metadata can be added to each Block
using BlockAdditional
data.
Each BlockAdditional
contains a BlockAddID
that identifies the kind of data it contains.
When the BlockAddID
is set to "1" the contents of the BlockAdditional
element
are defined by the "Codec BlockAdditions" section of the codec; see Section 3.1.5.¶
The following XML depicts the nested elements of a BlockGroup
element with an example of BlockAdditions
with a BlockAddID
of "1":¶
<BlockGroup> <Block>{Binary data of a VP9 video frame in YUV}</Block> <BlockAdditions> <BlockMore> <BlockAddID>1</BlockAddID> <BlockAdditional> {alpha channel encoding to supplement the VP9 frame} </BlockAdditional> </BlockMore> </BlockAdditions> </BlockGroup>¶
When the BlockAddID
is set a value greater than "1", then the contents of the
BlockAdditional
element are defined by the BlockAdditionalMapping
element, within
the associated TrackEntry
element, where the BlockAddID
element of BlockAdditional
element
equals the BlockAddIDValue
of the associated TrackEntry
's BlockAdditionalMapping
element.
That BlockAdditionalMapping
element identifies a particular Block Additional Mapping
by the BlockAddIDType
.¶
The values of BlockAddID
that are 2 of greater have no semantic meaning, but simply
associate the BlockMore
element with a BlockAdditionMapping
of the associated Track
.
See Section 6 on Block Additional Mappings
for more information.¶
The following XML depicts a use of a Block Additional Mapping
to associate a timecode value with a Block
:¶
<Segment> <!--Mandatory elements omitted for readability--> <Tracks> <TrackEntry> <TrackNumber>1</TrackNumber> <TrackUID>568001708</TrackUID> <TrackType>1</TrackType> <BlockAdditionalMapping> <BlockAddIDValue>2</BlockAddIDValue><!--arbitrary value used in BlockAddID--> <BlockAddIDName>timecode</BlockAddIDName> <BlockAddIDType>12</BlockAddIDType> </BlockAdditionalMapping> <CodecID>V_FFV1</CodecID> <Video> <PixelWidth>1920</PixelWidth> <PixelHeight>1080</PixelHeight> </Video> </TrackEntry> </Tracks> <Cluster> <Timestamp>3000</Timestamp> <BlockGroup> <Block>{binary video frame}</Block> <BlockAdditions> <BlockMore> <BlockAddID>2</BlockAddID><!--arbitrary value from BlockAdditionalMapping--> <BlockAdditional>01:00:00:00</BlockAdditional> </BlockMore> </BlockAdditions> </BlockGroup> </Cluster> </Segment>¶
Block Additional Mappings
detail how additional data MAY be stored in the BlockMore
element
with a BlockAdditionMapping
element, within the Track
element, which identifies the BlockAdditional
content.
Block Additional Mappings
define the BlockAddIDType
value reserved to identify that
type of data as well as providing an optional label stored within the BlockAddIDName
element.
When the Block Additional Mapping
is dependent on additional contextual information,
then the Mapping SHOULD describe how such additional contextual information is stored within the BlockAddIDExtraData
element.¶
SMPTE ST 12-1 timecode values can be stored in the BlockMore
element to associate
the content of a Matroska Block with a particular timecode value.
If the Block uses Lacing, the timecode value is associated with the first frame of the Lace.¶
The Block Additional Mapping
contains a full binary representation of a 64-bit SMPTE timecode
value stored in big-endian format and expressed exactly as defined in Section 8 and 9
of SMPTE 12M [SMPTE.ST12-1]. For convenience, here are the bit assignments for a
SMPTE ST 12-1 binary representation as described in Section 6.2 of [RFC5484]:¶
Bit Positions | Label |
---|---|
0--3 | Units of frames |
4--7 | First binary group |
8--9 | Tens of frames |
10 | Drop frame flag |
11 | Color frame flag |
12--15 | Second binary group |
16--19 | Units of seconds |
20--23 | Third binary group |
24--26 | Tens of seconds |
27 | Polarity correction |
28--31 | Fourth binary group |
32--35 | Units of minutes |
36--39 | Fifth binary group |
40--42 | Tens of minutes |
43 | Binary group flag BGF0 |
44--47 | Sixth binary group |
48--51 | Units of hours |
52--55 | Seventh binary group |
56--57 | Tens of hours |
58 | Binary group flag BGF1 |
59 | Binary group flag BGF2 |
60--63 | Eighth binary group |
For example, a timecode value of "07:32:54;18" can be expressed as a 64-bit SMPTE 12M value as:¶
10000000 01100000 01100000 01010000 00100000 00110000 01110000 00000000¶
The BlockAddIDType
value reserved for timecode is "121".¶
The BlockAddIDName
value reserved for timecode is "SMPTE ST 12-1 timecode".¶
BlockAddIDExtraData
is unused within this block additional mapping.¶
This document inherits security considerations from the EBML [RFC8794] and Matroska [RFC9559] documents.¶
Codec handling may be one of the more error-prone aspect of using Matroska. The parsing and interpretation of binary data can lead to many types of security issues. Although these issues don't come from Matroska itself, it's worth noting some issues that need to be considered.¶
The mandatory CodecPrivate
may be missing from the TrackEntry
description. The TrackEntry
MAY be discarded in that case.¶
An existing CodecPrivate
data may be bogus or incomplete or too big. The TrackEntry
MAY be discarded in that case.¶
A lot of codec have internal fields to hold values that are already found in the TrackEntry
like the video dimensions or the audio sampling frequency.
If these values differ that can lead to playback issues and even crashes.¶
This document defines registries for Codec IDs stored in the CodecID
element.
A CodecID
is a case-sensitive ASCII string with a V_
, A_
, S_
and B_
prefix for
video, audio, subtitle and button tracks respectively. The details of the string format
are found in Section 3.1.1.¶
To register a new Codec ID in this registry, one needs a Codec ID string, a TrackType value, a description, a Change Controller, and an optional Reference to a document describing the Codec ID.¶
Some Codec IDs values are deprecated and SHOULD NOT be used. Such Codec IDs are marked as "Reclaimed" in the "Matroska Codec IDs" registry.¶
Table 4 shows the initial contents of the "Matroska Codec IDs" registry. The Change Controller for the initial entries is the IETF.¶
Codec ID | Track Type | Description | Reference |
---|---|---|---|
V_AV1 | 1 | Alliance for Open Media AV1 | This document, Section 3.3.1 |
V_AVS2 | 1 | AVS2-P2/IEEE.1857.4 | This document, Section 3.3.2 |
V_AVS3 | 1 | AVS3-P2/IEEE.1857.10 | This document, Section 3.3.3 |
V_CAVS | 1 | AVS1-P2/IEEE.1857.3 | This document, Section 3.3.4 |
V_DIRAC | 1 | Dirac / VC-2 | This document, Section 3.3.5 |
V_FFV1 | 1 | FFV1 | This document, Section 3.3.6 |
V_MJPEG | 1 | Motion JPEG | This document, Section 3.3.7 |
V_MPEGH/ISO/HEVC | 1 | HEVC/H.265 | This document, Section 3.3.8 |
V_MPEGI/ISO/VVC | 1 | VVC/H.266 | This document, Section 3.3.9 |
V_MPEG1 | 1 | MPEG 1 | This document, Section 3.3.10 |
V_MPEG2 | 1 | MPEG 2 | This document, Section 3.3.11 |
V_MPEG4/ISO/AVC | 1 | AVC/H.264 | This document, Section 3.3.12 |
V_MPEG4/ISO/AP | 1 | MPEG4 ISO advanced profile | This document, Section 3.3.13 |
V_MPEG4/ISO/ASP | 1 | MPEG4 ISO advanced simple profile | This document, Section 3.3.14 |
V_MPEG4/ISO/SP | 1 | MPEG4 ISO simple profile | This document, Section 3.3.15 |
V_MPEG4/MS/V3 | 1 | Microsoft MPEG4 V3 | This document, Section 3.3.16 |
V_MS/VFW/FOURCC | 1 | Microsoft Video Codec Manager | This document, Section 3.3.17 |
V_QUICKTIME | 1 | Video taken from QuickTime files | This document, Section 3.3.18 |
V_PRORES | 1 | Apple ProRes | This document, Section 3.3.19 |
V_REAL/RV10 | 1 | RealVideo 1.0 aka RealVideo 5 | This document, Section 3.3.20 |
V_REAL/RV20 | 1 | RealVideo G2 and RealVideo G2+SVT | This document, Section 3.3.21 |
V_REAL/RV30 | 1 | RealVideo 8 | This document, Section 3.3.22 |
V_REAL/RV40 | 1 | rv40 : RealVideo 9 | This document, Section 3.3.23 |
V_THEORA | 1 | Theora | This document, Section 3.3.24 |
V_UNCOMPRESSED | 1 | Raw uncompressed video frames | This document, Section 3.3.25 |
V_VP8 | 1 | VP8 Codec format | This document, Section 3.3.26 |
V_VP9 | 1 | VP9 Codec format | This document, Section 3.3.27 |
A_AAC/MPEG2/LC | 2 | Low Complexity | This document, Section 3.4.1 |
A_AAC/MPEG2/LC/SBR | 2 | Low Complexity with Spectral Band Replication | This document, Section 3.4.2 |
A_AAC/MPEG2/MAIN | 2 | MPEG2 Main Profile | This document, Section 3.4.3 |
A_AAC/MPEG2/SSR | 2 | Scalable Sampling Rate | This document, Section 3.4.4 |
A_AAC/MPEG4/LC | 2 | Low Complexity | This document, Section 3.4.5 |
A_AAC/MPEG4/LC/SBR | 2 | Low Complexity with Spectral Band Replication | This document, Section 3.4.6 |
A_AAC/MPEG4/LTP | 2 | Long Term Prediction | This document, Section 3.4.7 |
A_AAC/MPEG4/MAIN | 2 | MPEG4 Main Profile | This document, Section 3.4.8 |
A_AAC/MPEG4/SSR | 2 | Scalable Sampling Rate | This document, Section 3.4.9 |
A_AC3 | 2 | Dolby Digital / AC-3 | This document, Section 3.4.10 |
A_AC3/BSID9 | 2 | Dolby Digital / AC-3 | This document, Section 3.4.11 |
A_AC3/BSID10 | 2 | Dolby Digital / AC-3 | This document, Section 3.4.12 |
A_ALAC | 2 | ALAC (Apple Lossless Audio Codec) | This document, Section 3.4.13 |
A_ATRAC/AT1 | 2 | Sony ATRAC1 Codec | This document, Section 3.4.14 |
A_DTS | 2 | Digital Theatre System | This document, Section 3.4.15 |
A_DTS/EXPRESS | 2 | Digital Theatre System Express | This document, Section 3.4.16 |
A_DTS/LOSSLESS | 2 | Digital Theatre System Lossless | This document, Section 3.4.17 |
A_EAC3 | 2 | Dolby Digital Plus / E-AC-3 | This document, Section 3.4.18 |
A_FLAC | 2 | FLAC | This document, Section 3.4.19 |
A_MLP | 2 | Meridian Lossless Packing / MLP | This document, Section 3.4.20 |
A_MPC | 2 | MPC (musepack) SV8 | This document, Section 3.4.21 |
A_MPEG/L1 | 2 | MPEG Audio 1, 2 Layer I | This document, Section 3.4.22 |
A_MPEG/L2 | 2 | MPEG Audio 1, 2 Layer II | This document, Section 3.4.23 |
A_MPEG/L3 | 2 | MPEG Audio 1, 2, 2.5 Layer III | This document, Section 3.4.24 |
A_MS/ACM | 2 | Microsoft Audio Codec Manager (ACM) | This document, Section 3.4.25 |
A_REAL/14_4 | 2 | Real Audio 1 | This document, Section 3.4.26 |
A_REAL/28_8 | 2 | Real Audio 2 | This document, Section 3.4.27 |
A_REAL/ATRC | 2 | Sony Atrac3 Codec | This document, Section 3.4.28 |
A_REAL/COOK | 2 | Real Audio Cook Codec | This document, Section 3.4.29 |
A_REAL/RALF | 2 | Real Audio Lossless Format | This document, Section 3.4.30 |
A_REAL/SIPR | 2 | Sipro Voice Codec | This document, Section 3.4.31 |
A_OPUS | 2 | Opus interactive speech and audio codec | This document, Section 3.4.32 |
A_PCM/FLOAT/IEEE | 2 | Floating-Point, IEEE compatible | This document, Section 3.4.33 |
A_PCM/INT/BIG | 2 | PCM Integer Big Endian | This document, Section 3.4.34 |
A_PCM/INT/LIT | 2 | PCM Integer Little Endian | This document, Section 3.4.35 |
A_QUICKTIME | 2 | Audio taken from QuickTime files | This document, Section 3.4.36 |
A_QUICKTIME/QDMC | 2 | QDesign Music | This document, Section 3.4.37 |
A_QUICKTIME/QDM2 | 2 | QDesign Music v2 | This document, Section 3.4.38 |
A_TRUEHD | 2 | Dolby TrueHD | This document, Section 3.4.39 |
A_TTA1 | 2 | The True Audio | This document, Section 3.4.40 |
A_VORBIS | 2 | Vorbis | This document, Section 3.4.41 |
A_WAVPACK4 | 2 | WavPack | This document, Section 3.4.42 |
S_ARIBSUB | 17 | ARIB STD-B24 subtitles | This document, Section 3.5.1 |
S_DVBSUB | 17 | Digital Video Broadcasting subtitles | This document, Section 3.5.2 |
S_HDMV/PGS | 17 | HDMV presentation graphics subtitles | This document, Section 3.5.3 |
S_HDMV/TEXTST | 17 | HDMV text subtitles | This document, Section 3.5.4 |
S_KATE | 17 | Karaoke And Text Encapsulation | This document, Section 3.5.5 |
S_IMAGE/BMP | 17 | Bitmap | This document, Section 3.5.6 |
S_ASS | 17 | Advanced SubStation Alpha Format | Reclaimed, Section 3.5.7 |
S_TEXT/ASS | 17 | Advanced SubStation Alpha Format | This document, Section 3.5.7 |
S_TEXT/ASCII | 17 | ASCII Plain Text | This document, Section 3.5.8 |
S_TEXT/SSA | 17 | SubStation Alpha Format | This document, Section 3.5.9 |
S_TEXT/USF | 17 | Universal Subtitle Format | This document, Section 3.5.10 |
S_TEXT/UTF8 | 17 | UTF-8 Plain Text | This document, Section 3.5.11 |
S_TEXT/WEBVTT | 17 | Web Video Text Tracks (WebVTT) | This document, Section 3.5.12 |
S_SSA | 17 | SubStation Alpha Format | Reclaimed, Section 3.5.7 |
S_VOBSUB | 17 | VobSub subtitles | This document, Section 3.5.13 |
B_VOBBTN | 18 | VobBtn Buttons | This document, Section 3.6.1 |
This document defines registries for BlockAdditional Type IDs stored in the BlockAddIDType
element.
The values correspond to the unsigned integer BlockAddIDType
value described in Section 5.1.4.1.17.3 of [RFC9559].¶
To register a new BlockAdditional Type ID in this registry, one needs a BlockAddIDType
unsigned integer,
a BlockAddIDName
string value, a Change Controller, and an optional Reference to a document describing the BlockAdditional Type ID.¶
Table 5 shows the initial contents of the "Matroska BlockAdditional Type IDs" registry. The Change Controller for the initial entries is the IETF.¶
BlockAddIDType | BlockAddIDName | Reference |
---|---|---|
0 | Use BlockAddIDValue | This document, Section 3.7.1 |
1 | Opaque data | This document, Section 3.7.2 |
4 | ITU T.35 metadata | This document, Section 3.7.3 |
121 | SMPTE ST 12-1 timecode | This document, Section 6.1 |
0x64766343 | Dolby Vision configuration dvcC | This document, Section 3.7.6 |
0x61766345 | Dolby Vision enhancement-layer AVC configuration | This document, Section 3.7.4 |
0x64767643 | Dolby Vision configuration dvvC | This document, Section 3.7.7 |
0x64767743 | Dolby Vision configuration dvwC | This document, Section 3.7.8 |
0x68766345 | Dolby Vision enhancement-layer HEVC configuration | This document, Section 3.7.5 |
0x6D766343 | MVC configuration | This document, Section 3.7.9 |