Minutes of the Audio/Video Transport Working Group

Reported by Colin Perkins

The Audio/Video Transport working group met twice at the 42nd IETF in Chicago.
The major items of discussion were the revision of RTP and the Audio/Video profile
for advancement to draft standard, multiplexing of RTP streams and transport of
MPEG4 streams using RTP.

The meeting opened with a review of the status of the working group documents. The
RTP header compression scheme and the payload format documents for JPEG, BT656
and H.263+ video are now either awaiting IESG approval or are in last call. The payload
format for bundled MPEG streams has been published as RFC2343 (experimental). The
Options for Repair of Streaming Media document has been published as RFC2354
(informational).

The draft giving guidelines for writers of RTP payload format documents (draft-ietf-
avt-rtp-format-guidelines-00.txt) is still awaiting revision. Mark Handley (ISI)
committed to update this document in time for the next meeting, for publication as a
BCP or Informational RFC. Input is solicited from people who know about the quirks
of particular codecs, to effectively capture the knowledge of the group as an aid to the
designers of future payload formats.

The RTP payload format for PureVoice audio (draft-mckay-qcelp-01.txt) was discussed
briefly with a view to conducting last call for proposed standard soon. Some concerns
were raised regarding the means for signalling encryption of the media data only (not
the RTP headers) by means of a bit in the payload sub-header, since out-of-band
signalling, for example using an SDP attribute, may be more appropriate. Clarification
of this issue is needed before the draft can go forward, but the document is otherwise
complete.

The generic RTP payload format was not discussed this time. The authors of the
proposals presented at the last meeting were to produce a combined proposal, but this is
not yet complete. The authors commited to completing this merger by the next meeting
and it is expected that further discussion of the subject will occur then.

The RTP MIB (draft-ietf-avt-rtp-mib-02.txt) was presented by Mark Baugher (Intel).
The changes since the previous draft are the elimination of separate tables for hosts
and monitors, with an rtpSessionMonitor variable being added to distinguish the two
modes. The rtpFracLost variable was removed since it was found not to be useful. The
definition of a session was clarified.

The RTP MIB will be referenced by the H-series MIBs which are being defined, but
the RTP MIB itself will remain an AVT work item (since RTP experience is in this
group). The H-series MIB developers will review the RTP MIB to ensure that it meets
their needs.

The main area of concern with this current version of the RTP MIB is the complexity
of the compliances definitions resulting from the merger of the tables. It is expected
that the MIB will be ready for last call after a further round of revisions, probably by
the time of the next meeting. A reference implementation of the MIB is now available.

The major topic of discussion on the first day was multiplexing multiple streams into
a single RTP stream. This discussion started with an introduction by Stephen Casner
(Cisco) who raised a number of issues for consideration in the following discussion:

- Why are we multiplexing: because common handling is needed or to reduce
overhead?
- When should we multiplex: what should be separate and what should be bundled?
- Where should we multiplex? At which protocol level? Can we keep all multiplexing
at one protocol level?
- How should we multiplex: an application specific solution or a general purpose one?

A pointer was also given to Jonathan Rosenberg's internet-draft on this subject from
December 1996, which discusses these issues. Copies of this may be obtained from http://www.cs.columbia.edu/~jdrosen/aggregate/rtpdoc.htm and a revised and updated
version of this will be re-posted.

Following the introduction, a number of proposals were presented: Jonathan Rosenberg
(Lucent) and Barani Subbiah (Nokia) presented somewhat similar proposals applicable
for the efficient connection of PSTN gateways using RTP. Tohru Hoshi (Hitachi) and
Mark Handley (ISI) presented more general proposals.

The first presentation was of draft-ietf-avt-aggregation-00.txt by Jonathan Rosenberg.
This proposal deals with the interconnection of telephony gateways only - it is not a
general purpose multiplexing protocol. Some of the main features of this proposal are:

- Users are identified by a single 7 bit identifier. The mapping from SSRC to this
identifier is done by non-RTP means (eg: SIP/H.323 signalling). If more than 127
streams are being transported between two gateways multiple multiplexed RTP
sessions must be used.
- All multiplexed streams must share a common clock and generate packets at integer
multiples of a common frame duration. They do not need to use a common codec.
- Payload type information is transported for each multiplexed user. In many cases
the payload type specifies the length of the packet so there is no need for a separate
length indication field (although one can optionally be provided if necessary).
- User payloads are not word aligned. Aligning them would reduce the bandwidth
efficiency significantly.

It was noted that statistical multiplexing of multiple streams using silence suppression
can cause problems: there is a limit to how many streams can be packed into a single
multiplexed packet before exceeding the network MTU. If the limit is exceeded,
multiple packets must be generated: this will cause the receiver to see non-contiguous
sequence numbers per user giving the appearance of loss. The solution is to limit the
statistical multiplexing of streams so that the MTU is not exceeded even when no
streams are silent, but this reduces the efficiency somewhat.

It was also noted that the loss of a single packet from the multiplexed stream will affect
all users multiplexed into the stream. It is not expected that this will be a problem (in
fact the use of multiplexing may reduce loss rates, since it reduces the both the data and
packet rates compared to non-multiplexed streams).

Barani Subbiah (Nokia) presented draft-ietf-avt-mux-rtp-00.txt. This proposal is
designed to solve a similar problem to that of Rosenberg and is unsurprisingly
somewhat similar to that proposal. The main difference seems to be that this protocol
uses an explicit length indication for each multiplexed packet (6 bits rather than 16 bits
in Rosenberg's proposal) and that the payload type for each user is signalled out-of-
band rather than carried in the payload (disallowing changing payload types on the fly).

A more generic proposal (draft-tanigawa-rtp-multiplex-00.txt) was presented by Tohru
Hoshi (Hitachi). In this proposal RTP streams are multiplexed, rather than voice
streams, by concatenating multiple RTP packets into one. This allows for the
multiplexing of any sort of data, rather than just voice data, at the expense of additional
overheads. Once again, out of band signalling is required to indicate that this is a
multiplexed stream. It was noted that it may be possible to generalise this as a generic
UDP multiplexing protocol, rather than an RTP multiplexer.

These proposals discuss out-of-band signalling which is required for correct operation
of these protocols. It is noted that whilst signalling is required in these cases, and
should be simple to implement using either SIP or H.323, it is outside the scope of a
payload format document (the payload format should be independent of the signalling
protocol).

It was noted that the presence of multiple multiplexing solutions is not necessarily
desirable, since this hinders interoperability. It would be desirable to combine these
proposals into one if at all possible. However, it was further noted that the drafts from
Rosenberg and Subbiah are essentially solving the same problem, whilst the proposal
presented by Hoshi is doing something different. The tradeoffs are different and we
may need two protocols: the issues are sufficiently different for the two scenarios.

None of the three proposals presented so far have solved the generic multiplexing
problem: the first two are clearly very application specific, the third requires out of
band signalling to operate.

The second session started with a brief presentation by Mark Handley (ISI) describing
an idea which resulted from the earlier discussion of multiplexing. This proposal,
MuRGE, uses the techniques of RTP header compression within a single packet as a
generic multiplexing method (all state is reinitialised within each packet). That is, each
packet contains a standard RTP header followed by a number of payloads each with
their own payload header. The payload headers are coded as differences from the
previous header. Clearly the bandwidth efficiency of this proposal depends on the
similarity between the headers of the multiplexed payloads. If used between
cooperating gateways where SSRC values can be allocated consecutively and the
codecs, timestamps and sequence numbers are synchronised, this proposal can produce
a single byte header for each multiplexed packet. If there is no cooperation between
multiplexing points the full RTP header has to be sent for each multiplexed stream. If
signalling is employed between multiplexing points (eg: for SSRC mapping) then some
gain can be made even in the most generic case. This proposal is at a very early stage
of development, but introduces some interesting ideas. Further work is clearly needed.

The discussion of multiplexing concluded with agreement that the development of a
multiplexing proposal is of interest and should become a work item of the group.

The next item for discussion was transport of MPEG4 streams using RTP draft-ietf-
avt-rtp-mpeg4-00.txt and the role of DMIF signalling draft-ietf-avt-mpeg4-dmif-01.txt
which was presented by Vahe Balabanian (Nortel). After outlining the proposals
discussion focused on a number of open issues:

- Should MPEG4 elementary streams be transported directly over RTP or should
they be encapsulated using FlexMux first? There is some concern that the use of
FlexMux does not cleanly fit into the RTP model in particular the interaction with
RTP mixers is unclear.
- The mapping of MPEG4 scene and object descriptor streams to RTP is unclear. It
may be that these need special transport and protocols other than RTP may be
better suited to their needs: in particular the initial session description should not
be carried in RTP. The transport of dynamic updates to this is an area which needs
further study: an RTP stream may be appropriate or alternatively a separate
signalling stream (eg: RTSP using ANNOUNCE) may work better.
- The mapping of MPEG4 decoder timestamps to RTP is unclear, since RTPincludes
only a send timestamp and applications are expected to derive their own decode
time based on the observed network timing jitter. The next IETF meeting coincides
with the MPEG4 decision meeting, so it is therefore urgent that these issues are
resolved. The last chance to make changes to MPEG4 version 1 will be at the
MPEG meeting on 12-16 October. Since it is clear that only limited progress can
be made in the short time period available it was decided to continue with the
development of the payload format for MPEG4 elementary streams and to resolve
the issues discussed. Once this is done and further experience has been gained with
actual implementations the group will revisit these issues.

The major discussion item for the second day was the advancement of RTP and the
Audio/Video profile from proposed- to draft-standard status. A summary of the changes
in draft-ietf-avt-rtp-new-01.txt was made by Stephen Casner (Cisco). These include:

- Added fudge factor in timer reconsideration
- Added fix for underestimate when using SSRC sampling if the group size
decreases rapidly
- RTCP sender and receiver bandwidth may be specified as a parameter (rather than
the default 5%)
- RTCP minimum interval may scale smaller for high bandwidth sessions and zero
initial delay for unicast sessions
- Specified padding for RTCP only on the last packet
- Specified relative NTP uses the "best" platform clock
- Formal reference to IPsec for security (this concerns some people since RTP may
be used in scenarios where the presence of IPsec cannot be guaranteed...)
- Partial conversion to SHOULD, MUST, MAY, etc

In addition it has been decided not to make a number of the changes which have
previously been suggested:

- Ignore group size dropping to zero with reverse reconsideration.
- No scaling of the RTCP interval larger since this could cause time-outs
- No changes to the jitter algorithm for multi-packet video frames
- No additional SDES items were defined (these can be registered with the IANA
separately)
- No change to the definition of the RTCP RR loss fraction
- Nothing was added about translators adding random timestamp offsets

The issue of conditional vs unconditional reconsideration was discussed and it was
noted that there is little to choose between these two algorithms in practise, and that
unconditional reconsideration is simplest to implement. The next revision will therefore
only include unconditional reconsideration.

A number of problems which have been discovered in the SSRC sampling algorithm
were presented by Jonathan Rosenberg (Lucent). The problems with reconsideration
and over-weighting of senders have been corrected in the current RTP draft. A problem
remains when the group size decreases rapidly which results in members using SSRC
sampling producing very inaccurate estimates of the group size. A solution using a
"binning" algorithm is proposed in draft-ietf-avt-rtpsample-00.txt but this algorithm
may be patented by Lucent (who are willing to license on "fair, reasonable and non-
discriminatory terms").

Much discussion occurred on this topic since it is considered undesirable to have
patented technology as part of the main specification. The cleanest solution appears to
be to move the SSRC sampling algorithms out of the main specification into a separate
document. The main RTP specification would note that SSRC sampling may be
desirable in certain cases and point to this new document for implementation advice
and sample algorithms. This separation will occur with the next version of the RTP
specification.

The changes to the Audio/Video profile (draft-ietf-avt-profile-new-03.txt) are less
extensive: the PureVoice codec has been added and assigned payload type 12. As
discussed at the previous meeting this is the last static assignment which will be made -
this policy is now stated explicitly in the draft. The static assignment of payload type
77 to redundant audio has been removed since all known implementations use a dynamic
payload type. References to MPEG1 system streams and MPEG2 program streams
(RFC2250) using dynamic payload types have been added and a number of other
RFC references have been updated.

The new policy regarding static payloads needs to be better described in the next
revision of the draft. The SDP modifiers to explicitly denote the RTCP fraction (if the
default of 5% is not being used) have yet to be written, but it is felt that these should
not be added to the A/V profile (since that should not be tied to SDP) rather a new
document should define them.

The MIME registration of payload formats has yet to be done. There are many open
issues here regarding how this should occur, what information is bound to the names,
who MIME is extended for types which cannot be represented in email, etc.

An update on the RTP payload for redundant audio was presented by Colin Perkins
(UCL). This document (draft-ietf-avt-rtp-redundancy-revised-00.txt) updates RFC2198
in the light of additional usage experience. The change is to specify that all packets in a
redundant stream should be sent using the redundancy format, rather than sending the
first packet(s) in a talkspurt using the payload format of the primary codec. This allows
for explicit advertisement of the buffering requirements of a stream which simplifies
implementations and removes the need for an out-of-band parameter to convey this
information.

An update on the RTP payload for generic forward error correction (draft-ietf-avt-fec-
03.txt) was presented by Jonathan Rosenberg (Lucent). Changes since the previous
draft include the addition of code fragments illustrating the decoding stage, support for
FEC using Reed-Solomon codes, extension of the timestamp recovery to 56 bits and
removal of the reference to the expired draft by Budge. A number of issues with the
current draft were discussed including the required mask size: 24 bits is believed
sufficient so the optional extension to 56 bits will be removed from the draft.

It was also decided that parity FEC and Reed-Solomon codes are sufficiently different
that this draft should be split into two. The resulting parity FEC payload format
document is expected to be ready for last call after one further revision; the Reed-
Solomon payload format document will need further work over the coming months.

The meeting concluded with a reminder that a revised working group charter has been
posted. Comments and discussion of the proposed new charter and milestones should
be directed to the mailing list.