Minutes of the Audio/Video Transport Working Group

Reported by Colin Perkins

The Audio/Video Transport working group met twice at the 42nd IETF in Chicago. 
The major items of discussion were the revision of RTP and the Audio/Video profile 
for advancement to draft standard, multiplexing of RTP streams and transport of 
MPEG4 streams using RTP.

The meeting opened with a review of the status of the working group documents. The 
RTP header compression scheme and the payload format documents for JPEG, BT656 
and H.263+ video are now either awaiting IESG approval or are in last call. The payload 
format for bundled MPEG streams has been published as RFC2343 (experimental). The 
Options for Repair of Streaming Media document has been published as RFC2354 
(informational). 

The draft giving guidelines for writers of RTP payload format documents (draft-ietf-
avt-rtp-format-guidelines-00.txt) is still awaiting revision.  Mark Handley (ISI) 
committed to update this document in time for the next meeting, for publication as a 
BCP or Informational RFC. Input is solicited from people who know about the quirks 
of particular codecs, to effectively capture the knowledge of the group as an aid to the 
designers of future payload formats. 

The RTP payload format for PureVoice audio (draft-mckay-qcelp-01.txt) was discussed 
briefly with a view to conducting last call for proposed standard soon. Some concerns 
were raised regarding the means for signalling encryption of the media data only (not 
the RTP headers) by means of a bit in the payload sub-header, since out-of-band 
signalling, for example using an SDP attribute, may be more appropriate. Clarification 
of this issue is needed before the draft can go forward, but the document is otherwise 
complete.

The generic RTP payload format was not discussed this time. The authors of the 
proposals presented at the last meeting were to produce a combined proposal, but this is 
not yet complete. The authors commited to completing this merger by the next meeting 
and it is expected that further discussion of the subject will occur then.

The RTP MIB (draft-ietf-avt-rtp-mib-02.txt) was presented by Mark Baugher (Intel). 
The changes since the previous draft are the elimination of separate tables for hosts 
and monitors, with an rtpSessionMonitor variable being added to distinguish the two 
modes. The rtpFracLost variable was removed since it was found not to be useful. The 
definition of a session was clarified.

The RTP MIB will be referenced by the H-series MIBs which are being defined, but 
the RTP MIB itself will remain an AVT work item (since RTP experience is in this 
group). The H-series MIB developers will review the RTP MIB to ensure that it meets 
their needs.

The main area of concern with this current version of the RTP MIB is the complexity 
of the compliances definitions resulting from the merger of the tables. It is expected 
that the MIB will be ready for last call after a further round of revisions, probably by 
the time of the next meeting.  A reference implementation of the MIB is now available.

The major topic of discussion on the first day was multiplexing multiple streams into 
a single RTP stream. This discussion started with an introduction by Stephen Casner 
(Cisco) who raised a number of issues for consideration in the following discussion:

- Why are we multiplexing: because common handling is needed or to reduce 
overhead?
- When should we multiplex: what should be separate and what should be bundled?
- Where should we multiplex? At which protocol level? Can we keep all multiplexing 
at one protocol level?
- How should we multiplex: an application specific solution or a general purpose one?

A pointer was also given to Jonathan Rosenberg's internet-draft on this subject from 
December 1996, which discusses these issues. Copies of this may be obtained from http://www.cs.columbia.edu/~jdrosen/aggregate/rtpdoc.htm and a revised and updated 
version of this will be re-posted.

Following the introduction, a number of proposals were presented: Jonathan Rosenberg 
(Lucent) and Barani Subbiah (Nokia) presented somewhat similar proposals applicable 
for the efficient connection of PSTN gateways using RTP.  Tohru Hoshi (Hitachi) and 
Mark Handley (ISI) presented more general proposals.

The first presentation was of draft-ietf-avt-aggregation-00.txt by Jonathan Rosenberg. 
This proposal deals with the interconnection of telephony gateways only - it is not a 
general purpose multiplexing protocol. Some of the main features of this proposal are:

- Users are identified by a single 7 bit identifier. The mapping from SSRC to this 
identifier is done by  non-RTP means (eg: SIP/H.323 signalling). If more than 127 
streams are being transported between two gateways multiple multiplexed RTP 
sessions must be used.
- All multiplexed streams must share a common clock and generate packets at integer 
multiples of a common frame duration. They do not need to use a common codec.
- Payload type information is transported for each multiplexed user. In many cases 
the payload type specifies the length of the packet so there is no need for a separate 
length indication field (although one can optionally be provided if necessary).
- User payloads are not word aligned. Aligning them would reduce the bandwidth 
efficiency significantly.

It was noted that statistical multiplexing of multiple streams using silence suppression 
can cause problems: there is a limit to how many streams can be packed into a single 
multiplexed packet before exceeding the network MTU. If the limit is exceeded, 
multiple packets must be generated: this will cause the receiver to see non-contiguous 
sequence numbers per user giving the appearance of loss. The solution is to limit the 
statistical multiplexing of streams so that the MTU is not exceeded even when no 
streams are silent, but this reduces the efficiency somewhat.

It was also noted that the loss of a single packet from the multiplexed stream will affect 
all users multiplexed into the stream. It is not expected that this will be a problem (in 
fact the use of multiplexing may reduce loss rates, since it reduces the both the data and 
packet rates compared to non-multiplexed streams).

Barani Subbiah (Nokia) presented draft-ietf-avt-mux-rtp-00.txt. This proposal is 
designed to solve a similar problem to that of Rosenberg and is unsurprisingly 
somewhat similar to that proposal. The main difference seems to be that this protocol 
uses an explicit length indication for each multiplexed packet (6 bits rather than 16 bits 
in Rosenberg's proposal) and that the payload type for each user is signalled out-of-
band rather than carried in the payload (disallowing changing payload types on the fly).

A more generic proposal (draft-tanigawa-rtp-multiplex-00.txt) was presented by Tohru 
Hoshi (Hitachi). In this proposal RTP streams are multiplexed, rather than voice 
streams, by concatenating multiple RTP packets into one.  This allows for the 
multiplexing of any sort of data, rather than just voice data, at the expense of additional 
overheads. Once again, out of band signalling is required to indicate that this is a 
multiplexed stream. It was noted that it may be possible to generalise this as a generic 
UDP multiplexing protocol, rather than an RTP multiplexer.

These proposals discuss out-of-band signalling which is required for correct operation 
of these protocols. It is noted that whilst signalling is required in these cases, and 
should be simple to implement using either SIP or H.323, it is outside the scope of a 
payload format document (the payload format should be independent of the signalling 
protocol).

It was noted that the presence of multiple multiplexing solutions is not necessarily 
desirable, since this hinders interoperability. It would be desirable to combine these 
proposals into one if at all possible. However, it was further noted that the drafts from 
Rosenberg and Subbiah are essentially solving the same problem, whilst the proposal 
presented by Hoshi is doing something different. The tradeoffs are different and we 
may need two protocols: the issues are sufficiently different for the two scenarios.

None of the three proposals presented so far have solved the generic multiplexing 
problem: the first two are clearly very application specific, the third requires out of 
band signalling to operate.

The second session started with a brief presentation by Mark Handley (ISI) describing 
an idea which resulted from the earlier discussion of multiplexing. This proposal, 
MuRGE, uses the techniques of RTP header compression within a single packet as a 
generic multiplexing method (all state is reinitialised within each packet). That is, each 
packet contains a standard RTP header followed by a number of payloads each with 
their own payload header. The payload headers are coded as differences from the 
previous header. Clearly the bandwidth efficiency of this proposal depends on the 
similarity between the headers of the multiplexed payloads. If used between 
cooperating gateways where SSRC values can be allocated consecutively and the 
codecs, timestamps and sequence numbers are synchronised, this proposal can produce 
a single byte header for each multiplexed packet. If there is no cooperation between 
multiplexing points the full RTP header has to be sent for each multiplexed stream. If 
signalling is employed between multiplexing points (eg: for SSRC mapping) then some 
gain can be made even in the most generic case. This proposal is at a very early stage 
of development, but introduces some interesting ideas. Further work is clearly needed.

The discussion of multiplexing concluded with agreement that the development of a 
multiplexing proposal is of interest and should become a work item of the group.

The next item for discussion was transport of MPEG4 streams using RTP draft-ietf-
avt-rtp-mpeg4-00.txt and the role of DMIF signalling draft-ietf-avt-mpeg4-dmif-01.txt 
which was presented by Vahe Balabanian (Nortel). After outlining the proposals 
discussion focused on a number of open issues:

- Should MPEG4 elementary streams be transported directly over RTP or should 
they be encapsulated using FlexMux first? There is some concern that the use of 
FlexMux does not cleanly fit into the RTP model in particular the interaction with 
RTP mixers is unclear.
- The mapping of MPEG4 scene and object descriptor streams to RTP is unclear. It 
may be that these need special transport and protocols other than RTP may be 
better suited to their needs: in particular  the initial session description should not 
be carried in RTP. The transport of dynamic updates to this is an area which needs 
further study: an RTP stream may be appropriate or alternatively a separate  
signalling stream (eg: RTSP using ANNOUNCE) may work better.
- The mapping of MPEG4 decoder timestamps to RTP is unclear, since RTPincludes 
only a send timestamp and applications are expected to derive their own decode 
time based on the observed network timing jitter.  The next IETF meeting coincides 
with the MPEG4 decision meeting, so it is therefore urgent that these issues are 
resolved. The last chance to make changes to MPEG4 version 1 will be at the 
MPEG meeting on 12-16 October.  Since it is clear that only limited progress can 
be made in the short time period available it was decided to continue with the 
development of the payload format for MPEG4 elementary streams and to resolve 
the issues discussed. Once this is done and further experience has been gained with 
actual implementations the group will revisit these issues.

The major discussion item for the second day was the advancement of RTP and the 
Audio/Video profile from proposed- to draft-standard status. A summary of the changes 
in draft-ietf-avt-rtp-new-01.txt was made by Stephen Casner (Cisco). These include:

- Added fudge factor in timer reconsideration
- Added fix for underestimate when using SSRC sampling if the group size 
decreases rapidly
- RTCP sender and receiver bandwidth may be specified as a parameter (rather than 
the default 5%)
- RTCP minimum interval may scale smaller for high bandwidth sessions and zero 
initial delay for unicast sessions
- Specified padding for RTCP only on the last packet
- Specified relative NTP uses the "best" platform clock
- Formal reference to IPsec for security (this concerns some people since RTP may 
be used in scenarios where the presence of IPsec cannot be guaranteed...)
- Partial conversion to SHOULD, MUST, MAY, etc

In addition it has been decided not to make a number of the changes which have 
previously been suggested: 

- Ignore group size dropping to zero with reverse reconsideration. 
- No scaling of the RTCP interval larger since this could cause time-outs
- No changes to the jitter algorithm for multi-packet video frames
- No additional SDES items were defined (these can be registered with the IANA 
separately)
- No change to the definition of the RTCP RR loss fraction
- Nothing was added about translators adding random timestamp offsets

The issue of conditional vs unconditional reconsideration was discussed and it was 
noted that there is little to choose between these two algorithms in practise, and that 
unconditional reconsideration is simplest to implement.  The next revision will therefore 
only include unconditional reconsideration.

A number of problems which have been discovered in the SSRC sampling algorithm 
were presented by Jonathan Rosenberg (Lucent). The problems with reconsideration 
and over-weighting of senders have been corrected in the current RTP draft. A problem 
remains when the group size decreases rapidly which results in members using SSRC 
sampling producing very inaccurate estimates of the group size. A solution using a 
"binning" algorithm is proposed in draft-ietf-avt-rtpsample-00.txt but this algorithm 
may be patented by Lucent (who are willing to license on "fair, reasonable and non-
discriminatory terms"). 

Much discussion occurred on this topic since it is considered undesirable to have 
patented technology as part of the main specification. The cleanest solution appears to 
be to move the SSRC sampling algorithms out of the main specification into a separate 
document. The main RTP specification would note that SSRC sampling may be 
desirable in certain cases and point to this new document for implementation advice 
and sample algorithms. This separation will occur with the next version of the RTP 
specification.

The changes to the Audio/Video profile (draft-ietf-avt-profile-new-03.txt) are less 
extensive: the PureVoice codec has been added and assigned payload type 12. As 
discussed at the previous meeting this is the last static assignment which will be made - 
this policy is now stated explicitly in the draft. The static assignment of payload type 
77 to redundant audio has been removed since all known implementations use a dynamic 
payload type. References to MPEG1 system streams and MPEG2 program streams 
(RFC2250) using dynamic payload types have been added and a number of other
RFC references have been updated.

The new policy regarding static payloads needs to be better described in the next 
revision of the draft. The SDP modifiers to explicitly denote the RTCP fraction (if the 
default of 5% is not being used) have yet to be written, but it is felt that these should 
not be added to the A/V profile (since that should not be tied to SDP) rather a new 
document should define them.

The MIME registration of payload formats has yet to be done. There are many open 
issues here regarding how this should occur, what information is bound to the names, 
who MIME is extended for types which cannot be represented in email, etc.

An update on the RTP payload for redundant audio was presented by Colin Perkins 
(UCL). This document (draft-ietf-avt-rtp-redundancy-revised-00.txt) updates RFC2198 
in the light of additional usage experience. The change is to specify that all packets in a 
redundant stream should be sent using the redundancy format, rather than sending the 
first packet(s) in a talkspurt using the payload format of the primary codec. This allows 
for explicit advertisement of the buffering requirements of a stream which simplifies 
implementations and removes the need for an out-of-band parameter to convey this 
information.

An update on the RTP payload for generic forward error correction (draft-ietf-avt-fec-
03.txt) was presented by Jonathan Rosenberg (Lucent).  Changes since the previous 
draft include the addition of code fragments illustrating the decoding stage, support for 
FEC using Reed-Solomon codes, extension of the timestamp recovery to 56 bits and 
removal of the reference to the expired draft by Budge. A number of issues with the 
current draft were discussed including the required mask size: 24 bits is believed 
sufficient so the optional extension to 56 bits will be removed from the draft. 

It was also decided that parity FEC and Reed-Solomon codes are sufficiently different 
that this draft should be split into two. The resulting parity FEC payload format 
document is expected to be ready for last call after one further revision; the Reed-
Solomon payload format document will need further work over the coming months.

The meeting concluded with a reminder that a revised working group charter has been 
posted.  Comments and discussion of the proposed new charter and milestones should 
be directed to the mailing list.