Minutes of the Audio/Video Transport Working Group

Reported by Colin Perkins.

The audio/video transport working group met twice at the 43rd IETF in
Orlando. The main agenda items were discussion of RTP multiplexing, and the
update of RTP and the audio/video profile for advancement to draft standard
status. In addition, a number of RTP payload formats and the RTP MIB were
discussed.

The meeting opened with a review of the working group status, and documents
completed since the last meeting. These include the payload formats for
H.263+ video (RFC2429), BT.656 video (RFC2431) and JPEG video (RFC2435),
together with AT&T's Error Resilient Video Transmission Technique (RFC2448).

The RTP MIB (draft-ietf-avt-rtp-mib-03.txt) was presented by Mark Baugher.
Changes since the previous draft have been motivated by comments and
review by Bert Wijnen and the ITU SG16. The main differences are as
follows: renamed MIB definitions to RTP-MIB and module to RTPMIB; changed
OID structure to current IETF conventions; several clarifications made to
DESCRIPTION clauses; use "noSuchInstance" rather than "noSuchObject" in
rtpRcvrRTT; changed media counters to be 64 bits; separated host and
monitor compliance sections; and the use InterfaceIndexOrZero instead of
redefining InterfaceIndex.

There are a number of open issues with the use of multiple MIB agents in
switches, the optional nature of the interface index and to permit easier
selection of session entries by session address which have to be resolved,
but it is expected that this will be complete within the next few weeks, at
which time a new draft will be issued ready for working group last call. A
reference implementation is available of the -02 draft, work is progressing
on the latest version. The RTP MIB will be referenced by H.341, the ITU's
H-series management specification.

The RTP payload format for DTMF and other telephone tones
(draft-ietf-avt-dtmf-01.txt, draft-ietf-avt-tones-00.txt) was presented by
Scott Petrack. The justification for using a new payload format is to
reproduce the tones better than low-rate codecs can do; to apply
redundancy differently for tones and speech; to separate detection of tones
from their interpretation; and to preserve the sounds associated with
particular signals. Two payload formats are defined: a named signal event
payload and a tone frequency format payload. 

The named signal event payload includes a named event (dialtone, busy,
call-waiting, off-hook, etc), together with the volume and duration of that
event. The tone frequency format sends a representation of the actual set
of tones to be played, rather than their purpose (name), enabling a wider
range of tones and the playout of foreign tones. 

Both formats may be conveyed in a single packet, using RFC2198 redundancy,
for robustness. Mark Handley expressed some concern with the means by which
this is done: in particular it is unclear how to playout these tones when a
fraction of the tone is lost. 

The two drafts are to be merged within the next couple of months, and it is
the desire of the authors to move the result to the standards track soon
after. Some concern was expressed that this draft is not receiving
sufficient exposure in bodies such as the ITU which may be able to provide
relevant feedback - it must be ensured that these bodies have a chance to
comment before this document can progress in the IETF.

The discussion of RTP multiplexing started with a presentation by Jonathan
Rosenberg on issues in RTP multiplexing (draft-ietf-avt-muxissues-00.txt).
As noted in this draft, there are two scenarios which are of interest:
end-to-end and mid-network multiplexing. The scenario chosen will affect
the choices made in designing the multiplexing scheme. These choices affect
the delineation, identification, synchronisation and dynamism of the
multiplexed data.

There are three means by which data in a multiplexed packet may be
delineated: by explicit length indicators (which have maximum flexibility
but the largest overhead); implicitly based on payload type (many codecs
are fixed length blocks); or implicitly by out of band signalling (which
has the least bandwidth overhead, but requires signalling and makes
changing encodings or packetisation difficult).   If all multiplexed
packets are of the same duration this problem becomes much simpler.

Multiplexed data may be identified by an explicit ID, where the number of
bits used depends on the number of simultaneous calls and the desired ID
reuse latency. This ID may be the RTP SSRC in some cases. Alternatively the
data may be channelised, where each user's data gets a slot in the packet.
This latter approach requires out-of-band signalling and, of silence
suppression is used, a bitmask to indicate which channels are active. The
overheads of the two schemes vary with the number of channels being
multiplexed, whether channels are active continually, and with the rate of
change of the set of multiplexed streams.

If each frame of multiplexed data within a packet has the same timestamp,
the individual timestamps may be elided, and replaced by the single
timestamp in the multiplexed packet's RTP header. If the timestamps are
close, offsets relative to the outer timestamp may be used rather than
complete timestamps (saving some number of bits). Alternatively if users
have uncorrelated frame start times it is necessary to send the complete
timestamp per user.

The dynamism of codecs also affects the multiplexing format chosen. If
codecs never change there is no need to include a payload type indication
within the multiplexed stream, and out of band signalling may be used.
Similarly, as codecs change rarely, out of band signalling may still be
appropriate, depending on how often codecs may change (synchronisation
between the signalling of a payload type change and the media stream may be
complex). If codecs change frequently, some form of in-band payload type
indication is most appropriate - this need not necessarily be a complete
RTP PT value if the set of allowable codecs is small a mapping table may be
used instead.

The marker bit may need to be transmitted depending on the use of the
multiplexed stream.

It was noted that this is a very large space, and a number of solutions to
the multiplexing problem are possible. This group has a number of solutions
presented to it, yet the precise problem definition for each of these has
not been enunciated. It may help to focus the discussion if the question
"why are you multiplexing?" is clearly answered, and if we derive a number
of scenarios which require common solutions.

Essentially, we need to focus on requirements. Trying to do generic
optimisation using a multiplexer is futile.

The first multiplexing proposal (an update to draft-ietf-avt-mux-rtp-00.txt) 
was presented by Barani Subbiah. The stated goals of this proposal are to
achieve the best possible fit with cellular/PSTN applications and to derive
a payload format suitable for use in a switched IP telephone network. They
are aiming for a simple format with a fixed header suitable for hardware
implementation, providing a compromise between bandwidth saving (in
addition to the outer RTP header, this proposal averages two bytes overhead
per multiplexed stream) and complexity.

The sequence number (2 bits) is new since the last draft, this means a
reduction in the length field size to 5 bits. Concern was expressed that 
a 5 bit length field is insufficient for some audio codecs which may be
desirable. The use of the 2 bit sequence number was questioned, since 4
packet losses are possible -- a longer field should probably be used.

A transition bit is included to signal a change in the end-to-end flow
parameters, allowing one state change per RTT. It was noted that if all
packets with an RTT are lost, the state change will not be noted at the
receiver. It was also noted that payloads such as DTMF interspersed with
voice can cause change in payload type more often than once per RTT.

Tohru Hoshi presented draft-tanigawa-rtp-multiplex-01.txt. This is a simple
proposal where multiple RTP packets are concatenated for transmission in a
single UDP packet. The default packetisation interval specified for a codec
in the audio/video profile is used such that no length indication is
necessary (or can be signalled out-of-band, if a non-default interval is
desired). The draft now includes a section describing the efficiency gains
for using this proposal according to various metrics. Call setup signalling
is also defined.

The next multiplexing proposal discussed was GeRM (draft-ietf-avt-germ-00.txt),  
presented by Mark Handley. The goal of this proposal is to transparently
multiplex a number of RTP streams. It operates using difference coding
between the headers of packets to be multiplexed together. Clearly this
will work better if the packet headers are similar (this can be achieved
between cooperating gateways, although the traffic pattern will affect
performance) but it will still work if the end points do not cooperate,
and will perform no worse than simply concatenating packets.

The GeRM protocol is well suited for scenarios where a mix of RTP packets
are to be multiplexed, such as may be encountered in the transport of
MPEG-4 streams, or for use between a pair of cooperating gateways
multiplexing a large number of similar streams. It achieves considerable
flexibility, at the expense of complex parsing and greater bandwidth
overhead than other, less general, protocols. 

The final multiplexing presentation was produced by Dean Willis during the
meeting, so no draft exists.  The assumptions here are that large numbers
of streams are being carried between end-point pairs, fast interfaces with
minimal serialisation delay are used, and mixed codecs with silence
suppression exist. The goal is to increase overall "network efficiency" by
re-packing packets to increase the total MTU and reduce the number of
packets sent. Two alternatives were considered:
- RTP level: complex, no benefit to non-RTP traffic, issues with RTCP
- UDP level: brute force level simplicity, aggregate UDP flows between
  end-points, allows IPsec at multiplexing level, transparent to the
  application, allows mid-network multiplexing, less efficient
The UDP level approach is favoured.

Following the presentations a considerable amount of discussion ensued.
Concern was expressed that multiplexing is being used as an "RTP switching"
solution, with application level routing: it was noted that IP has a number
of perfectly reasonable routing algorithms already, and it is unnecessary
to re-invent these within RTP. Many people expressed concern that the
problem to be solved by multiplexing has not been clearly stated: is it to
reduce the number of packets sent? the number of bytes sent? to perform
application level RTP routing? etc. It is unclear that RTP multiplexing is
the correct solution here: a generic UDP multiplexing protocol (as in the
final proposal) may be more suitable in some cases.  Carsten Bormann
succinctly stated that if we are to define an RTP multiplexing scheme,
it should be an absolute requirement to preserve the integrity of the
RTP information.  If it does not, then it is not RTP multiplexing, it
is a new protocol.

Concern was expressed that multiplexing streams with different transport
level addressing into one is not clearly handled by these proposals. In
some cases, the SSRC is assumed to provide a unique stream ID, which is 
not necessarily the case across multiple streams. The handling of RTCP data
by a number of these proposals is also unclear.  The proposals need to
be extended to address these issues.

Some of the proposals specify a particular form of signalling in addition
to the payload format. These payload formats should be independent of the
signalling to be used. The proposals may want to express signalling
requirements, but should not tie the payload format to a specific scheme.

Since it is unclear whether a single protocol can satisfy multiple aims,
and which of the five proposals currently submitted to the group will
go forward, the authors and other working group participants are
requested to submit application scenarios in which multiplexing is to
be applied.  Within those scenarios, assumptions about traffic can be
made explicit.  We'll choose three or four scenarios and ask the
authors to simulation or analysis to quantify the performance of their
proposals under those scenarios, in order to facilitate a fair comparison.

The next subject was the transport of MPEG-4 streams within RTP. A number
of AVT members participated in the MPEG meeting in Atlantic City in
October, leading to the formation of an ad-hoc group within MPEG to discuss
MPEG transport using IP. The work conducted in that group to date was
presented by Reha Civanlar. 

A number of alternatives for the transport of MPEG-4 streams in an IP
network were considered: 
        - directly on UDP
        - RTP followed by a full MPEG-4 SyncLayer packet
        - MPEG-4 SyncLayer packets mapped onto RTP packets
        - MPEG-4 elementary streams over RTP with natural payload formats
The preferred approach would be to use the latter approach, but since the
ES interface is not a normative part of MPEG-4 this may not be feasible.

The approach chosen is, therefore, to map the MPEG-4 SyncLayer packets onto
RTP packets, such that the common pieces of the header reside in the RTP
header, with a small payload header providing the MPEG-4 specific features.
A single payload format is used for MPEG-4 streams transported within RTP,
and the MPEG-4 model is maintained (although not the precise packet
format). In this approach, an RTP multiplexing scheme is needed to fulfill
the role of FlexMux in MPEG-4. The GeRM proposal seems to be a good fit for
this.

An internet draft detailing this work is in preparation. Those who wish to 
participate in this work are encouraged to join the ad-hoc group's mailing
list: send email to 4onIP-sys-request@fzi.de in order to subscribe.

Christine Guillemot presented an RTP generic payload with scalable and
flexible error recovery (draft-guillemot-genrtp-00.txt). This draft takes
a somewhat different view of the problem of transporting MPEG-4 content and
is based on carrying elementary streams in a generic manner. The motivation
for this is to transport many types of stream whilst avoiding having to
define a payload format for each and allowing finer control of error
correction with a set of different FEC mechanisms and the possibility of
grouping AUs in a single packet.

One of the aims of this work are to factorize the common features instead
of developing specific formats for each codec/type of elementary stream and
to be able to identify repeated data so that the network adaptive layer can
identify and remove this if desired. The adaptation layer can add FEC to
entire packets or to portions of a stream within packets (adding redundancy
in a similar manner to RFC2198).

Concern was expressed that adding FEC to portions of a packet adds a lot of
extra complexity, and unless this FEC is much smaller than that which would
otherwise be present this complexity may not be justified.

Some concern was expressed that this document includes a number of payload
formats (redundancy, FEC, fragmentation and grouping) which may be better
separated. This clearly depends on the details of the stream which is being
packetised.

It is unclear that this format is suitable as a generic RTP payload
independent of MPEG-4, however it may work well as a general purpose
transport for MPEG-4 elementary streams.

Steve Casner described the changes made to the main RTP specification since
the last meeting. This is now stable, and unless major problems are found
is believed ready for last-call for draft standard. The changes since the
last meeting are described in draft-ietf-avt-rtp-new-02.txt and include:
        - SSRC sampling moved to separate draft (ietf-avt-rtpsample-01.txt)
        - Keep only unconditional reconsideration 
        - Add IANA considerations section added; no longer suggest experimental
          registration of values
        - Y2036 (in)consequences explained
        - convert to MUST, SHOULD, MAY
A plea was made for help checking:
        - Section 0: resolution of open issues
        - Section 6.2: RTCP transmission interval
        - Section 6.3: RTCP send and receive rules
        - Appendix A: does the code work?
        - Appendix B: changes from rfc1889
It was noted that the group must document "at least two independent and
inter-operable implementations from different code bases" of "all of the
options and features of the specification" in order to advance to draft
standard status (RFC2026). Colin Perkins volunteered to produce a draft
detailing those options and features as a checklist for vendors to check
compliance, and Jonathan Rosenberg volunteered to produce a draft detailing
tests for the timer reconsideration algorithms.  Since the meeting,
Jonathan has done a careful check of the code in Appendix A and found
several problems to be fixed.

The changes to the RTP profile (draft-ietf-avt-profile-new-04.txt) since
the last meeting are less advanced: a clearer statement of the new policy
of no more static assignments, and the addition of change bars. It is
still necessary to complete the update with MUST, SHOULD, MAY, etc and to
add text to allow default of 5% RTP bandwidth to be overridden. 

The registration of RTP payload format names as MIME types is still not
complete: Philipp Hoschka volunteered to work on this, and to work out the
details of the process. It is hoped that this may just be a statement we
can put in the profile to specify how the registration is done, without
changing the MIME registration process, but this is not yet clear.  

The working group has agreed in previous IETF meetings that any
additional RTCP SDES items should be defined in separate RFCs rather than
adding them to the base RTP spec.  This is in part to minimize changes
at the transition from Proposed to Draft Standard but also because we
did not want implementors to infer that all applications should
include all the SDES items.

Accordingly, Peter Parnes has written draft-parnes-rtp-sdes-00.txt to
propose the addition of new SDES items Nickname, Homepage,
Personal_image and Active_media.  Steve Casner presented this proposal
since the author could not attend.  This proposal is similar to the
set of potential new SDES items discussed at the 41st IETF in Los
Angeles, though Organisation was included previously and Active_media
is new here.  Comments are requested from the group as to whether this
is the right set of new SDES items to define at this point, or whether
some others should be added or some of these deleted.  The status of
this RFC would be Experimental rather than Proposed Standard.

Steve Casner repeated a concern expressed previously that about the
inclusion of URLs in an SDES packet which may be sent to a large
multicast group, since simultaneous retrieval of these by many
receivers can cause implosion problems. This draft specifies that
retrieval should either be done only in response to direct user action
or if automated should be delayed by a random interval (after receipt
of the RTCP packet).  Is this specification sufficient?

Mark Handley asserted that we shouldn't try to turn RTCP into a
general data transfer mechanism, but did favor adding Organization
since current practice is to include that information in the Name
item.  The consensus of the group was that it was reasonable to make
extensions to RTCP, but the additional information must be optional,
and should not be required for operation of the application.
Extension RFCs should include an applicability statement for each
item.  Further comments on this draft will be requested on the mailing
list to establish consensus for proceeding.

The SSRC sampling algorithms (draft-ietf-avt-rtpsample-01.txt) were
presented by Jonathan Rosenberg. These have been moved out of the main 
RTP specification because of the IPR issues (Lucent patent on the binning
algorithm). The changes to this document are:
        - Uniformity of SSRC values usage: recommend hashing SSRC value,
          because some broken implementations doesn't choose uniformly
          distributed SSRC values
        - New section on performance of sampling in terms of coefficient
          variation added
        - An explicit statement of the IPR issues and licensing terms needs
          to be added
Comments are requested. 

The document giving guidelines for writers of RTP payload formats
(draft-ietf-avt-rtp-format-guidelines-01.txt) was noted as being
essentially complete. One more revision will be produced to include
some comments recently received, so others wishing to make comments
should do so as soon as possible.  That revision will be last called
for BCP status.

The payload format for PureVoice(TM) audio (draft-mckay-qcelp-01.txt) was
noted as being in working group last call still, pending resolution of a
question regarding the proposed scheme for encryption of the payload
data in a non-standard manner.  A compromise solution has been worked
out in offline discussions with the authors, so a revision of the
draft is expected soon so last call can be completed if there are no
further objections.

The Generic FEC payload (draft-ietf-avt-fec-04.txt) was presented by
Jonathan Rosenberg. Changes include the removal of Reed-Solomon coding 
(to a separate draft) and mask extension. The examples and code have
been tested and bug-fixed, and the issues with encryption during key
changes have been resolved. This document is essentially ready for last
call as proposed standard - will do one more revision to get the MUST,
SHOULDs, etc sorted. Comments are sought.

The Reed-Solomon draft (draft-ietf-avt-reedsolomon-00.txt) was also
presented by Jonathan Rosenberg. Help is required with this: if anyone
having expertise on the Reed-Solomon algorithm is interested in seeing this
work progress they should contact Jonathan, else the draft will not be
updated.

A proposal to use the RFC2198 redundancy format as a transport for
interleaved audio (draft-ietf-avt-interleaving-00.txt) was presented by
Colin Perkins. This may eventually be merged into that document, although
this is at present undecided. Comments are sought.

The meeting concluded with a brief presentation on the proposed charter
revision. It was noted that this needs clarification regarding MPEG-4
payload formats, but is otherwise satisfactory. The revised charter
will be sent for IESG approval in the near future.