Editor's note:  These minutes have not been edited.

Minutes of the Audio/Video Transport Working Group 

Reported by Steve Casner

1. Working group status

The primary output of the AVT working group is the Real-time 
Transport Protocol, which was published in January 1996 as a Proposed 
Standard RFC1889 along with the companion RTP profile for 
audio/video conferencing RFC1890. Progression to Draft Standard is 
discussed below. In addition, there are four Internet-Drafts awaiting 
publication which define the RTP payload formats for H.261, JPEG, 
MPEG and CellB video encodings. These drafts have just been passed by 
the IESG and will be sent to the RFC Editor for publication right after 
the IETF meeting. Work continues on the definition of additional 
proposed payload formats, one of which was presented at this meeting.

AVT met for two sessions at this IETF. The first session was dedicated 
to the major topic, header compression for RTP applications running 
over low-speed lines. A portion of the second session was given over to 
presentation of a signaling protocol that is more relevant to the 
MMUSIC area but did not fit in that group's schedule. The 
miscellaneous RTP topics comprising the remainder of the session are 
detailed in later sections of this report. 

2. Compression of RTP headers

In a presentation to the AVT working group at the March 1996 IETF 
meeting, Scott Petrack explained the need for compression of RTP 
headers in order to allow low data rate applications such as Internet 
telephony over 28.8 kb/s modems to use RTP. He outlined some 
techniques that could be used between cooperating endpoints to reduce 
the size of the RTP header. However, at that meeting and in subsequent 
discussions, some have argued that compression should instead be 
applied at the endpoints of slow links so that the IP and UDP headers 
may also be compressed.

2.1. Hop-by-hop compression

At this meeting, Steve Casner presented a proposal for hop-by-hop 
compression of IP/UDP/RTP headers developed with Van Jacobson and 
derived from RFC1144 TCP/IP compression. The basic idea comes from 
the observation that although there are several fields that vary from 
packet to packet in RTP, the differences are often constant from one 
packet to the next. For example, audio packets are often of constant 
duration, so the timestamp changes by a constant amount. For this case, 
all that must be communicated is an indication that the second-order 
difference is zero along with a small sequence number to detect packet 
loss between the compressor and decompressor. Additional bits are used 
to allow indication that individual fields have changed by an 
unexpected amount, in which case only the differences for those fields 
are appended in a compact encoding, rather than requiring the full 
uncompressed header be transmitted. This scheme compresses the 40 
bytes of IP, UDP and RTP headers down to 2-4 bytes for most packets.

The proposal was well accepted by the group. One question was 
whether this scheme is too dependent upon characteristics of audio and 
video, but the delta coding of sequence and timestamp fields seems 
generally applicable. Timestamps such as those in MPEG which are not 
monotonic can be handled because the delta is signed. 

Further details were given in a draft-casner-jacobson-crtp-00.txt, 
which was sent to the working group mailing list (rem-conf@es.net) but 
was somehow lost and failed to be officially posted. This draft is to be 
updated to include changes decided since it was sent to the list as well 
as completion of the protocol details left for finalization during initial 
implementation. The group agreed that this proposal should be taken 
as an AVT work item, so the updated draft will be titled draft-ietf-
avt-crtp-00.txt. 

2.2. Need for an interim solution

The working group agreed that the hop-by-hop compression scheme 
should be completed and implemented as soon as possible. However, 
since publication, implementation and deployment of this scheme into 
the Internet infrastructure will take 12-18 months, vendors of Internet 
telephones and other applications had asked that AVT define an 
interim compression scheme that could be implemented right away in 
the endpoint applications alone. The tradeoff is that the compression 
gain is marginal (12 byte compress to 2 or 3) compared to compressing IP 
and UDP headers as well. Furthermore, the ability to measure packet 
loss and accurately reconstruct media timing would be reduced compared 
to the full RTP.

As a strawman idea, Steve Casner presented a straightforward 
modification of the hop-by-hop scheme for use in compressing RTP 
alone end-to-end, but pointed out that the performance was likely to be 
unacceptable due to the higher loss rate and longer round-trip delay. 
Instead, Van Jacobson proposes that RTP be sent over TCP to take 
advantage of the installed base of RFC1144 TCP/IP compression. The 
RTP header could be compressed to an average of 1 byte if carried over 
TCP. The problem is that until congestion control algorithms such as 
Random Early Drop (RED) are deployed in routers, UDP traffic will 
displace TCP traffic, so vendors may be reluctant to use this TCP 
solution. Deployment of RED is expected within a few months. 

Scott Petrack presented some issues in defining an end-to-end protocol 
and making the transition from that interim solution to the complete 
solution. Since the end-to-end delay and loss rate are much higher than 
on a single link, the 4-bit sequence number of the hop-by-hop scheme 
would not be sufficient, but adding 8 more might be, assuming that the 
application is willing to proceed even when some packets are lost. An 
alternative would be to send RTP directly over IP to save the 8 bytes of 
UDP. However, this does not provide any means for multiplexing RTP 
and RTCP unless two IP protocol types were allocated (none have been).

Scott noted that implementation of the IP/UDP/RTP compression 
scheme is elective for each applicable link and argued that 
applications would not be willing to transmit uncompressed RTP 
packets unless they could get a guarantee that compression was 
available on all slow links along the path.

Carsten Bormann noted that an RSVP bandwidth guarantee provides 
sufficient information given traffic control that considers header 
compression in determining the available bandwidth. This is part of 
his ISSLOW proposal in the ISSLL working group. If a more relaxed 
guarantee of compression availability separate from bandwidth 
availability is required, that should be defined as an additional type 
of service to be provided via RSVP rather than having AVT define a 
new mechanism specific to this problem. 

Greg Minshall suggested that applications could measure the 
performance of the network to decide if sufficient bandwidth was 
available. Applications might start off using a lower-bandwidth 
encoding with full RTP for interoperability, but switch to a higher 
quality encoding if hop-by-hop compression were available or when 
communicating with another copy of the same program such that a 
proprietary protocol could be used in place of RTP. 

There was substantial discussion centering around practicality and 
timing of an interim solution. Carsten Bormann claimed that Internet 
Telephony would not really be effective without the latency reduction 
mechanisms underway in ISSLOW and V.80 modems expected in 1997. 
Francois Menard noted that even with agreement on the use of RTP 
(with or without compression), interoperation would not be possible 
without agreement on voice coding and call control protocols as well, 
which will take time.

If the purpose of the interim solution is to get vendors to switch from 
proprietary protocols to RTP, then that goal will not have been 
achieved by defining a new, reduced version of RTP. Bob Webber felt 
that this would tend to cause confusion by presenting multiple solutions 
to vendors. Scott Petrack pointed out that suggesting a compressed form 
of RTP over compressed TCP could cause the same confusion, and that 
carrying full RTP on compressed TCP might therefore be preferable as 
an interim solution. 

Considering that the gain of compressing RTP alone would be relatively 
small and that it could not be standardized in the necessary timeframe, 
the prevalent position was that AVT should not define an interim 
solution. The consensus, supported by a straw poll of the meeting 
participants, was to move as quickly as possible with the complete 
solution of IP/UDP/RTP compression and to try to give the industry 
confidence that this solution will be put in place and will solve the 
problem.

3. Proposed new RTP payload formats

In the second session, Walid Dabbous and Mark Handley presented the 
redundant audio encoding technique and payload format developed by 
UCL and INRIA. Walid graphed the results of packet loss studies 
showing that most packet losses are less than three packets in length, 
with single-packet losses predominating. Therefore, forward error 
correction via redundant audio appended to later packets can be 
effective, as demonstrated by intelligibility tests. The penalty is 
increased end-to-end delay since the receiver must allow time for the 
later packets carrying the redundant audio to arrive. Van Jacobson 
observed that the results of this study might be biased by the location 
of the test sites. Many paths on the MBone have shown a predominant 
loss pattern of 500 ms outages occurring at 30, 60 or 90 second intervals 
coincident with the routing updates in some routers. This would require 
the spacing between the original and redundant audio data to be 
increased beyond 3 or 4 packets. 

Mark Handley described the payload format used for redundant audio 
as defined in draft-perkins-rtp-redundancy-00.txt. This payload 
format is to be indicated by a single payload type of its own in the RTP 
header. Then, in the payload section of the packet, a separate block 
header is included for each encoding (original data and redundant 
encodings of earlier data). The block header includes the payload type 
of the individual block, the length of the block, and the offset of the 
timestamp for that block relative to the timestamp in the main RTP 
header. The original data occurs last and its block header includes only 
the payload type. The length is implied and may be greater than 
would fit in the 8-bit length field of the block header. No timestamp 
offset is needed since the RTP timestamp is used directly.

Per the draft, the data for each redundant encoding follows 
immediately after its block header. Van Jacobson suggested appending 
the redundant encodings after the original data so that the first part of 
the packet would be in the same form as a packet without redundant 
encodings. However, that would still require parsing from the end to 
determine the length of the original data unless the redundant 
information was all included within a padding field as suggested by 
Henning Schulzrinne on the mailing list sometime earlier. Philip Lantz 
suggested that all the block headers be collected at the beginning of the 
payload section to simplify parsing; Mark Handley plans to make this 
modification. 

It was also suggested that the payload format could be generalized to 
allow multiple data types (such as audio and video) in a single packet, 
but there are two problems with that suggestion: the small length field 
in the block header depends upon the fact that compact encodings are 
used for redundant audio, and using a timestamp offset would not work 
for timestamps that are unrelated (as is the case for most RTP audio 
and video encodings).

There was no presentation on the draft H.263 video payload format 
since there have been no significant changes. Presentation of a payload 
format for G.723 audio was anticipated, but was not ready yet.

Another new submission is a proposal by Neil Harris to develop a 
profile for using RTP in professional audio and video production. Steve 
Casner put up an overview slide, but there was no presentation at this 
meeting since the author could not attend. However, working group 
participants are encouraged to read the draft which is available as 
draft-harris-rtp-pro-av-00.txt. 

4. Progressing RTP to Draft Standard

Sufficient time has elapsed so that the RTP spec may now be submitted 
for elevation from Proposed to Draft Standard status. Steve Casner 
brought up a few outstanding minor issues that must be addressed as 
part of this process. A wording change will be made to allow separate 
destination port numbers to be used for unicast RTP sessions, along with 
additional "rule changes" proposed by Michael Speer and Steve 
McCanne to support layered encoding schemes on multiple parallel RTP 
sessions. An update to the description of the SSRC loop/collision 
detection algorithm is needed to remove the restriction that the same 
source port be shared between the RTP and RTCP packets in a session. 
This is a simple change. However, the algorithm has not had sufficient 
operational experience in either its current form or with the proposed 
change. Assistance from implementers is solicited in testing the 
loss/collision detection algorithm in particular, but also in documenting 
overall interoperation of multiple independent RTP implementations 
as is required for progression to the Draft Standard stage. 

One collision detection issue posed by Karl Auerbach is the "hidden 
terminal" problem: two colliding sources A and B may not be able to 
hear each other due to multicast scope limits, but a third host C in 
between might be able to hear both. The use of network source addresses 
in the algorithm should allow C to distinguish the two sources and 
listen to only one. C could also intentionally cause a collision in both 
directions to induce A and B to change SSRCs. 

Since there will be edits to the RTP spec in progressing to Draft 
Standard, it will be necessary to issue the spec again as an updated 
Internet-Draft to allow comment on the changes. That draft will then 
be submitted for elevation.

5. Additions to RTCP

A portion of the session was given over to an MMUSIC topic that did 
not fit into that group's schedule. Scott Petrack presented his proposal 
for a new Simple Internet Signaling Protocol to set up and control RTP 
sessions. SISP is based on extensions to RTCP to take advantage of the 
communication path that RTCP provides and save bandwidth by 
utilizing the source description (SDES) information already 
transmitted rather than repeating it via another channel. The 
extensions to RTCP include a new RDES (receiver description) packet 
type to identify the intended callee, a new RCAP packet type to 
negotiate capabilities, and a new CP (call progress) item in the SDES 
packet to indicate "Ringing", "Busy", etc. The RDES packet would be 
sent to a well-known port, separate from the RTP streams, to initiate 
the setup.

There was substantial discussion of the overlap of this proposal with 
other protocols under development in MMUSIC (SIP, SCIP and SCCP). 
Another suggestion was that the subset of Q.931 used in H.323 
conferencing would serve the same purpose. Others expressed concern 
that only providing separate control for each medium misses a user 
requirement to be able to control some aspects of the multimedia session 
all at once. Greg Minshall expressed concern that the RCAP function 
would not be adaptable to new signaling needs that were likely to 
arise.

In short, there was not much support for SISP in its current form. 
However, the SISP proposal does point out the need for control functions 
during the course of a session which SIP, for example, does not address. 
A similar need arises for VCR-like controls when using RTP for video-
on-demand. Steve Casner presented a slide on the use of RTCP for VCR 
controls based on a suggestion from Larry Rowe. In a later MMUSIC 
session and an after-hours BOF led by Jeff Smith, a new line of 
discussion was started to consider control functions during a session for 
purposes such as call control as contained in SISP as well as recording, 
playback, and other functions. This discussion will continue on the 
MMUSIC mailing list (confctrl@isi.edu). 

6. Miscellaneous issues / logistics

The AVT meeting ended with only a couple of minutes available to 
introduce a few miscellaneous issues but not discuss them: should the 
working group define an API and a MIB for RTP? The MIB may be a 
requirement for RTP as a standards-track protocol, but there has not 
been a strong need for it because RTP monitoring is provided via RTCP.

Since there is continuing work to be done, and because the working group 
has reached the end of the existing charter, the charter must be 
revised. The chairman takes this task. New work includes: 

- Finishing work on new payload formats
- Possibly adding variable reliability to RTP - Managing the standards 
track transitions 

The group agreed to address these issues on the mailing list and to meet 
again at the December IETF in San Jose.