Reported by Steve Casner/USC-ISI

Minutes of the Audio/Video Transport Working Group (AVT)


At the previous AVT meeting in Toronto, version 2 of the Real-time
Transport Protocol (RTP) was presented as documented in the July version
(-05) of the specification.  It was agreed to proceed with this version
of RTP as soon as the missing example algorithms and the ``to be
determined'' points in the specification were completed.  In the
interim, the authors have prepared a new draft (-06) filling in these
items and introducing a few small changes to address items missed
before.  At this meeting, Steve Casner presented a report on the new
draft and there were no objections to these changes.  However, there was
a surprising amount of debate about the jitter parameter in the
Reception Report which had not changed except that the algorithm had
been defined.  As a result, a second session was scheduled to allow
completing the planned presentations.  During the second session, a
compromise solution was devised:  the jitter field remained unchanged,
but packet loss will be reported as both cumulative and short term, as
described below.  With this issue settled, the draft editing will be
completed and the draft will be submitted for Area Directorate review
and IESG Last Call as a Proposed Standard.

An interesting aspect of this meeting was that for the first time we
heard reports on implementations of RTP version 2, as outlined below.
There was also a presentation on the new draft Packet Format for
Encapsulation of MPEG in RTP. The slides for the presentations are
available from, as is the
file transcript.94dec, a more complete report on the meeting including a
rough transcript.

Changes in RTP Since July Draft

The two primary items left ``to be determined'' in the July RTP draft
(-05) were the algorithm for calculating the RTCP reception report
interval based on the observed session size and the algorithm for the
``jitter'' measure in the reception report.  Algorithms for both of
these have been included in the -06 draft.  However, in the rush to meet
the cutoff for Internet-Draft submissions before IETF, a couple of
details were left uncorrected in the report interval calculation
algorithm.  Also, two jitter algorithms were under discussion; what is
in -06 is the algorithm that Henning Schulzrinne has been using in
Nevot.  Instead, we have decided to use the algorithm that Steve McCanne
and Van Jacobson have implemented in vic because it is simpler and a
more straightforward first-order estimator.  See the discussion in the
next section.

There were several additional small changes, either from agreements at
the Toronto meeting, or to fix problems discovered in the interim:

   o As agreed in Toronto, the draft now specifies that the data and
     control ports are to be an even/odd pair.

   o The group decided not to partition the RTCP packet type space for
     profile- and payload-specific definitions because there is a
     problem with synchronization between the control and data streams.
     Instead, experiments with new packet types can use the APP type,
     and registration of successful types with IANA is encouraged.

   o It was unclear in -05 whether the numeric type values assigned to
     RTCP packets types began with 0 or 1.  In a note preceding this
     meeting, it was proposed that the RTCP packets be assigned type
     values 201-205 to enhance the probability of successful header
     validation or invalidation when comparing RTP versus RTCP which
     might be sent on the wrong port, or for comparison against some
     random or incorrectly decrypted packet.  No strong objections to
     renumbering were given.  There was some discussion of what values
     should be chosen, with 224-228 being another suggestion, but that
     is closer to all-1s which should be avoided the same as 0.  After
     more thought subsequent to the meeting, Casner suggests 200-204
     rather than 201-205 so that the SR/RR pair differ in only one bit
     to make the check simpler.  Also, following a suggestion from
     Wieland Holfelder, we will reserve the two payload type values in
     the RTP data packet header that would correspond to the low 7 bits
     of the RTCP packet types SR and RR. Since any stack of RTCP packets
     is to begin with SR or RR, only these need to be reserved.

   o In the -05 draft, most length fields had been changed to be
     zero-based, but the SDES item length was missed and is now changed.

   o Three new SDES items were added:  PHONE, TOOL, PRIV.

   o A one-octet length field was prefixed to optional BYE reason string
     since its length was not defined before.

Note that the changes in RTCP type values, SDES item length, and BYE
reason length introduce incompatibilities with previous drafts.

The group also agreed on three additional points where the RTP
specification will be made more specific:

   o It will be specified that SR rather than RR will be sent if data
     was transmitted during the last interval or the previous one.  This
     provides some redundancy for loss of the last SR.

   o When no data has been heard from any source during a reporting
     interval, the receiver should still send an RR packet containing
     zero reception reports rather than omit the RR. This is so that the
     RTCP packet stack always begins with SR or RR.

   o In order to calculate the RTP timestamp to go in the SR packet, and
     in order to calculate jitter, it is necessary that the clock from
     which RTP timestamps are derived be monotonic and linear in time.
     Note that this refers to the clock, not the sequence of timestamps
     generated.  In particular, it does not preclude the sending of
     timestamps out of order in the MPEG encoding.

Discussion of the Jitter Algorithm

The reception report in the RTCP SR/RR packet reports packet loss and
jitter.  The algorithm that will be specified in the RTP draft for
calculating the jitter parameter is based on the difference in packet
spacing at the receiver compared to the sender, or equivalently, the
difference in relative transit times, for a pair of packets:

    D(i,j) = (Rj - Ri) - (Sj - Si) = (Rj - Sj) - (Ri - Si)

Here S is the RTP timestamp from the packet, and R is the time of
arrival in RTP timestamp units.  Jitter is defined to be the mean
deviation (smoothed absolute value) of this difference:

    J = J + 1/16 (|D(i-1,i)| - J)

Two issues regarding the jitter parameter were debated in this meeting:

  1. Whether this algorithm, and in particular the gain parameter of
     1/16, was the correct choice, or whether the algorithm should be
     left to be profile specific.  Some went even further to suggest
     that the jitter parameter should be relegated to a profile-
     specific section of the reception report or left out entirely since
     its usefulness has not been demonstrated yet.

  2. Whether a short-term packet loss measure, useful for feedback
     control, should be reported instead of or in addition to the jitter
     measure.  In large sessions, the requirement to keep state on all
     the receivers to take differences between reports could become a
     problem.  Furthermore, the long interval between reports could mean
     that only one report is received from some receivers.

On the first issue, Van Jacobson emphasized that the jitter measure is
for network diagnostic purposes as well as for algorithms that adapt to
the behavior of the network.  Since this requirement is common across
all applications, we want to allow profile-independent monitors to be
able to interpret the jitter numbers, and therefore the jitter parameter
cannot be in a profile-dependent section of the report.  The usefulness
of the jitter parameter has not been proven, but the same is true for
all the other parameters in the reception report.  On the other hand,
experience with the MBone has shown a pressing need for mechanisms to
monitor distribution, and getting reports from the participants seems
like the only practical means.  Observations of the local statistics in
the vat program for packet loss and playout time variation, which is
derived from the jitter calculation, have shown a strong correlation
with the signal quality and establish a reasonable basis for inclusion
of these statistics in the reception report.  Packet loss tracks
persistent congestion while jitter tracks transient congestion.  If our
best guess turns out to be wrong with more experience, we can use the
report extension mechanism to test additional information, and once we
have got a much better guess then we can field RTP version 3 with a
revised report format.

Furthermore, to allow profile-independent monitors to make valid
interpretations of reports coming from different implementations, we
must also specify the algorithm and its parameters as part of the main
RTP protocol.  This algorithm is the optimal first-order estimator and
the parameter 1/16 is the optimal noise power reduction ratio for
situations where there is no model of the system.

To address the second issue, Ron Frederick suggested a compromise that
was accepted by the group as a whole.  The cumulative number of packets
received will be replaced by the cumulative number of packets lost
(calculated by the receiver as ``packets expected'' minus ``packets
received'').  Since this number is typically around two orders of
magnitude less than the number of packets expected, a comparable range
will be maintained if the packets lost field is reduced from 32 to 24
bits.  The top 8 bits will then be used to carry a relative measure of
packet loss that provides short-term information from a single report
packet.  This will be expressed as an 8-bit fraction of packets lost
during the last reporting interval.

A companion change was made to allow correlation between the single
reception reports from multiple recipients:  the ``cumulative number of
packets expected'' is replaced by ``extended last sequence number
received.''  The difference between these two values is only that the
initial sequence number received is subtracted from the latter to
calculate the former.  Not subtracting the initial sequence number means
that the ratio of the two words above will no longer produce an accurate
overall loss rate.  However, an accurate calculation of the loss rate
for nearly the full session is possible by taking the difference in
these fields between the first and last reception reports from a
particular receiver, and then calculating the ratio.

Reports on RTP Version 2 Implementations

Two presentations were given on implementations of RTP version 2 in
video tools.  Steve McCanne from LBL reported that the implementation of
RTP in vic was mostly straightforward; vic is producing reception
reports per the -06 specification, but not yet analyzing them.  Sources
for vic were released before the IETF meeting.  Both nv and ``Robust
H.261'' encodings are implemented, but Steve identified some problems
with the current specification for H.261 fragmentation.  He proposed to
make macroblocks the unit of processing by putting enough state
information into the header so each packet can be processed

Frank Kastenholz from FTP Software presented Loki, a new payload format
for RTP to carry the video formats of ``Video for Windows'' targeted at
the PC/Windows environment.  Processing load is shifted to the
transmitter where possible because there are fewer and they can run on
more powerful machines whereas receivers may be slow machines like 286s.
The protocol includes some additional application-specific RTCP control
packets, including some that are sent via unicast to a source.  This
will not work through RTP translators since RTPv2 no longer has the
``reverse control'' mechanism, so this is an issue to study.  The Loki
specification will be available as an Internet-Draft, and the
implementation will be available for anonymous FTP.

New Internet-Drafts on Video Payload Formats

Three new Internet-Drafts specifying how to encapsulate Cell-B, JPEG and
MPEG video in RTP were posted before the IETF meeting.  They are


Gerard Fernando from Sun gave a presentation on the MPEG draft.  Since
Don Hoffman has made presentations on MPEG at previous AVT meetings,
Gerard concentrated on the RTP aspects.  Sun has implemented the MPEG
Elementary Streams encapsulation, which is the second of two defined in
the specification and the one targeted for use over the Internet.  For
this encapsulation, the header is now always 32 bits rather than a
minimum 16 with and optional second 16, following the recommendation
made at the July AVT meeting in Toronto.

Gerard brought up one scenario of concern:  there will be cases where
MPEG signals are received from satellite transmissions in MPEG2
Transport Systems (MTS) format, which does not provide slice/macroblock
fragmentation information, and then retransmitted over the Internet.
How can this be made more robust to packet loss?  Van Jacobson suggested
that it is not very expensive to parse the stream to find the macroblock
boundaries.  This would allow translation to the MPEG ES encapsulation

Future Activities

The RTP specification will be edited to produce a -07 draft
incorporating the changes outlined above along with additional
explanatory text for sections that readers have found unclear, for
example, on how to use extension mechanisms.  The draft will then be
submitted for Area Directorate review and IESG Last Call as a Proposed
Standard.  Future working group tasks and meetings will be considered as
needs arise.