CURRENT_MEETING_REPORT_

Reported by Steve Casner/USC-ISI

Minutes of the Audio/Video Transport Working Group (AVT)


1.  Overview

In the first AVT session, rough consensus was given to submit the
revised Real-time Transport Protocol specification for area directorate
review and IESG Last Call as a Proposed Standard.  This revision,
denoted RTP version 2, incorporates changes requested by the first area
directorate review in November 1993.  It is the refinement by Steve
Casner, Ron Frederick, Van Jacobson and Henning Schulzrinne of the rough
protocol changes presented and discussed at the March 1994 IETF meeting
in Seattle.

An overview of the revised RTP was presented and discussed in the first
AVT session and part of the second.  The group concurred with the
choices made on all of the previously open issues.  It was agreed that
the hooks provided for protocol extensions were adequate for planned
experiments with mechanisms not included in the current protocol.  More
details on the issues discussed are included in the sections below.

There are a few explanatory paragraphs and example algorithm appendices
that need to be completed in the draft RTP specification, then it will
be submitted.  This should be done well before the next IETF meeting.

Later in the second AVT session, presentations were given on
specifications for the encapsulation in RTP of three different video
encodings:  Christian Huitema presented the H.261 encoding, Bill Fenner
presented the JPEG encoding, and Don Hoffman presented the MPEG
encoding.  These specifications will also be completed as
Internet-Drafts and then submitted as Proposed Standards.

On-line versions of the slides are available via FTP from ftp.isi.edu in
directory mbone/avt/toronto-july94/.


2.  Changes in RTP Version 2

In a message posted to the working group mailing list (rem-conf) in
June, a list of the protocol changes from version 1 to version 2 was
given.  The changes may be summarized as follows:


   o Carry the control and data traffic on separate ports

   o New control packet format, including mandatory reception reports

   o No options in data packet; CSRC count in fixed header

   o Locally unique 16-bit SSRC ID is now a 32-bit random global ID,
     always present (header now 12 octets)

   o Media-specific timestamps (versus fixed 65536 Hz rate)

   o Remove the application-level multiplexing of the Channel ID and
     move it to an encapsulation for the cases where it's needed

   o Application-specific sync marker bit

   o Encryption covers the whole packet; authentication omitted

   o With the change to global and potentially encrypted SSRC IDs,
     translators cannot do unicast reverse path control packets

   o Beginning of sync unit (BOS) option was eliminated; encodings that
     need the info should include it in their own headers

   o Length fields changed to count 32-bit words not including the
     length word to eliminate some validity checks at receiver


Comments were sought both on the changes that were proposed in a
finished form as well as on some open issues that were undecided.  Some
comments were received, but no major objections.  Although not all the
open issues were discussed, the authors completed the specification and
posted it as Internet-Draft draft-ietf-avt-rtp-05.txt before this
meeting.  During this meeting, each of the open issues was discussed as
outlined in the sections below, and the e-mail comments were addressed.



2.1  Provision for Testing Bolot/Turletti/Wakeman Scheme

RTPv2 includes a receiver-initiated congestion reporting scheme based on
multicast reception reports.  An alternative scheme is the one based on
sender-initiated polling described in a paper by Bolot, Turletti and
Wakeman at SIGCOMM '94.  This was implemented in RTPv1 using options
carried in the data packets.  Christian Huitema and Ian Wakeman argued
that it was important that RTPv2 not preclude further experimentation
with this scheme.  RTPv2 does not include control options, but does
provide a header extension mechanism intended for application-specific
extensions.

It was agreed that provision should be made for further experiments to
compare the two schemes, and that the header extension mechanism would
be suitable.  Details of the extension format for this use must be
defined in the Audio/Video profile that accompanies the RTP
specification.


2.2  Provision for Testing Unicast Feedback Mechanism

In the Bolot/Turletti/Wakeman scheme, reports from the polled receivers
are returned to the sender using the ``unicast reverse path control
packet'' mechanism of RTPv1.  This mechanism was eliminated in RTPv2
because the change of SSRC ID scheme and complete encryption of the
packets preclude translators from passing the reverse packets.  Van
Jacobson suggested that it would be better to multicast the reports even
under the sender-initiated polling scheme, and Ian Wakeman agreed this
might be the best method.  However, Christian Huitema did not want to
rule out entirely the possibility of sending unicast reports as it might
have lower cost especially for symmetric sessions with many very small
sources.  Unicast control packets are also utilized in the H.261 video
packetization scheme.

It was again agreed that provision should be made for experimentation
with both unicast and multicast methods.  Since it is feasible to do
these experiments in scenarios that do not include translators,
receivers can use the IP/UDP source information to return unicast
replies directly.  The INRIA folks prefer that the source port of the
data packets be used as the destination port for the unicast replies.
INRIA will take responsibility for designing the details of unicast
packet use under RTPv2 in this scenario, and provide a report back to
the group on the results of the experiments.


2.3  Need to Define Encapsulations for Protocols Other Than UDP

In RTPv2, control and data are sent on separate ports when using UDP.
For other protocols, either two associations must be used or some
encapsulation must be defined to provide multiplexing of control and
data on one association.  The RTPv2 specification does not specify any
such encapsulations; instead, that task is left to separate
specifications in the same manner as the IP encapsulation on Ethernet is
defined separately from the IP specification.  Don Hoffman suggested
that it was the responsibility of ancillary groups, such as ATM Forum
for AAL5, to decide whether to provide multiplexing or use two
associations, and to define the details of the encapsulations.  It was
agreed that the RTP specification would simply state the requirement for
the underlying layers to provide the multiplexing for separate control
and data.


2.4  Removal of FMT Control Packet

Van Jacobson, Ron Frederick and Christian Huitema all lobbied for the
removal of the FMT control packet.  Henning Schulzrinne was the primary
proponent but was not present.  Henning's argument is that due to the
combinatorics of encoding parameters, one cannot define ahead of time
all the payload types that you may need to use in a session.  The
creator of the session cannot know all the ones that others in the
session may want to use.  Van countered that only a small number of the
combinations will ever be used.  The group was asked about other uses of
dynamically defined payload types that might affect this decision, but
none were identified.  It as agreed that FMT should not be defined in
the main RTP specification, but that it could be defined in profiles as
needed.  For the initial Audio/Video profile, it was further agreed not
to include FMT at this time.  If a clear need is demonstrated later, we
can define it then, as a profile extension.



2.5  Authentication Omitted

The RTPv2 specification does not specify any authentication methods.
Encryption is defined because the primary security concern is for
privacy in conversations, which seems to be a stronger concern for audio
that for typed words.  Furthermore, Ron Frederick asserted that without
a key management system to use for the authentication, it's a moot
point.  There were no objections to omitting authentication from the
specification.



2.6  Rules for Sending Receiver Reports

There are a few items that remain to be fully specified in the RTP
draft.  One is to clarify when reception reports are required and when
they may be omitted.  The current statement is simply that they are
required when IP multicast is being used.  It may not make sense for the
specification to describe in detail under what circumstances reports
might not be used; we know about the IP multicast case, but we have not
really learned about the others yet (e.g., unidirectional systems).

Van Jacobson has promised to supply an algorithm for calculating the
interval between reception reports such that the overall rate of control
traffic from all sources is kept to a small fraction (1the data rate.
This will go in an appendix of the RTP specification.

Van also brought up a new aspect to be considered for this algorithm
that was suggested by Henning Schulzrinne.  If more of the control
bandwidth is allocated to senders than to receivers so that they can
send CNAMEs more often, this will allow receivers to more quickly
establish the cross-media binding for functions such as audio/video
synchronization.  For example, giving 50and the rest to receivers seems
reasonable.  If all participants get the same amount of control
bandwidth, in a 1000-person conference it might be 5 minutes before a
new participant received the senders CNAME. The details need to be
designed for clamping the sender control rate to a reasonable maximum
and insuring that randomization of the sending interval will avoid
exceeding the overall control bandwidth on a transient basis over the
scale of session sizes.  This new feature should be tested before it
goes into the specification.



2.7  Bit Allocations, Lengths in 32-bit Units, Control Packet Types

The RTP specification defines a particular allocation of bits to
functions in the data header.  In particular, only 4 bits are allocated
to the count of CSRC identifiers following the header so that 7 bits may
be allocated to the payload type field.  There were no objections to
this allocation.

A recent change in the specification was that all length fields covering
areas required to be a multiple of 32-bits should be counted in units of
32-bits rather than octets, and should not count the first 32-bit word
that contains the length field.  This avoids a validity check that the
bottom two bits of length are zero and a second validity check that the
value is not zero.  No objections were voiced.

Steve Casner proposed that the control packet type space be partitioned
among the main specification, profiles, and applications within a
profile, as was done with option codes in RTPv1.  This allows profiles
and applications to define types without conflicting with each other or
future definitions in the main specification.  This topic was not yet
addressed in the specification, but was agreed and will be added.


2.8  RTP Timestamps and Relationship to Real Time

Although it was not listed as an open issue, some questions were raised
about how the RTP timestamp should be related to real time for purposes
of synchronization.  Christian Huitema pointed out that the RTPv1
timestamp provided the relationship to real time directly in the signal
stream where it fits naturally.  In RTPv2 the relationship is carried in
the control packets to optimize data packet processing, and this may be
less convenient for some implementations.

Julio Escobar noted that for some applications such as data fusion, the
limitations on control traffic bandwidth might make they delay before
synchronization too long.  For such applications, the profile may
specify that the RTP timestamp will carry part of a real-time timestamp
and/or that additional real-time timestamp information may be carried in
a header that's part of the encoding or in an RTP header extension.
However, the RTP timestamp is supposed to have a random initial offset
for stronger encryption, so for the RTP timestamp to carry part of an
NTP timestamp this offset must be communicated to the receivers out of
band so it can be subtracted.

Christian said the IVS implementors had also observed a problem that the
audio input on some workstations skips samples under heavy load, thereby
causing the media clock to drift with respect to real time.  It should
be possible for the normal playout buffer adaptation to accommodate
this.  For synchronized playback, the relationship to real time may be
adjusted at the next start of talkspurt following each Sender Report
control packet that is received.  These must be sent often enough that
the drift out of sync does not become too large in between, which
relates back to the control packet bandwidth limit.


3.  Open Questions about Audio/Video Profile

In addition to the open issues regarding the RTP specification itself,
there were a few open issues to be settled for the specification of the
initial Audio/Video profile.  These are described below.


3.1  Number and Meaning of Marker Bits

The RTP specification allows a profile to trade off the number of marker
bits and payload type bits in the second octet of the data header.  The
proposal for the audio/video profile is to have one marker bit, and that
it would mark the start of a talkspurt for audio and the end of a frame
for video.  There was some discussion of the value of marking the end of
a talkspurt, but Van Jacobson argued that the functions to be performed
were independent of the bit.  The choice of marker bit was accepted by
the group.


3.2  Default Encryption Method

Encryption at the RTP level is defined to cover the entire packet, and
header validity checks are used to verify decryption with the correct
key.  The specification also identifies an alternative to not use
encryption at the RTP level, but instead to allow both unencrypted and
encrypted payload types to be defined.  For example, two payload types,
one for unencrypted PCM and one for encrypted PCM. This allows feeding
an encrypted, compressed stream to hardware that expects such a stream.
It was proposed that the audio/video profile should specify RTP-level
encryption as the default, based on the general principle to encrypt all
information that does not have to be left in the clear.  This was
accepted by the group.


3.3  Relationship Between Control and Data Port Numbers

The RTP specification currently defines the default relationship between
the control and data port numbers to be ``control = data + 1,'' but
allows profiles to define a different relationship.  Van Jacobson
proposed to change this default to be more strict:  that the data port
must be even (making the control port odd), and that we use this choice
in the audio/video profile.  This change would allow a network provider
to notice traffic on either port and find the control channel to monitor
without having any external information about the conference.

This proposed changed was agreed, and in addition it was agreed that
both the address allocator and the media applications should force the
data port number to be even.  This policy could be implemented only in
the address allocator, such as the sd tool.  However, since the current
implementation of sd does not force the data port to be even, it was
agreed that enforcing the policy in both places would ensure that it was
upheld and avoid compatibility problems.



4.  Profiles for Packetization of Video Encodings


During the second session, presentations were given on the
specifications of how the H.261, JPEG and MPEG video encodings should be
packetized for carriage over RTP. These specifications will be
companions to the RTP specification.



4.1  Packetization of H.261 Video Encoding
 

Christian Huitema gave a presentation on the revision of the H.261 video
packetization specification.  This encoding works without the H.221
bit-level multiplexing that is used with H.261 over circuits, carrying
GOBs (groups of blocks) in packets instead.  INRIA implemented
compression in software; UCL has interfaced a hardware codec and
stripped off the H.221 framing.  The packet format allows for arbitrary
bit alignment of the data to accommodate the hardware codecs.

After the RTP header, there is a 16-bit header that describes the format
of the encoding that follows.  Included are the bit positions of the
starting and ending bits within their bytes.  These are now constrained
to be zero (byte aligned) except at the beginning and end of a GOB. Also
in the header are several flags and the image size.

Van Jacobson requested a change to allow reassembly of the packets of a
GOB into a contiguous buffer even when packets arrive out of order.  The
contiguous buffer permits a simpler and faster decoding loop.  This can
be achieved by establishing the rule that all packets of a GOB other
than the last are the same size; or, alternatively, by adding a fragment
offset field to the H.261 packetization header.  Christian preferred the
first option because it did not introduce an incompatibility in the
packet format and did not add more overhead.  It was agreed to make this
change in the specification.

Steve Casner pointed out that the recent draft still requires some
changes in packet formats to reflect the use of RTCP control packets on
the control port rather than options in the data packets.  As was noted
above, some detail on the use of unicast reverse packets must also be
specified.  When these steps are completed, it was agreed that this
draft should be submitted in conjunction with the RTP specification as a
Proposed Standard.



4.2  Packetization of JPEG Video Encoding

Bill Fenner gave a description of the JPEG over RTP specification which
has resulted from discussions with Ron Frederick, Steve McCanne and
Lance Berc.  (See the slides for details.)  An Internet-Draft on the
JPEG packetization specification will be produced by the next IETF
meeting in December.

The encoding has a 64-bit header including a fragment offset since it is
not possible to guarantee same-size packets in JPEG. JPEG markers are
defined to be 0xFF bytes in the data stream; if you have a hardware
codec that does not support this, you have to remove them in software.
(Since there are also hardware codecs that require the 0xFF stuffing,
you cannot always win, and including them allows additional functions.)
The only JPEG markers supported are restarts which allow recovery in
case some data is lost.

A type field has replaced the collection of individual parameters in the
previous version of this specification.  Types 0-127 will be statically
defined, with type 0 being YUV 4:2:2 and type 1 being YUV 4:2:0.  Since
not all hardware supports restarts, type 0 is defined to exclude them to
maximize interoperability.  Restart codes will be supported in the
future in some future types after all the details are determined.  Types
128-255 are dynamically defined by the session protocol or by a control
packet, basically by sending all of the JFIF header describing that
type.


4.3  Packetization of MPEG Video Encoding

Don Hoffman gave an update on the changes in the MPEG packetization
draft for RTPv2.  (The Cell-B draft will just be re-issued and discussed
via e-mail.)  MPEG-2 is in development as an ISO/IEC standard.  In the
MPEG profile for RTP, two formats are proposed.  The first translates
and encapsulates the information in the MPEG-2 Systems environment for
interoperation with other transport mechanisms.  The second is a much
simpler for ``native'' Internet uses (eliminating a lot of the
application-level functionality that does not apply).  It is expected
that MPEG hardware will provide an interface at the Packetized
Elementary Stream (PES) level to make this possible.  For both the MPEG
and Cell-B specifications, the goal is to have Internet-Drafts completed
by the next meeting.

This specification uses the 90 kHz MPEG presentation timestamp clock for
RTP timestamps.  There is a transport header at the start of the RTP
payload that carries a translation of the MPEG transport information.
The transport header includes some optional fields whose presence is
indicated by a bit field in the first word.

There is one issue with regard to MPEG timing.  There are I, P and B
frames that are produced and interpolated at the receiver.  The output
from the encoder is not in temporal order, it is in frame dependency
order.  Therefore, the presentation timestamps in the RTP header will be
transmitted out of order with respect to the sequence numbers.  The
group did not see this as a problem.

For the PES encapsulation, will need payload types assigned for MPEG 1
video, MPEG 2 video, and MPEG 2 audio.  A 16-bit header was proposed at
start of the payload to carry some flag bits and slice counter.  One of
the flag bits indicates that another 16 bits follow to carry the
macro-block absolute position field.  However, Ron Frederick suggested
that 32-bit alignment was valuable, so the second 16 bits should always
be included.  Ron agreed this was a worthy consideration.


5.  Conclusion

During this meeting, the group agreed on essentially all of the open
issues for the RTP specification.  At the end of the meeting, the group
was asked for a show of hands from those who thought the specification
choices that had been made were fine, and that we should proceed with
filling in the example algorithms and completing the areas in the
specification where it now says ``to be determined'' and then submit
this protocol specification to the Area Directorate again for
publication as a Proposed Standard RFC. The chair interpreted the
response as consensus that we should proceed.  There is now nothing in
the way except completing the remaining details; the people who are
responsible know who they are.  The specification should be completed
and submitted well in advance of the next meeting.