CURRENT_MEETING_REPORT_

Reported by Steve Casner/Precept Software

Minutes of the Audio/Video Transport Working Group (AVT)

Thanks to Joerg Ott and Carsten Bormann for their notes on the
discussion which served as input for these minutes.


Overview

A meeting of the AVT Working Group was a late addition to the schedule
at IETF in Stockholm to allow a face-to-face discussion following the
recent e-mail exchanges about coordination with ITU-T Study Group 15 on
the use of RTP. The second half of the second MMUSIC session was used
for this purpose.  In addition to this primary topic, a few other
questions, listed below, were discussed.


Coordination With ITU

Joerg Ott gave a brief summary of the discussion at the SG-15 meeting
held in Stockholm in May to discuss the H.323 and H.22Z recommendations
for interworking LAN-based audio/video terminals with H.320 ISDN
terminals.  The scenario involves a gateway to provide address and
protocol translations at several levels, with audio/video data transfer
and multiplexing being only one.

At that meeting several viewpoints were expressed with regard to RTP,
ranging from defining a new protocol (H.22Z) that was only ``inspired''
by RTP, to using RTP as-is and defining a new setup protocol to go with
it.  At that time, SG-15 decided not to use RTP because of several
problems they perceived.  However, at a subsequent meeting organized by
Rich Baker at PictureTel and in e-mail discussion, this decision was
reversed.  It appears that the current position is to pursue use of RTP
and the RTP A/V Profile as defined unless it turns out that this scheme
will not work.

There remains many questions about how connection setup will be done,
but the specific problems regarding the use of RTP seem readily
answerable:


   o RTP's presentation timestamp is not sufficient; a transport
     timestamp should be available for QoS measurement.

     The RTP timestamp was intended to be useful for QoS measurement
     (via the jitter field in the RTCP reception report).  We believe it
     will work; if it does not work for ITU purposes it will not work
     for ours either.  The mechanism needs to be demonstrated in
     practice during the Proposed Standard stage.  Further details on
     use of the jitter measure with video formats are given in the next
     section.

   o H.323 needs to work over protocols other than IP (e.g., IPX).

     This is not a problem.  RTP has no specific dependencies on IP; it
     requires only framing and multiplexing of RTP/RTCP from the layers
     below.

   o Provision of lip-sync if audio and video streams do not originate
     from the same source.

     RTCP includes timestamps that allow playing in synchrony any
     sources that can reference a common clock.  It is suggested that
     absolute (wall clock) time be used as that reference when possible,
     and that the Network Time Protocol may be used to provide
     synchronization of the system clock to absolute time.  If some
     system has no notion of absolute time, it can use elapsed time
     instead if all the sources to be synchronized can count the same
     elapsed time.  If no reference clock is available, it seems
     unlikely that any alternative transport protocol could provide
     synchronization either.

   o Lack of stability in the RTP, profile and payload data format
     documents which are only in draft form.

     While there have been a number of changes during the time RTP has
     been designed, these documents have now reached a stable state.
     The main RTP specification and RTP profile have been submitted for
     IESG Last Call already, and the H.261 payload format specification
     will be submitted for Last Call immediately after IETF. These
     should be published as Proposed Standard RFCs by September.

   o Distinguishing multiple streams from the same source.

     Each ``RTP session'' is intended to carry only one medium.
     Multiple media should not be multiplexed in one RTP session based
     on the payload format code.  Multiple streams from the same source
     may be sent in separate RTP sessions (destination transport
     addresses), in which case the SSRC may or may not be the same for
     each session (it is not required because the linkage is provided
     through the RTP CNAME). It is also possible for one host to send
     multiple SSRCs in one RTP session, for example to transmit video
     from two different cameras.

   o RTCP is insufficient for H.323 call setup.

     True.  The RTP specification says that the use of additional
     control protocols may be required.

   o Lack of ITU control over payload format codes in the RTP
     Audio/Video Profile.

     The current plan is to proceed with the RTP profile as specified,
     which includes the additional code points that were requested for
     ITU-T standard encodings.  There should be no problem adding new
     ITU-T standard encodings in the future since we will also want to
     use them.  Interoperability will be maximized if this profile is
     found to be sufficient for H.323 purposes as well, but if not,
     another profile could be defined to provide a payload format code
     space dedicated to the ITU.


It seems most important to get the RTP specifications published first to
establish them as a stable base.  During the Proposed Standard stage of
the IETF standardization process, if the current specifications are
found to be inadequate either for general use on the Internet as planned
or more specifically for the interoperation planned by H.323, then those
changes may be introduced before going to the Draft Standard stage.
However, it is not expected that any substantial changes will be
required.



Jitter Measurements For Video Formats


It is valid to ask about measurements on video formats where the same
timestamp is used for all packets in a frame.  In some sense, it is the
network that imposes the variation in delay implied when transmission of
the video packets is spread over the frame interval rather than
occurring all at once, so it is reasonable to include it in the jitter
calculation.

On the other hand, it is expected that the jitter measure will be
primarily used to compare the behavior observed by different receivers.
The jitter measure can also be calculated by the sender for the traffic
as transmitted and then compared to the jitter reported by a receiver.
This allows cancelling out the jitter introduced by using the same
timestamp for all packets of a video frame.

If the first packet of a frame were marked in some payload
format-independent way, then it would be possible to calculate the
jitter using only those packets, which are sent with minimum delay after
the frame is sampled.  However, since the packets of a frame may
represent a burst, later packets in the frame may experience more delay,
so measuring only the first might not be accurate.

For the MPEG video format, the transmission order of I, P and B frames
is not the same as the presentation order.  This introduces significant
additional noise into the jitter calculation.  It is possible to correct
for this by observing the I, P and B bits in the MPEG header at the
receiver and adjusting the timestamps accordingly before doing the
jitter calculation.  Don Hoffman at Sun reports that they have
prototyped this scheme.  This works fine for receivers, but a profile-
and payload format-independent monitor would not have this information.


Other RTP Questions

In addition to the ITU coordination questions, there were a few
questions brought up recently on the working group mailing list that
were discussed in this meeting.


   o The latest RTP profile draft specifies a 90000 Hz clock for the RTP
     timestamp in all video payload formats to replace the 65536 Hz
     clock rate used previously.  This matches the choice made by the
     designers of the MPEG encoding to be a multiple of all of the video
     frame rates in common use, and is the choice recently made by the
     authors and implementors of the RTP payload format for H.261 video.
     It is requested that the authors of the other video payload format
     specifications update those specifications to reflect the new clock
     rate unless there is some reason that the old clock rate must be
     used.  No objections were voiced and no other comments on the RTP
     profile were offered.

   o The RTP payload format for MPEG video specifies two formats, one of
     which encapsulates MPEG Transport Systems format.  In that format,
     the position of video frame boundaries is not known to the process
     doing the RTP encapsulation.  Instead, the RTP marker bit is used
     to indicate the start of a ``payload unit''.  Note that choosing
     the start rather than end is at odds with the convention for other
     video formats, but is more convenient.  There is no advantage to
     marking the end as there is with the other video formats.  No
     objections were raised.

   o Vineet Kumar asked how multiple audio streams fed through a mixer
     could be synchronized with a video stream that was not sent through
     the mixer.  The answer is that the audio streams can all be
     synchronized and the mixed output emitted with the same timing if
     the sources all have synchronized clocks.  If not, then RTP does
     not solve this problem.

   o Feedback was invited on points where the RTP specification may not
     be as clear or explicit as is needed.  These should be sent to the
     authors or to the working group mailing list (rem-conf@es.net).


Future Activities

As mentioned above, the main RTP specification and RTP profile should be
published as Proposed Standard RFCs in September.  All video payload
formats should be posted for Last Call as soon as possible and then
published as RFCs as well.  This will complete the working group's
charter.