CURRENT_MEETING_REPORT_

Reported by Steve Casner/USC-ISI

Minutes of the Audio/Video Transport Working Group (AVT)

The AVT Working Group met for only one session at this meeting since the
draft specification for the Real-time Transport Protocol (RTP) is nearly
completed for submission as an RFC. The emphasis of this session was on
implementation experience with the focus shifting to companion
specifications for profiles and encodings.


Status of Draft RTP Specification

This group did not meet in Amsterdam, but there has been substantial
progress on the RTP specification via e-mail and a teleconference, and a
new draft-ietf-avt-rtp-04.txt and .ps has been installed.  The
specification has been submitted to the Area Director with a request for
``IESG Last Call,'' and is in review by the Directorate.

Steve Casner gave a brief description of the most recent change to the
specification, which was the addition of the APP option.  This option
allows experimental application-specific options to be defined without
official registration while avoiding conflicts with other option
definitions.  See the draft RTP specification for details.  A brief
description was also given on a proposal from Andrew Cherenson to add an
option, not in the main RTP specification but in the audio/video
profile, to indicate the mode or state of a participant.  The proposed
set of states were:  active, video frozen (still image), private
(listening but not sending), and hold (not listening and not sending).

A good fraction of the attendees at this meeting had read the RTP
specification.  Comments were solicited both on the specification and on
the two options just described, but no comments were offered.  However,
behind the scenes, some objections have been raised to the
classification of RTP as a Proposed Standard and to certain details of
the specification.  These issues will be discussed further on the
mailing list.


Implementation Experience

Ron Frederick from Xerox PARC gave a presentation on his experience with
implementing RTP in the network video (nv) program.  He reported that
overall, the implementation went very cleanly, and that the combination
of the sequence number, timestamp and sync bit worked well together.  He
found the option format easy to generate and parse, but cautioned that
the parser must watch out for an illegal option length zero or length
greater than the packet length.  (The example option parsing code in the
appendix to the specification includes these checks.)

The one nuisance Ron found was that the program needs to know if an SSRC
option is present to fully identify the sender before the parsing can
act upon the other options.  This requires parsing the options twice, or
storing the information while parsing and then acting upon it at the
end.  To reduce this nuisance, it was proposed that the specification be
modified to require that if an SSRC option is present, it must follow
immediately after the fixed header.  Since this is the logical place for
translators to insert the SSRC option, and since there can be only one,
this restriction should cause no difficulties.

David Kristol from AT&T described his work (just beginning) on a quality
of service monitor for RTP. It would create a map of the MBONE, and
display a measure of the reception quality for each receiver on the map
using data obtained from reception reports multicast by the receivers.
This would allow a visual determination of bottleneck points.  One
observation was that the measure of video delay is affected by the use
of the same timestamp on all packets of a video frame even though the
packets are not transmitted at the same time.  A solution is to measure
delay only on the first packet of a frame.  This illustrates that
reception quality measurement may be dependent upon the medium.

Dave also implemented a vat/RTP translator to allow participation in vat
audio sessions inside the AT&T firewall.  This turned out to be very
simple, the only problem being translation of vat's
beginning-of-talkspurt flag into RTP's end-of-talkspurt flag.  For now,
he is just copying the bit and ignoring the distinction.


Encoding Specifications

Frank Kastenholz from FTP Software asked for the addition in the
audio/video profile of an 8-bit linear encoding (``L8'') and a format
code for L8 encoding at 11.025 KHz.  This matches the capability of
common audio hardware on PC and Mac platforms.  It is possible to
convert in software to 8-bit mu-law at 8 KHz, but this increases the
minimum processing power required to participate.  This request was
generally agreed upon, and Frank was requested to provide the details to
go into the profile.  Henning Schulzrinne cautioned that adding a new
``standard'' encoding places a burden on all implementations to include
at least a decoder for it.

Bill Fenner from NRL and Ron Frederick gave presentations on carrying
JPEG video over RTP, and on the issues to be addressed in an encoding
specification.  Although the JPEG specification includes a variety of
formats, Ron recommended that we stick with 4:2:2 video format, square
pixels (as produced by most of the chips even though CCIR 601 specifies
rectangular pixels), a 16x8 block as the minimum coded unit, and
progressive scan.  Ron also recommended that we use the Q factors
defined for C-JPEG and D-JPEG by the Free JPEG Group and use the
standard Huffman coding table, though these could be overridden by
custom table definitions.

Bill has designed an encoding for JPEG over RTP, and implemented it
using the Parallax JPEG hardware.  He points out that JPEG frames are
large, so they are likely to require segmentation and reassembly.
Losing one packet out of a frame will result in frame loss because the
Huffman reset mechanism that is part of the standard does not provide
enough sequence space for packet-size losses.  He also observed that the
Q factor does not provide much usable quality range (the picture gets
lots uglier without the frame rate increasing as much as one would
expect).

The encoding Bill defined uses the same RTP timestamp on all packets of
a frame, and the RTP sync bit indicates the last packet of the frame, as
usual.  In addition, he has defined a small header to go at the
beginning of the data in the first packet of a frame.  The presence of
this header is indicated by the first two bytes being one of the
application-specific codes (0xFF 0xE1) provided in the JPEG
specification and guaranteed not to appear in the data.  This code is
followed by two bytes to encode the Q factor, Huffman table index, and
some size information.  Special values of these indices can be used to
indicate that custom quantization and/or Huffman tables will follow.
The mechanisms for requesting and/or periodically retransmitting custom
tables are still to be decided and tested.  There were no major
objections to this design other than the suggestion that explicit image
width and height factors be included.  Bill agreed to produce a first
draft specification for JPEG over RTP with assistance from Ron and
Fengmin Gong from MCNC.


Video Decoder API

In Columbus we had a good discussion on the feasibility of creating a
common interface for software video decoders so that each packet video
program can incorporate decoders for many or all of the other programs'
native formats to enable interoperation.  At this meeting, Ron Frederick
gave an update on the decoder API in the nv program in which decoding
and rendering of the image data are decoupled:  nv does all the network
I/O, RTP processing, and X-window system interaction; the image decode
routines just convert each packet of compressed bits into uncompressed
YUV pixels for a portion of the image.  A callback routine is provided
to render a rectangular portion of the image after decoding.

Ron identified several open issues that have arisen:


   o Is YUV a good choice for color decoding?  It allows easy rendering
     into monochrome or color images, but requires extra processing for
     encodings that would more naturally use RGB or dithered data.  The
     difficulty is that number of variations in the rendering code is
     already large to handle variations in pixel depth and ordering.  It
     may not be worthwhile to double or triple this to render from
     additional input formats.

   o It is desirable to enable the use of hardware encoders and/or
     decoders for increased performance, but what additional hooks are
     required to fit this into the model?  Some answers may come from
     exploring the options for the SunVideo board Cell-B encoder and for
     JPEG video using the Parallax board as Bill Fenner has done.

   o Should the common code handle resequencing of packets?  Previously,
     nv ignored packet sequencing because packets of the nv encoding can
     be processed out of order.  Now, nv is processing the sequence
     numbers to accumulate packet loss information, and could do the
     resequencing.  However, Ron feels that this function should be left
     to the decode routines because the requirements may not be the same
     for all encodings, unless we can define as part of the profile an
     extra level of framing for all the encodings to use.


Other API's may also be needed.  Henning suggested that video encoding
routines should also be sharable to reduce the effort of writing them.
Since nv already separates the frame grab from the encoding, an
interface could be explored there.  Abel Weinrib also pointed out that
we need API's at a higher layer, that of whole media agents to be
controlled by different session managers.


Report from IMA Network Focus Group

A the end of the session, we got a report from Thomas Maslen of Sun on
the recent first meeting of the IMA Network Focus Group, and on the
potential interaction with the IETF AVT and MMUSIC Working Groups'
activities.  The Interactive Multimedia Association (IMA) is an industry
group chartered to develop standards to support multimedia applications.
In particular, the Multimedia System Services (MSS) proposal defines an
object-oriented architecture for the infrastructure to support
multimedia applications.

In a way, the MSS work fits between the AVT and MMUSIC areas.  The MSS
proposal does not specify media transport mechanisms or protocols.  The
Network FoG is to address the requirements for network transport in the
MSS, and to define network transport interfaces, target environments and
protocol profiles to support those requirements.  The group will work
with other standards groups, including the IETF, to incorporate existing
protocols and cooperate on the definition of new ones where needed.  At
first look, it appears that RTP may be suitable as one of the protocols
to be used for transport of real-time media.

Similarly, MSS provides infrastructure for multimedia applications such
at teleconferencing, but does not include the applications themselves.
Abel pointed out that it does not include higher-level objects like
people in its model, nor does it include policies.  Therefore, MMUSIC
sits above MSS, and the session management mechanisms to be developed in
that working group might be used for communication among a set of
applications implemented using MSS.


Future Working Group Activity

The session closed with a discussion of future working group activity.
As work on the RTP specification is completed, the group's emphasis will
shift to profile and encoding specifications.  From the point of view of
our Area Director, Allison Mankin, it is appropriate for the group to
continue work as needed, or to go on hiatus but keep the mailing list
(rem-conf) active.  Meetings at future IETFs may then be called to
address new questions such as the interface between network real-time
services and RTP, or when appropriate to advance any of the
specifications through the standards process.


Attendees

Andy Adams               ala@merit.edu
Stephen Batsell          batsell@itd.nrl.navy.mil
Tom Benkart              teb@acc.com
Richard Binder           rbinder@cnri.reston.va.us
Ronald Broersma          ron@nosc.mil
Stephen Casner           casner@isi.edu
Ping Chen                ping@ping2.aux.apple.com
Chuck de Sostoa          chuckd@cup.hp.com
Stephen Deering          deering@parc.xerox.com
David Dubois             dad@pacersoft.com
Ed Ellesson              ellesson@vnet.ibm.com
Julio Escobar            jescobar@bbn.com
Roger Fajman             raf@cu.nih.gov
William Fenner           fenner@cmf.nrl.navy.mil
James Fielding           jamesf@arl.army.mil
Robert Fink              rlfink@lbl.gov
Ron Frederick            frederick@parc.xerox.com
Mark Garrett             mwg@faline.bellcore.com
Atanu Ghosh              atanu@cs.ucl.ac.uk
Shawn Gillam             shawn@timonware.com
Robert Gilligan          Bob.Gilligan@Eng.Sun.Com
Fengmin Gong             gong@concert.net
Darren Griffiths         dag@ossi.com
Regina Hain              rrosales@bbn.com
Shai Herzog              herzog@catarina.usc.edu
Phil Irey                pirey@relay.nswc.navy.mil
Rick Jones               raj@cup.hp.com
Frank Kastenholz         kasten@ftp.com
David Kaufman            dek@magna.telco.com
Byonghak Kim             bhkim@cosmos.kaist.ac.kr
Charley Kline            cvk@uiuc.edu
Michael Kornegay         mlk@bir.com
David Kristol            dmk@allegra.att.com
Allison Mankin           mankin@cmf.nrl.navy.mil
David Marlow             dmarlow@relay.nswc.navy.mil
Jim Martin               jim@noc.rutgers.edu
Thomas Maslen            maslen@eng.sun.com
Marjo Mercado            marjo@cup.hp.com
Greg Minshall            minshall@wc.novell.com
Dan Nordell
Marsha Perrott           perrott@prep.net
J. Mark Pullen           mpullen@cs.gmu.edu
Jim Rees                 Jim.Rees@umich.edu
Eve Schooler             schooler@isi.edu
Henning Schulzrinne      hgs@research.att.com
Michael Speer            michael.speer@sun.com
John Stewart             jstewart@cnri.reston.va.us
Matsuaki Terada          tera@sdl.hitachi.co.jp
Chuck Warlick            chuck.warlick@pscni.nasa.gov
Abel Weinrib             abel@bellcore.com
Jean Yao                 yao@cup.hp.com
Shinichi Yoshida         yoshida@sumitomo.com