Minutes from the webpriv BOF held at the 40th IETF in Washington DC.

Minutes taken by Craig R.P. Heath (craig@sco.com) with slight
amendments by April Marine (amarine@nic.nasa.gov) in square brackets to
reflect comments from the list.  Questions should go to April Marine.

BOF Chaired by: Larry Masinter (masinter@parc.xerox.com).

User Services Area director:  Joyce Reynolds (jkrey@isi.edu).

Mailing list: web-priv@nasa.gov.  (to join: web-priv-request@nasa.gov;
a majordomo list) 

The scope of the BOF was to discuss current issues with regard to privacy
in respect of the WWW, and whether there is something the IETF can do to
help, by way of user education, or guidelines for spec/protocol writers.

Privacy Issues

Hit Metering - info on user's browsing habits can be gathered by
monitoring page hits; if a cache is used (either in a proxy server or
the user client) the ability to monitor is moved, but not necessarily
avoided.  Currently privacy issues are not considered in
implementations; avoiding hit metering may be counter to what content
providers want.  [Privacy issues are considered in RFC 2227, but the
fear was expressed that some content providers may desire more
information than simple hit-metering provides and would therefore not
use the facilities in RFC 2227.]

State Management (cookies) - the cookie mechanism is [an extension to]
the HTTP protocol.  Along with returning the requested page, the
server can set a cookie on the client.  The client should return the
cookie to the server the next time the page/server is accessed.  There
is an issue here with regard to European privacy laws, which are more
strict than the USA.  Users need to be aware of what is going on.
There was a discussion of "unverifiable transactions".  Content
providers may link to images from third party sites - the images may
be accompanied by cookies.  [One way] the third party can discover where the
image was referenced from [is] by looking at the "referrer" field in the
http header.  If another content provider links to the same third
party, the cookie from the first content provider will be returned,
allowing a history of the sites visited to be deduced.  This situation
is typically encountered with sites using a third party advertising
agent, e.g. Altavista's use of Doubleclick.  [A] purpose of
the cookies is, for example, to limit the number of times a particular
ad is displayed.  It was pointed out that the referrer field can be
disabled in Netscape by editing a config file.  It was questioned
whether changing this mechanism would help, or whether advertisers
would just find some other way of achieving the same goal (the content
providers are already implicitly cooperating with the advertisers).
It was pointed out that SET has persistent info that can be retrieved
in certain circumstances.  Cookies are a mixed blessing - combined
with personal information they can be used to tailor the view an
individual is presented with.  Users should be able to turn cookies on
and off.  There was a suggestion that "certified cookies" could be
provided where there was some assurance of what use the cookie will be
put to.  This would need to apply to plug-ins, etc.  as well - it is
very important for the user to understand how information is to be
used.  The expectation in Asia is that 90% of the cost of an Internet
connection will be met by advertising - there is a danger of privacy
issues hurting advertisers.  It was pointed out that the aims of
advertisers is moving from simply displaying the advert to closing the
transaction.  A straw poll indicated that some of the audience never
register with web sites, and some register using pseudonyms.

[It should be noted that the effect of privacy laws and the fact that
they differ in different parts of the world is a general issue for the
discussion and is not limited to the topic of cookies.]

W3C Platform for Privacy Preferences Project (P3P) - http://www.w3.org/P3P/ -
privacy information can be encoded in the W3C metadata format, everyone can
define their own private policy, similar to the PICS model for content
labelling.  Servers announce their intentions for processing of personal
information in metadata, allowing the user to accept or reject them.  Several
drafts are available from the web site, and there are two working groups
on implementation.  User preferences can be configured to cover domains from
a single site, through a group of sites, up to all sites.  Configurations
can be exchanged via a URL.  The privacy assertions would include cookie
processing, making the cookie issue academic.  All other processing of
personal info would be included, e.g. use of user name/serial number in URLs.
A decision must be made to trust the accuracy of the privacy assertions -
this is essentially the standard trojan horse problem.  Privacy assertions
can be signed by a third party as assurance, but they don't have to be.  The
user policy can specify whether assertions must be signed to be trusted.

Other Privacy Issues - Misuse of indexed mailto: URLs for spam was raised
as an issue.  It was suggested that privacy problems are social, not
technological, and the basic issue is trust.  If a working group is formed,
it would need strong links with W3C, also with any working group emerging
from the spam BOF.  Users need "informed consent" - both informational and
technical.  The basis of the P3P model is in real-world trust in institutions;
institutions can vouch for others.

Operational Issues

NASA received a Freedom of Information Act request for full access logs for
all web sites (~30M/day).  Person has requested this from all federal sites.
Agencies are not allowed to ask why the information is wanted.  There is a
concern that the requester will be able to use the information to build a
"click trail" for users accessing the sites.  "Anonymising" the log was not
permissable.  As a result of this, it is now policy to keep raw logs for only
30 days (even backups); summary reports have their own archival policies.
Businesses also have issues with "legal discovery" and may need to take a
similar approach.  It was pointed out that logs are necessary for
characterisation of usage - if logs must be discarded, it will be necessary
to do the analysis "as you go".

With the greater availability of information, it is getting harder to
effectively anonymise logs - "De-anonymising" has been demonstrated with
medical records, for example.

Plans for Working Group

The User Services (USV) area is not just concerned with end-users, but
all levels of users.  Its charter includes the production of guidelines
and books.  There seems to be a reasonable level of interest - a possible
overlap with the run (Responsible Use of the Network) working group was
noted.  The working group could be a combined effort with other areas,
in particular the Security area.  Although the end product would not be
a new protocol, expertise from the technical community is needed.  The
choice of area (USV, SEC, etc.) is less important than the composition
of the working group itself.

Potential Inputs

There is a paper analysing the effects of the European Union privacy
directive on US trade - inaction on this may result in difficulties.

Volunteers

April Marine (amarine@nic.nasa.gov) has volunteered to chair the group,
at least until the next meeting.

Erik Bataller (emb@nttlabs.com) has volunteered to gather information on
user experiences (similar to the NASA experience above).

Ted Hardie (hardie@nasa.gov) has set up this list (web-priv@nasa.gov).

			- Craig Heath, SCO.