CURRENT MEETING REPORT


Minutes of the Common Indexing Protocol Working Group (FIND)

Reported by Patrik Faltstrom, Bunyip


The FIND Working Group met for the first time at the 34th IETF.  
Patrik Faltstrom chaired the meeting.  He first reviewed the charter.

It was pointed out that the e-mail address of the Working Group is 
"find@bunyip.com", and nothing else.

The Charter gives the Working Group goal as defining one, and only 
one, common indexing protocol which all directory services can use 
when passing indexing information.  Patrik admitted that so far this 
work has been aimed toward WHOIS++, but that he is depending on 
the group for help in making it work across directory protocols.  
Currently there are 2 drafts which came out of the WNILS Working 
Group: one on the Common Indexing Protocol by Chris Weider and the 
other on the WHOIS++ mesh by Patrik.  Patrik intends the second 
version to include LDAP and PH.

The way directory information is indexed in the CIP is for each leaf 
node to supply the information to the indexing server (centroid).  When 
the indexing server gets a query it will be able to prune the branches 
where there will be no information.  (Note that the examples are in 
WHOIS++)  The leaf server sends the Data-Changed command:

# DATA-CHANGED
o  Version-number:
o  Time-of-latest-centroid-change:
o  Time of message-generation:
o  Server-handle:
o  Host-Name:
o  Host-Port:
o  Best-time-to-poll:
o  Authentication-type:
o  Authentication-data:
# END

The centroid uses the Best-time-to-poll value to send a poll command:

# POLL
o  Version-number:
o  Type-of-poll:
o  Poll-scope:
o  Start-time:
o  End-time:
o  Template:
o  Field:
o  Server-handle:
o  Host-Name:
o  Host-Port:
o  Hierarchy:
o  Description:
o  Authentication-type:
o  Authentication-data:
# END

The polled machine sends back the Centroid-changed response:

# CENTROID-CHANGES
o  Version-number:
o  Start-time:
o  End-time:
o  Server-handle:
o  Case-sensitive:
o  Authentication-type:
o  Authentication-data:
o  Compression-type:
o  Size-of-compressed-data:
o  Operation:

# BEGIN TEMPLATE
o  Template:
o  Any-field:

# BEGIN FIELD
o  Field:
o  Data:

# END FIELD
# END TEMPLATE
# END CENTROID-CHANGES

Both the template and field are repeatable.

Today the only transfer is on the whole centroid, it is case insensitive, 
is the 8879-1 character set, and the tokenization algorithm is white 
space and @.


More information about the CIP is available at: 

http://www.bunyip.com/products/digger


The question was asked why use this when X500 has replication?  The 
answer is that it is a base for the future.  X500 doesn't offer indexing, nor 
does it provide a common indexing for all protocols.  This model is also 
used for URN to URC resolution at Georgia Tech, and the model may 
allow for Web indexing.  Chris Weider pointed out that it will allow 
for the 1,000 flowers blooming, a term which refers to the multiplicity 
of directory protocols becoming available.

Patrik was asked about things not WHOIS++ and he replied that he 
does not believe there will be any problem handling the indexing 
information.

Patrik was also asked is the Working Group should do a survey of 
indexing schemes and Patrik replied he was looking for volunteers to do 
so.

There was a small semantic discussion on whether it was an indexing 
protocol or whether it was exchanging data to create an index.  Patrik 
would like to have a common format for the index if possible.  Each 
directory service would pass its index to the centroid, which would 
index that index.  And in fact, the index would be indexed at each level 
of the tree.  Some of the issues to study will be the trade-offs of the 
number of levels and the size of the indexes.  A lot of factors are 
involved: data, reduction of indexes, geography.

Tim Howes of U Michigan gave a presentation on an program he's 
written called centipede.  The centipede connects to a directory over 
LDAP which tells it what information to produce for the centroid 
index.  It is produced, and centipede then connects to the target with the 
references and uses LDAP to install the index in the entry.  It generates 
distinct values (whole names rather than tokens) which it passes up 
the tree.  The large index allows more precise searches and pruning. Tim 
gave the following URL for more information:

http://www.umich.edu/~rsug/ldap

CIP has the advantage of you knowing who is indexing you, while
centipede does not.

Chris W. reminded the group that all of this was experimental and 
wanted the group to think about what sorts of indexing information 
would be useful.


The group identified the following issues:

o  Character sets
o  Tokenization algorithm
o  Legal issues
o  How to specify for partial centroids
o  New server-to-ask records
o  Schema translations
o  Query result
o  Protocol issues
o  Security
o  Replication
o  Dealing with replicated data
o  Polling cycle detection

The group focused on what might be simple.  These issues might be:

o  Common format and schema translation
o  Overall model
o  Given a name of a company, return a domain name
o  Index data stored in WHOIS RWHOIS WHOIS++ and X500.

The group agreed that the plan of attack should be:

1) Overall model
2) Schema translation

Both Joann Ordille and Roland Hedburg have some experience with 
schema translations that will be useful to the group.

It was suggested that we would need a registry of schema with 
descriptions and capabilities.

The group also asked what happens when the search result lives on 
WHOIS++ but the client only speaks LDAP.  Proxies were suggested as 
one solution.  Another solution would be to return URLS which 
contained queries which could be handed to a server.

The group then elected to defer engineering discussions to the list, and 
Patrik adjourned the meeting.