Dan Larner, larner@parc.xerox.com
Copyright 1996 Xerox Corporation -- All Rights Reserved
Although current Web technologies have been doing
a remarkable job in providing information exchange, they are often
a collection of ad hoc mechanisms being pushed to limits for which
they were not originally intended. Object oriented design approaches
have some distinct advantages when dealing with complexity. They
provide for explicit separation of the concepts of interface, communication and
implementation, resulting in more reliable, manageable and extensible
systems. When extended to distributed object systems, these concepts
are bolstered by transparency of location and language. Distributed
object systems seem to be a fine foundation for creating complex
distributed systems, Web services of the future in particular.
However one cannot ignore the past, and any new infrastructure
must support the existing mechanisms for a lengthy migratory period.
Inter-Language Unification (ILU), a distributed object system from
Xerox PARC, is sufficiently flexible to provide such migratory functionality. ILU
allows existing services and distributed object services
to each appear native to the other; in fact, interoperation between
ILU and existing HTTP clients and servers is already in place.
The Web today provides a great deal of power.
Using this medium, the gathering and exchange of information
has exploded over the past few years.
The Web not only offers a tremendous tool for the present;
indications are that its use will only expand in the future.
However, Web users are experiencing the effects of
the limitations of the technologies underlying the Web's services.
Access can be problematic due to clogged servers and bandwidth
limitations. Integration of the Web with other user tools is
spotty at best, and non-existent at worst. A peek under the
covers reveals that all this power is being provided by myriad
collections of ad hoc mechanisms, custom implementations, and
protocols (often quickly modified for the moment) that were never
really designed with this kind of growth in mind. An excellent
critique of the current Web can be found in [1].
At the most fundamental level, all of the activity on the
Web is aimed at performing operations on distributed collections of resources
[2]. Many of today's Web technologies provide
little, if any, separation between a resource's 'interface'
(i.e., the specification of operations that it supports), the
communication 'protocols' used to convey these operations from
point to point, and the 'implementation' (i.e. the actual program
that carries out the computation associated with the operations).
The separation of these aspects is well recognized for its central
importance to the construction of reliable, maintainable, composable
and extensible systems, especially in the face of tackling problems
of increasing complexity. These same concepts brought
object-oriented programming to mainstream design, caused the layered
design of communication protocols, and produced the notion of
an API. Without explicit manifestations of these concepts, systems
become fragile, and difficult to understand and modify, which, in
turn, limits
their life and usefulness, and reduces return on investment.
In object-oriented programming systems, these separate
concepts of interface, communication, and implementation are central
themes. A distributed object system promotes the advantages of
object-oriented programming languages to a network wide context,
making it a natural foundation for well designed distributed systems
such as the Web. It does this primarily by providing transparency
of location, and transparency of implementation language. It provides
the freedom to "Use the most appropriate tools for the
task".
Location transparency allows a programmer to perform
operations on objects without specific regard to whether the object
is local to the program, in a different process, or on a distant
machine. No special syntax distinguishes local from remote calls,
and the semantics of a call are consistent no matter what the
case. The distributed object system deals with all the low-level
details of establishing network connections, marshaling arguments
un-marshaling return values, etc. This frees the programmer
to concentrate on the real problem at hand rather than
all the low-level plumbing. Location transparency
allows systems to be built using different systems for different
parts. Each part can run where it makes the most sense (e.g.
a driver can run close to a particular peripheral, a math intensive routine
can run on a high speed processor, etc.) Location transparency also enables the
creation of systems which are more reliable, by facilitating the design of
redundant functionality. For example, in the event that an existing server
goes off-line, unbeknownst to the client the system can arrange to contact
a different server.
Language transparency allows a system to be constructed
whose parts may be implemented in different programming languages.
That is, a program written in one language, say C++ or Python,
can perform operations on objects written in another language,
say Java or Lisp. The program does this without even knowing
that the other object's implementation is non-native. This is
accomplished by using precise descriptions of object interfaces
written in a language neutral interface specification language.
Language-specific compilers input these descriptions
and output language specific 'stub' code that allows programs
written in a particular language to operate with objects fitting
those descriptions. This concept of an interface specification
language makes it especially easy to provide object oriented interfaces
for legacy code - the object's implementation simply
converts method invocations to calls to the existing code.
This allows legacy systems to continue to provide value in new
situations. Language transparency pushes the potential for code
reuse to even greater levels, readily admitting a choice in the
most appropriate language for various parts of the overall task.
There is a cost for any technology that enables the creation of new and better systems, there is a cost, and distributed object systems are no exception. Location can't be totally transparent in today's systems because of its timing implications. Calls which are to remote systems take roughly an order of magnitude longer than calls that occur between processes on a single system, and these take roughly an order of magnitude longer than calls that occur within a process. Thus, at the system architecture level, it's still important to partition functions with a high degree of cohesion into units which are colocated.
Because of the parallelism often induced in distributed
systems, more attention must be paid to resource allocation and
sharing. In addition, if not accounted for in the design, a failure
on one node can bring the entire system to a halt. If replication
is introduced to make the design more fail-safe, then issues of
data replication and degree of consistency come into play. Separation
of computation into multiple processes and machines and languages
also makes debugging more difficult.
Whether the benefits of distributed object systems
outweigh the difficulties they bring with them depends on the
situation of course. No technology is a substitute for careful
system design and engineering, but distributed object systems should
enable the creation of systems which were not easily possible before.
Distributed object system technology can provide a well-defined,
strong, extensible foundation for future Web technologies.
In support of these tenets, are the efforts of the
Object Management Group (OMG) [3]. The OMG is a non-profit consortium
of over 600 developers and users, including many major forces
in the computing industry. Its promotes the use of object
technologies
for the development of distributed computing systems by developing
and standardizing common architectural frameworks for object-oriented
applications.
One of the OMG's specified frameworks is the Common
Object Request Broker Architecture (CORBA). It specifies a
foundational
infrastructure that lets objects operate with one another. Mappings
of these concepts to major programming languages are defined,
as are common communication protocols to be used by Object Request
Brokers (ORBs). This allows interactions to be independent of
the actual platforms and languages used to implement the objects.
By complying with the CORBA standard, portability and
interoperability
of objects is achieved when using tools supplied by different
vendors. Additional OMG specifications, at various levels of
standardization include Common Object Services (e.g. Naming, Events,
etc.), Common Facilities, and more.
Inter-Language Unification (ILU) from Xerox PARC [4]
builds on the CORBA distributed
object standard from the OMG, but offers a number of advantages
over other distributed object systems that make it particularly
attractive as a foundation for new Web technologies: Additional
espousing on the use of ILU for the Web can be found in [5].
In spite of the fact that distributed object systems can provide
a foundation for Web technologies with a life far into the future,
one would be a naïve to think they can replace the existing
services in one fell swoop. A means must be provided for a period
of migration to any new approach. During this period, the
distributed
object implementations of Web services must be accessible to existing
clients, and clients constructed with the new approach must be
able to interact with existing services.
For example, a Web browser
must be able to contact an object specified with a URL, and perform
operations such as HTTP's GET HEAD and POST operations [6].
Similarly,
a client based on distributed objects must be able to accept a
URL and see it as an object that supports these operations.
Leveraging
the capabilities of both existing Web services and distributed
object systems together in a freely interactive manner, immensely
bolsters their usefulness.
Approaches to the integration of the Web with distributed
object systems have focused on bridging techniques such as [7],
and/or CGI replacement/enhancement, such as [8]. The approach described
in this paper is different in that it incorporates HTTP support
directly into the ORB as a general communication protocol, and
additionally defines a particular object type designed to allow
interaction with existing Web services.
In the remainder of this paper, the focus is on the
support provided for HTTP/1.0 in ILU. Other protocols, such as
FTP can be addressed in a similar vein, as, presumably, can other
distributed object systems.
ILU support for HTTP/1.0 became available in ILU 2.0 alpha8.
The key to providing the integration desired lies
in the similarity between the URL's used in HTTP, and the String
Binding Handles (SBH) ILU uses to identify objects. An SBH specifies
everything that is needed to identify and contact a particular
object, just as a URL does for Web resources Let's consider an
SBH as an example: A full SBH for an object in one of ILU's sample
programs is:
ilu:TimingTest.figtree.parc.xerox.com/0;ilu%3Ac0FGHuC8UdTAO+ETWz1Nl5RjWa3;sunrpc_2_0x61a78_1139713249@sunrpcrm=tcp_13.1.100.126_1588
The breakdown of this SBH is (where the initial
/, ; and @ are simply token dividers):
Now lets take a look at a URL used with HTTP. For
example: http://www.parc.xerox.com:80/index.html Here we
have:
A URL specifies the host, protocol, transport/port,
and resource identifier. These map to direct analogs in an SBH.
Consider what happens if, in ILU, we create an ilu_Server entity
on host www.parc.xerox.com at TCP port 80, that 'understands'
HTTP. Then any Web browser that tries to access the URL
http://www.parc.xerox.com:80/index.html will end up sending its
HTTP request to that ilu_Server. The ilu_Server can treat the
/index.html as an object identifier, and invoke the appropriate method
(GET, HEAD, or POST) on the object that has that identifier.
This method's implementation is determined by the programmer, and
different derived types of HTTP resources can have different method
implementations.
Methods might perform a simple task, such as packing a
files contents into an entity body, emulating typical document
retrieval. Methods may instead perform complex computations to dynamically
determine response content, based on the composition of the request
and other outside information (e.g. previous requests, real-world values, data
base content, interaction with other services, etc.).
The latter is what is typically done with current CGI approaches;
when using ILU, however, the method is efficiently invoked within the same process.
(Note that one possible method implementation could be calling out to existing CGI programs
when necessary. This is an example of the legacy code encapsulation mentioned previously.)
The return value from the call on the object can
then be packaged up in an HTTP response, and sent back to the Web
Browser. A natural question to ask at this point is "Doesn't
that mean that all the objects that could possibly be referenced
have to be loaded up and ready to go all the time?" The
answer is no. ILU allows functionality to be associated with
a server that will create/load objects dynamically as needed.
(A simple Web Server program, webserver, included in the
httest test suite mentioned below illustrates on-the-fly object creation.)
On the other hand, an ILU application
needs to be able to treat a resource being served up by an existing
Web server as if it were an object, with GET HEAD and POST methods.
In ILU, as in other distributed object systems, non-local objects
are represented by 'surrogate' objects (other systems may call
them 'proxies'). The job of a surrogate is to act as a stand-in
for the actual object - forwarding any calls it receives to
the actual remote object, and returning any results accordingly.
(again, this is grossly over simplified, but conceptually correct).
So if the surrogate knows how to use HTTP as its communication protocol,
it can format up an HTTP request embodying the arguments passed to the call on
the object, send it over to the server, get back the response,
and package that up in the form that the caller expects. In ILU,
a function that creates a surrogate object from an ILU SBH has
been enhanced to also accept a URL and return the appropriate
type of surrogate. Now, an ILU application can think of the world
as objects, whether they be real objects in the distributed object
sense, or objects whose implementations are actually supplied
by existing HTTP servers.
A test suite called httest (one of the examples
that now comes with ILU) illustrates Web Browser-to-ILU, ILU-to-Web
Server, and general ILU-to-ILU over HTTP. The latter is of
interest not because of general object method invocation
with HTTP (there are more efficient protocols for this), but
rather because it allows general object interaction
to be carried on between systems separated by firewalls, thus
letting HTTP pass through. A brief description of the httest
example, and sample output are in the Appendix.
An ILU application that wishes to interact with an
existing Web resource must be able not only to get an object
(a surrogate, actually) representing the resource; it must also
have some means for specifying the HTTP headers and entity
body that should be sent with the request. Similarly, an ILU server
functioning as an HTTP-accessible Web resource must be able to
set status, header and entity body content.
Arbitrary programmers interpretations of these HTTP
components cannot be generally mapped into HTTP. A specific signature
is needed for the GET HEAD and POST methods, so that the ILU
implementation
of the HTTP protocol knows how to map arguments into actual
HTTP format. In addition, we need a way to distinguish these
methods, which are meant to be used with existing Web services, from other
methods that may happen to have the same name but different
signatures.
We address this need by defining a specific base
object type with declarations for structuring
the arguments to, and return values from, the GET HEAD and POST
operations. Any GET HEAD or POST operation invoked on an object
that is an instance of this base type (or an instance of a type
derived directly or indirectly from this base type) has a particular
signature that ILU knows how to map
to HTTP. This base type is only a slight modification to the
type defined in the ILU-Requester work [9]. (The modification
is basically to omit the "connection" argument - this
sort of information can quite easily be contained in a normal
header name/value field.) An abbreviated definition of this type
is shown below using ILU's Interface Specification Language.
Thus, a method named GET HEAD or POST, invoked on
an object that is a direct or indirect instance of this type,
automatically has its Request and Response mapped to/from HTTP
in a manner compatible with existing Web services. The fairly
straightforward mapping is coarsely described in Table 1 below:
The implementation will automatically insert a Content-Length
header if possible and when necessary, and takes care of the colon
separators between header names and values. It will also deal
with older servers that sometimes omits the CR from the required
CRLF line termination.
For other situations, i.e., general ILU-to-ILU communication that
just happens to be occurring over HTTP, the mapping is still
consistent
with the HTTP protocol, but a more general format is used. ILU specific
information such as the ilu_Server ID is placed in a header, and
the marshaling of arguments is done entirely within the entity
body. In keeping with some idea of human readability, marshaled
arguments, with the exception of potentially huge byte-vectors,
are encoded as readable ASCII strings - e.g. 3.1416 encodes as
"3.1416". Readers concerned about efficiency should
note that for general ILU-ILU communication, another protocol
such as ONC RPC is a much better choice than the current HTTP
implementation. The HTTP protocol implementation could, however,
be easily changed to use a more efficient encoding, similar to
what's used in ONC RPC for example.
Currently, the implementation of HTTP in ILU supports only HTTP/1.0.
At the time of this writing, HTTP/1.1 was nearing the horizon,
****
OK to add an editor's footnote, pointing reader to completed
HTTP/1.1 spec in this issue?
ABSOLUTELY!, danl
****
and when actual implementations appear, the ILU support should
be extended as necessary to work with, and take advantage of,
the new version as appropriate.
Second, the C and C++ language mappings in the current ILU
implementation require
that the entire response be in memory before it can begin to be
passed across the wire. While this is not a problem in situations
involving relatively small
content, if many large messages are to
be processed, a significant copying overhead results. As a result,
we must address various approaches
to accommodate these transfer needs by allowing method arguments
and return values to be indirectly referenced (e.g., via pipes).
Finally, HTTP is only one of many protocols
in use on the Web. FTP, Gopher, News, etc.
also fit the fundamental pattern of performing operations
on distributed collections of resources. Just as
ILU was extended so HTTP-based resources could be viewed
as distributed objects and vice versa, ILU could be extended to
embrace these other protocols as well.
[1] W3Objects: Bringing Object-Oriented Technology
to the Web, David Ingham, Mark Little, Steve Caughey, Santosh
Shnvastava, The World Wide Web Journal, Issue 1, Dec 95, O'Reilly,
http://www.w3.org/pub/WWW/Journal/1/ingham.141/paper/141.html
[2] WWW and OOP, Dan Connolly, http://www.w3.org/pub/WWW/OOP/Activity.html
[3] Object Management Group Home Page,
http://www.omg.org/
[4] Inter-Language Unification, Xerox PARC,
ftp://parcftp.parc.xerox.com/pub/ilu/ilu.html
[5] Why ILU? -- /OOP and the Web, Dan Connolly,
http://www.w3.org/pub/WWW/OOP/WhyILU.html
[6] Hypertext Transfer Protocol -- HTTP/1.0, T.
Berners-Lee,
R. Fielding, H. Frystyk, RFC 1945, http://ds.internic.net/rfc/rfc1945.txt
[7] A Web of Distributed Objects, Owen Rees, Nigel
Edwards, Mark Madsen, Mike Beasley, Ashley McClenaghan, The World
Wide Web Journal, Issue 1, Dec 95, O'Reilly, http://www.w3.org/pub/WWW/Journal/1/rtor.085/paper/085.html
[8] CorbaWeb: A Generic Object Navigator, Philippe
Merle, Christophe Gransart, Jean-Marc Geib, Fifth International
World Wide Web Conference, May 6-10, 1996, Paris, France, http://www5conf.inria.fr/fich_html/papers/P33/Overview.html
[9] The ILU Requester: Object Services in HTTP Servers,
Paul Everitt, W3C Informational Draft 07-Mar-96, http://www.w3.org/pub/WWW/TR/WD-ilu-requestor
The httest example contains
2 programs, htserver and htclient, used to test and demonstrate the use
of the HTTP protocol within ILU. These programs show
The htserver program creates 2 objects;
Each object is serviced by it's own ilu server
(that's just how it was written) and each object is also 'Published'
using ILU's simple publish and lookup functions.
Usage: htserver [port_number [ HOSTNAME [verbose]
] ]
The htclient program accepts a URL as it's first
argument. This is treated as an identifier for an http_Resource
object (which may reside in side an existing Web server). GET
HEAD and POST calls are made on this object and the results displayed.
(If the URL is literally NIL, then this test is skipped.)
If the second argument is provided, then htclient
assumes that is should call the GET HEAD and POST methods on the
httpderived_obj0 implemented by htserver, as well as call its
flipcase method using the argument as the argument to the method.
Results of these calls are displayed.
Usage: htclient HttpURL [[string_to_flipcase] [
HOSTNAME ]]
If your site requires use of proxy servers for access
outside your site, and if you wish to try running operations across
this 'firewall', then set the environment variable ILU_HTTP_PROXY_INFO
to be the hostname of your proxy server, followed by a colon (:)
and by the port number of the proxy (e.g.,
wwwproxy.my.site.com:8000).
Note: Only abbreviated output for GET operations
between ILU and existing services is shown. ILU automatically
adds Content-Length headers where required - these are not shown
in the program's output.
1. To illustrate ILU operating with / obtaining
an existing Web server document
2. To illustrate an existing Web Browser accessing
the htserver supplied object
3. To illustrate ILU interacting with ILU using
HTTP as the means of General Object Method Invocation
(*-------------------- Header related Types ----------------------- *)
TYPE field-name = ilu.CString; (* a header field-name *)
TYPE field-value = ilu.CString; (* a header field-value *)
TYPE optional-field-value = OPTIONAL field-value; (* value optional *)
TYPE Header = RECORD (* message header *)
name : field-name,
value : optional-field-value
END;
TYPE HTTPHeader = Header;
TYPE HTTPHeaders = SEQUENCE of HTTPHeader; (* all the headers *)
(* -------------------- Entity Body related Types ----------------- *)
TYPE EntityBody = SEQUENCE of BYTE; (* the entity body *)
TYPE OptionalEntityBody = OPTIONAL EntityBody; (* bodies optional *)
(* -------------------- Request URI related Types ----------------- *)
TYPE RequestURI = ilu.CString;
(* -------------------- Full Request Types ------------------------ *)
TYPE Request = RECORD (* 'mostly' a http full request *)
URI : RequestURI,
(* This can be the absoluteURI or abs_path uri - including params,
queries, etc. (if it's the full absoluteURI or abs_path, then the
scheme, netpath, and path portion of this should be http:, the
netpath should agree with the server id, and the path the same as
the object ID although this isn't checked), OR more commonly it can
be just the params, queries, e.g. ;foo;bar?zap *)
headers : HTTPHeaders,
(* the general, request and entity headers NOTE: if the user didn't
supply a Content-Length header, ILU's http will automatically put
in a Content-Length header if an Entity body is supplied. Note that
when responding to a HEAD method then (since there is no body) the
user should supply a Content-Length header. *)
body : OptionalEntityBody (* may or may not be a body *)
END;
(* -------------------- Response related Types -------------------- *)
TYPE StatusCode = ENUMERATION (* some possible status return codes *)
OK = 200,
Created = 201,
etc.
END;
TYPE Response = RECORD (* a http full response *)
status : StatusCode, (* status of servicing the request *)
headers : HTTPHeaders, (* general, response and entity headers *)
body : OptionalEntityBody (* may or may not be a body *)
END;
(* -------------------- Resource Object --------------------------- *)
TYPE Resource = OBJECT (* object that knows standard http methods *)
TYPEID Ilu_Http_1_0_resource_object
(* std. http 1.0 methods, each takes request, & returns response *)
METHODS
GET (request: Request) : Response,
HEAD (request: Request) : Response,
POST (request: Request) : Response
END;
Table 1 : ILU HTTP Interface to HTTP Mapping
ILU HTTP Interface
HTTP Protocol ILU Method Name Method name in
Request's Request-Line
(If using a Proxy server, scheme + location of
object +)
ILU Object ID + any params/queries present in the Request.URI field
Request-URI in Request's Request-Line
Request.headers Request-Headers
Request.body Entity-Body in
Request
Response.status Status-Code and
Reason-Phrase in Response's Status-Line
Response.headers Response-Headers
Response.body Entity-Body in
Response
Directions
References
Appendix - httest Example - Description and Output
htserver overview
htclient overview
Use of Proxy Servers
Examples of Running
>htclient http://pundit.parc.xerox.com/simple.txt
[The Request sent to the Web Server]
---------------- Resource Test --------------------
Request: (Note all values are shown between >< s)
URI = >http://pundit.parc.xerox.com/simple.txt<
Number of headers = >1<
Header 0
field-name = >User-Agent<
optional-field-value = >ILU-HTTP-Object-Client/1.0<
Body is:
>Sample Request Body Bytes<
[The response from the Web server]
---------------------------------------------------
Calling GET on http_obj ---------------------------
Response: (Note all values are shown between >< s)
Status = >200<
Number of headers = >7<
Header 0
field-name = >Server<
optional-field-value = >HTTPS/0.96<
Header 1
field-name = >Allow<
optional-field-value = >GET HEAD POST<
Header 2
field-name = >MIME-version<
optional-field-value = >1.0<
Header 3
field-name = >Content-type<
optional-field-value = >text/plain<
Header 4
field-name = >Date<
optional-field-value = >Thursday, 18-Apr-96 4:12:27 GMT<
Header 5
field-name = >Last-modified<
optional-field-value = >Thursday, 18-Apr-96 4:20:9 GMT<
Header 6
field-name = >Content-length<
optional-field-value = >75<
Body is:
>This is the first line of simple.txt
This is the last line of simple.txt
<
[Begin the server program]
>htserver 80 pundit verbose
[Now a browser is asked to retrieve the URL
http://pundit.parc.xerox.com/http_obj0]
------------------------------------------
_server_http_Resource_GET called
Request: (Note all values are shown between >< s)
URI = >/http_obj0<
Number of headers = >4<
Header 0
field-name = >Connection<
optional-field-value = >Keep-Alive<
Header 1
field-name = >User-Agent<
optional-field-value = >Mozilla/2.0 (WinNT; I)<
Header 2
field-name = >Host<
optional-field-value = >pundit.parc.xerox.com<
Header 3
field-name = >Accept<
optional-field-value = >image/gif, image/x-xbitmap,
image/jpeg,
image/pjpeg, */*<
Body is:
>NIL<
[The Browser's display now contains]
server_http_Resource_GET
htserver 2718 pundit t
htclient http://pundit.parc.xerox.com:2718/http_obj0
FlipMyCase
or to show raising an exception
htclient http:// pundit.parc.xerox.com:2718/http_obj0
raiseerror