[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] In HTML: XML Documents Are Objects! (or "Killing OO Softly With XML")
It looks like my last post was messed up a bit. Here's a text intro with the entire document attached in HTML. Sorry for the double post. ======================================================================= "Wouldn't it be nice if one could simply tell an object to serialize to XML, and then deserialize back into an object?" As programmers do you long for the old days when data was data and code was code? Do you buy into the idea that the behavior associated with data should be embedded within the application so as to restrict reuse of that data? Ah, the good old days of relational databases! In its current usage XML is enabling you to revisit those days again... but don't be persuaded by the dark force! Put on your OO glasses and see the light! Sure, XML provides incredible potential, and I am all for it. But in their current form, XML documents are nothing more than mobile semi-structured non-object databases (this is pretty cool, but not OO). Why is it that programmers have suddenly forgotten all about objects just so they could write XML? Is a return to relational databases that enticing? (Bleech!) The only practical reasoning behind such an approach is that programmers want to keep their data private. They don't want other applications to have the ability to reuse that data. They accomplish this feat by embedding all of the code associated with that data (formally called "behaviors" in the OO era) in their own applications. [Who's running this show anyway? Is XML some kind of conspiracy to kill OO?] Here's a simple example. You write an application that converts unformatted poems into composite poem objects rich with behavior. You want to store these poems, and share them with other applications that want to do things with poems (whatever it is you do with poems). You define an XML structure and start generating XML documents as a means to store and share the poems. Every application (including yours) that reads in your poems using an XML parser will see the poem as something similar to: [This XML document was taken from an example accessible at the Microstar website (distributors of the AElfred XML parser). The file name is donne.xml. Below is the parse tree for this document.] root |-> Element |-> Element |-> Element | |-> Element | |-> Element |-> Element | |-> Element |-> Element |-> Element |-> Element |-> Element Pretty impressive right? It sure doesn't look like a poem object does it? Once this structure has been generated every single application will need to supply its own code to understand how to navigate and interpret this structure, and provide behavior for it. This is typical if you are a C programmer, but be clear, this isn't OO. And, while DOM takes us a bit farther, you still won't get the parser to produce a poem object and its poem-specific behaviors from the XML document (but we still want DOM!). The process of generating XML strips the behavior out of the objects; or, saying it differently, XML and related standards do not describe a mechanism by which one can attach behavior to XML documents. The parser, in turn, cannot therefore work miracles when it reads the data (which are no longer objects) back into the application. Or can it? Why can't we view XML as a serialized object representation? If we agree that this is not too far fetched, then why can't parsers deserialize or objectify the objects contained in the XML documents, rather than simply handing us data and making the applications do all of the work? What if the parsers generated real classes (with behavior!) instead of generic Element classes? The poem above would instead look like this: (perhaps if we talked about XML documents as orders (or anything else) instead of poems it might be more motivating?) root |-> poem |-> front |-> title | |-> author | |-> revision-history |-> item | |-> body |-> stanza |-> stanza |-> stanza |-> stanza Oh, but could it be that simple? (The answer is "yes.") Would having a parser output objects with type-specific behavior be useful? (Hmm...) Would programmers really want to share their objects if they could? (The answer should be "yes.") Even if they didn't want to share their objects, or if nobody wanted their objects, why violate the principles of OO and make the programmers' lives more difficult? Wouldn't it be nice if one could simply tell an object to serialize to XML, and then deserialize back into an object? With some VERY simple extensions to current parsers this can occur, and already has -- we've created an extended version of the Lark XML parser which provides this capability. Our input to this extended parser is the XML document and the type-specific classes (like poem) extended with the basic ability to deserialize themselves. The details are described in the attached document. The enhanced version of Lark is freely available on request. Paul. -- ******************************************************************** Paul Pazandak pazandak@o... Object Services and Consulting, Inc. http://www.objs.com Minneapolis, Minnesota 55420-5409 612-881-6498 ******************************************************************** XML Documents Are Objects!Killing OO Softly With XMLAs programmers do you long for the old days when data was data and code was code? Do you buy into the idea that the behavior associated with data should be embedded within the application so as to restrict reuse of that data? Ah, the good old days of relational databases! In its current usage XML is enabling you to revisit those days again... but don't be persuaded by the dark force! Put on your OO glasses and see the light! Sure, XML provides incredible potential, and I am all for it. But in their current form, XML documents are nothing more than mobile semi-structured non-object databases (ohhh so close! But not quite enough). Why is it that programmers have suddenly forgotten all about objects just so they could write XML? Is a return to relational databases that enticing? (Bleech!) The only practical reasoning behind such an approach is that programmers want to keep their data private. They don't want other applications to have the ability to reuse that data, and they accomplish this feat by embedding all of the code associated with that data (formally called "behaviors" in the OO era) in their own applications. [Who's running this show anyway? Is XML some kind of conspiracy to kill OO?] Here's a simple example. You write an application that converts unformatted poems into composite poem objects rich with behavior. You want to store these poems, and share them with other applications that want to do things with poems (whatever it is you do with poems). You define an XML structure and start generating XML documents as a means to store and share the poems. Every application (including yours) that reads in your poems using an XML parser will see the poem as something similar to: [This XML document was taken from an example accessible at the http://www.microstar.com/ website (distributors of the AElfred XML parser). The file name is http://www.microstar.com/XML/donne.xml. Below is the parse tree for this document.] root |-> Element |-> Element |-> Element
Pretty impressive right? Then every single application will need to supply its own code to understand how to navigate and interpret this structure, and provide behavior for it. This is typical if you are a C programmer, but be clear, this isn't OO. And, while DOM takes us a bit farther, you still won't get the parser to produce a poem object and its poem-specific behaviors from the XML document (but we still want DOM!). The process of generating XML strips the behavior out of the objects; or, saying it differently, XML and related standards do not describe a mechanism by which one can attach behavior to XML documents. The parser, in turn, cannot therefore work miracles when it reads the data (which are no longer objects) back into the application. Or can it? Why can't we view XML as a serialized object representation? If we agree that this is not too far fetched, then why can't parsers deserialize or objectify the objects contained in the XML documents, rather than simply handing us data and making the applications do all of the work? What if the parsers generated real classes (with behavior!) instead of generic Element classes? The poem above would instead look like this: (perhaps if we talked about XML documents as orders (or anything else) instead of poems it might be more motivating?) root |-> poem |-> front |-> title
Oh, but could it be that simple? (The answer is "yes.") Would having a parser output objects with type-specific behavior be useful? (Hmm...) Would programmers really want to share their objects if they could? (The answer should be "yes.") Even if they didn't want to share their objects, or if nobody wanted their objects, why violate the principles of OO and make the programmers' lives more difficult? Wouldn't it be nice if one could simply tell an object to serialize to XML, and then deserialize back into an object? With some VERY simple extensions to current parsers this can occur,
and already has -- we've created an extended version of the http://www.textuality.com/Lark which provides this capability. Our input to this extended
parser is the XML document and the type-specific classes (like poem) extended
with the basic ability to deserialize themselves.
IntroductionXML documents are indeed objects, or at least they could be. If we simply associate behavior with the data structures defined within the XML documents we could have normal, living, breathing objects... like we're used to in the programming world. Instead of having the parser breathe life back into our objects, as part of the deserializing or re-objectifying the object, we are forced to do this within our applications. Simply put, parsers aren't doing enough for us.XML parsers currently support non-portable object specifications. While the XML documents themselves are portable by virtue of being written in XML, the objects represented by those documents are cannot be objectified without an accompanying document-specific application which interacts with the parser. Current XML parsers provide the ability to parse an XML document, and
perhaps generate a generic object structure (parse tree) corresponding
to the document. However, XML documents could potentially represent more
than simple structured documents, they could describe complex objects with
behavior. Common (simple) examples of XML documents include address lists.
But making use of this information requires each application which desires
to consume address lists to write parser-related code, as well as code
to implement the behaviors of the address lists and their entries. We propose
a simple extension to parsers which would all but eliminate application-parser
interaction and the need for document handlers (which do not migrate with
the XML document), and would facilitate objectifying XML documents into
type-specific objects (like we're used to having in the programming world)
having all related behaviors intact.
BackgroundCurrent XML parsers generate generic parse trees (most do anyway). These trees represent the structure of the data that was parsed. But what is missing is the behavior associated with this data. While there are methods associated with the generic parse tree elements, these are not data-specific but rather generic methods (see the sample code). This approach places the burden on the application to deserialize the document back into objects using the generic calls and a lot of validating code. This is true of all current XML parsers (which support parse tree generation).Once the XML document is parsed the information needs to be retrieved by the application, so it must access it from the parse tree (if one was generated -- see the note on problems with event-based parsing). In general, the consuming application may proceed in one of two ways to accomplish this:
The application will march down the structure, extracting out and consuming the data as it goes. This requires making calls using the generic parse tree methods (parser-specific -- SAX doesn't support a parse tree API). The application copies the data out of the generic parse tree into type-specific structures (e.g. Java objects) which contain type-specific definitions. The data is then accessed by the application using the type-specific API of these new structures. An ExampleHere's an example to illustrate this. This XML document was taken from an example accessible at the Microstar website (distributors of the AElfred XML parser). The file name is http://www.microstar.com/XML/donne.xml. When an XML parser generates a parse tree for this document, the resulting (informative) tree will look like the following in Lark (and similar in the other parsers as well):root |-> Element |-> Element |-> Element
The Element entries are the objects created by XML parser corresponding to the Element Declarations in the XML document. To determine what each element is, the application must navigate the structure and inspect each Element object using a generic API. This requires that the knowledge of how to navigate the structure is embedded within the application. The interface of this object must be embedded within the application as well which really violates the object-oriented paradigm -- yes, the data is stored in objects, but the associated type-specific behavior is stored someplace else. While this may appear similar to how objects are serialized today (without code), the distinction is that any other application that wants to access this object will not have access to the code since it is buried in the application which created it. All other applications will have to provide their own code (this, again, is how applications for relational databases are written). There are several other problems with this approach, not the least of
which is that the application should not be responsible for doing this.
Furthermore, the parser-related code required to walk a complex structure
is complex itself (not quite as complex as code used for event-based parsing
of complex structures however), and is more difficult to maintain. Finally,
the application is forced to do what the parser has already done, that
is understand and navigate the structure of the document. The parser has
already gone through the entire document and generated a structured instantiation
of objects. The crux of the problem is that the parser generates generic
objects which forces all of this additional work on the application. Worse
yet, there is no reason this has to occur -- nor does the (tree-generating)
parser have to be significantly modified.
Event-based parsingAn alternative to tree generation is simply to consume the structure on-the-fly as it is parsed. This requires writing an XML structure-specific handler (a document handler in SAX terms) which describes what should happen for each XML declaration that is encountered; no structure is automatically generated, so if objectification of the XML document is desired the handler is responsible for this. Using event-based parsing the application could adopt either of the above two approaches, the first being simple consumption and the latter which would cause the construction of some structure corresponding to the XML document. In both cases, at least for complex XML structures, there would be a lot of conditional segmented code which is more difficult to write and modify when changes in the XML structure occur. Using the extension proposed the majority of the work is done by the tree-generating parser, empowering the application to see XML documents as objects and alleviating their burden of using event-based parsing.Granted, when an application will only encounter one kind of XML structure,
event-based parsing might be a reasonable approach from the standpoint
that only one handler would need to be written. But it still suffers from
some of the same problems as generic parse tree generation (see the summary
section).
XML Parsers ExtendedWhat if the output of the parser was a type-specific structure which coincided with the definition of the structure in the XML document? And, what if that resulting objects contained the type-specific behavior for the specific element type parsed? What if the resulting parse tree for the example above instead looked like:root |-> poem |-> front |-> title
where poem, front, body, title, author, revision-history, and stanza were all classes with type-specific behavior? Instead of writing something like the following to retrieve the title of the poem: Element front = null;
one could simply write: poem.getTitle(); More importantly, all of the behaviors that should be associated with each of these object types would be defined as part of the object interfaces themselves rather than embedded within the application. Granted, an application can generate this same structure using the transformation
/ mapping technique above. However, this is partially a duplication of
effort since it requires the application to navigate the structure generated
by the parse tree, and then generate a new structure which mirrors the
parse tree. The extension to Lark eliminates the need to do this because
it instantiates the correct type-specific parse tree the first time.
Note that this is an extension to Lark, and therefore applicable to any
XML document.
DetailsWhat occurs in the underlying implementation of an XML parser is rather straightforward. When it sees an XML element declaration, it instantiates a generic Element object (with Element only related methods). The extension to Lark simply extends the behavior of the parser so that instead of instantiating generic Element objects, it instantiates type-specific ones.So when the parser encounters a new element declaration, it looks for a class declaration which identifies which class to instantiate in lieu of a generic Element class object (where it looks is described below). For example, when the parser identifies the "poem" element declaration, it looks for a class declaration for poem. If it finds one, it instantiates an object of that class rather than a generic Element object. The poem class extends the interface of the Lark Element class, but in addition, adds type-specific methods relevant to a poem object. Within a type-specific parse tree class, like poem, is code which understands how to extract the parsed information. In effect, the object understands how to investigate itself. This code is provided by the object type creator. It will travel with the object as a means to facilitate re-objectifying the XML back into an object. This enables reuse of the object by any application. Of course, as stated above, the poem class will also provide a poem-specific interface. A method I have added to the Element class is process(). It can be called once an element has been parsed. In each implementation, for example within the poem class, the process() code handles extracting the data from the inherited generic structures of the Element class. Alternatively, poem methods could simply be written that do this directly. But, it is important to note that the object itself is doing this, and further, that no other parse trees or duplicate structures are being constructed. The location of the class declaration is not hard-coded. It could be within the XML file itself, in a DTD, in a stylesheet, or in a remote repository, for example. In addition, local class declarations may be used to override default class declarations. In the implementation of the Lark extension, I have simply embedded them in the DTD file along with the declaration of the structure of the XML file. In its current form the class declaration would look like the following for the poem example above, although there would be many ways to accomplish this: <!ENTITY Poem-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.poem">
The ClassSuffix is used to avoid possible naming collisions (which may
be solved otherwise using the XML namespaces proposal). So, when
a new element declaration is identified by Lark it inspects this list looking
for an entry matching the pattern <element type><ClassSuffix>, or
in the case of the poem element declaration, "Poem-Class".
Cavaet Language?Is this a language-specific extension? Not really. The class declarations could be (for example) written in Active-X I suppose, or even wrapped in CORBA, thereby enabling any language to take advantage of the idea of XML documents as objects. It would up to the parser to find the correct class declaration and objectify accordingly.Implementation ExperienceMy experience with XML parsers began last year. As part of a DARPA-funded project I am implementing an architecture to demonstrate scalable object service architectures. I started using event-based parsing as a means to import object service specifications. These XML specifications represent real (Java and CORBA) services that are invoked by the architecture.I noticed that by adopting an event-based approach to parsing I would have to write a lot of code which would be difficult to maintain should I have changes in the future. In addition, this code would be hard for someone else to understand since each parser callback method would include conditional statements for several types of elements, and the code would be spread across several methods. I prefer a clean separation of code whenever possible, and this didn't seem very clean. I decided that tree parsing was a more practical route. The parser would automatically generate a structure for me. But, then I realized that I had to write all of the code to navigate this generic object structure, pull out the information I wanted, and then copy it into service specification objects having the behavior I wanted. Since the parser was already generating classes, why not just tell it
to generate the real classes to begin with? The classes themselves
would handle deserialization. Sounds like OO to me! With modest changes
to Lark, when it sees an XML service specification document it will generate
service specification objects right away. This extension will work for
any XML document which defines specializations of the Element
class and makes them available to the parser. Besides asking Lark to parse
the document, my application has no other parser-related code. Furthermmore,
any other application can use my XML service specification documents, and
load them in as service specification objects with only a few lines of
code.
SummaryIn summary, an extension has been presented which extends the capabilities of Lark, but which could be applied to all tree-generating XML parsers. It enables type-specific composite object construction to occur within the parser which is a significant improvement over generic parse tree construction because:
If this proposed extension were adopted it would benefit significantly
from a standardization of the Element interface (something that
will happen with DOM). In this way, the associated class files would not
be parser-specific, and therefore any XML document could be objectified
by any tree-generating parser.
StatusI anticipate that the extensions I have made to Lark will be incorporated into a next version of Lark (I assume this from previous dialogues I have had with Tim Bray). If not, and in the meantime, the enhanced version of Lark is freely available on request.References & AcknowledgementsRelated work in this area is described in http://www.objs.com/OSA/wom.htm by Frank Manola, Object Services and Consulting, Inc. Thanks to Frank Manola (OBJS, Inc.) and Tim Bray (Textuality, Inc.) for their useful feedback.
This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government. © Copyright 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this document.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|