Addressing the Enterprise: Why the Web needs Groves
I'm posting the first few sections of http://www.prescod.net/groves/shorttut/ I've rewritten a lot of it to demonstrate the importance to the Web. I am hoping to generate some discussion on XML-DEV and xlxp. -------------------------------------------------------------------------------- 1. Introduction This paper is a high level introduction to the grove paradigm. Just as SGML was a hidden jewel buried in among the ISO standards for screwdriver heads, groves are another well-kept secret. The time has come to make "groves for the Web". This document should be relevant to the people that would do the specifying and coding to make groves available on the Web, but also to technically-oriented managers that are not interested in the fine details. This document is intended to explain the importance of the grove paradigm to the It is intended to clarify in people's minds what the result of parsing an SGML or XML document should look like. Some variation on the grove model could be imagined, but the basics of the model seem fundamental and unavoidable to me: for instance, W3C's DOM reflects the same basic concepts. Groves were invented to solve the problems that had become revealed at a particular point in the development of the SGML family of standards. XML has reached the same point so the time is right to popularize the grove idea. Please send me your comments on this document. It will eventually become an ISOGEN technical paper, but it is still rough. -------------------------------------------------------------------------------- 2. Background 2.1 The Problem In the early and mid-1990s, the ISO groups that were responsible for the SGML family of standards realized that they had a large problem. The people working on the DSSSL and HyTime standards found that they had slightly different ideas of the abstract structure of an SGML document. Understanding an SGML document's structure is easy for simple things, but there are many issues that are quite complex. For instance, it is not clear whether comments should be available for a DSSSL spec. to work on, or whether they should be addressable by hyperlinks. It isn't clear whether it should be possible to address every character, or only non-contiguous spans of characters. Should it be possible to address and process tokens in an attribute value or only character spans? Should it be possible to address markup declarations? XLink and XSL must solve all of the same issues. Although this paper will discuss many problem domains, the reader should keep in mind that addressing is the central one. If you cannot address information (e.g. through a URL) then you cannot do anything else you need to it: such as retrieve it, bind methods to it, attach metadata to it, apply access control lists to it, render it, work with it in a programming language and so forth. Addressing is the key. Value follows naturally and immediately. The reason that addressing into XML (and other data formats) is ill-defined is because the XML specification speaks of the syntax of the XML language, not the abstract, addressible objects encoded in the document. Linking and processing are done in terms of some data model, not in terms of syntax. When you make a link between two elements, you are not linking in terms of the character positions of the start- and end-tags in an SGML or XML entity. You are linking in terms of abstract notions such as "element", "attributes" and "parse tree". The role of an XML parser is to throw away the syntax and rebuild the logical ("abstract") view. The role of a linking engine (such as a web browser) is to make links in terms of that logical view. The role of a stylesheet engine is to apply formatting in terms of that logical view. Unless stylesheet languages, text databases, formatting engines and editors share a view, processing will be unreliable and complicated. It is not very common for XML and SGML applications and toolkits to provide all of the information necessary for building many classes of sophisticated applications, such as editors. There is not even a standardized way for an toolkit to express what information from the SGML/XML document it will preserve. Even if two toolkits preserve exactly the same information, it is quite possible that they use different terminology to describe the information. In some cases, APIs might be identical except that they use different structures to organize the information! But those one or two features could make navigating the APIs very different. In the software engineering world we have a technique for avoiding this sort of problem: modelling. Using languages like the Unified Modelling Language (UML) we can build sophisticated, intricate models of the world that can be independently implemented and yet interoperable. I can hand a model of a human resources application to a developer on the other side of the planet and we can build logically compatible applications. Of course UML is at a very high level. The precise expression of an object in a particular programming language or system is not fixed by UML. The UML is a mathematical expression of the entities and relationships in a problem domain. It doesn't usually translate directly into code or APIs. That is why we also have to use more concrete object description languages such as IDL, ODL and STEP Express. 2.2 Why Not the DOM The W3C has partially addressed this situation with a specification called the Document Object Model (DOM). Unfortunately the DOM is not really an object model in the abstract sense. It is rather just a collection of IDL interfaces and some descriptions of how they relate. This is different from an abstract object model because it is too flexible in some places and not flexible enough in others. The DOM is too flexible in that it is not rigorous enough to be a basis for addressing. For instance the DOM says that a string of four characters could be broken up into multiple text nodes or treated as a single one. If we describe addresses in terms of DOM text nodes, those addresses will be interpreted differently by various DOM implementations. This is one reason that XPointer and XSL are not defined in terms of the DOM. This weakness of the DOM is fatal for using it for addressing but it is also annoying for programmers. In some cases they must write special code to work with documents that have different text breaking algorithms because the DOM has given implementors too much flexibility here. It puts their ease of implementation above the ease of coding for DOM users. In other ways te DOM is not flexible enough. One important weakness is that it is defined in IDL which does not permit much variation in language mappings and bindings. We have found this very limiting in the Python and Perl worlds. With these high level languages there are more convenient ways of mapping the high level XML concepts into APIs than the ways dictated by CORBA. If we use these ways instead of the DOM ways, however, our APIs are conformant to the DOM only in spirit, not in terms of the formal detail of the specification. 2.3 Defining views The DOM has a more important inflexibility. It would be useful for the programmer using the DOM to be able to define whether all adjacent text nodes are merged or not. There is a "normalize" method that attempts to provide this feature that method actually modifies the tree. All viewers must see the same view. Another useful view is one in which every character is a separate node. That view allows us to address individual characters very easily. Another view might provide DTD information for a document. Yet another view would provide linking information. Still another view would attach RDF properties to the DOM. We can also make views that are simpler than the default DOM view. We could have a view that got rid of CDATA nodes and treated them just as text. Another view might remove processing instructions based on the principle that many applications do not use them. It would also be very nice to be able to remove "insignificant" whitespace from a view. The W3C is working on a subset of XML to make XML easier to process for parsers but there is no such spec to make simpler DOMs for application writers. Let's take this back to the addressing realm for a second. Given all of these views of a document, we could do things like query for the node with the "author" RDF property with value "Paul" or for the node that is reference by a particular hyperlink or for the third character of an element and so forth. There is an important truth here. Every time we create a new specification built on XML, we implicitly define new properties that should be attached to the nodes: almost a whole new data model! Consider a document type based not only on XML but also on XLink, namespaces and RDF. That document has many different views. Here are some obvious ones: * First there is the low-level XML view. It would have elements, attributes, characters and so forth. * Then there is the namespace view built on top of that. A "namespace engine" would add some namespace information to the tree. It would probably hide namespace attributes that were visible in the lower view. * Then there is yet another view that adds hyperlinking information. The engine that provides this view can let us know whether a node is an anchor or a link. On top of that there could be a view built specifically for that document type. It would understand the constructs in the document type and make them available to a programmer as objects with properties. Let's step back for a minute again. If we can make all of these views available to the programmer in some simple, consistent way, then we could surely make them all available to people doing querying and addressing also. That means that we could make a query language that could do queries based on constructs from all four levels! We could also easily define query languages that were specialized and optimized for a particular level. The way we currently handle this is through different "levels" of the DOM. The second one is being worked on right now. These levels tend to lag behind the specifications that they are supposed to work with by months or years. There is a DOM for XML, HTML and CSS, but nothing for namespaces, RDF, XLink, XSL queries or XHTML. There is a single group within the W3C that will be responsible for building all of these "levels" of the DOM. This group of intelligent, well-meaning people is the most fundamental bottleneck in the standards world today. Even if all of the DOM people gave up their day jobs and became full-time DOM builders they could never keep up with the amount of innovation occurring within the W3C. Consider then, that the problem is in no way limited to the W3C. People are building little XML-based languages with their own data models all over the Web. A central API bottleneck is not inconvenient: it is impossible. The DOM cannot be a universal API for all XML-based languages. The XML Information Set project is similar to the DOM except that it works in terms of abstractions instead of APIs. That is an important first step. But the Information Set is designed only for a single view of XML with certain optional features. It does not seem at this time that it will be possible for "end-users" (programmers and query-writers) to tweak the views. It also does not seem that the model is designed to be extensible to completely new views. In other words it provides the very bottom layer but does not define the infrastructure to build the upper ones. In the ISO world we solve this problem by farming out the definition of data models. A "property set" is a formal model for a view of an XML document. A property set is half way between an abstract, unimplementable UML data model and a narrowly defined IDL definition. It speaks in terms of the higher level concepts that are the basis of hypermedia and so can be implemented conveniently in high level programming languages. The important thing about property sets is that they embed and embody the requirements necessary for a data model to be useful in a hypermedia context. That means that every node in the grove is addressable. It is easy to construct an address for any given node (for instance the character under a mouse click) or node list (e.g. a selected list of characters). 2.4 Hypermedia to the Max Now we get to the really exciting thing about property sets. You can build property sets for views of XML documents. Those are extremely useful and powerful. Even more powerful, though, are property sets for things that are not even XML. SQL databases and OLE objects can have property sets. LaTeX files can have property sets. People have defined experimental property sets for CSS, CGM and for something as abstract as legal documents. After all, a property set is just a simple data model. You could define UML models for all of those types. Defining a property set is no harder. But property sets have a huge benefit over UML: once you define a property set for a data object type, that data object becomes addressible. This means that every subcomponent of every data object in an enterprise is potentially addressable. The important point is that you do not have to convert all of your data resources into XML or HTML to make them addressable. You may need to turn them into XML or HTML to render or transfer them between machines, but there are many other things that we need to be able to do with addressable resources. We can attach access control lists to to them, make hyperlinks to them, attach metadata to them and so forth. Some "XML people" have this idea already but they express it in terms of "building a DOM" over some non-XML resource. The idea is right but the expression of the idea is wrong. The logic goes this way: We want an addressable data model for the resource. XML has a data model. Therefore let's use the XML data model for the resource. This model seems logical but it is inefficient both in terms of computer time and programmer time. If the underlying object is a relational database then it makes no sense to take numbers (for example) and encode them as strings so that the client application can unencode them back to numbers. Similarly, it makes no sense to turn a database record into an XML element so that the final application can think of it as a record again. If all you need to do is address a database record then what you want to do is the minimum required to turn a database record into something addressable. The grove model is designed so that defining a property set for the database is the minimum you have to do. In this case you can forget about XML altogether! In buzzword-speak this is "addressing the enterprise." Every data object in an organization from the smallest non-profit to the largest multinational can be addressed through a single data model and query language. You might also think of these as meta-models and meta-query languages in that the grove model and its associated query language give you a framework for defining the details of more precise models and richer query languages. Let me say again that rendering the document is another matter altogether. Given an address, the easiest way to render the object might be via XML. For a database record this might be the case. For a slide within a PowerPoint document, however, the easiest way to render it might be through OLE. Addressing is separate from rendering. Groves allow you to say that you want to see the slide. OLE or XML/XSL might provide the technology that you need to actually see the slide. Without groves, hypermedia addressing is very poorly defined. For instance, how do you, today, make a hyperlink to a particular frame in an MPEG movie, or a particular note in a midi sequence? How would you extract that information in a stylesheet (for instance for sequencing a multimedia hyperdocument). It makes no sense to address in terms of bytes, because often a single logical entity, like a frame, is actually spread across several bytes and they may not be contiguous. Addressing in terms of characters would make even less sense because MPEG movies and midi sequences are not character based. The web solves this problem by inventing a new "query language" (in the form of extensions to URLs called "fragment identifiers") for each data type. This more or less works, but it leads to a proliferation of similar, but incompatible query languages doing the same basic thing. These languages have different syntax and underlying models. This brings us to the next point: implementation. Under today's W3C way of doing things you would implement a hypermedia browser (e.g. SMIL) by hard-coding support for each different type of query for each type of playable object. If resources hyperlinked to each other through these fragment identifers, the implementation engine would have to implement separate query languages for each fragment identifier type. That is an annoying waste of time. Consider, the issue of metadata attached to parts of media objects through links. For instance I might add a title to an MPEG frame so that I could locate it later. Or I could add a pop-up-video style annotation. Usually this would be implemented as some sort of on-disk or in-memory database. In one column of a record you would have the properties that you want to attach (expressed somehow). On the other side you need to have things to attach them to. We need a generic term for "things that you can address within media objects." Generically, we could call these "nodes" and "node lists." As soon as you make that leap to describing the targets of references generically, regardless of media type, you have essentially reinvented groves. It follows that standards like RDF implicitly depend upon a (currently underdefined) concept similar to groves. Instead of reinventing them, however, you have the option of using an international standard that specifies them! I hope that one day there will also be a W3C standard that does something similar. More at: http://www.prescod.net/groves/shorttut/ -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco "Silence," wrote Melville, "is the only Voice of God." The assertion, like its subject, cuts both ways, negating and affirming, implying both absence and presence, offering us a choice; it's a line that the Society of American Atheists could put on its letterhead and the Society of Friends could silently endorse while waiting to be moved by the spirit to speak. - Listening for Silence by Mark Slouka, Apr. 1999, Harper's xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format