Subtyping in XML
Mike Champion wrote: > > 9/2/2002 8:02:30 PM, Paul Prescod <paul@p...> wrote: > > >> ... > >> "Inheritance is a complex type's only advantage, but you really > >> don't want to use it." > > > >Yep! After a year of off-and-on research I concluded that trying to > >import OO-style inheritance to schemas was a bad idea. > > Paul, did you write up the results of that research anywhere? > I looked for a link on www.prescod.net and didn't find any > rants about schemas and type inheritance. No, that which is feasible and not feasible never became that clear in my own mind. Let me be more precise in my conclusions. Basically after working with type various approaches to unifying the OO model (specifically inheritance) it became clear to me that unifying the OO and tree grammar (i.e. DTD) models is quite difficult and very easy to get wrong by accident. I've seen bits of the argument expressed by various people in messages over the years but never a single rant. I know you didn't ask for justification but I feel that having made the claim I feel I should back it up if anyone was interested (which it seems you are). Still, I'll be scattered, not organized. Nevertheless, let's see what I can find. I first started thinking about this stuff around 1997, I guess: * http://www.geocrawler.com/archives/3/318/1997/10/100/1765820/ The section on "Subclassing" suggests the gist of the problem. The core sentences are: "This is a little bit of an inversion from OOP, because in OOP a subclass must accept any 'input' that a parent class can. We think of content models as "accepting input". "Generally speaking, attributes seem more intrinsically amenable to concepts of subclassing than content, because they are "random access" in some sense, as are methods in OOP. Perhaps in adding OO features to SGML we will also choose to make attributes more powerful (for instance by allowing them to have content models and explicit substructure like elements)." But we didn't make attributes more powerful. Instead XML Schema just added inheritance and ignored the potential problems. Whatever middling enthusiasm I have for the semantic web technologies derives from the fact that they provide a much cleaner basis for inheritance, extensibility and property-based data access (which is strongly associated with OO). Let me try to summarize the problem with XML Schema inheritance this way: The defining characteristic of subtyping in OO languages is that if the subtype is properly designed it *will not break code* written for the supertype, whether the supertype predicted your extensions or not. This can be achieved with XML Schema inheritance only if the people writing code for the supertype practiced a high level of discipline. In other words, subtyping "just works" for clients in OO languages. It takes (IMO) unacceptable levels of discipline in XML Schema. XML Schema is better than DCD was when it comes to inheritance, but it is still somewhat susceptible to the issue I discussed here: * http://lists.xml.org/archives/xml-dev/199901/msg00517.html No, you can't break client applications by extending a union in XML Schema, but let me give a trivial example of where you could break a naively created client application: section = title, para+ Standard, fairly naive, XSLT says: title -> chapter_title para -> paragraph Output schema says: chapter_title, paragraph Some yahoo (perhaps trying to crash your system) extends the section to: section = title, para+, title The schema says, "Yeah, that's a valid extension" (despite the fact that it violates Liskov). The XSLT faithfully does it thing and returns: chapter_title, paragraph, chapter_title Now you've tricked the app into generating bad data without violating the input schema. It is VERY DIFFICULT to trick an object oriented program in this way because the extension mechanism is based on named properties that *cannot* interfere with each other. More recently, Don Box has been edging towards the same ideas: * http://www.gotdotnet.com/team/dbox/spoutlet.aspx?key=2002-07-29T13:10-08:00 "One of the features that really hooked my on XML Schema was derivation by extension and xsi:type. This mechanism worked very similar to the object marshaling and serialization world I had cut my teeth on, and for several years, I viewed the XML type system through these glasses. Obviously, as the years have passed, I've become slight more catholic in my views thanks to the influence of people like Allen Brown, Matthew Fuchs, Simon St. Laurent, and Martin Gudgin. Today, the top of my head blew off (yet again) while listening to Martin Gudgin giving a talk on XML Schema to my team. Specifically, while he was explaining some of the more esoteric aspects of derivation by restriction, I saw the light." Dare says: * http://lists.xml.org/archives/xml-dev/200206/msg00220.html "With XQuery and XSLT one can attempt to process elements based on their XSD types but with xsi:type one can both restrict and extend these types in the instance document unbeknownst to the author of the processing code. At first glance it seems like both these mechanisms do not radically alter the content model in such a manner that carefully written type aware processors will be rendered ineffective. However until applications start getting built there probably is no sure way to tell if my fears are unfounded or not." I think his fears are founded! Henry Thompson says: "Just as no-one would allow a mission-critical system involving validation to do so against a client-supplied DTD (despite the fact that, as you point out in your companion message, XML 1.0 _requires_ it to do so to be a conformant validating parser), but would instead use their own, just so anyone writing a mission-critical application involving schema-validity assessment will do so against their own schema and either write it to 'block' the use xsi:type wrt extension, or ignore any other schema hints, so any attempt to use foreign types will fail (both of these strategies _are_ allowed by W3C XML Schema)." * http://lists.xml.org/archives/xml-dev/200206/msg00265.html So basically he's saying that it isn't safe to allow the data provider to nominate a schema that uses inheritance to extend your schema. If you can't do that, then XML Schema inheritance is not really a mechanism for improving the extensibility in XML. And XML's dirty little secret is that it isn't really that great at extensibility after all. "Extending" a document type can break applications which is more or less what happens in binary formats too! I'll repeat that this is the central issue that has got me looking past XML to the semantic web technologies. They aren't trying to patch in extensibility into the model later. It's core from the start. I'm pretty confident that an application built around an RDF class can work with any subclass without any fear of violating Liskov. By definition, subtypes inherit constraints. Extensions can only be in dimensions that do not violate constraints. But XML Schema does not make this promise. To give another example, I would expect in RDF that if a property had a maxOccurs of 10, it could not be expanded by the child. But in XML Schema, you can use derivation by extension to add elements at the end which will be collected by almost all naive processors. The only way a smart processor can guard against this is to make sure to only process the first 10 elements. But a central goal of schemas should be to *relieve* applications of this kind of constraint-checking burden. Basically, XML Schema inheritance is not in general a third-party safe extensibility mechanism for XML, and as I recall it, that's what it was supposed to be. If it isn't that, then its costs outweight its benefits, in my opinion. -- Paul Prescod
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format