Re: Interoperability [long]
At 09:16 14/11/2001 -0800, Tim Bray wrote: >[...] >What kind of problems do you run across? Gosh, where to start! I run into problems flowing XML from content creation, through editing, through processing, through content management, through delivery processing. I would identify these are the most irksome: (Warning: free format attempt at documenting the problems I've been through in the last two weeks just getting some simple XML into a browser, validated and in/out of some simple filtering programs, follows:) 1) Round-tripping problems Most of my XML processing is XML to XML processing. A variety of nasty things happen to things like entitiy refs, encodings, comments, cdata secs etc. The usual stuff I get in a fluff about on this list. 2) Display problems Its amazingly hard to get a good result rendering XML with CSS2 . Its not that CSS2 isn't up to it, it is that things like attribute defaults, entity expansions etc. that you want to keep external to the instance go unnoticed by XML browsers that don't read the external DTD. This stuff is real important for things like qualified styles. You end up adding things to your instance that you would prefer to leave external just to get the content to display right. 3) Namespace problems Back in the SGML days with things like Panorama (based on Synex Viewport) it was possible to get tabular display of arbitrary markup. In Opera 5 for example, you get tabular display by using the table model from the "http://www.w3.org/TR/REC-html40" namespace. But Opera don't read no external DTD, so I cannot do this: <ATTLIST table xmlns CDATA #FIXED "http://www.w3.org/TR/REC-html40" I must add the attribute to *every* instance of the table in my documents. Then my authors complain saying "what the f&*k is this polluting my table markup". Now although I want to get tables for editing/browsing I don't want to throw away DTD validation. DTDs don't support namespaces. Bummer. One solution is to fix the prefix in the instance like this: xmlns:x="http://www.w3.org/TR/REC-html40" and in the DTD like this: xmlns:x CDATA #FIXED "http://www.w3.org/TR/REC-html40" Now I can validate but have wired the prefix. Bummer. Could use parameter entities to avoid that but then I scare my para-techs with a DTD that looks rather complicated with all those percents %allovertheplace; (I told them XML would be easy!) I could just abandon validation. Don't like that option. Would end up coding too much data-validation in business logic. Could jump for a complete namespace aware schema language. Don't like the sound of that. People way smarter than me are not even sure that XML Schema is implementable! BTW, have you guys on xml-dev seen the collateral damage in terms of complexity/readability to the SAX/DOM/XSLT programs that have become "namespace aware"? Hey! I could add the FIXED attributes into the internal subset. Cannot find any documentation on what Opera might do with such an approach... I know for sure that my filter developers writing SAX filters that have handlers for startElement(), endElement(), pis() and characters() will be unhappy if I tell them they need to round-trip the stuff in the internal subset. In fact I can tell you now for sure that that stuff will just get lost. I can here the screams from the content manager now... 4) Locating DTDs I want to put DTDs somewhere central. I don't want to lug them around each directory I have XML files in so that: <!DOCTYPE foo SYSTEM "foo.dtd"> works. I could use a full URI but then I need HTTP running locally or live with the hit of pulling this stuff across an unreliable network. Not good. Could use SOCAT but patchy support on the ground for this. So much for freely interchangeable tools. Could separate the prolog completely from the instance so that entity A says <!DOCTYPE foo [ ... [> and entity B says <foo>..<./foo>. This worked great in the SGML days. You could pass multiple system identifiers to NSGMLS for example and it would concatenate them prior to parsing. Not workable with XML because although nothing in the XML 1.0 spec prohibits it, parsing tools don't like it. If I have to customize a parser to do this management will look at me funny given all they have read about freely available parser tools out there! 5) Creating simple hypertext effects The ball has been dropped on linking for years. This is not XML's fault but it sure doesn't help creating simple viewers for XML, which then reflects badly on XML. I have made it work in Opera 5 but it required proprietary CSS2 extensions and an incantation in the instance from a long lost XLink WD: <xref xml:link="simple" show="replace" href="#preamble"> Again, because the browser ignores extenal subsets, I have to carry these attributes around *every* xref in my instances even though they never change value. Bummer. I can't put them in the internal subset because the filter-writers will loose them in XML -> XML processing. Bummer. 6) Character encodings I want to ensure that my documents do not use characters outside the ISO-8859-1 range. But I don't want to use an iso-8859-1 encoding declaration because parsers are not required to support it. But I also don't want to say UTF8 (or leave it out and have it default to UTF8) because then my authors can slip in some Kanjii that will blow my downstream processes to pieces. What I want to say is "this document uses Unicode but just uses Unicode characters in such-and-such a range" How do I do that? Oh, BTW, Opera and lots of other tools out there that call themselves XML compliant, don't do Unicode. Worse, they silently don't do Unicode. You find these things out the hard way. 7) Browser/Editor styling After two weeks I now have a CSS2 system for rendering some XML. Opera looks okay but I have had to add kludgy stuff to my instances to make the links work and proprietary stuff to my stylesheet. Even taking out the proprietary stuff out of the stylesheet, I get varying degrees of crud in every CSS compatible editing system. Again, not XML's fault per se but being able to view the stuff easily is kinda important! ===== That lot is just for starters. Gotta get back to work. Any suggestions on these issues greatly appreciated. Call me a fuddy-duddy but simple stuff like this was simpler with the *complex* SGML standard than it is with the *simple* XML standard. To return to the original spark of this, I believe that a significant part of the problem is that XML's definition is just syntax and compliance with the syntax doesn't tell you a lot when it comes to tying components together into complete systems. This is because it does not give you any guidance as to what XML you will get *out* of a system - it just tells you that you can get your stuff *in* to the system. Great for vendors and consultants. Not so great for John Q. Webhacker. It doesn't give you any feel for what the xml compliant system will do at layers above the syntax which is where the rubbber hits the road for interoperability. Sean
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format