[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Interoperability [long]
At 10:31 AM 15/11/01 +0000, Sean McGrath wrote: I apologize in advance for excerpting lots of Sean's excellent prose for the sake of context. >(Warning: free format attempt at documenting the >problems I've been through in the last two weeks >just getting some simple XML into a browser, validated >and in/out of some simple filtering programs, follows:) BTW, "just getting some simple XML into a browser" is a dream that I've been talking up since 1997, but we ain't there yet. I'm impressed that you're getting this to work at all, and also that you're not getting shot down because people insist on using IE, which has effectively zero XML+CSS support - really irritating since it has pretty damn good HTML+CSS support. Anyhow, it seems that a lot of the problems you're having boil down to: "Opera doesn't support XML 1.0 very well." >1) Round-tripping problems >Most of my XML processing is XML to XML processing. >A variety of nasty things happen to things like entitiy refs, >encodings, comments, cdata secs etc. The usual stuff >I get in a fluff about on this list. This is really interesting stuff. A bit more detail I think would be helpful to all of us here. I've never built an application that depends on preserving CDATA sections or comments downstream. I can see the problem - if you want to ship stuff from author to author, I imagine a certain amount of XML software will produce logically equivalent output while losing stuff that's important to the authoring process. E.g. Perl will, unless you go to some work to preserve it. How about recent python? On encodings, see below. >2) Display problems >Its amazingly hard to get a good result rendering >XML with CSS2 . >Its not that CSS2 isn't up to it, it is that >things like attribute defaults, entity expansions etc. that you >want to keep external to the instance go unnoticed by >XML browsers that don't read the external DTD. >This stuff is real important for things like qualified >styles. You end up adding things to your instance that you >would prefer to leave external just to get the content >to display right. For general-purpose browser applications, I have a hard time believing that they'll ever be willing to rely on downloading external entities to pull together a page display. That was the whole reason that Netscape & MS demanded (& got) the right for a processor to bypass external entities back in 1996. The explanation at the time was that the multithread parsing techniques they have to use to get acceptable page display performance simply did not allow for the possibility of having to go out inline and get arbitrarily-deeply-nested recursive external entity structures. So *for display only* I think we're kind of stuck with that one. For an *authoring* application it's clearly necessary to handle external entities includng DTDs & other schema-ware. >3) Namespace problems >Back in the SGML days with things like Panorama (based >on Synex Viewport) it was possible to get tabular display >of arbitrary markup. In Opera 5 for example, you get >tabular display by using the table model from the >"http://www.w3.org/TR/REC-html40" namespace. > >But Opera don't read no external DTD, so I cannot do this: ><ATTLIST table > xmlns CDATA #FIXED "http://www.w3.org/TR/REC-html40" > >I must add the attribute to *every* instance of the table in my documents. >Then my authors complain saying "what the f&*k is this polluting >my table markup". Wouldn't it be OK to add this on the way out to the browser, so the authors don't have to see it? And it's worth mentioning that there were big problems with Synex and other packages in fetching DTDs and suchlike over the net. In fact the most successful such product, from EBT, took an XML-like everything-in-the-instance view if I recall correctly. >Now although I want to get tables for editing/browsing I don't >want to throw away DTD validation. DTDs don't support >namespaces. Bummer. One solution is to fix the prefix >in the instance like this: > xmlns:x="http://www.w3.org/TR/REC-html40" > >and in the DTD like this: > xmlns:x CDATA #FIXED "http://www.w3.org/TR/REC-html40" > >Now I can validate but have wired the prefix. Bummer. Could use >parameter entities to avoid that but then I scare my para-techs with >a DTD that looks rather complicated with all those percents >%allovertheplace; (I told them XML would be easy!) > >I could just abandon validation. Don't like that option. Would end >up coding too much data-validation in business logic. Could >jump for a complete namespace aware schema language. >Don't like the sound of that. People way smarter than me >are not even sure that XML Schema is implementable! This is a real problem. Validation is A Good Thing, although unlike you I never do at run-time, just at design and authoring time. Namespaces are also A Good Thing [yes, I know some here disagree]. DTD's don't do enough - in particular don't handle namespaces well - and we need something better. It's not clear we have it yet. >Hey! I could add the FIXED attributes into the internal subset. >Cannot find any documentation on what Opera might >do with such an approach... I know for sure that my filter developers >writing SAX filters that have handlers for startElement(), endElement(), pis() >and characters() will be unhappy if I tell them they need >to round-trip the stuff in the internal subset. In fact I can tell >you now for sure that that stuff will just get lost. I can here >the screams from the content manager now... if Opera doesn't respect the internal subset it's just broken and in fact doesn't handle XML. And I think is in a minority of programs. Yes, you clearly have to do extra work to round-trip XML while keeping intact information that's important to authors. The DOM could have been a small fraction of its size if this hadn't been a problem. There's a tension between the large number of people sending around data who are unwilling to pay the extra cost and complexity required for authoring-support capabilities, and those who want to support authors and see this as part of the basic package. I think the answer is, easy things should be easy and complex things should be possible. Shipping around a structured document object with all the supporting material required for further authoring is *not* a simple process in any reasonable sense. >4) Locating DTDs > >I want to put DTDs somewhere central. I don't want to lug them around >each directory I have XML files in so that: ><!DOCTYPE foo SYSTEM "foo.dtd"> >works. > >I could use a full URI but then I need HTTP running locally or live with >the hit of pulling this stuff across an unreliable network. Not good. XML as written requires that you either use local URIs or rely on the network. I tend to prefer the latter, but then again, I don't fetch DTDs at runtime. >Could use SOCAT but patchy support on the ground for this. So >much for freely interchangeable tools. There has *always* been patchy support for this. There has *never* been industry consensus at the implementation level for how to handle PUBLIC identifiers. If there had been, it would have been in XML 1.0. As I said, I think this is one of the big outstanding irritants and I'm suprised we've never actually managed to get some momentum behind one of the alternatives. >5) Creating simple hypertext effects > >The ball has been dropped on linking for years. This is not XML's fault >but it sure doesn't help creating simple viewers for XML, which >then reflects badly on XML. Yep. It is unforgiveable that XLink, which shouldn't have been hard to specify, took so many years to get out the door. Not an XML problem, a politics/people/process problem. >6) Character encodings > >I want to ensure that my documents do not use characters >outside the ISO-8859-1 range. But I don't want to >use an iso-8859-1 encoding declaration because parsers >are not required to support it. Every parser I've ever seen supports 8859-1. Is there a single counterexample? But <snicker> that doesn't help you though, because I can always put € (Euro) in my 8859-1 text. BTW Sean, how do you do Euros? SGML had SDATA entities, but they had poor interoperability and flaky product support. Here's one area where SGML (kind of) wins. You could in theory limit the charset to 8859-1 in the SGML declaration. Mind you, I never heard of anyone ever doing this on a production basis... toolset problems? I guess the modern schema datatypes kind of allow you to do this via the regexp tools? >Oh, BTW, Opera and lots of other tools out there that >call themselves XML compliant, don't do Unicode. Worse, they >silently don't do Unicode. You find these things out >the hard way. Then they're NOT XML TOOLS and this is NOT XML's FAULT. BTW, the browsers actually do a pretty good job in my experience. Hey Sean, let's name some names and put some pressure on the vendors. >Call me a fuddy-duddy but simple stuff like this >was simpler with the *complex* SGML standard >than it is with the *simple* XML standard. I'll certainly buy into the premise that SGML tools tend to be heavily authoring-focused. One reason is that in large part, all that ever happened with SGML was you authored it and then you printed it. The great virtue was you could still print it 10 years later... try that with MS Office. >To return to the original spark of this, I believe that a significant >part of the problem is that XML's definition is just syntax >and compliance with the syntax doesn't tell you a lot >when it comes to tying components together into complete >systems. You've pointed useful fingers at some gaps in our tool repertoire, particularly in the authoring-support and content-management spaces. It's not obvious to me that a focus on structure rather than syntax would really be that important in fixing these problems. And I stand by my claim, based only on my personal experience, that in heterogeneous distributed environments, it's easier to agree on syntax than on data structures. And way more robust. Clearly there are those who have different experiences. -Tim
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|