|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The privilege of XML parsing - Data types,binary XML and X
On Sat, 07 Dec 2002 08:02:03 +0000, Sean McGrath <sean.mcgrath@p...> wrote: > I've given a lot of thought recently to what it is about data typing > in XML and Binary XML that makes me so nervous. What follows is my > most concerted attempt at articulating what causes me to be > so nervous an a suggestion for how we might proceed. Sheesh, you're insightful You ought to write a book :-) > Simply put, there is nothing wrong with Binary XML within the confines > of an application. It is a very useful optimization which can and > should be treated as a "compiler". You would never throw away your > source code having passed it through a compiler. The same should be > the case with your XML. It is the portable representation of your data > just like the source files are the portable version of your machine > code. I think that's a great analogy. Some alternative syntaxes for "XML infoset serialization" (not to be confused with "XML" of course) would be in addition to the XML source code, for convenience/efficiency. They wouldn't in any sense replace the source code; "losing the source code" would be a disaster, and failing to provide the source code on demand would be a horrible breach of interop-etiquitte. > > If they end > up using strongly typed "compiled" XML to get around this, they will have > tightly bound their XML to their process which is a bad thing. I agree with the "typed" bit, and I completely agree that tightly binding application-specific datatypes to shared data puts us back in the Bad Old Days. > Standardized, marshallings of XML (XML infoset compilers) for Java, .NET > etc. > need to be done so that the notion of binary XML is both catered for > and COMPREHENSIVELY RELEGATED to the realm of "compiled" > output. Something you just use for optimization reasons but NEVER use > as primary storage for your data. Hmm, I have some minor quibbles, or perhaps need clarification. I do see some *potential* reasons for a standardized "efficiient" Infoset interchange format that doesn't include platform-specific binary formats or application- specific datatypes. To extend the source code / object code analogy, it might be thought of as P-code. It would simply serialize an infoset in a way that makes it significantly easier to parse than the UnicodeWithAngleBrackets we know and love. Basically, one might start from the XPath data model (or some successor) and come up with a vendor/platform/language-neutral format that serializes it in a way that is faster to parse, based on actual experience in where XML parsers spend their time, I don't have any solutions to propose, but some problems that such a P-code might solve could include: - The inefficiency of resolving namespaces. I - Normalizing Unicode characters. - Resolving entities, CDATA sections, and other syntax sugar. (I'm not sure what to do about < > & but I suspect that there are creative solutions possible) - Buffer management, string rewriting, object creation. I know that these are signficant bottlenecks in most parsers. I don't know offhand how a serialization format could help break them, but I know that this is a big reason that people who write high-performance SOAP processors drew a line in the sand and refused to allow DTDs and all the cruft they bring along into SOAP messages. Maybe a P-code that pre-resolves the "cruft" might be parsable significantly faster than XML can be parsed. Sorry to go on so long with this possibly ill-founded brainstorm, but this is the kind of thing I think many advocates of "binary" XML are talking about, and it is not necessarily tied up with application-specific datatypes or platform-specific numeric formats. > > I suggest we make one core twist to XML. Lets express the various layers > to XML parsing in terms of a pipeline and see if it can help > us accommodate the date typing folk, the binary XML folk etc. without > throwing out the baby with the bathwater. Yes! I think that the "pipeline processing" metaphor could provide a Gestalt shift that puts more of the stuff that seems to divide the XML community into a common set of tools/operations; the different communities use a different set of these, or combine them in different ways, without getting in each others' way.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








