|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Streaming XML and SAX
Tom Harding writes: > How? You would doubtless agree that mandating a specific encoding > for all streams sidesteps one of the major benefits of XML. > Introducing an encoding declaration mechanism into the transport > protocol, as HTTP does, would duplicate the function of the XML > processor. Here's a short excerpt from the non-normative Appendix F of the XML 1.0 Recommendation: The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. In the interests of interoperability, however, the following rules are recommended. - If an XML entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery. - If an XML entity is delivered with a MIME type of text/xml, then the charset parameter on the MIME type determines the character encoding method; all other heuristics and sources of information are solely for error recovery. - If an XML entity is delivered with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery. These rules apply only in the absence of protocol-level documentation; in particular, when the MIME types text/xml and application/xml are defined, the recommendations of the relevant RFC will supersede these rules. If I were defining a streaming protocol for e-commerce, news, financial markets, etc., I probably would mandate a single encoding for all packets (UTF-8 or UTF-16), just to keep things simple. As you can see in the above excerpt, the character-set discover heuristics in XML are intended for use only in the absence of protocol-specific encoding information. <snip/> > It's amazing how two people can see things so differently. I think > it's supremely elegant that only the XML processor needs to look at > data coming off the wire. It's also as efficient as it gets. It is efficient only if you know for certain that you need to use a single object model for all of the XML information that you're receiving; otherwise, you'll end up building a generic object model (like a DOM), then tearing it down to build an optimised domain-specific one (such as a vector graphic or a financial-transaction object), and that process would be painful. > course the software architecture that handles the documents emitted > must be modular and extensible, but the task of parsing is done. Parsing is relatively easy (though it's wasteful to do it twice); building an object model from the parsing is time- and resource-consuming. For example, imagine that I have a Java class like this: public class Purchase { public int seqno; public int customerId; public int vendorId; public int invoiceId; public float total; } In XML, an instance of this information might look like this: <purchase xmlns="http://www.ecommerce.net/ns/ec/"> <seqno>12345678</seqno> <customer-id>87654321</customer-id> <vendor-id>18273645</vendor-id> <invoice-id>81726354</invoice-id> <total>92674.12</total> </purchase> Based on my (limited) understanding of the Java VM, the Java versions of a Purchase objects will require 24 bytes of storage each; I'd guess that even a heavily-optimised generic DOM implementation would require at least 5-10 times as much storage (I'll welcome corrections from any DOM implementors on this list). In other words, if I go straight from the XML to my own object model, I can store 100,000 purchases in 2,400,000 bytes of storage; if I go from XML to a generic DOM object model, I will require between 12,000,000 and 24,000,000 (or more) bytes to store the same information, and then I will *still* have to build my own object model afterwards. All the best, David -- David Megginson david@m... http://www.megginson.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||






