[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Faster processing without schemas? (was Re: Micr
Michael Champion wrote: > Generic DBMS and middleware (ahem, the payers of my salary) > can't in general efficiently know the schema of everything > flowing in and out, so requiring schema knowledge is a > showstopper for me. As I understand your position, you are willing to accept more than one encoding if: 1. there are a small number of widely supported serialization standards 2. XML text is mandated as the fallback in content negotiation 3. a priori schema knowledge is not required. The ASN.1 defined binary encodings do not conflict with the first two requirements. The "issue" with ASN.1 defined encodings would be around the question of schema knowledge (i.e. item 3). As is well known, ASN.1 based systems typically do require that both sides of a link share knowledge of a common schema. However, this is more an attribute of the way that ASN.1 is used rather than the system itself. In the past, ASN.1 has usually been used in situations where shared knowledge of schemas was not only considered reasonable but often was considered desirable... However, one can easily produce a single ASN.1 schema that is capable of encoding any XML data in such a way that the original XML can be reconstructed without reference to any other schema. In other words, one can easily use ASN.1 to define an equivalent of the encoding discussed in Dennis M. Sosnoski's presentation to the Binary XML Workshop. Sosnoski's XBIS appears to be a serialization of a SAX2 event stream coupled with a symbol table that allows compression of strings used more than once. (i.e. strings are replaced by compact "handles" which are indexes into the symbol table.) The same can be described quite easily in ASN.1. In fact, I believe that an ASN.1 based encoding would have additional benefits in the case where the encoder (but *not* necessarily the decoder) had access to a user generated schema since the ASN.1 encoder would then be able to replace many text nodes with integer or other binary representations that are more compact than text. Such compression of text by substituting binary equivalents is not supported in Sosnoski's proposal. The method of providing a symbol table or "directory" within an encoding in order to achieve compression is something that has been done in the past with ASN.1 schemas. For instance, I remember a word processor at Digital that had very large encodings due to the fact that "rulers" and other similar large structural elements needed to be referenced frequently within a file. Rather than restating these large objects whenever they were referred to, the solution was to list the "rulers" in a "table" and just refer to them by their id's in the actual document. This is conceptually exactly what is done in XBIS and other similar encodings. Nothing new. Some may object to the fact that there would still be a requirement for one schema to be know by all readers and writers of the "no-schema" encoding. However, I hope you can see that such a schema, whether explicit or implicit, is required by any encoding system. Even "no-schema" text XML has an implicit schema that defines what is an element, what is an attribute, etc... Hopefully, you'll accept that ASN.1 can be just as useful in the "no-schema" case as it is in the "schema-aware" case. Given that we already have available to us a standardized, mature, widely used method of binary encoding, I personally can't see the justification for pursuing the definition of a new binary encoding. What we should have is: No-Schema Encodings: Text: XML binary: ASN.1 ?ER with schema for XML Schema-Aware Encodings: Text: XML + custom schema binary: ASN.1 ?ER + custom schema i.e. four use cases with two encoding solutions The interesting discussion should be over what is the best way to define the schema for the "no-schema" case. Should it be a simple serialization of a SAX2 event stream? If so, would the "symbol definitions" be done in-line to minimize the memory requirements during one-pass reading? Or, would they be gathered into a table at the top or bottom of the data? Should all data be passed as text? Or, if a schema is available, should the encoder be permitted to substitute primitive types like INTEGER when they are called for in the schema? (Assuming that the decoder would output them as strings.) We don't need another binary encoding, at most, what we need is agreement on what the ASN.1 schema for a "no-schema" binary encoding would look like. bob wyman
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|