[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Request for Comments: XML binary encoding

  • From: Paul Cody Johnston <pcj@i...>
  • To: "Al B. Snell" <alaric@a...>
  • Date: Wed, 11 Apr 2001 12:49:16 -0700

xml binary representation
Big discussion on Binary XML.  I don;t know how some people on this
list get work done!  I listen but rarely contribute to this list
because it takes me so long to formulate postings...



I've been designing a "next-gen" XML-like language for awhile now,
though all my implementation time has been soaked up by a parser
compiler I;ve been writing (different project), so I still haven't
released a reference implementation of the language (it's called
"reticular structure language" (RSL): http://www.inxar.org/rsl).

Anyhow, RSL may be expressed using a binary "compiled" representation
under certain circumstances.  I initially thought this was a cool idea
because of the incredible performance gains that would be gleaned from
not having to parse the text.  As has been discussed previously,
punctuated by Tim Bray comments, the gains in this area are pretty
limited.  It's not really worth it except under very specific
conditions.

However, RSL has the additional feature that validation is considered
the norm -- most RSL documents should be validated.  What I discovered
is that by compiling the "source" text form into a binary
representation, you can organize the information such that structural
patterns in the document can be grouped.  This pattern grouping allows
future validation (of the binary representation) to be significantly
faster, which is important for RSL (which determines validity at
run-time, not compile-time).

For an extreme example, consider an XML representation of a log file.
The log file has 10,000 entries, each of which is an element with no
attributes and a content model defined in a DTD.  Typical processing
would involve parsing the text and 10,000 regexp challenges to confirm
the validity of each entry to the DTD.

A compiled representation allows one to recognize that all 10,000
entries have the same pattern.  Validation of this document would
require only a single regexp challenge to validate all structures in
the document.

One potential drawback to the current design of this representation
(unpublished) is that is not stream-based.  This would prohibit
SAX-like processing of the binary reprentaion.  The point is that
there are trade-offs in whatever your do.  Simplest things are almost
always best.

Paul             


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.