[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Schemas and Other Crucial XML Questions

  • From: Tyler Baker <tyler@i...>
  • Date: Mon, 10 Aug 1998 14:56:16 -0400

decompile xml
David Megginson wrote:

> Sam Gentile writes:
>
>  > > Also, we have been hearing rumors of a "short" XML notation. Is
>  > > there one?  We have a need to reduce the size of our buffers.
>
> No, there is no such thing.  XML's parent, SGML, included extensive
> facilities for markup minimisation and has suffered badly for it,
> since SGML tools are far too difficult to write (there is still not a
> single Java-based SGML parser, beside probably more than a dozen
> Java-based XML parsers).
>
> There are, however, alternatives: for example, you could compile the
> XML to a compact binary format for internal storage then decompile it
> back to a verbose format for export -- there's no requirement to store
> it internally as text.

Simple some very simple compression algorithms like Huffman encoding for
instance, do very well with XML documents as the Name production that is used for
identifying tags among other things will be converted to some binary symbol that
is used as an index to lookup the actual name production.  In fact, you could do
this all with entities by simply taking all of the Names specified in the DTD,
spit them into a List, and then declare all entities.

You could index all of this by using base 10 digits or else use something as high
as base 64 to encode the array references.

<!ENTITY % 0 "Foo">
<!ENTITY % 1 "Bar">

Then for a document which had element types with names "Foo" and "Bar" occurences
of:

<foo></foo>
<bar></bar>

would be converted to:

<0></0>
<1></1>

For small documents like CDF for instance these sort of techniques may turn out
to be counter-productive.

Tyler

BTW, on a side-note I am having a problem understanding whether the external
subset or the internal subset should be parsed first.  I would assume that the
external subset should go first, but in this case it would make using INCLUDE and
IGNORE sections to be pretty useless.  This is something that is not clarified as
far as I can tell in the 1.0 spec so if someone could clarify how this should be
handled by a parser, then I would greatly appreciate it.

Thanx in advance...


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.