[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Request: Techniques for reducing the size of XML instances

  • From: Al Snell <alaric@a...>
  • To: "HUGHES,MARK (Non-HP-FtCollins,ex1)" <mark_hughes@n...>
  • Date: Wed, 01 Aug 2001 01:11:57 +0100 (BST)

gzipping xml
On Tue, 31 Jul 2001, HUGHES,MARK (Non-HP-FtCollins,ex1) wrote:

>   That's an excellent point - passing around a tokenized form of an XML
> document to simplify parsing is a reasonable idea.  Personally, I'd just
> use the Pyxie format <http://www.pyxie.org/>, as it's *VERY* easy to
> produce and to parse again, and has the tremendous advantage of still
> being plain-text, so it's easy to debug and test.

That's certainly in keeping with some of the binary XML approaches - the
distinction between "binary" and "textual" is bogus, really, but a
nomenclature we're stuck with for now.

It's all binary anyway. "text" just uses a fairly standardish binary
format (although the blueberry thread shows that this "text" format is a
bit shifty anyway)

PS: Just ran a quick test, timing gzip. gzipping 11449004 bytes on a K6/2
400 took 10.693 seconds of CPU time to compress to 2738792 bytes. If this
machine were serving compressed XML, it wouldn't be able to max out a
10Mbit link, even assuming that whatever processing it was doing to create
these data took zero time... This was a coredump file I compressed rather
than a large amount of XML, which will skew the results a bit, but it
looks like three to four times the CPU power of my laptop would be
required to even handle the communications overhead of generating a 10Mbit
gzipped XML stream. I recently had to help implement a system that read a
small amount of data from disk and performed some computation, sending the
data over a 100Mbit link to the next stage of servers. It had to pretty
much fill that 100Mbit link to meet spec[1] and it was lower power than my
laptop. gzipping XML would not have been an option; the system could only
just about fit the raw data down a 100Mbit link with the required TCP/IP
protocol overhead, let alone if it had XML markup all over it.

Non-gzipped XML would have probably been OK in this situation since,
luckily, this data happens to be a series of strings of about 20k in
length, so the overhead of <?xml version='1.0' ?><message>...</message>
wouldn't be an issue, but if it were highly structured or numerical data,
the overhead of <number>123456789</number> over a single 32 bit word (a
factor of 4) would have meant we'd need 4 100Base/T links coming from this
machine to fit the required just-under-100Mbit/sec of raw data - or
gigabit Ethernet.

Raw data processing took just under 50% of the machine's CPU. If we'd had
to emit XML, we'd have had to gzip it all to fit it down the 100Mbit/sec
Ethernet, and there just wouldn't be enough CPU to do that.

ABS

[1] The spec mandated something along the lines of 1,000 80Kb data packets
a second, IIRC - add TCP/IP overhead to that and you're pushing a
100Mbit/sec Ethernet, which was what the machine had connected to it.

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.