[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Parsing efficiency? - why not 'compile'????


asn.1 keywords example
On Wednesday 26 February 2003 14:19, David Megginson wrote:
> Alaric B. Snell writes:
>  > If your system is sitting idling waiting for data over the network, then
>  > a more compact representation would be a winner!
>
> We'll look forward to your test results.

If your system spends lots of its time waiting for networking, do you 
disagree that reducing the bandwidth utilisation would reduce the service 
round trip time and increase the maximum throughput?

Note that, of course, smaller packets won't reduce the latency of the link 
due to speed of light limitations, but they will reduce the latency caused by 
bandwidth limitations. And the maximum number of transfers your network can 
handle in a second is directly related to the message size; if you can halve 
the size of the message, you can fit twice as many through your pipe in unit 
time.

So to go back to emperical test results...

The ASN.1/XML interop people found that, for data-oriented XML, savings of 
80% are common; eg, messages being one fifth the size. Per-packet overheads 
aside, that would imply that you can fit about five times as many ASN.1/PER 
encoded messages down a given network connection in a second as you can XML 
messages.

Let's take an example of a stereotypical poster-child Web service... some 
kind of online store.

It has a message you can send it to request a stock search, given keywords 
and a price range, returning a list of stock descriptions.

And it has a message you can send it to place an order, containing delivery 
addresses and one or more invoice lines, returning a basic success / failure 
code.

The latter operation will happen less often than the former, and will 
probably involve more time-consuming operations such as checking availability 
of all the items, checking the credit account, filing the order in the 
database, and making a printer in a warehouse start printing out a packing 
slip / manifest for dispatching to commense, so let's focus on the former.

The request message only needs to contain a keyword string and two prices; in 
XML that might be:

<search>
 <keywords>pink floyd</keywords>
 <prices min="5.99" max="20.00" />
</search>

Total size = 76 bytes plus the highly variable length of the keyword string.

In PER, that would probably be a byte or two for the length of the keywords 
(going up to two bytes, from memory, if it's more than 128 characters due to 
variable length integer storage? Something like that), then the currency 
values would actually be stored as numbers of pence in the same format - 
probably two or three bytes each.

Total size = 6 bytes plus the highly variable length of the keyword string.

But the response would look like this in XML:

<search-response>
<result sku="GH234" price="6.50">Dark Side of the Moon</result>
<result sku="KK234" price="7.50">Wish You Were Here</result>
</search-response>

Size: 37 bytes + 43 bytes per result + description text length

I think in PER that would be another variable-length integer for the number 
of results returned (called it one byte if we want less than a hundred 
results), then (for each fixed-length SKU) five bytes plus two bytes of 
price, one or two bytes of description length, then the description.

Size: 1 byte + 9 bytes per result + description text length.

In the PER cases, the resulting encoding will be almost entirely the 
description texts, while in my XML exmaple the description text was smaller 
than the XML surrounding it. If we say the descriptions are likely to be 20 
bytes long, then we have a loss of 36 bytes of overhead (probably negiligible 
in the long run) but a reduction in mean per-result size from 60 bytes each 
to 30 bytes each, a halving. So we could be servicing twice as many customers 
at once from a given Internet link, until the database can't handle all the 
keyword searches any more.

Looking at it another way, consider Google's XML interface. If they got a 
similar 50% reduction in size from using PER (considering that most of the 
search results consist of URLs and descriptions as opposed to the structure 
of the listing, ranking scores, etc) then, if their XML interface became 
predominantly used, they could halve their bandwidth costs. I'm sure their 
search algorithm is more resource-intensive than parsing and producing XML, 
but their bandwidth usage must be *astronomical*!
 
> David

ABS

-- 
A city is like a large, complex, rabbit
 - ARP

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.