[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fast text output from SAX?


sax xml loading
Bob Wyman wrote:

> ...
>
>	Just like you, I groaned when I saw the suggestion that you
>could take "wire-protocol" and then just stuff it into memory. This
>  
>
Not wire-protocol, wire-format = 'the same format as on the wire, in a 
file, etc.', the data payload in other words.
I am asserting that it is possible to construct a data format that is 
efficient for desired operations that is also compact in memory and 
therefore can be input and output as-is without transformation.  The 
hard part is allowing in-place modifications to be efficient to do and 
not result in much or any space overhead.  Everything else is done or 
could easily be done with other formats.  If any data format is 
self-describing in the XML sense, I can write a library that allows me 
to traverse its structure and retrieve data in an XPath style.  ASN.1 
and similar IDL systems usually compile into data-specific code, but 
even for these formats I could devise metadata that a general purpose 
library could use to traverse the resulting structures in an XPath style 
to retrieve and convert values.

>might work with text, but it sure as heck doesn't work with binary
>formats or anything that contains an address or offset. The
>  
>
I can think of several ways to represent an offset that is independant 
of a particular architecture and I'm sure you can too.

>distinctions between wire-protocol, in-memory-format, and
>on-disk-format, are fundamental. Every proposal that I've ever seen
>  
>
Why would the wire format (not wire protocol) and on-disk-format be 
different?  I'm not talking about the wire-protocol; to the application 
the transport just takes a stream of bytes, possibly in chunks, and 
returns the same.

>for a "common" format for use in two or more of these contexts has
>ended up failing for one reason or another. As far as stuffing
>wire-protocol into memory goes: Let me just say that *NOBODY* is ever
>going to write to *MY* address space without a great deal of checking
>  
>
What are you thinking here?  Who would be writing into your address 
space?  DMA from the network directly to application memory?  (This does 
have some use in high end computing situations, but that's not what I'm 
talking about.)

My proposal consists of loading a block (or string of blocks) of data 
into a buffer, traversing and reading or modifying that data with a 
library, and later possibly writing the resulting buffer out.  What 
strikes you as dangerous about that?

When you load a buffer of data and feed it to a gzip library to 
decompress it, isn't that the same situation on a bulk scale?

>going on... Also, if this problem was as simple as just replacing
>direct addresses with relative addresses, don't people realize that we
>  
>
That's not what I am doing; my recent example was a proof of concept and 
proof of existance of a solution that met the specific requirements 
being dicussed: avoidance of parsing and serialization as a separate 
step.  That doesn't mean that a solution with a relative reference would 
be bad, but my main methods are not relative addresses.
Please read about my approach at: http://esxml.org and do point out my 
errors.

>probably would have figured this out a few decades ago? As an
>industry, we're not so stupid that we would missed something so
>obvious... Some times, the obvious solution is *SO* obvious that it
>must be flawed.
>  
>
Better famous last words have seldom been spoken.  :-)
I see advances every day that cause me to ponder the same question.  
I've been programming a fairly long time and, besides horsepower, there 
are a lot of things the royal we should have thought of 20 years ago.  I 
think I was even independantly first on several very popular ideas, but 
I didn't act publicly on those.

I can't guaruntee that the best future example of my approach will be 
super efficient and an obvious choice, but I have aggregated enough 
solutions in my current design that I have convinced myself that it is 
possible.  I would rather release code than talk about it once I have 
some design decisions, this last week notwithstanding.  ;-)  Later.

> .....
>
>		bob wyman
>  
>

sdw

-- 
swilliams@h... http://www.hpti.com Per: sdw@l... http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.