[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

binary xml for large collections of structured data?

  • To: <xml-dev@l...>
  • Subject: binary xml for large collections of structured data?
  • From: "cr88192" <cr88192@h...>
  • Date: Tue, 23 Nov 2004 13:56:33 +1000

xml structured data
(rambling some, oh well).

how about xml for xml-like data, but a lot of it?...
eg: GB's of xml, representing things like arbitrary types of serialized 
objects, collections, ...

thoughts come to mind, eg:
programming language object stores;
fairly detailed 3d worlds and resource sets;
possibly compound video streams (I am imagining lots of small video streams 
tied together into a bigger one using xml-like data);

so, there could be 2 varieties:
a flat serialized version, which could be simpler and used for read or write 
only access;
a dynamic random-access version, which could be more complex, but would 
allow read/write access, and possibly tranactional stuff (the log likely 
being kept as a seperate file). something like a b-tree could make sense.

similar could be a cell-based heap-like system (dunno a better term). this 
could gain simplicity at the cost of size and flexibility. eg:
the data files are divided into a number of cells, eg, each, say, 16 bytes. 
the cells are managed with a bitmap, and allocated as objects consisting of 
one or more cells. these are divided into chunks, each with a certain number 
of cells and it's own bitmap (eg: some power of 2 size in MB).
this approach is fairly similar to how my current memory manager/garbage 
collector works, so it might make sense. my mm/gc used 1MB chunks, but this 
might be small. I guess I could allow bigger chunks as well (like my mm).

(ok, I will have to estimate the amount of inflation, but I think it might 
be signifigant...). presumably, as much data would need to be packed into 
single runs as is reasonable.

there could be possibly other file types (traditionally, file types like 
this are distinguished with file magic's, which would make sense here).

I once did store images with a similar format (actually, just a swizzled 
dump of the heap). I remember I had reserved the first chunk for stuff like 
a file header, and crap like names for symbols and built-in functions.
now, I am more conventionally serializing data, but then I have to write 
code to serialize/deserialize all the types...

now, the serial variety could likely be further divided: fine and coarse.
fine would emphasize packing density, at the cost of performance (imo, this 
is about what I came up with before).
coarse would emphasize access performance, and would likely waste extra 
space (eg: fixed-sized numbers and string-table only tags).

this should roughly cover this domain (though it could be argued that this 
is too far outside xml's domain anyways...). but then again: what domain am 
I in really?...

I would think the riff domain, as riff is typically used here, but riff 
falls short and is often a bad-fit for my use patterns.

I may spec some.
I guess I have already come up with the serial-fine variety.


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.