[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] binary xml for large collections of structured data?
(rambling some, oh well). how about xml for xml-like data, but a lot of it?... eg: GB's of xml, representing things like arbitrary types of serialized objects, collections, ... thoughts come to mind, eg: programming language object stores; databases; fairly detailed 3d worlds and resource sets; possibly compound video streams (I am imagining lots of small video streams tied together into a bigger one using xml-like data); .. so, there could be 2 varieties: a flat serialized version, which could be simpler and used for read or write only access; a dynamic random-access version, which could be more complex, but would allow read/write access, and possibly tranactional stuff (the log likely being kept as a seperate file). something like a b-tree could make sense. similar could be a cell-based heap-like system (dunno a better term). this could gain simplicity at the cost of size and flexibility. eg: the data files are divided into a number of cells, eg, each, say, 16 bytes. the cells are managed with a bitmap, and allocated as objects consisting of one or more cells. these are divided into chunks, each with a certain number of cells and it's own bitmap (eg: some power of 2 size in MB). this approach is fairly similar to how my current memory manager/garbage collector works, so it might make sense. my mm/gc used 1MB chunks, but this might be small. I guess I could allow bigger chunks as well (like my mm). (ok, I will have to estimate the amount of inflation, but I think it might be signifigant...). presumably, as much data would need to be packed into single runs as is reasonable. there could be possibly other file types (traditionally, file types like this are distinguished with file magic's, which would make sense here). I once did store images with a similar format (actually, just a swizzled dump of the heap). I remember I had reserved the first chunk for stuff like a file header, and crap like names for symbols and built-in functions. now, I am more conventionally serializing data, but then I have to write code to serialize/deserialize all the types... now, the serial variety could likely be further divided: fine and coarse. fine would emphasize packing density, at the cost of performance (imo, this is about what I came up with before). coarse would emphasize access performance, and would likely waste extra space (eg: fixed-sized numbers and string-table only tags). this should roughly cover this domain (though it could be argued that this is too far outside xml's domain anyways...). but then again: what domain am I in really?... I would think the riff domain, as riff is typically used here, but riff falls short and is often a bad-fit for my use patterns. I may spec some. I guess I have already come up with the serial-fine variety.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|