|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] simple idea: 'SBXE'
continuing on a kind of permathread I guess, oh well. around late last night, an idea popped up for a fairly simplistic binary xml encoding, which aims to: be smaller than textual xml; not be signifigantly more complicated than textual xml. before going to sleep, I beat together the basic idea. this morning, I messed with the spec a little. I am not sure if I will do anything with this. I may implement it possibly at least for my own uses (basic data storage type stuff). xml is ammusingly enough more often used as an internal representation of data than an external one in my projects... I just thought maybe people here might be interested. any thoughts or comments? spec dump: --- Simplistic Binary XML Encoding Goals: Does not require a complicated encoder or decoder; Does not involve a seperate compression/decompression pass. Does Not Attempt: Large data sets or random access; Decent compression; Complete representation of XML features. The encoding is viewed as a stream of bytes. Strings are encoded as ASCII or UTF-8, and with '\0' as a terminator. Files will begin with the string "SBXE". Later versions may alter the string to reflect the version, or include extra data after this string. 0x00: general purpose ending marker 0x01..0x1F: Special, Reserved 0x20..0x3E: Namespace Prefix MRU 0x3F: Namespace String 0x40..0x7E: Opening Tag/Attr MRU 0x7F: Opening Tag/Attr String 0x80..0xFE: Text MRU 0xFF: Text String Node [<NS>] <TAG> <ATTR*> 0 <BODY*> 0 Attr [<NS>] <TAG> <TEXT*> Body <NODE>|<TEXT> Text is represented potentially as globs of raw strings and MRU references. A single text string should be limited to 255 bytes or less. MRU Scheme Whenever a given string is being encoded, it can be checked if it were encoded recently, and, if so, a reference to the correct spot in the MRU list can be encoded and that value is moved to the front. Otherwise, the new string is encoded directly, and added to the front of the list. Higher numbers mean more recent matches, so things shift in the direction of lower numbers. Upon shifting off the end a string is essentially forgotten. Tags and Attributes will have the same space in the encoding, but will refer to different MRU lists. -- the mru list would be based on the linear contents of the file. ok, I don't have any good examples. <foo><bar>baz</bar><bar baz="baz"/></foo> 41 bytes 'SBXE\0' 0x7F 'foo\0' 0x00 0x7F 'bar\0' 0x00 0xFF 'baz\0' 0x00 0x00 0x7E 0x7F 'baz\0' 0xFE 0x00 0x00 0x00 34 bytes (29 absent the prefix). longer examples would probably do a little better.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








