[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: hello, new to list, thoughts
Hi all, Have you looked into BinX regarding binary XML formats? As far as I can tell (and from only a brief knowledge of binx) your idea is fairly similar in purpose (maybe in implementation, but I don't know enough to say). Just thought I'd mention it in case it can help to prevent needless duplication of work, Cheers Rich -------------------------------- Richard Bruin PhD Student Department of Earth Sciences University of Cambridge On Wed, 2004-11-17 at 16:44, David Lieberman AWDSF wrote: > Neat! > > Good ideas, all. (if you ask me) > > It'll be interesting to see when you're finished. > > David Lieberman > http://www.awdsf.com > > > -----Original Message----- > From: cr88192 [mailto:cr88192@h...] > Sent: Wednesday, November 17, 2004 6:55 AM > To: xml-dev@l... > Subject: hello, new to list, thoughts > > ok, I was lead here as, as far as I can tell, the group gmane.text.xml.devel > > is a mirror of this list (given that a reference to this list is appended to > > the messages there). > I appologize if this is not the case. > pardon if I am being a troll, I am new here. > > I am mostly requesting general comments, eg, on things that could be > improved in my idea. I am not expecting anyone really to take me seriously. > > > ok. so I have been recently doing something roughly along the lines of a > binary xml (not exactly, but I have designed it such that basic subset of > xml maps fairly well, and doesn't give up any real info in the conversion). > namespace definitions, however, are similar but not exactly the same, so > this would be left to the conversion tool. > > I tried to retain what I felt to be the general spirit and semantics of xml, > > as imo they seem quite decent for data (though, difficult to fully recognize > > as such or reason about). imo, they are better than a plain tree. I had > considered stripping down the semantics at some points, but some things > seemed interdependent (one needs rules and complexity in some places to > grant freedoms in others, and one needs to figure out where to be strict and > > where to be lax...). > > why? > personally, I don't feel that within xml's core domains (network > communications, messages, "documents", ...) a binary variant would be > particularly helpful, however: > my interests lie more in data storage (for what xml works good for, I use > that); > formats like riff, ebml, ... leave a lot to be desired imo wrt semantics; > textual xml is not that great for data storage imo, eg, one can't skip over > data or jump around in the same way they can with, say, riff, and imo base64 > > coding would not be very good for cases when size is a priority, or if much > of the data is large binary chunks. > > in a kind of ideological frenzy, I beat something together, and have spent a > > while refining it. > > I also have a basic implementation (not yet online, I have been a bit behind > > on this kind of thing recently). > not much coding has been being done recently (largely I am stuck fiddling > with the details of the design, among other things not related to this). > > things like file size or processor overhead are not high priorities here, I > just try to save space when possible, and avoid wasting too much processor > time. my results thus far have shown the generated files to be slightly > smaller than the input xml (presently lacking any kind of content string > compression, though tag compression is done). this could be viewed as a good > > sign I guess. > > > > here is a recent draft of the spec: > ---- > XLIFF (0.1.1): > Partially fueled by an argument about EBML, I came up with this. > It has little to do with LIFF, but, hell, I am not really using LIFF (it > ended up too close to RIFF and too generally ugly...). > This format shows itself to be kind of a pain to code up, but this is not > entirely unexpected. > > Cleaning up spec some from 10-29 version, minor alterations. > Stripping out container stuff, as it clutters the spec and doesn't really > make sense in this context anyways (keeping the old version as the idea may > be adaptable to a different format). > > Goals: > A binary format with similar flexibility to XML (X); > Support for large files and datasets (L); > Sort of like RIFF and IFF (IFF). > > This will be a TLV format with attributes and namespaces similar to XML. > It will use a tag dictionary to help reduce the total file size. > It should be acceptable for random access and "big chunks of data" style > uses (like RIFF, IFF, and EBML). > > > Numbers: > The MSB (bit 7) serves to indicate the precense of following (higher order) > bytes. > > 0xxxxxxx, -64..63 > 1xxxxxxx 0xxxxxxx, -8192..8191 > 1xxxxxxx 1xxxxxxx 0xxxxxxx, -1M..1M > .. > > Values are in Low-High order and with 2s complement encoding. > As a result, the sign is implicitly contained in bit 6 of the last byte. > > The maximum value of a number depends on the implementation, and an > implementation is allowed to refuse overly large numbers. > However, I will spec that the limit should be at least 32 bits (a 35 bit > number with the upper 4 bits either all 0 or 1). > > > Strings: > { > Number len; //length if >0, dictionary index if <0, empty string if 0 > if(len>0)byte str[len]; > } > > Strings may be indexed in a dictionary. The exact semantics for dictionaries > > will depend on context. > > Node: > { > String ns; > String tag; > Number alen; > if(alen>0)byte attr[alen]; > Number dlen; > if(dlen>0)byte data[dlen]; //the contents depend on the ns and tag > } > > Attr: > { > String ns; > String tag; > Number dlen; > if(dlen>0)byte data[dlen]; > } > > In tags and attributes, negative chunk lengths are reserved. > > Implicit XLIFF attributes are allowed in nodes without restrictions as they > are not generally viewed as part of the content. > Duplicate attributes are not allowed. > Attributes may not contain either tags or other attributes (other > non-nestable structures will generally be allowed). Attributes should be > kept small in both size and number. > > > > Tag Dictionary: > There will be a dictionary responsible for namespaces, tags, and attribute > names: > This dictionary will behave similarly to a stack; > Any new strings are added to the end of the current dictionary level > (encoded directly and not allready present); > On descent into a node, the dictionary is retained from the parent, creating > > a new level; > Any new strings are added to the end of the current dictionary level; > On exit from a node, any strings added in that level are removed (making it > as if the descent had not occured). > > The use of a dictionary allows denser packing (due to, eg, tags being 1 or 2 > > bytes). > Fairly dense packing might require building the entire dictionary upfront, > but an encoder can have less dense packing, eg, by just encoding strings > directly. > The need for upfront dictionaries for packing to work well is related to > ideas for allowing faster processing by not having to descend into subnodes > to build an up-to-date dictionary, and also to allow random access in some > cases. > > Body Dictionaries: > Each namespace will also have a "body dictionary", which may be used for > compressing content strings in a content specific manner. The maintainence > of these dictionaries is largely left to the format in question (however, > they will implicitly pop off anything added within a node). > > The format is a tree, with the default toplevel tag flagging the format. > > Special tags could exist for maintenence purposes (eg: adding a basic set of > > common strings to the dictionary, ...). > Like XML, some special tags may exist prior to the root to define things > (the base dictionary, ...). > > > Namespaces: > The empty string namespace is the "default" namespace. > > Except builtin XLIFF namespaces (default, XLIFF, ...) namespaces are to be > declared prior to use. > Formats are given control over how namespaces are used/defined. > > Namespaces refer to several URI's: > the Type URI, which defines the physical type of the container (eg: XML). > the Namespace URI, which defines the semantic type of the container (eg: an > XML Namespace). > > Any namespaces beginning with "XLIFF" are reserved for use by XLIFF. > Further: > XLIFF.*: basic XLIFF namespaces, failure to understand tags/attributes > should cause failure; > XLIFF.S.*: semantic XLIFF namespaces, these are allowed to be ignored, but > should be preserved; > XLIFF.O.*: optional XLIFF namespaces, these may be ignored or stripped off > without effecting content. > > There will be a basic "XLIFF" namespace, and being unable to parse tags in > this namespace will be viewed as an error (this namespace will handle things > > which may change the format of subsequent data, effect dictionaries, ...). > XLIFF attributes are required to be understood before attempting to parse > the contents of a node. > > "XLIFF.S" will be used for semantic XLIFF tags, failure to understand them > will not compromise decoding of the format. > > "XLIFF.O" will be used for optional XLIFF tags, failure to understand or > removal of them will not compromise decoding of the format. > > "XLIFF.NS" could be a namespace for namespace declatations (like in XML). > eg, "XLIFF.NS:foo" as an attribute could declare a foo namespace. > the content of these tags could be an array of pairs of strings, eg: > "TypeURI", "xliff:foo_container", "NSURI", "xliff:bar_ns". > > > XLIFF:Header > An tag required at the start of an XLIFF file serving to mark it as a valid > XLIFF file, and to give general info about the file. > > XLIFF:TypeName Header Attribute, gives a general "file type name" used for > identifying the type (appart from examining the contents or namespaces). It > is encoded as a raw string. > > XLIFF:HeaderFlags Header Attribute > Contains a number marking various flags for a file. Unknown flags may be > ignored. > 1&=dictionary is static within the file. > > Other attributes may be found in the header besides those related to XLIFF. > An example would be custom or format specific tags. > > > XLIFF:DictStrings Tag > Defines a glob of strings to be added to the tag dictionary. > This may be used for reducing the number of occurances of some common tag > which may only occure in sublevels or such. > > XLIFF:NodeFlags Attribute > Contains a number marking various flags for a node. Unknown flags may be > ignored. > 1&=this node is compound; > 2&=dictionary is static within this node. > > XLIFF.O:JUNK Tag/Attribute > Marks a space as being "junk", thus allowing leaving some space for new > tags, attributes, or padding. > > > > XML in XLIFF > > There are 2 ways to do XML in XLIFF: > A unified document (all the content, or at least the toplevel, is XML); > A mixed document (the toplevel is not necissarily XML). > > In the unified document case the toplevel tag is 'XML' (with at least the > default namespace declared as being XML), which may contain any xml header > tags and the xml root. > Namespace declarations are to be converted to XLIFF style. > > In the mixed document case, at least the basic xml namespaces are to be > declared in the file toplevel (along probably with others). > The nstype for XML is 'xliff:binxml'. > > String Globs > All textual data will be represented by a number of strings stuck end to > end. > These will use a "textglob dictionary", which will follow the same rules as > that for the tag dictionary. > > Attribute data is defined as a glob of strings. > > An empty tag value flags a glob of textual data. The body for this is a > string glob. > > -- > > > here is a small fragment from the test app: > ---- > int EncodeXMLNode(XLIFFW_Context *ctx, NetParse_Node *node) > { > NetParse_Attr *acur; > NetParse_Node *ncur; > > if(node->text) > { > XLIFFW_BeginTag(ctx, "", ""); > XLIFFW_BeginAttrs(ctx); > XLIFFW_EndAttrs(ctx); > > XLIFFW_BeginBody(ctx); > XLIFFW_WriteString(ctx, node->text); > XLIFFW_EndBody(ctx); > XLIFFW_EndTag(ctx); > > return(0); > } > > XLIFFW_BeginTag(ctx, node->ns, node->key); > XLIFFW_BeginAttrs(ctx); > > acur=node->attr; > while(acur) > { > XLIFFW_BeginAttr(ctx, acur->ns, acur->key); > XLIFFW_WriteString(ctx, acur->value); > XLIFFW_EndAttr(ctx); > acur=acur->next; > } > > if(node->first)XLIFFW_NodeFlagsAttr(ctx, XLIFF_NFL_COMPOUND); > XLIFFW_EndAttrs(ctx); > > XLIFFW_BeginBody(ctx); > ncur=node->first; > while(ncur) > { > EncodeXMLNode(ctx, ncur); > ncur=ncur->next; > } > > XLIFFW_EndBody(ctx); > XLIFFW_EndTag(ctx); > > return(0); > } > -- > > and another: > ---- > > n=NetParse_XML_LoadFile("form0.xml"); > > wctx=XLIFFW_OpenWrite("test0.xliff"); > > XLIFFW_WriteHeader(wctx, "xliff:test", 3); > > // XLIFFW_BeginDictStrings(wctx); > // WriteXMLDictNode(wctx, n); > // XLIFFW_EndDictStrings(wctx); > > XLIFFW_BeginTag(wctx, "", "XML"); > XLIFFW_BeginAttrs(wctx); > XLIFFW_BindNSAttr(wctx, "", "xliff:binxml", ""); > XLIFFW_NodeFlagsAttr(wctx, XLIFF_NFL_COMPOUND); > XLIFFW_EndAttrs(wctx); > > XLIFFW_BeginBody(wctx); > > EncodeXMLNode(wctx, n); > > XLIFFW_EndBody(wctx); > XLIFFW_EndTag(wctx); > > XLIFFW_WriteEOF(wctx); > XLIFFW_DestroyContext(wctx); > > rctx=XLIFFR_OpenRead("test0.xliff"); > DumpNodes(rctx); > > XLIFFR_DestroyContext(rctx); > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|