RE: hello, new to list, thoughts

To: cr88192@h...
Subject: RE: hello, new to list, thoughts
From: Richard Bruin <rbru03@e...>
Date: 17 Nov 2004 17:29:36 +0000
Cc: xml-dev@l...
In-reply-to: <200411171648.iAHGmQx29167@s...>
Organization: Department of Earth Sciences
References: <200411171648.iAHGmQx29167@s...>

Play the video

Hi all,

Have you looked into BinX regarding binary XML formats? As far as I can
tell (and from only a brief knowledge of binx) your idea is fairly
similar in purpose (maybe in implementation, but I don't know enough to
say).

Just thought I'd mention it in case it can help to prevent needless
duplication of work,

Cheers

Rich

--------------------------------
Richard Bruin
PhD Student
Department of Earth Sciences
University of Cambridge

On Wed, 2004-11-17 at 16:44, David Lieberman AWDSF wrote:
> Neat!
> 
> Good ideas, all. (if you ask me) 
> 
> It'll be interesting to see when you're finished.
> 
> David Lieberman
> http://www.awdsf.com
> 
> 
> -----Original Message-----
> From: cr88192 [mailto:cr88192@h...] 
> Sent: Wednesday, November 17, 2004 6:55 AM
> To: xml-dev@l...
> Subject:  hello, new to list, thoughts
> 
> ok, I was lead here as, as far as I can tell, the group gmane.text.xml.devel
> 
> is a mirror of this list (given that a reference to this list is appended to
> 
> the messages there).
> I appologize if this is not the case.
> pardon if I am being a troll, I am new here.
> 
> I am mostly requesting general comments, eg, on things that could be 
> improved in my idea. I am not expecting anyone really to take me seriously.
> 
> 
> ok. so I have been recently doing something roughly along the lines of a 
> binary xml (not exactly, but I have designed it such that basic subset of 
> xml maps fairly well, and doesn't give up any real info in the conversion). 
> namespace definitions, however, are similar but not exactly the same, so 
> this would be left to the conversion tool.
> 
> I tried to retain what I felt to be the general spirit and semantics of xml,
> 
> as imo they seem quite decent for data (though, difficult to fully recognize
> 
> as such or reason about). imo, they are better than a plain tree. I had 
> considered stripping down the semantics at some points, but some things 
> seemed interdependent (one needs rules and complexity in some places to 
> grant freedoms in others, and one needs to figure out where to be strict and
> 
> where to be lax...).
> 
> why?
> personally, I don't feel that within xml's core domains (network 
> communications, messages, "documents", ...) a binary variant would be 
> particularly helpful, however:
> my interests lie more in data storage (for what xml works good for, I use 
> that);
> formats like riff, ebml, ... leave a lot to be desired imo wrt semantics;
> textual xml is not that great for data storage imo, eg, one can't skip over 
> data or jump around in the same way they can with, say, riff, and imo base64
> 
> coding would not be very good for cases when size is a priority, or if much 
> of the data is large binary chunks.
> 
> in a kind of ideological frenzy, I beat something together, and have spent a
> 
> while refining it.
> 
> I also have a basic implementation (not yet online, I have been a bit behind
> 
> on this kind of thing recently).
> not much coding has been being done recently (largely I am stuck fiddling 
> with the details of the design, among other things not related to this).
> 
> things like file size or processor overhead are not high priorities here, I 
> just try to save space when possible, and avoid wasting too much processor 
> time. my results thus far have shown the generated files to be slightly 
> smaller than the input xml (presently lacking any kind of content string 
> compression, though tag compression is done). this could be viewed as a good
> 
> sign I guess.
> 
> 
> 
> here is a recent draft of the spec:
> ----
> XLIFF (0.1.1):
> Partially fueled by an argument about EBML, I came up with this.
> It has little to do with LIFF, but, hell, I am not really using LIFF (it 
> ended up too close to RIFF and too generally ugly...).
> This format shows itself to be kind of a pain to code up, but this is not 
> entirely unexpected.
> 
> Cleaning up spec some from 10-29 version, minor alterations.
> Stripping out container stuff, as it clutters the spec and doesn't really 
> make sense in this context anyways (keeping the old version as the idea may 
> be adaptable to a different format).
> 
> Goals:
> A binary format with similar flexibility to XML (X);
> Support for large files and datasets (L);
> Sort of like RIFF and IFF (IFF).
> 
> This will be a TLV format with attributes and namespaces similar to XML.
> It will use a tag dictionary to help reduce the total file size.
> It should be acceptable for random access and "big chunks of data" style 
> uses (like RIFF, IFF, and EBML).
> 
> 
> Numbers:
> The MSB (bit 7) serves to indicate the precense of following (higher order) 
> bytes.
> 
> 0xxxxxxx, -64..63
> 1xxxxxxx 0xxxxxxx, -8192..8191
> 1xxxxxxx 1xxxxxxx 0xxxxxxx, -1M..1M
> ..
> 
> Values are in Low-High order and with 2s complement encoding.
> As a result, the sign is implicitly contained in bit 6 of the last byte.
> 
> The maximum value of a number depends on the implementation, and an 
> implementation is allowed to refuse overly large numbers.
> However, I will spec that the limit should be at least 32 bits (a 35 bit 
> number with the upper 4 bits either all 0 or 1).
> 
> 
> Strings:
> {
> Number len; //length if >0, dictionary index if <0, empty string if 0
> if(len>0)byte str[len];
> }
> 
> Strings may be indexed in a dictionary. The exact semantics for dictionaries
> 
> will depend on context.
> 
> Node:
> {
> String ns;
> String tag;
> Number alen;
> if(alen>0)byte attr[alen];
> Number dlen;
> if(dlen>0)byte data[dlen]; //the contents depend on the ns and tag
> }
> 
> Attr:
> {
> String ns;
> String tag;
> Number dlen;
> if(dlen>0)byte data[dlen];
> }
> 
> In tags and attributes, negative chunk lengths are reserved.
> 
> Implicit XLIFF attributes are allowed in nodes without restrictions as they 
> are not generally viewed as part of the content.
> Duplicate attributes are not allowed.
> Attributes may not contain either tags or other attributes (other 
> non-nestable structures will generally be allowed). Attributes should be 
> kept small in both size and number.
> 
> 
> 
> Tag Dictionary:
> There will be a dictionary responsible for namespaces, tags, and attribute 
> names:
> This dictionary will behave similarly to a stack;
> Any new strings are added to the end of the current dictionary level 
> (encoded directly and not allready present);
> On descent into a node, the dictionary is retained from the parent, creating
> 
> a new level;
> Any new strings are added to the end of the current dictionary level;
> On exit from a node, any strings added in that level are removed (making it 
> as if the descent had not occured).
> 
> The use of a dictionary allows denser packing (due to, eg, tags being 1 or 2
> 
> bytes).
> Fairly dense packing might require building the entire dictionary upfront, 
> but an encoder can have less dense packing, eg, by just encoding strings 
> directly.
> The need for upfront dictionaries for packing to work well is related to 
> ideas for allowing faster processing by not having to descend into subnodes 
> to build an up-to-date dictionary, and also to allow random access in some 
> cases.
> 
> Body Dictionaries:
> Each namespace will also have a "body dictionary", which may be used for 
> compressing content strings in a content specific manner. The maintainence 
> of these dictionaries is largely left to the format in question (however, 
> they will implicitly pop off anything added within a node).
> 
> The format is a tree, with the default toplevel tag flagging the format.
> 
> Special tags could exist for maintenence purposes (eg: adding a basic set of
> 
> common strings to the dictionary, ...).
> Like XML, some special tags may exist prior to the root to define things 
> (the base dictionary, ...).
> 
> 
> Namespaces:
> The empty string namespace is the "default" namespace.
> 
> Except builtin XLIFF namespaces (default, XLIFF, ...) namespaces are to be 
> declared prior to use.
> Formats are given control over how namespaces are used/defined.
> 
> Namespaces refer to several URI's:
> the Type URI, which defines the physical type of the container (eg: XML).
> the Namespace URI, which defines the semantic type of the container (eg: an 
> XML Namespace).
> 
> Any namespaces beginning with "XLIFF" are reserved for use by XLIFF.
> Further:
> XLIFF.*: basic XLIFF namespaces, failure to understand tags/attributes 
> should cause failure;
> XLIFF.S.*: semantic XLIFF namespaces, these are allowed to be ignored, but 
> should be preserved;
> XLIFF.O.*: optional XLIFF namespaces, these may be ignored or stripped off 
> without effecting content.
> 
> There will be a basic "XLIFF" namespace, and being unable to parse tags in 
> this namespace will be viewed as an error (this namespace will handle things
> 
> which may change the format of subsequent data, effect dictionaries, ...). 
> XLIFF attributes are required to be understood before attempting to parse 
> the contents of a node.
> 
> "XLIFF.S" will be used for semantic XLIFF tags, failure to understand them 
> will not compromise decoding of the format.
> 
> "XLIFF.O" will be used for optional XLIFF tags, failure to understand or 
> removal of them will not compromise decoding of the format.
> 
> "XLIFF.NS" could be a namespace for namespace declatations (like in XML). 
> eg, "XLIFF.NS:foo" as an attribute could declare a foo namespace.
> the content of these tags could be an array of pairs of strings, eg:
> "TypeURI", "xliff:foo_container", "NSURI", "xliff:bar_ns".
> 
> 
> XLIFF:Header
> An tag required at the start of an XLIFF file serving to mark it as a valid 
> XLIFF file, and to give general info about the file.
> 
> XLIFF:TypeName Header Attribute, gives a general "file type name" used for 
> identifying the type (appart from examining the contents or namespaces). It 
> is encoded as a raw string.
> 
> XLIFF:HeaderFlags Header Attribute
> Contains a number marking various flags for a file. Unknown flags may be 
> ignored.
> 1&=dictionary is static within the file.
> 
> Other attributes may be found in the header besides those related to XLIFF. 
> An example would be custom or format specific tags.
> 
> 
> XLIFF:DictStrings Tag
> Defines a glob of strings to be added to the tag dictionary.
> This may be used for reducing the number of occurances of some common tag 
> which may only occure in sublevels or such.
> 
> XLIFF:NodeFlags Attribute
> Contains a number marking various flags for a node. Unknown flags may be 
> ignored.
> 1&=this node is compound;
> 2&=dictionary is static within this node.
> 
> XLIFF.O:JUNK Tag/Attribute
> Marks a space as being "junk", thus allowing leaving some space for new 
> tags, attributes, or padding.
> 
> 
> 
> XML in XLIFF
> 
> There are 2 ways to do XML in XLIFF:
>  A unified document (all the content, or at least the toplevel, is XML);
>  A mixed document (the toplevel is not necissarily XML).
> 
> In the unified document case the toplevel tag is 'XML' (with at least the 
> default namespace declared as being XML), which may contain any xml header 
> tags and the xml root.
> Namespace declarations are to be converted to XLIFF style.
> 
> In the mixed document case, at least the basic xml namespaces are to be 
> declared in the file toplevel (along probably with others).
> The nstype for XML is 'xliff:binxml'.
> 
> String Globs
> All textual data will be represented by a number of strings stuck end to 
> end.
> These will use a "textglob dictionary", which will follow the same rules as 
> that for the tag dictionary.
> 
> Attribute data is defined as a glob of strings.
> 
> An empty tag value flags a glob of textual data. The body for this is a 
> string glob.
> 
> --
> 
> 
> here is a small fragment from the test app:
> ----
> int EncodeXMLNode(XLIFFW_Context *ctx, NetParse_Node *node)
> {
>  NetParse_Attr *acur;
>  NetParse_Node *ncur;
> 
>  if(node->text)
>  {
>   XLIFFW_BeginTag(ctx, "", "");
>   XLIFFW_BeginAttrs(ctx);
>   XLIFFW_EndAttrs(ctx);
> 
>   XLIFFW_BeginBody(ctx);
>   XLIFFW_WriteString(ctx, node->text);
>   XLIFFW_EndBody(ctx);
>   XLIFFW_EndTag(ctx);
> 
>   return(0);
>  }
> 
>  XLIFFW_BeginTag(ctx, node->ns, node->key);
>  XLIFFW_BeginAttrs(ctx);
> 
>  acur=node->attr;
>  while(acur)
>  {
>   XLIFFW_BeginAttr(ctx, acur->ns, acur->key);
>   XLIFFW_WriteString(ctx, acur->value);
>   XLIFFW_EndAttr(ctx);
>   acur=acur->next;
>  }
> 
>  if(node->first)XLIFFW_NodeFlagsAttr(ctx, XLIFF_NFL_COMPOUND);
>  XLIFFW_EndAttrs(ctx);
> 
>  XLIFFW_BeginBody(ctx);
>  ncur=node->first;
>  while(ncur)
>  {
>   EncodeXMLNode(ctx, ncur);
>   ncur=ncur->next;
>  }
> 
>  XLIFFW_EndBody(ctx);
>  XLIFFW_EndTag(ctx);
> 
>  return(0);
> }
> --
> 
> and another:
> ----
> 
>  n=NetParse_XML_LoadFile("form0.xml");
> 
>  wctx=XLIFFW_OpenWrite("test0.xliff");
> 
>  XLIFFW_WriteHeader(wctx, "xliff:test", 3);
> 
> // XLIFFW_BeginDictStrings(wctx);
> // WriteXMLDictNode(wctx, n);
> // XLIFFW_EndDictStrings(wctx);
> 
>  XLIFFW_BeginTag(wctx, "", "XML");
>  XLIFFW_BeginAttrs(wctx);
>  XLIFFW_BindNSAttr(wctx, "", "xliff:binxml", "");
>  XLIFFW_NodeFlagsAttr(wctx, XLIFF_NFL_COMPOUND);
>  XLIFFW_EndAttrs(wctx);
> 
>  XLIFFW_BeginBody(wctx);
> 
>  EncodeXMLNode(wctx, n);
> 
>  XLIFFW_EndBody(wctx);
>  XLIFFW_EndTag(wctx);
> 
>  XLIFFW_WriteEOF(wctx);
>  XLIFFW_DestroyContext(wctx);
> 
>  rctx=XLIFFR_OpenRead("test0.xliff");
>  DumpNodes(rctx);
> 
>  XLIFFR_DestroyContext(rctx);
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>

Follow-Ups:
- Re: hello, new to list, thoughts
  - From: "cr88192" <cr88192@h...>

References:
- RE: hello, new to list, thoughts
  - From: "David Lieberman AWDSF" <david@a...>

Prev by Date: Last Call of xml:id is published
Next by Date: XTech 2005, Gilbane Conference to Co-locate in Amsterdam 24-27 May
Previous by thread: RE: hello, new to list, thoughts
Next by thread: Re: hello, new to list, thoughts
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >