[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: "Introducing MicroXML, Part 1: Explore the basicprinciples

  • From: David Lee <dlee@calldei.com>
  • To: Uche Ogbuji <uche@ogbuji.net>
  • Date: Sun, 15 Jul 2012 14:42:26 +0000

RE:  "Introducing MicroXML


=====================  Uche says
Sure without a separator, you would simply have an closing document tag switch the serial docs parser to a state of looking for new start tag, DTDecl, PI or end of stream, but I think an explicit separator would reduce the cases where what we would think of now as malformedness from user error winds up looking like an intentional sequence of two or more documents.
========

See my cross posted reference to 
http://xml.calldei.com/XDMSerialize

I had an "Ah Ha" Moment last week when I realized that the UTF8 BOM could serve as such a separator.
( I havent updated the above page to reflect this).

Why I stumbled on this is I had a concatenation of all things, a bunch of JSON documents in UTF8.
( in this case Twitter output) and they had UTF8 BOM at the beginning of each document but all in the same file.
I opened it in my favorite JSON reader app and Voila ! It opened just fine but only showed the first JSON document.
Then I realized that a use case I wanted for XDM Serialize is that a sequence of 1 take the same format as just 1,
Thus a single XML document (or any XDM value) would have the same serialized form as a sequence of 1 document.
This is somewhat tricky ... in conjunction with some other use cases. Such as the concatentation of 2 documents should produce 
a sequence of 2 documents.
Then I realized that if I used BOM as a separator it might actually work and plain XML parsers could read the degenerate case of 1 document.
If every document started like
BOM <data>
BOM <data>

Then by themselves they are valid XML documents
If you concatenate them they become
BOM <data> BOM <data>

which a XDM Serialized capable parser could parse, and in some cases "dumb" parsers might just see this as 1 document and stop.

This also means you can concatenate arbitrary documents with 0 or more sequences without inspecting them and without adding extra bytes.
And splitting, counting  document sequences requires only knowing how to read for BOM sequences.


Still its a bit of a misuse though but still I am intrieged.



----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.