[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: MicroXML

  • From: Amelia A Lewis <amyzing@talsever.com>
  • To: xml-dev@lists.xml.org
  • Date: Tue, 14 Dec 2010 01:00:28 -0500

Re:  MicroXML
On Tue, 14 Dec 2010 11:35:31 +0700, James Clark wrote:
>> How do I tell whether it's safe to use my uXML parser instead of my
>> (heavier) XML 1.0 + Namespace in XML + XML:Base + XML:ID + whatever
>> parser?
> Given that MicroXML is designed to be a subset, how could there be a
> reliable in-band mechanism to tell you?  Anything you might put in the

Well, if MicroXML hadn't ruled out the use of most of the available 
indicators, then certainly something like a PI would be possible.

> document, has to be legal XML 1.0, so it can't be a reliable indicator that
> it's MicroXML rather than XML 1.0. Similar problem with telling how to use
> MicroXML rather than HTML.  I don't think this is any different from
> problems we have today. How do you choose between an HTML, XML or an SGML

<html>  Not in a namespace?  It's HTML.  In the XHTML namespace?  
XHTML.  Not <html>?  XML.  There's potential confusion for XML vs SGML 
if there is no XML declaration and there is a doctype declaration 
containing at least a system ID.  Hmmm.  Well, the available BNF 
suggests that the SGML declaration is not optional, either.  
http://xml.coverpages.org/sgmlsyn/index.htm, and especially sgmlsyn.htm 

> parser? There's no reliable in-band mechanism.  In the end, you have to rely
> on out-of-band information.

Perhaps.  I've been involved (somewhat peripherally) in SGML-related 
code (for a parser/validator capable of handling XML and SGML (and some 
other things) (proprietary software)).  For performance, we used 
standard XML processors; even in 2000/2001 (when it was a live product) 
instances of XML outnumbered instances of SGML encountered by a 
significant factor (in particular environments, the reverse was true, 
but they didn't mind adding XML parsing to SGML--whereas adding SGML 
support to XML lost the value of XML, most thought).

I don't think MicroXML reaches that standard.

I'm not concerned about distinguishing (Micro)XML from HTML--or from 
images, or from other easily recognizable file types.  The question, 
which I think is important, is how to safely use a small, fast MicroXML 
parser--rather than starting to use it, throwing away the results, and 
falling back to an XML parser.

> Nonetheless I can think of some heuristics.  I suspect it is very unusual in
> XML to have a DOCTYPE declaration with neither an internal nor an external
> subset.  Thus such a DOCTYPE declaration (regardless of the DOCTYPE name)
> could be a good indicator.

All right.  This means that MicroXML cannot be embedded, unless the 
doctype declaration is stripped (or the Root Element Type validity 
constraint is to be ignored 
(http://www.w3.org/TR/REC-xml/#vc-roottype)).  The only potential for 
confusion is for HTML5 polyglot markup; almost any other use case for 
XML is going to include either a system id with URI or a public id 
(with FPI and uri).

> I think the general policy has to be that if you don't have out of band
> information, then use the more liberal format (ie XML or HTML5 rather than
> MicroXML).

Oops.  Is MicroXML actually attractive enough to see significant takeup 
if the recommendation is that safe parsing in the absence of an 
out-of-band indicator is to use something else?

Ah, well.  It appears that this proposal is targeted 
primarily-nearly-exclusively toward bridging XML with HTML5?  Is that a 
fair characterization?  If so, I'll slide off and stop being annoying 
(I don't have any interest worth mentioning in the behavior of 

I'd like to see a 'next generation'.  I'm starting to wonder if we 
haven't got at least a couple or three different use cases:

a) the confluence-with-the-browser case, where JSON and HTML5 are going 
to be mentioned and targeted, where removing namespaces is accepted as 
a near-given, but extensibility doesn't seem particularly important;

b) the xml-over-the-network case (including exempli gratia SOAP, but 
also less RPC-ish document/resource interactions), where the doctype 
decl has long been forbidden, and namespace improvements would be grand 
but no one can afford to throw out the distributed-authority baby with 
the prefix-mapping bathwater, and 'typing' (fsvo 'typing') is liable to 
be an issue;

c) the document/store case (likely including databases), where the 
entire prolog causes discomfort, and namespace simplification is 
regarded as unattainable utopianism, again with at least a part of the 
crowd concerned with 'typing'; this case may also include those doing 
extensive pipeline processing (or perhaps that's yet another use case).

I'm probably not outlining the groups well.  What all have in common, I 
think: elements good, attributes good; things that aren't elements or 
attributes bad (comments seem to be tolerated better than PIs or 
anything from the prolog).  I don't know that the browser-case 
antipathy to extensibility via a distributed authority can be 
reconciled with the much less drastic desire to address the various 
(and variously interpreted or understood) shortcomings of the 
namespaces specification for establishing a distributed authority for 

Hmmmmm.  *shrug*

Amelia A. Lewis                    amyzing {at} talsever.com
Yankees are compelled by some mysterious force to imitate Southern 
accents and they're so damn dumb they don't know the difference beween
a Tennessee drawl and a Charleston clip.
                -- Rita Mae Brown, "Rubyfruit Jungle"

  • References:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.