Re: A plea for Sanity

To: <xml-dev@l...>,"Joe English" <jenglish@f...>
Subject: Re: A plea for Sanity
From: "Jonathan Borden" <jborden@a...>
Date: Sat, 6 Apr 2002 11:25:02 -0500
References: <200204051643.g35Gh4s10393@d...>

Play the video

Joe,

Great post.

Jonathan

----- Original Message ----- 
From: "Joe English" <jenglish@f...>
To: <xml-dev@l...>
Sent: Friday, April 05, 2002 11:43 AM
Subject:  A plea for Sanity


> 
> [ Also sent to xml-names-editor@w... ]
> 
> "Namespaces in XML 1.1 Requirements" cites the ability to "undeclare"
> a namespace as the principal (only?) new needed feature, because
> of the case where:
> 
> | information items [...] from another document [...] may
> | have fewer in-scope namespaces than their parent.  There is
> | no mechanism for accurately serializing this situation. If
> | the infoset is naively serialized and reparsed, the children
> | will end up with additional namespace information items which
> | serve no useful purpose.
> 
> I believe that this requirement is ill-considered.
> 
> Under SGML and XML 1.0, applications can treat generic
> identifiers as atomic strings; with XML 1.0 + Namespaces,
> element and attribute names become compound objects consisting
> of a URI and a local name.  This complicates applications a bit,
> but by itself is not an onerous burden: toolkits like SAX can
> provide namespace processors that keep track of the namespace
> environment, map GIs to {URI+localname} pairs, and throw away
> the original namespace declarations.
> 
> The real complexity starts to show up in applications which
> themselves need to keep track of the namespace environment
> (e.g., XSLT).  This is usually required for applications that
> need to reserialize an Infoset as XML and wish to retain
> the original namespace prefixes on output.  (It gets hairier
> for markup vocabularies that include QNames in content, but that's
> a different issue.)
> 
> But the new requirement implies that the *exact set of in-scope
> namespaces at each node* is an essential part of the Infoset.
> This is the part that I think is ill-considered.  This property
> should be deemed inessential, just as whitespace in tags and the
> order of attribute value specifications are deemed inessential.
> XML-related specifications should not expect or demand that it be 
> preserved; any set of namespace declarations that produce the same 
> {URI+localname} pairs after namespace processing should be considered 
> equivalent.
> 
> In particular, "additional namespace information items which
> serve no useful purpose" -- and hence do not affect the interpretation
> of QNames in markup or content -- should not matter.  Applications
> should be free to insert or discard them as they see fit without
> changing the meaning of the Infoset.
> 
>  * * *
> 
> Now a plea for sanity.
> 
> (This is for people who design XML vocabularies and applications;
> xml-names-editor, I know you're busy, so you can stop reading here.)
> 
> There are certain practices which, if avoided, can make life
> simpler for application and toolkit developers.  These are
> all legal according to the Namespaces REC, and I don't suggest
> that they be disallowed in XML 1.1, but it may be beneficial
> for individual applications to disallow them.
> 
> Some definitions:
> 
> Let's say that an XML document is _neurotic_ if it maps the same
> namespace prefix to two different namespace URIs at different
> points.  Neurosis makes it necessary for XML processors to
> work with {URI+localname} pairs instead of GIs, and to keep
> track of the namespace environment at each point in the tree
> if there are QNames-in-content.  If it weren't for neurosis,
> applications could use a single namespace map that applied to
> the entire document.
> 
> Conversely, a document is _borderline_ if it maps two different
> namespace prefixes to the same namespace URI.  Borderline documents
> complicate reserialization: the choice of which prefix to
> use for a particular {URI+localname} pair depends on its
> position in the tree.
> 
> A document is _psychotic_ if it maps two different namespace prefixes
> to the same URI _in the same scope_.  Psychosis presents an even
> bigger difficulty for reserialization: now applications must keep
> track of the original prefix as well as the {URI+localname} pair.
> 
> A document is _normal_ (or _in namespace-normal form_) if all
> namespace declarations appear on the root element and it is
> not psychotic.  (A borderline document with all namespace 
> declarations in the same place is automatically psychotic;
> a neurotic document with this property would be illegal according
> to the Namespaces REC.)
> 
> Normal documents are the easiest to process: the application can
> determine the global namespace environment at the beginning of the
> parse, and can use it throughout processing.
> 
> It's not always possible to produce normal documents -- the producer
> might not know all of the relevant namespaces at the time it emits
> the root element start-tag -- so a weaker definition is useful:
> A document is _sane_ if it is neither neurotic nor borderline.
> 
> Document producers should be designed to emit sane documents.
> 
> This is not hard to do -- the serializer just needs to maintain
> a monotonic, bijective URI/prefix map and reuse the same prefix
> whenever a namespace URI leaves and comes back into scope.
> ("Bijective": there is precisely one URI for each prefix and
> one prefix for each URI; by "monotonic" I mean that prefix+URI
> pairs may be added to the map but not removed.)
> 
> A sane document can be transformed into a normal document simply
> by moving all namespace declarations to the root element and
> filtering out duplicates.  (This can't be done in streaming
> mode, but it might be an appropriate technique for XML databases.)
> 
> Now general-purpose XML consumers cannot expect to receive sane
> documents.  However *special-purpose* consumers, designed to work
> with specific markup vocabularies, can be a lot simpler if the
> markup vocabulary includes namespace sanity as a requirement.
> 
> As an application developer, I'd prefer not to have to worry
> about namespace nodes or {URI+localname} pairs.  I'd rather be
> able to give the parser an internal namespace map describing
> all the namespace URIs I'm interested in, and have the parser
> translate QNames in markup to use my prefixes.  Then the application
> can work with GIs instead of {URI+localname} pairs.  If the source
> document is sane, then it's possible to preserve the original prefixes
> on reserialization simply by remembering the original namespace map;
> it's not necessary to keep track of namespace nodes during processing.
> 
> QNames in content are a lot easier to process in a sane document.
> Sanity guarantees that a given QName means the same thing wherever
> it appears.  Any future markup vocabulary which uses QNames in content
> should include sanity as an application requirement.
> 
> A requirement for sanity shifts part of the burden onto document
> producers, where it's easy to handle.  The alternative is maddening
> complexity for document consumers.
> 
> 
> --Joe English
> 
>   jenglish@f...
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
> 
>

References:
- A plea for Sanity
  - From: Joe English <jenglish@f...>

Prev by Date: Namespaces alternative? (was Re:WD for Namespaces 1.1)
Next by Date: A free sanitizer(?) (was Re: A plea for Sanity)
Previous by thread: Re: A plea for Sanity
Next by thread: A free sanitizer(?) (was Re: A plea for Sanity)
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >