[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A plea for Sanity
Joe, Great post. Jonathan ----- Original Message ----- From: "Joe English" <jenglish@f...> To: <xml-dev@l...> Sent: Friday, April 05, 2002 11:43 AM Subject: A plea for Sanity > > [ Also sent to xml-names-editor@w... ] > > "Namespaces in XML 1.1 Requirements" cites the ability to "undeclare" > a namespace as the principal (only?) new needed feature, because > of the case where: > > | information items [...] from another document [...] may > | have fewer in-scope namespaces than their parent. There is > | no mechanism for accurately serializing this situation. If > | the infoset is naively serialized and reparsed, the children > | will end up with additional namespace information items which > | serve no useful purpose. > > I believe that this requirement is ill-considered. > > Under SGML and XML 1.0, applications can treat generic > identifiers as atomic strings; with XML 1.0 + Namespaces, > element and attribute names become compound objects consisting > of a URI and a local name. This complicates applications a bit, > but by itself is not an onerous burden: toolkits like SAX can > provide namespace processors that keep track of the namespace > environment, map GIs to {URI+localname} pairs, and throw away > the original namespace declarations. > > The real complexity starts to show up in applications which > themselves need to keep track of the namespace environment > (e.g., XSLT). This is usually required for applications that > need to reserialize an Infoset as XML and wish to retain > the original namespace prefixes on output. (It gets hairier > for markup vocabularies that include QNames in content, but that's > a different issue.) > > But the new requirement implies that the *exact set of in-scope > namespaces at each node* is an essential part of the Infoset. > This is the part that I think is ill-considered. This property > should be deemed inessential, just as whitespace in tags and the > order of attribute value specifications are deemed inessential. > XML-related specifications should not expect or demand that it be > preserved; any set of namespace declarations that produce the same > {URI+localname} pairs after namespace processing should be considered > equivalent. > > In particular, "additional namespace information items which > serve no useful purpose" -- and hence do not affect the interpretation > of QNames in markup or content -- should not matter. Applications > should be free to insert or discard them as they see fit without > changing the meaning of the Infoset. > > * * * > > Now a plea for sanity. > > (This is for people who design XML vocabularies and applications; > xml-names-editor, I know you're busy, so you can stop reading here.) > > There are certain practices which, if avoided, can make life > simpler for application and toolkit developers. These are > all legal according to the Namespaces REC, and I don't suggest > that they be disallowed in XML 1.1, but it may be beneficial > for individual applications to disallow them. > > Some definitions: > > Let's say that an XML document is _neurotic_ if it maps the same > namespace prefix to two different namespace URIs at different > points. Neurosis makes it necessary for XML processors to > work with {URI+localname} pairs instead of GIs, and to keep > track of the namespace environment at each point in the tree > if there are QNames-in-content. If it weren't for neurosis, > applications could use a single namespace map that applied to > the entire document. > > Conversely, a document is _borderline_ if it maps two different > namespace prefixes to the same namespace URI. Borderline documents > complicate reserialization: the choice of which prefix to > use for a particular {URI+localname} pair depends on its > position in the tree. > > A document is _psychotic_ if it maps two different namespace prefixes > to the same URI _in the same scope_. Psychosis presents an even > bigger difficulty for reserialization: now applications must keep > track of the original prefix as well as the {URI+localname} pair. > > A document is _normal_ (or _in namespace-normal form_) if all > namespace declarations appear on the root element and it is > not psychotic. (A borderline document with all namespace > declarations in the same place is automatically psychotic; > a neurotic document with this property would be illegal according > to the Namespaces REC.) > > Normal documents are the easiest to process: the application can > determine the global namespace environment at the beginning of the > parse, and can use it throughout processing. > > It's not always possible to produce normal documents -- the producer > might not know all of the relevant namespaces at the time it emits > the root element start-tag -- so a weaker definition is useful: > A document is _sane_ if it is neither neurotic nor borderline. > > Document producers should be designed to emit sane documents. > > This is not hard to do -- the serializer just needs to maintain > a monotonic, bijective URI/prefix map and reuse the same prefix > whenever a namespace URI leaves and comes back into scope. > ("Bijective": there is precisely one URI for each prefix and > one prefix for each URI; by "monotonic" I mean that prefix+URI > pairs may be added to the map but not removed.) > > A sane document can be transformed into a normal document simply > by moving all namespace declarations to the root element and > filtering out duplicates. (This can't be done in streaming > mode, but it might be an appropriate technique for XML databases.) > > Now general-purpose XML consumers cannot expect to receive sane > documents. However *special-purpose* consumers, designed to work > with specific markup vocabularies, can be a lot simpler if the > markup vocabulary includes namespace sanity as a requirement. > > As an application developer, I'd prefer not to have to worry > about namespace nodes or {URI+localname} pairs. I'd rather be > able to give the parser an internal namespace map describing > all the namespace URIs I'm interested in, and have the parser > translate QNames in markup to use my prefixes. Then the application > can work with GIs instead of {URI+localname} pairs. If the source > document is sane, then it's possible to preserve the original prefixes > on reserialization simply by remembering the original namespace map; > it's not necessary to keep track of namespace nodes during processing. > > QNames in content are a lot easier to process in a sane document. > Sanity guarantees that a given QName means the same thing wherever > it appears. Any future markup vocabulary which uses QNames in content > should include sanity as an application requirement. > > A requirement for sanity shifts part of the burden onto document > producers, where it's easy to handle. The alternative is maddening > complexity for document consumers. > > > --Joe English > > jenglish@f... > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|