Re: XML 2 so far
Most of the problems with this come from the malcoding of RSS documents. I'd argue that a regex filter preprocess of such files would make not just this but a number of issues with XML go away.
Much of the difficulty here comes with legacy code; there's a lot of XML that was encoded as ISO-8559-1 early on that's still in the system. Agreed, would like to see UTF-8 become standard.
Agreed - entirely too much of my career has been spent recoding HTML encodings to their numeric equivalents. The encoding tables are well defined and would not take up a significant amount of memory or processing time on today's systems. There is some interesting work that was done in XSLT2 on character encodings and mappings that should also be pushed into the parser.
Agreed. Internal subset processing introduces semantics and complexity that would be better handled via a transformation process or some other formal processing tool post facto.
This is one area where I'd be inclined to disagree. I think that there is a technical need, though not necessarily one that shows up in HTML. The primary use case I see here comes in query operations; most queries return multiple nodes of content (thinking XML databases here), with the enclosing node added primarily because XML currently does not support it (this is akin to retrieving a JSON array).
I've long referred the lax syntax argument as being the "Grandmother Argument" - that my grandmother should be able to write invalid (fill in the blank language) and the system should be able to handle this laxness. It's a weak argument in HTML (if only because I believe that the amount of HTML being written by hand is a small and (more importantly shrinking) percentage of the overall production of HTML as more and more of it gets produced by automated mechanisms), but it's a terribly argument in XML, in great part because the only way you can derive even marginal semantics is by incorporating an XSD or similar type definition language, and the ability to introduce mechanisms to compensate for such laxness assuming a greater degree of competency in schema design than I've seen evinced in most XSD developers.
What this does imply is that if a decision to create lax XML is permitted, there needs to be a way of introducing into the schemas some way of defining how such laxness is handled - This would be analogous to saying that if you have a P tag that the tag would be lax (would resolve with no terminating tag) if a given set of opening tags were encountered (<P>,<DIV>,<Hn>, etc.). I don't necessarily see this as being a bad solution, but it would put more onus on the schema developer and would require rethinking XSDs in particular. Other areas, such as <b><i> inversions (</b></i>) might also require such a set of rules.
The question here is whether this benefits any language other than HTML?
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format