[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Suggestion for an alternative XML 1.1
I am preparing an alternative proposal for XML 1.1, and I would appreciate any help from sympathetic people on this list. I think it is more productive to have concrete alternative proposals rather than merely raising issues. The basic idea of this is, following the idea attributed to James Clark, that we may as well put in some kind of layer to bring out character issues in XML. Actually, I take the reverse idea: we pull out character issues, to make the a lightweight version of XML. Where the current draft is very wrong is that it thows out the naming rules entirely, rather than shifting them to where they are appropriate: as part of validation. I suggest something along these lines: 1) It is called XML 1.1, if needed. 2) It converts NEL on input to #A. 3) No changes to XML whitespace rules 4) The definitions for WF and Valid XML be altered: i) WF XML is simplified: same as current WF except that naming rules are not used to parse the data, instead delimiters and whitespace are used. The data NEED NOT be normalized or checked for normalization.* Encoding errors SHOULD cause failure. Name errors NEED NOT be reported, except for the presence of control characters (as in the current Blueberry draft.) ii) Valid XML is made stricter and future-proofed: same as current validity, except that normalization must be performed before comparing identifiers. The current naming list should be made advisory, and a formula for creating the specific list using the Unicode 3.* identifier properties should be drawn up: this way the XML 1.1 spec formally tracks Unicode, and it should mention that because there is scope for the libraries on a particular system to not be on a previous version of Unicode, use of characters introduced into Unicode in the previous two or three years (i.e. in Unicode 3.1 ) as markup is deprecated as unsafe. So after two or three years, when libraries are presumed to been updated, those novel characters are automatically undeprecated, and the Unciode Consortium can keep on upgrading Unicode 3.*. (I would say that if Unicode wants a 4.0 sometime, that would indicate some major change or consolidation that would require special attention, such as an errata.) Encoding errors MUST cause failure. Incoming data MUST be checked for normalized, or (preferably) normalized.* I think moving this way would: 1) Provide the least disruptive way to satisfy the requirement for NEL 2) Track Unicode changes in a rational way, allowing use of the characters for people in controlled or regional environments (e.g. Japan), while spelling out the risks clearly and promoting a timetable for Unicode upgrades to be deployed. 3) Simplify XML by not need character tables or Unicode libraries for WF-checking. Obviously the current XML WG is keen on simplifying things that don't affect them (which may be taken as an accusation as much as an observation) or they wouldn't have made their current proposal, and the changes I suggest would made a real difference in parsing rates, especially for non-ASCII names.** One wrinkle that should be addressed is then we can have documents with no DTD that can be "valid" and "invalid" (because of name-checking). If this is anomolous, then a three layer model could be introduced instead: "well-formed", "strictly well-formed" and "valid". The strictly well-formed would pretty much correspond to current WF, and the WF would be the lightweight WF I suggest above. The strictly WF is also a convenient slot for namespace naming rules in an XML 2.0. XML Schemas etc. should specify that they require an infoset from a "strictly WF" document. Cheers Rick Jelliffe * The reason it should not be an error to find unnormalized data in the simple cases is that the normalization state of data coming into the parser is dependent on the transcoder used, if any, and out of control of lay programmers to repair. For validation, I cannot see why the pupported security risk in allowing normalization coming into a generic XML parser (as distinct from a c14n-specific parser) should outweigh the advantages of normalizing incoming data. ** Why am I proposing simplification, when I am often on the ultra-conservative side? Well, it is simplification of parsing techniques that brings out something that was designed into XML 1.0: that whitespace and delimiters are all that really is needed for parsing. And we already have a mode for debugging and QA of XML: validation. Name-checking (and its vitally important side-effect, that transcoding is verified by name-checking) can be made part of validation without sacrificing much. We are not changing the language, just refactoring where checks should occur in a way that better suits high-volume processing and small devices.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|