|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: UTF-8+names
From: "Tim Bray" <tbray@t...> > Miles Sabin wrote: > > > <?xml encoding&=;"UTF-8+names"?> > > > > How would that get along with Appendix F style encoding detection? > > Since there is no replacement named '&=;', this would be passed to the > XML processor exactly as you see it there. The XML processor would > (correctly) throw it on the floor, because it's not well-formed. > Where's the problem? -Tim 1. Bad example, but the underlying issue is real. Specifically, the spec should ensure and comment on the fact that no character in UTF-8+names can validly appear in an xml declaration and thus can't interfere with encoding detection. I mean explicitly, not just that the list doesn't happen to include any of them at this stage of development. 2. You complained about not hearing from the constituencies whose problems this proposal was intended to address. As an XML editor provider, I'm in that group once removed. I hear their requirements in this area, which usually surface as questions about how to use DTD entities along with their favorite non-DTD schema language (and frustration when they find they often can't because XML does not require parsers to support entities apart from validation). If they could do this, I doubt you would ever hear from this constituency again. Conversely, the main core constituency requirement UTF-8+names doesn't address is user-defined entities, so if you do UTF-8+names instead, the constituency will still be unsatisfied. 3. The proposal is certainly a pain for an editor provider. It effectively introduces an extra stage of processing in between b and c in a) detect encoding, b) translate encoding to Unicode, c) parse XML. An editor must make this extra stage an option. Many users will prefer not to do the latter translation, so they can see the "entities" as entities and edit them as such. But since these entities are not seen by an XML processor and can go into XML names, the editor may either disallow this, and not conform to the UTF-8+names specification, or implement a new flavor of XML parsing that accommodates them in XML names. A number of interesting quality issues arise from this. How is name comparison defined? The UTF-8+names spec does not restrict users from using a name in one instance with embedded "entities" and in another instance without. If the two instances are to be seen as equal, the editor must keep a separate, internal representation of names for comparison purposes. I hope this scenario sounds familiar to other editor-writers; it parallels what one had to go through for editors _not_ based on Unicode, a fairly giant step backward. Many editors offer on-the-fly validation, but of course documents with entity references in names are not valid, so the document must be translated from the editor's representation to the parser's for every validation. Since the pseudo-entities can also appear where DTD-defined entities can appear, the editor must be able to tell them apart. When a user has explicitly defined a character entity with the same name as one of the UTF-8+names entities but a different definition, and the user hovers the cursor over the entity, which definition should the editor display? Both? The UTF-8+names list of names is rather large (and probably growing, as other constituencies weigh in), arguably much larger than most documents will require. In an editor, when the user types an & or requests code assist, what list of names should they be shown? The union of the DTD entities and the +names? It is well-known that very long lists do not work well with popup lists; users have trouble navigating them without the cursor falling off the list and can't read them to prompt themselves for plausible entries. Should the editor therefore add three different ways for the user to ask for entity name assistance? More options mean more confusion. I have no doubt that some programmer can code up a bunch of Emacs macros that address these concerns (except the ease of use parts). But, here's a news flash, most people don't use Emacs. The list of editors applied to XML (not to mention the non-XML uses of UTF-8+names) is quite large and some have quite large constituencies themselves who are highly resistant to changing editors. Specifications like this should take into account the amount of grief implied for editor providers and the consequent slow introduction of satisfactory tools support. On the other hand, if you want to make this easier for tools to support, here are three suggestions: Use a different character than &. In your Use With XML section specifically disparage the use of UTF-8+names in XML element and attribute names. Drop the &&; escape; you don't need it and editors certainly don't need any more complexity in entity expansion. Bob Foster
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








