[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] SAX-ext proposal #3: entity encoding, version
Locator infoset extensions - Two of the infoset properties for documents are not supported by the current SAX2 API (including extensions): the character encoding used, and the XML version used. - These are actually characteristics of all parsed entities, not just the document entity, just like the [base URI] currently exposed through the Locator interface. - There may be up to three kinds of encoding name to be concerned with: * What's declared inline, using an xml/text decl, or defaulted (UTF-8, UTF-16) * Sometimes an external declaration, through MIME type, which is authoritative but which may not agree with the inline decl * For Java, the name of the encoding actually used by a Reader will often not match the "winning" declaration name. (For one example, "UTF8" really means "UTF-8".) The actual encoding used affects the kind of Unicode normalizations that need to be done. That's what the infoset needs (yes?), and it'd be the one that's declared (externally, else internally), a non-Java name. PROPOSAL - Define a new org.xml.sax.ext interface: public interface Locator2 extends Locator { public String getXMLVersion (); public String getEncoding (); } Strings returned would be the relevant values, or null if the values are not known. The encoding string would reflect the active declaration. That would be implemented by Locator objects provided in setDocumentLocator() callbacks, to expose this information. - Define a new org.xml.sax.ext class implementing that interface, inheriting from org.xml.sax.helpers.LocatorImpl - Define a new standard feature ID: http://xml.org/sax/features/use-locator2 Read-only If true, the Locator object passed in setDocumentLocator events will also implement the Locator2 interface, and can be cast to it. Note that because of the way Java typing works, testing that feature would be optional: applications could always try to cast (if they were willing to take the performance hit). QUESTIONS: - Is it necessary to expose both types of declared encodings? If so, proposal: a new String getEncodingDecl () returns the internal label; getEncoding () would return the (authoritative) external label. The internal label might be null if it was defaulted. (Tracking this info costs, and it's not clear any apps should actually care, which is why it's omitted.) - Is there a better convention to use for extending interfaces than the numeric suffix? (Meta-1) - Is the new implementation class really needed? Alternative: update LocatorImpl. (Meta-2)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|