[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: JSR 206 and SAX
> > propose that a feature be added to the XMLReaderFactory, perhaps > > http://xml.org/sax/features/unicode-normalization-checking. If this > > feature is enabled, the factory returns readers that perform Unicode > > normalization checking. The EG felt that this feature was most > > appropriate at the factory level. I am not sure why the factory level is the most appropriate? Aren't the getFeature, setFeature functions part of the XMLReader itself? Why add feature maintenance at the factory level when it can simply be enumerated through creation of each available XMLReader (if necessary, as it is likely known a priori)? > Seems reasonable. This feature should be false by default, a true > value should be optional, and if a problem is encountered the error() > message in the ErrorHandler should be invoked. <snip really good points about defining errors well> Define problem. If the feature is not supported or not recognized the approriate SAXException should be raised during the call to getFeature. Unfortunately, in this proposal there is nothing per se about what to do when the document is or is not full normalized. Perhaps Locator2 needs another function isFullyNormalized() which may or may not return true for a given entity/or document (here things got fuzzy for me). In cases where the feature http://xml.org/sax/features/unicode-normalization-checking is not supported, not recognized, or is false it will always return false. I don't believe, based on my quick re-reading that a non fully normalized document is an error-- it doesn't really seem to be given any status, be it warning or error. Additional wording about the combination of such a feature and the validation feature might be helpful as well. Additionally, I would question whether or not a feature should be provided for normalization transcoding or if that was beyond the scope of SAX. I suspect it would not be imperative because the results of such a feature could be semi-reliably deduced if there were an isFullyNormalized function and the document contained enities in legacy encodings. The current text in getEncoding might already speak to this... or it may need to be modified: "Note that some recent W3C specifications require that text in some encodings be normalized, using Unicode Normalization Form C, before processing. Such normalization must be performed by applications, and would normally be triggered based on the value returned by this method. "[1] I realize a lot of this is half-baked questions, I just wanted to bring up all of the issues I could think of early on... [1] http://www.saxproject.org/apidoc/org/xml/sax/ext/Locator2.html#getEncoding() Jeff Rafter
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|