|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Detection of non-Unicode characters
Mark Feblowitz wrote: > We've gotten ourselves in a slight muddle. We've copied Word documentation > into (many) xs:annotation blocks in our UTF-8 .xsd files (there are around > 300 files). In the process, we have apparently brought along some > non-Unicode characters. This is not tolerated equally well by all tools. I love that last sentence. Your problem is probably subtly different from as stated, which might even make the a difference in the solution. It could be the case that the file has bytes that are not actually a UTF-8 encoding of any character, for example the hex sequence 0AC0C0 cannot possibly occur in UTF8. I'm not aware of any command-line tools for catching this, but you could write your own in C in a couple of hours with a copy of the UTF8 rules handy; it wouldn't be XML-specific. Second possible problem is that the UTF-8 is good but it encodes Unicode characters that aren't allowed in XML, like for example  - any decent XML parser should catch this and give you helpful error messages, if you have an expat around your system (and a lot of people do these days) "xmlwf [filename here]" will do the trick. D'oh, now that I think about it in fact I bet xmlwf (or equivalent) would probably catch the UTF8 breakage too. -Tim
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








