[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Detection of non-Unicode characters
At 04:36 PM 8/23/2002 -0600, Matt Gushee wrote: >I would bet it's this. Just this past week I have been debugging a >broken application that is supposed to generate XML from Word documents. >The main problem I found was that the Word documents are full of >characters like 0x07, 0x2012-0x2019, and the like. The latter range >consists of common punctuation symbols like dashes and left and right >quotes (AKA 'smart quotes'). They appear to be using Code Page 1252 >mapped directly into Unicode. I just ran into this myself, with a styled apostrophe character -- which was only reported as a problem by XML Spy 4.4 upon opening the 1.2MB XML file (character was: Â (0xC2), ' (0x92)). All three validators I have (Xerces standalone, XMetal 3.0, and XML Spy 4.4) reported the file valid, but it was failing upon import into a content management system (with the less than helpful error of "no root element present", when there clearly was). A tool that would quickly locate these kinds of things would be enormously helpful (I'd certainly buy a copy if it were commercial/shareware). Ann ----- Ann Navarro, WebGeek, Inc. http://www.webgeek.com say what? http://www.snorf.net/blog
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|