[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Detection of non-Unicode characters


detect unicode
From: "Ann Navarro" <ann@w...>

> I just ran into this myself, with a styled apostrophe character -- which 
> was only reported as a problem by XML Spy 4.4 upon opening the 1.2MB XML 
> file (character was: Â (0xC2), ' (0x92)).


I expect we will see more of this problem, unless the C1 controls (U+0080-U+009F)
are banned from direct use in XML. The trouble is that transcoders do not fail when
they find strange characters. Nothing stops your XML from being polluted, because
after the data is in corrupted, it may look like good data. For more on this issue,
see http://www.topologi.com/public/XML_Naming_Rules.html  

...
> A tool that would quickly locate these kinds of things would be enormously 
> helpful (I'd certainly buy a copy if it were commercial/shareware).

You may care to look at my company's new editor for XML and SGML:
the Topologi Collaborative Markup Editor. See
 http://www.topologi.com/

We'll be posting the real announcement in a day or two; you can download it
for evaluation now.

When you open a file, an "Incoming Text Conditioning" box comes up. In the
"Whitespace" tab you can set it to:
  * detect control characters or characters above a certain character
  * give a warning or replace the character with a PI containing the code point,
to figure out what is going wrong and where it is.

Also, it displays the Unicode code for the current caret position, so you can
see what is going on even when the font doesn't have a glyph for a character.
It will give warnings for many kinds of encoding errors, and sorts its available
encodings in three ways (by platform, by language, and by IANA name)
for easier selection. It performs Unicode normalization on the way in and the 
way out, and during cut-and-paste. 

Cheers
Rick Jelliffe


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.