XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Topic Page 1 2 3 4 5 6 7 8 9 Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Scott RemigerSubject: Convert XML to UTF-8
Author: Scott Remiger
Date: 02 Jan 2009 04:12 PM
Originally Posted: 02 Jan 2009 04:11 PM
I have a client that is saying that my xml is not utf-8 and I am not sure how to check or validate that it is or is not.

At the top of the header I have

<?xml version="1.0" encoding="utf-8"?>
<RECIPES xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:///c:/nobackup/Recipe.xsd">
<RECIPE RECIPEID="2" TITLE="Escargots Red Lion" TTIME="60" CONVENIENCE="Gourmet" IMAGE="2.jpg" COURSE="Appetizers">
<INGS>

Does that not declare it UTF-8 and when I validate should it not notify me that is that it is not UTF-8 if that is the case?

Postnext
Alberto MassariSubject: Convert XML to UTF-8
Author: Alberto Massari
Date: 05 Jan 2009 08:42 AM
Hi Scott,
the declaration specifies the UTF-8 encoding; and if Stylus loads it without complaining, it means it's valid UTF-8. How are you generating it? Could you zip it and post it in the forum so that we can double check it?

Thanks,
Alberto

Postnext
Scott RemigerSubject: Convert XML to UTF-8
Author: Scott Remiger
Date: 05 Jan 2009 09:57 AM
Absolutely thanks for taking the time.

Postnext
Scott RemigerSubject: Convert XML to UTF-8
Author: Scott Remiger
Date: 05 Jan 2009 09:57 AM
Absolutely thanks for taking the time.


UnknownRecipesConvert0.zip
Zip file contains a XML document

Postnext
Alberto MassariSubject: Convert XML to UTF-8
Author: Alberto Massari
Date: 05 Jan 2009 12:15 PM
Hi Scott,
your XML files contains a few non-latin characters (like the accented "e", the "degree" and the "registered name" symbols) that are not in the range 0x00-0x7F. So, in UTF-8 they are encoded using multiple values; in your case, for instance, the "degree" symbols, whose Unicode codepoint is 0xB0, is stored as 0xC2 0xB0. This is correct UTF-8; so, something is wrong in your client configuration. Do you know which software he is using, and can you get the offset where the wrong codepoint is reported?

Thanks,
Alberto

Postnext
Scott RemigerSubject: Convert XML to UTF-8
Author: Scott Remiger
Date: 05 Jan 2009 01:26 PM
To clarify Stylus is encoding correctly and everything according to stylus is correct. That is good.


My second question would be can we turn that option off and make it more strict?

How they are validating is using http://www.validome.org/xml/validate


Thanks for the help.

Postnext
Alberto MassariSubject: Convert XML to UTF-8
Author: Alberto Massari
Date: 06 Jan 2009 09:47 AM
Hi Scott,
I tested the validator with a subset of the file (20Mb is too big for their web interface) and it pass the well-formedness test. The validation step clearly fails because it's missing a schema. Could it be that the error your client reports is related to a schema validation error, and not to a wrong UTF-8 codepoint?

Alberto

Posttop
Scott RemigerSubject: Convert XML to UTF-8
Author: Scott Remiger
Date: 06 Jan 2009 10:12 AM
No it had to do with the accent over the e and an accent over the a as will once I cleared them out the XML ran fine.


I think you are on the right track with the previous post.

 
Topic Page 1 2 3 4 5 6 7 8 9 Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.