XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Richard PottsSubject: dodgy(non-ascii) characters causing confusion
Author: Richard Potts
Date: 14 Feb 2007 11:21 AM
Originally Posted: 14 Feb 2007 11:19 AM
SS Version: Enterprise 2006 Release 2 591d

I've written a XSLT to convert from XML to another form of XML.

Extract from source document:

<did id="0xF18C" name="ecu_sn">
<hex>FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF</hex>
<txt>
<element> <name>value</name> <value>˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙</value>
</element>
</txt>
</did>


Extract from resulting XMLdocument:

<SerialNumber Name="ecu_sn">
<Value>&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;&#65533;</Value>
</SerialNumber>


(The &#65533;&#65533;&#65533; chars actually appear as "?" within a black diamond. within SS)

What I don't understand is that when viewing the source document in SS it 'renders' ok but the resultant xml doesn't. The problem has something to do with the 'dodgy' characters '˙˙˙˙˙˙˙˙˙˙˙˙˙˙'. But why is my resulting XML in error? (When all I'm doing is 'echoing' the dodgy characters to the output xml?)

For reference, Extract from my XSLT:

<xsl:variable name="SerialNumberValue">
<xsl:value-of select="txt/element/value"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="$SerialNumberValue != ''">
<SerialNumber>
<xsl:attribute name="Name">
<xsl:value-of select="@name"/>
</xsl:attribute>
<Value>
<xsl:value-of select="$SerialNumberValue"></xsl:value-of>
</Value>
</SerialNumber>
</xsl:when>
<xsl:otherwise>
<SerialNumber>
<xsl:attribute name="Name">
<xsl:value-of select="@name"/>
</xsl:attribute>
<Value>
<xsl:text>?</xsl:text>
</Value>
</SerialNumber>
<Error><xsl:value-of select="concat('Module Serial Number is blank: ',../../../@name,' ',@name)"></xsl:value-of></Error>
</xsl:otherwise>
</xsl:choose>

Postnext
Alberto MassariSubject: dodgy(non-ascii) characters causing confusion
Author: Alberto Massari
Date: 14 Feb 2007 12:12 PM
Hi Richard,
could you attach a screenshot of Stylus? It's not clear to me where the characters are displayed properly and where they aren't.

Thanks,
Alberto

Postnext
Richard PottsSubject: dodgy(non-ascii) characters causing confusion
Author: Richard Potts
Date: 14 Feb 2007 12:23 PM
Find screen shots attached.

Note: the XSLT I developend in SS but apply it in my VBA application.
I've now tracked problem down to someting to do with 'encoding' because if I open up my resultant file in notepad and then "save as.." it says 'ANSI' and if I save it as "UTF-8" IE will open it ok.

Where as if I do the same with my souce document it says already 'UTF-8'

So where do I tell my application to save the resultant XML file to be UTF-8? in the stylesheet or somewhere in VBA or both?

Postnext
Richard PottsSubject: dodgy(non-ascii) characters causing confusion
Author: Richard Potts
Date: 14 Feb 2007 12:23 PM
Find screen shots attached.

Note: the XSLT I developend in SS but apply it in my VBA application.
I've now tracked problem down to someting to do with 'encoding' because if I open up my resultant file in notepad and then "save as.." it says 'ANSI' and if I save it as "UTF-8" IE will open it ok.

Where as if I do the same with my souce document it says already 'UTF-8'

So where do I tell my application to save the resultant XML file to be UTF-8? in the stylesheet or somewhere in VBA or both?

Thanks in advance.


Unknownscreenshots.doc

Postnext
Alberto MassariSubject: dodgy(non-ascii) characters causing confusion
Author: Alberto Massari
Date: 14 Feb 2007 01:02 PM
Richard,
it depend how your VBA application saves the file... if it relies on the MSXSL processor to actually write the file, it would be enough to add a <xsl:output encoding="utf-8"/> instruction right after the <xsl:stylesheet> node.
If your application is instead getting the result as a VBA string, you should investigate if the VBA file allows writing data in Unicode, instead of using the current locale.

Alberto

Postnext
Richard PottsSubject: dodgy(non-ascii) characters causing confusion
Author: Richard Potts
Date: 15 Feb 2007 06:48 AM
Thanks for the pointer Alberto.

In my VBA code I was creating the resultant file using "Writeline" i.e.
'
'Output to file
'
ExtractFile.WriteLine xslProc.output


Further Investigation on the web, for similar problems - found that ADO 'streams' seemed to be the answer: I've now changed it to:

'
' Open stream and set output code to UTF-8
'
Str.Charset = "UTF-8"
Str.Open
xslProc.output = Str

'
' Apply Transform
'
xslProc.transform

'
' Save stream to file
'
Str.SaveToFile sExtractFileName


So hope that helps someone else.

Not sure if it is possible or not… but as a thought, is there a potential enhancement to SS here? – e.g. to have a 'properties' for each file loaded into the XML editor so you can see what 'encoding' it was created with. Also is it possible to high-light the encoding issue to the user when you get the "? in diamonds" displayed for a xml file - How about maybe a SS feature 'convert this file from one encoding format to another encoding format' ???

Postnext
Tony LavinioSubject: dodgy(non-ascii) characters causing confusion
Author: Tony Lavinio
Date: 15 Feb 2007 09:40 AM
The XML standard says that if the encoding isn't in the first
line of the file, then it MUST be only one of a handful. The
rules are very specific, and anything else should be considered
broken and rejected by the parser.

See http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing

The problem is that if the actual encoding of the file differs
from the stated encoding, then it is not always possible to determine
unambiguously what the encoding should have been. It is possible to
have a file that could be read successfully using several different
encodings and yielding several different results.

We do the best we can, but it is not deterministically possible to
tell in all cases, so the safest course is for Stylus Studio to
complain.

Posttop
Tony LavinioSubject: dodgy(non-ascii) characters causing confusion
Author: Tony Lavinio
Date: 15 Feb 2007 09:42 AM
... and there is a way inside of Stylus Studio to change the encoding,
but it assumes that the file in the editor is displayed properly.

Really, to fix a broken encoding you'd need a non-unicode-aware tool,
or more ideally you would get the source of the broken XML to not write
invalid XML.

 
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.