Re: How can I preserve ASCII Encoding Character Sets?

Play the video

Subject: Re: How can I preserve ASCII Encoding Character Sets?
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 06 Nov 2012 16:58:37 -0500

At 2012-11-06 16:54 -0500, Philip Vallone wrote:

I want to preserve   or   as   or  .


The character or the markup?  I think you want to
preserve the markup but your process is trying to
preserve the character and is messing up the character set.

Currently, as an example, &#160; will output
into my resulting file as a space,

Actually, it is a non-breaking space (NBSP) and not a space.

but when the resulting xsl file is used to
transform the xml file to FO it prints out a bad character "E".


Yes, that happens when the stream is in UTF-8 but
you've told your processor the stream is in US-ASCII.

I have nailed down the issue to when I convert my stylesheet into one.


How are you doing that conversion?  If you use
XSLT then the problem is not in that step but somewhere else.

I hope this explains my issue. I appreciate all the help.


If you use native XML tools to go from one XML
file to another (in this case your piecemeal XSLT
stylesheets to the aggregate XSLT stylesheet), then you won't have a problem.

If you use Java or some other programming
language, which isn't native XML, then it is
likely there that the problems are being introduced with the character set.

Reading the evidence you provide here, you are
using an XML processor in another language to
read the stylesheet, that processor is converting
the numeric character reference into a Unicode
character, your language is writing out the
Unicode character as UTF-8, thus losing the
markup of the numeric character reference, and
the resulting file still says "US-ASCII" at the
top while the string is encoded in UTF-8.

I suggest you use XSLT to aggregate your
stylesheet fragments into a single stylesheet
(which is what I do in the obfuscation post I
made earlier), thus your end result will be in
UTF-8, but the declaration at the top will
indicate UTF-8.  You will lose the numeric
character reference as it will be replaced by the
Unicode character, but this is fine because the
declaration at the top of your output will indicate or imply UTF-8.

Then treat the aggregated stylesheet in your
encrypt/decrypt process as an octet stream, not
as a string of characters, thus avoiding any
interpretation of UTF-8 on the way in or out.  Or
use strings if you can guarantee fidelity between your input and your output.

This should get around your characters being
UTF-8 and your declaration being ASCII.

I hope this helps.

. . . . . . . . . . . . Ken


--
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/s/
G. Ken Holman                   mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal

Current Thread
How can I preserve ASCII Encoding Character Sets? Philip Vallone - 6 Nov 2012 19:06:06 -0000 G. Ken Holman - 6 Nov 2012 21:28:49 -0000 Michael Kay - 6 Nov 2012 21:37:40 -0000 Philip Vallone - 6 Nov 2012 21:55:15 -0000 G. Ken Holman - 6 Nov 2012 22:04:40 -0000 <= Message not available Philip Vallone - 8 Nov 2012 00:04:07 -0000 Michael Kay - 6 Nov 2012 22:21:38 -0000

<- Previous	Index	Next ->
Re: How can I preserve ASCII , Philip Vallone	Thread	Re: How can I preserve ASCII , Philip Vallone
Re: Building Dynamic width Ca, Wendell Piez	Date	Re: How can I preserve ASCII , Michael Kay
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >