Subject: Re: Integrating a Search and Replace template with the CSV to XML converter
From: Marney Cotterill <marney@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 03 Jun 2008 17:56:49 +1000
|
Thank you so much Michael for you detailed response!
I have thrown myself into XSLT and XML without any prior knowledge, and seem
to have missed quite a few of the basics!
I will look further into encoding types for my own benefit, but what you
have suggested below works absolutely perfectly.
I would like to post my XSLT stylesheet on a template exchange at
www.dnndev.com to be used in conjunction with the Dot Net Nuke add-on module
Xmod. There is a real need for this type of transform in this community.
Andrew, do you have a problem with this. I will make sure you have full
credit!
Kindest Regards,
Marney
On 3/6/08 5:43 PM, "Michael Kay" <mike@xxxxxxxxxxxx> wrote:
>> The characters that are effecting things are part of the
>> UNICODE set 'General Punctuation'. This is translating
>> through the stylesheet fine and is being displayed in the
>> resulting XML by ’ (right hand quote) and – (en
>> dash). Problem is, my dynamic website does not know how to
>> display these characters, and I am getting the little boxes instead.
>
> It's not surprising that it doesn't know how to display them, since neither
> of these codepoints is assigned to any printable Unicode character. The
> Unicode codepoint for en dash is x2013; the code for "right single quotation
> mark" is x2019.
>
> What has happened is that your input uses the Microsoft-proprietary cp1252
> character encoding. There's no harm in that, provided that the software
> reading the file knows it's in this encoding, so that it can translate such
> characters to their proper Unicode values for use in the output XML.
>>
>> I am thinking of integrating a Global Search and Replace
>> template that runs on the final XML to find all instances of
>> ’ and replace with ' .
>
> No, you should fix the problem at source rather than patching it up later.
> If you're reading the CSV file using unparsed-text(), and if the CSV file is
> in cp1252 encoding, then you can specify this in the encoding parameter to
> unparsed-text().
>
> Michael Kay
> http://www.saxonica.com/
>
>
Marney Cotterill
graphic designer
cracker//brandware
6 Bourke Street
Queens Park
NSW 2022
Telephone 02 9387 2001
Facsimile 02 9387 2006
marney@xxxxxxxxxxxxxxxxxxxx
www.crackerbrandware.com
|