[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

character entities

Subject: character entities
From: Joe Barwell <jbar@xxxxxxxx>
Date: Mon, 03 Nov 2008 20:32:17 +1300
 character entities
Hello people,

xsl 1.0, Firefox 3.0, Zend Search Lucene, php 5.2.6.

I'm having a wee spot of bother with character entities.

What I'm trying to do:

I have data stored in xml files, which I first pass to an xsl template
in order to transform it into a more usable form (technically, I'm
"flattening" it).

This data is then put into fields within a Zend Search Lucene index, via
php (that's why I first "flattened" it).

This index data is then queried (again via php) and the results sent
to/rendered by a browser.

If I put &#241_; (minus the underline character, which I've added so
this email is not mis-parsed) in my original xml, and using
encoding="iso-8859-1" for it and my xsl stylesheet, then my xsl
transforms that into a (Spanish) n character with a tilde on top: q.

If I tell ZSL to index fields using 'iso-8859-1' encoding, my Spanish n
becomes: CB1. If I tell ZSL to index fields using 'utf-8' encoding, my
Spanish n becomes: C1.

I've looked at dpawson on encoding, and Mike Brown's tutorial at
skew.org. They're v. good, but don't quite seem to explain where I'm
going wrong (or more likely, I'm just oblivious to what's under my nose).

I believe I need to prevent all parsers bar the browser at the end from
parsing my "special characters", right? But how?

I have tried putting a dtd with an entity declaration inside my original
xml, but although that works--i.e. using:

<!DOCTYPE wine [
<!ENTITY ntilde "&#241;">
]>

I can then put: &ntilde; inside my xml, this still gets parsed into: q
by my xsl, & then stored as: C1 in lucene, and displayed as: C1 in my
browser.

I've also tried playing around with php's htmlspecialchars() function,
to no avail.

Latest effort: I tried using encoding="utf-8" for all levels: my
original xml, my xsl output, and the input to ZSL's index, & I also
saved my xml file as utf-8 format, and used the Spanish n inside my xml,
i.e. q rather than &#241;. Doing that, the Spanish n was preserved
through the xsl output, but ZSL stores it as: C1, & that's also how my
browser displays it.

I've run out of ideas. Any suggestions? Ta.

Joe

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.