[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: character entities
Hi, > I'm having a wee spot of bother with character entities. It's character encoding rather than character entities > This data is then put into fields within a Zend Search Lucene index, via > php (that's why I first "flattened" it). > > This index data is then queried (again via php) and the results sent > to/rendered by a browser. > > If I put ñ_; (minus the underline character, which I've added so > this email is not mis-parsed) in my original xml, and using > encoding="iso-8859-1" for it and my xsl stylesheet, then my xsl > transforms that into a (Spanish) n character with a tilde on top: q. > > If I tell ZSL to index fields using 'iso-8859-1' encoding, my Spanish n > becomes: CB1. If I tell ZSL to index fields using 'utf-8' encoding, my > Spanish n becomes: C1. These sorts of issues are nearly always a case of writing in one encoding and reading in another, and you just need to track down where the reading and writing is happening - it could be a string to byte conversion in your code, or parsing of the markup in the browser, or even the text viewer you are using to check the output (such as the eclipse output window) > I believe I need to prevent all parsers bar the browser at the end from > parsing my "special characters", right? But how? Not really, that's just a way of bypassing encoding problems and doesn't address the underlying issue. > Latest effort: I tried using encoding="utf-8" for all levels: my original > xml, my xsl output, and the input to ZSL's index, & I also saved my xml file > as utf-8 format, and used the Spanish n inside my xml, i.e. q rather than > ñ. Doing that, the Spanish n was preserved through the xsl output, but > ZSL stores it as: C1, & that's also how my browser displays it. Ahh ok, well that's the right approach, you just need to examine the code at every step and isolate that point where it's going wrong - you've got to the output of transform ok, next is to carefully step through what happens between that and "ZSL". Using the actual n-tilde charactor or the character reference 241 shouldn't make any different, by the way... cheers -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|