[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: 8bit ascii encoding

Subject: RE: 8bit ascii encoding
From: "Andrew Welch" <awelch@xxxxxxxxxxxxxxx>
Date: Fri, 23 Aug 2002 13:53:32 +0100
ascii encoding
ha! no wonder I get confused...

> If each char (in uniocde  2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.

If my chars are two bytes each then Im using utf-16, but utf-8 can
consist of 1-5bytes per char... I think I need to read some more.

At the moment, Im using an xml output method with ascii encoding, and
telling IE the encoding is utf-8 (in the meta), therefore any chars not
in ascii should be output as references and displayed correctly in IE as
that is set to UTF-8.

Currently, this results in any chars not in the ascii range being
displayed a single square box, which is progress from before where I was
getting between 3 and 7 chars displayed for any 'special' character...

Anyway, this is getting slightly off-topic and I think Im fighting a
losing battle as anything I do has to go through the ActiveX control,
which I haven't got control of (or any understanding of ;) so I'll call
it a day for now.

Thanks for the continuing education in character encoding - one day I
will get it!

cheers
andrew 



> -----Original Message-----
> From: David Carlisle [mailto:davidc@xxxxxxxxx]
> Sent: 23 August 2002 12:33
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  8bit ascii encoding
> 
> 
> 
> > Yeah... anywhere nice?
> 
> I would say that it was suitably far from computers, but it seems that
> even 3000m up a swiss mountain you still expect to find an 
> internet cafe
> these days (I resisted the urge to log in and answer any xsl-list
> messages though:-)
> 
> > ha.. nice.  After some testing it seems that char references display
> > fine, while characters themselves do not 
> 
> well presumably they would if you wrote the characters in the right
> encoding. Guessing it sounds like you are writing bytes that 
> correspond
> to iso-8859-1 characters into a utf8 encoded stream. If so you'll get
> the wrong characters (or more often an error) except for that part of
> utf-8 that happens to use one byte per character.
> 
> > I think the reason IE isn't picking up that each char is two
> > bytes (utf-8)
> 
> If each char (in uniocde  2) is in 2 bytes you are using utf-16 not
> utf-8. (Unicode 3 requires more than 2 bytes per character even in
> utf-16, the so called surrogate pairs). utf-8 requires 1 - 5 bytes,
> depending on the character.
> 
> 
> > So I guess I have two options...
> > 
> > 1. persevere trying to get IE to treat the output as two byte chars 
> 
> I think your problem is using the phrase "two byte chars" 
> which leads to
> confusion. Characters have a unicode number but do not correspond
> directly to any number of bytes.
> Different encodings map subsets of the unicode character set into
> particular byte combinations.
> 
> 
> > 2. pass through all char refs to the output un-escaped, and let IE
> > escape them...
> 
> All character references are replaced by the referenced 
> character by an
> XML parser. So ther eis no way to "pass through" references unchanged.
> The XSLT system can not tell whether a reference or a character was in
> the original data.
> 
> 
> > Is this the best option?
> It is still not clear what you are trying to do but there should be bo
> real reason why your C part can not handle whatever encoding is coming
> out of the XSLT. It isn't clear from your description whether this is
> utf-8 or utf-16. You may find it easier if you specified
> encoding="iso-8859-1" and used latin-1 in the C part.
> 
> David
> 
> _____________________________________________________________________
> This message has been checked for all known viruses by Star Internet
> delivered through the MessageLabs Virus Scanning Service. For further
> information visit http://www.star.net.uk/stats.asp or 
> alternatively call
> Star Internet for details on the Virus Scanning Service.
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
> 
> 
> 
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
>  
> 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.381 / Virus Database: 214 - Release Date: 02/08/2002
 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.