[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: nbsp is not that hard, folks

Subject: RE: nbsp is not that hard, folks
From: "Américo Albuquerque" <aalbuquerque@xxxxxxxxxxxxxxxx>
Date: Sat, 9 Nov 2002 11:53:14 -0000
utf 8 nbsp
Hi there.
So, what you are saying is that &nbsp; is to XML and HTML has "#define
nbsp" is to C??

-----Original Message-----
From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
[mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Mike Brown
Sent: Friday, November 08, 2002 7:13 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject:  nbsp is not that hard, folks


Brian Grainger wrote:
> If you're trying to escape &nbsp; in a document encoded as UTF-8, you
> have to use Unicode escaping of the UTF-8 representation of the
> entity. In this case, &nbsp; is equal to &#160;, and &#160; encoded as

> UTF-8 is \u00A0.

Good grief. No, you have your terminology badly mixed up, and you're
throwing in an irrelevant notation. "&nbsp;" "&#160;" and "\u00A0"  have
nothing, NOTHING to do with UTF-8. There is something about nbsp that
just confuses the heck out of people. I think it must be the fact that
it looks like a space, and that you don't have an nbsp key on your
keyboard.

OK, read this.

1. There is a character -- an abstract unit in a "script" (a writing
system;
we are using Latin right now) -- called NO-BREAK SPACE by the Unicode
Standard and ISO/IEC 10646. Unicode and ISO/IEC 10646 assign this
character an integer number, 160, which is A0 in hex. We say Unicode all
the time around here, but
we mean ISO/IEC 10646 because that's what the XML and HTML specs
reference.
The two standards share the same character repertoire and numbering so
there's
no harm.

2. UTF-8 is an encoding scheme that provides a way of representing any
of the approximately 1.1 million possible abstract characters in Unicode
as a sequence of 1 to 4 bytes. The UTF-8 representation of the Unicode
character 160 (no-break space), is the pair of bytes C2 A0, in that
order. In contrast, iso-8859-1 is a character map that provides a way of
representing the first 256 Unicode characters as a single byte. us-ascii
is an even more limited set
of just the first 128, mapped to a single byte.

3. This thing:  \u00A0
  - is a sequence of 6 bytes (ASCII bytes for slash, u, zero, zero, A,
zero);
  - has special meaning in a programming language like Java or Python,
     where it is essentially a macro for the no-break space character;
  - is used when representing the character directly as encoded bytes is
     impractical or impossible.

4. This thing:  &#160;
or this thing:  &#xA0;
  - is to SGML applications like HTML and XML what \u00A0 is to Java &
Python;
  - is called a character reference (or "numeric character reference").

5. This thing:  &nbsp;
  - is to SGML applications like HTML and XML an "entity reference";
  - refers to an entity (a separate collection of information) named
nbsp;
  - depending on the circumstances, is intended to be treated by the
     XML parser or HTML user agent as equivalent to the entity's
     "replacement text";
  - is, in HTML, predefined to have the replacement text of just one
     character, the no-break space;
  - is not defined by default in XML.

6. The thing here in between the quotes:   "?"
  - is byte 0xA0;
  - is intended to be a no-break space because this email is iso-8859-1
     encoded;
  - has exactly the same meaning in an XML document as &#160;.

   - Mike
________________________________________________________________________
____
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume:
http://skew.org/~mike/resume/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.