[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

nbsp is not that hard, folks

Subject: nbsp is not that hard, folks
From: Mike Brown <mike@xxxxxxxx>
Date: Fri, 8 Nov 2002 00:12:57 -0700 (MST)
nbsp not defined
Brian Grainger wrote:
> If you're trying to escape &nbsp; in a document encoded as
> UTF-8, you have to use Unicode escaping of the UTF-8
> representation of the entity. In this case, &nbsp; is equal to
> &#160;, and &#160; encoded as UTF-8 is \u00A0.

Good grief. No, you have your terminology badly mixed up, and you're throwing
in an irrelevant notation. "&nbsp;" "&#160;" and "\u00A0"  have nothing,
NOTHING to do with UTF-8. There is something about nbsp that just confuses the
heck out of people. I think it must be the fact that it looks like a space,
and that you don't have an nbsp key on your keyboard.

OK, read this.

1. There is a character -- an abstract unit in a "script" (a writing system;  
we are using Latin right now) -- called NO-BREAK SPACE by the Unicode Standard
and ISO/IEC 10646. Unicode and ISO/IEC 10646 assign this character an integer
number, 160, which is A0 in hex. We say Unicode all the time around here, but 
we mean ISO/IEC 10646 because that's what the XML and HTML specs reference. 
The two standards share the same character repertoire and numbering so there's 
no harm.

2. UTF-8 is an encoding scheme that provides a way of representing any of the
approximately 1.1 million possible abstract characters in Unicode as a
sequence of 1 to 4 bytes. The UTF-8 representation of the Unicode character
160 (no-break space), is the pair of bytes C2 A0, in that order. In contrast,
iso-8859-1 is a character map that provides a way of representing the first
256 Unicode characters as a single byte. us-ascii is an even more limited set 
of just the first 128, mapped to a single byte.

3. This thing:  \u00A0
  - is a sequence of 6 bytes (ASCII bytes for slash, u, zero, zero, A, zero);
  - has special meaning in a programming language like Java or Python,
     where it is essentially a macro for the no-break space character;
  - is used when representing the character directly as encoded bytes is
     impractical or impossible.

4. This thing:  &#160;
or this thing:  &#xA0;
  - is to SGML applications like HTML and XML what \u00A0 is to Java & Python;
  - is called a character reference (or "numeric character reference").

5. This thing:  &nbsp;
  - is to SGML applications like HTML and XML an "entity reference";
  - refers to an entity (a separate collection of information) named nbsp;
  - depending on the circumstances, is intended to be treated by the 
     XML parser or HTML user agent as equivalent to the entity's
     "replacement text";
  - is, in HTML, predefined to have the replacement text of just one 
     character, the no-break space;
  - is not defined by default in XML.

6. The thing here in between the quotes:   "?"
  - is byte 0xA0;
  - is intended to be a no-break space because this email is iso-8859-1 
     encoded;
  - has exactly the same meaning in an XML document as &#160;.

   - Mike
____________________________________________________________________________
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume: http://skew.org/~mike/resume/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.