[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: xslt replace special characters

Subject: Re: xslt replace special characters
From: Mike Brown <mike@xxxxxxxx>
Date: Mon, 11 Nov 2002 13:38:52 -0700 (MST)
xml replace special characters
Alice Fan wrote:
> Thanks Greg.  Right in the UI, we want the user to enter their URL. Their 
> URL will most likely have name/value pairs.  Is there an easier way?  There 
> is no otherway of filtering '&' before it gets processed in the XSL?

It doesn't matter if they're entering a URL/URI or not. Any text that you 
intend to put into an XML document needs to be screened, to preserve 
well-formedness / parseability.

1. Always note the following:

- non-XML characters need to be removed or replaced
  (U+0000..U+0008, U+000B, U+000C, U+000E..U+001F, U+D800..U+DFFF,

- a string is not a URI if it violates URI syntax, so if the text is
   destined for a URI-pseudotype attribute value (like href or src in 
   HTML/XHTML), characters above U+007F should be escaped by writing
   their equivalent UTF-8 bytes as '%xx' for each byte, where xx is the
   hex notation for the byte (though this isn't strictly necessary; a 
   conforming HTML user agent will do this automatically)

- additional translation of ASCII-range characters (U+0000..U+007F) in 
   text destined for URI attributes is not required but is wise, to
   ensure conformance to URI syntax; %-escape everything except
   a-z, A-Z, 0-9, and these: - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , [ ]

2. If and when the XML document exists in serialized form
   (i.e., as a string, not as a DOM object), note the following:

- if the text is not destined for a CDATA section, markup characters '&'
   and '<' need to be escaped

- if the text is destined for a CDATA section, the '>' in ']]>'
   needs to be escaped

- if the text is destined for a comment, it must not contain '--'
   (how you handle such an offense is up to you)

- if the text is destined for an attribute value delimited by apostrophes,
   then apostrophes in the value must be escaped (usually use &apos; unless
   in HTML)

- if the text is destined for an attribute value delimited by quotes,
   then quotes in the value must be escaped (usually use &quot;)

- if the text is destined for a non-URI attribute value, then tab, LF, 
   and CR need to be escaped to facilitate round-tripping

I probably missed one or two cases, but as you can see, you can't just slap
any old text into a document and call it XML...

   - Mike
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume: http://skew.org/~mike/resume/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.