From: Mike Brown <mike@xxxxxxxx>
Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: xslt replace special characters
Date: Mon, 11 Nov 2002 13:38:52 -0700 (MST)
Alice Fan wrote:
> Thanks Greg. Right in the UI, we want the user to enter their URL.
Their
> URL will most likely have name/value pairs. Is there an easier way?
There
> is no otherway of filtering '&' before it gets processed in the XSL?
It doesn't matter if they're entering a URL/URI or not. Any text that you
intend to put into an XML document needs to be screened, to preserve
well-formedness / parseability.
1. Always note the following:
- non-XML characters need to be removed or replaced
(U+0000..U+0008, U+000B, U+000C, U+000E..U+001F, U+D800..U+DFFF,
U+FFFE..U+FFFF)
- a string is not a URI if it violates URI syntax, so if the text is
destined for a URI-pseudotype attribute value (like href or src in
HTML/XHTML), characters above U+007F should be escaped by writing
their equivalent UTF-8 bytes as '%xx' for each byte, where xx is the
hex notation for the byte (though this isn't strictly necessary; a
conforming HTML user agent will do this automatically)
- additional translation of ASCII-range characters (U+0000..U+007F) in
text destined for URI attributes is not required but is wise, to
ensure conformance to URI syntax; %-escape everything except
a-z, A-Z, 0-9, and these: - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , [ ]
2. If and when the XML document exists in serialized form
(i.e., as a string, not as a DOM object), note the following:
- if the text is not destined for a CDATA section, markup characters '&'
and '<' need to be escaped
- if the text is destined for a CDATA section, the '>' in ']]>'
needs to be escaped
- if the text is destined for a comment, it must not contain '--'
(how you handle such an offense is up to you)
- if the text is destined for an attribute value delimited by apostrophes,
then apostrophes in the value must be escaped (usually use '
unless
in HTML)
- if the text is destined for an attribute value delimited by quotes,
then quotes in the value must be escaped (usually use ")
- if the text is destined for a non-URI attribute value, then tab, LF,
and CR need to be escaped to facilitate round-tripping
I probably missed one or two cases, but as you can see, you can't just slap
any old text into a document and call it XML...
- Mike
____________________________________________________________________________
mike j. brown | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list