[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fw: Select entire XML doc [FURTHER]

Subject: Re: Fw: Select entire XML doc [FURTHER]
From: "Karl Stubsjoen" <karl@xxxxxxxxxxxxx>
Date: Fri, 28 Feb 2003 15:15:10 -0700
javascript xml encode
Wow... that was "overwhelmingly" excellent.
Karl

Errr... I think I shall learn how to post XML from the client using
javascript and the XML dom ; )

Karl


----- Original Message -----
From: "Mike Brown" <mike@xxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, February 28, 2003 2:56 PM
Subject: Re:  Fw: Select entire XML doc [FURTHER]


> Karl Stubsjoen wrote:
> > Wow... that was most awesome.  Thanks for the help, it really made a lot
of
> > sense.  And indeed, I do need to be careful of HTML tags becoming
malformed.
> > Once the XML has been propery serialized in a text area element, what is
the
> > proper way to deserialize it?
>
> Do you mean you want to turn
>
> <someXmlData>&lt;tag&gt;chardata&lt;/tag&gt;</someXmlData>
>
> into
>
> <someXmlData><tag>charadata</tag></someXmlData>
>
> ?
>
> ...This is a FAQ and is generally beyond the scope of what XML should be
used
> for, or what XSLT can do without extension functions. But if you insist,
you
> will need to write an extension function that takes the content of the
> someXmlData element (or any string, really), passes it into an XML parser,
and
> converts the parser's results to a node-set or result tree fragment. See
your
> XSLT processor docs for how to write an extension function (it varies).
Your
> processor may already have such a function available (but likely not).
>
> Or do you mean after the HTML has been rendered in the browser, and the
user
> submits the form having the textarea with the possibly-edited XML? That's
a
> whole 'nother can of worms, due to encoding issues, which I am all too
happy
> to write about, although it is technically off-topic for this list.
>
> First, in general, you should not be passing XML around in HTML form data,
if
> the intent is to have a general-purpose XML editing system, although as
long
> as you stick to pure ASCII, or just treat it as an uneditable binary file,
> then things should be fine.
>
> The problems begin with how form data is handled. A browser transmits the
form
> data, which is Unicode, encoded as if it were going into a URL. This means
> that certain characters in the ASCII range (code points 0 to 127) and all
> characters beyond the ASCII range (code points 128 to 1114111) are first
> encoded as bytes, then represented as ASCII bytes for the characters "%xx"
> where xx is the hexadecimal representation for a byte. The ASCII-range
> characters always use the us-ascii encoding as the basis for the
%-escaping,
> while the non-ASCII characters typically (it's not enforced by any
standard)
> use the encoding *of the HTML document containing the form from which this
> data was submitted*.
>
> So for example if you have in your textarea the character data "¡Hola
amigo!",
> and the HTML with the form was utf-8 encoded, and the browser user didn't
> override the interpreted encoding on their end, then the form will be
> submitted using utf-8 as the basis for the %-escaped form data:
>
>   %C2%81Hola%20amigo!
>
> whereas if the HTML were iso-8859-1 encoded, it would be coming through as
>
>   %81Hola%20amigo!
>
> On the receiving end, the form data needs to be decoded. Most servers
provide
> an API for receiving decoded form data in your application, be it CGI
> environment variables or getParameter() methods on HTTP request objects or
> what have you. But since most browsers do not communicate the details of
what
> encoding they used as the basis for the %-escaping, the server makes a
guess,
> and usually guesses wrong. So for example, while
>
>    %C2%81Hola%20amigo!
>
> unambigously means bytes
>
>    C2 81 48 6F 6C 61 20 61 6D 69 67 6F 21
>
> ...the API might mistakenly assume that these are iso-8859-1 and will
decode
> it for you into the string "À¡Hola amigo!". In fact, this happens quite
often.
> So you'll have to be prepared to transcode: re-encode the string using the
> same encoding that the server assumed, and then decode it using the
encoding
> that you know the HTML form used (you might send the latter in a hidden
form
> field). Either that, or pull the raw data out of the HTTP request and
properly
> decode it yourself.
>
> Once you have the properly decoded string, you can feed it to an XML
parser as
> a Unicode string, so that the parser will ignore the encoding declaration
in
> the XML's prolog. If you were to feed the raw bytes (the C2 81 48 etc
above)
> to the parser, you would have to declare the encoding externally, because
> there's a chance that the declaration in the prolog has become innacurate
> while it was edited and reencoded.
>
> You didn't know what you were getting into, did you? Like I said, in
general,
> HTML forms and the server-side APIs for processing them are just not
equipped
> to be a general-purpose XML editing system, at least not in an idiot-proof
> way. The culprits are really HTTP and MIME; HTML is just working around
their
> restrictions. And browser vendors choose the path of least disruption,
> choosing not to implement some of HTML's features that could easily work
> around some of these issues (e.g., they do have a way of transmitting
encoding
> info, but they just don't do it, to "keep people's scripts from
breaking").
>
> --
>   Mike J. Brown   |  http://skew.org/~mike/resume/
>   Denver, CO, USA |  http://skew.org/xml/
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.