[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: (char)0 handling proposal

  • From: Brendan Macmillan <bren@m...>
  • To: derhoermi@g...
  • Date: Sat, 18 Aug 2001 16:03:21 +1000 (EST)

char 0
> >"Now is the time for all good men to come to the aid of the party@@@@@@@@@"

> >"@" is the null char - when a String is *mostly* text, it would be nice to
> >render the readable text as human readable...

> Create a new simpleType quotedPrintable, then you can have
> 
>   "Now is the time for all good men to come to the aid of the
>    party=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00=00"
> 
> where the string is converted to UTF-16LE before applying QP. This is as
> human readble as possible. But please note that wouldn't be a very
> interoperable solution and I discourage such multi-level encodings.
I agree about multi-level encodings, but it does seem the only way to cater for
both human consumption and binary data.

> If it's binary don't use XML (directly) or use the mentioned types. Who
> cares about human consumption _and_ uses binary data?
It's because the "char" datatype of Java is ambivalent. It usually contains
Unicode, but it can also be treated as a 16 bit unsigned integer.[*]

More and more I think you are right, that if a String does contain non-text
values (by the XML definition), then it should be treated entirely as binary.

Incidentally, a way to serve both the concerns of human consumption and binary
data is to render binary in this strangely familiar format:

<Binary>
0000000: 4e6f 7720 6973 2074 6865 2074 696d 6520  Now is the time 
0000010: 666f 7220 616c 6c20 676f 6f64 206d 656e  for all good men
0000020: 2074 6f20 636f 6d65 2074 6f20 7468 6520   to come to the 
0000030: 6169 6420 6f66 2074 6865 2070 6172 7479  aid of the party
0000040: 0000 0000 0000 0000 000a                 @@@@@@@@@.
</Binary>

This kind of format is *the most* human readable way to present binary data.
It can be edited effectively via the hex representation, and the text
representation is "read-only" (a kind of markup of the real data).  The
addresses on the left are a non-XML markup - but this could be done in an XML
style, eg:

<bin addr="0000000"> 4e6f 7720 6973 2074 6865 2074 696d 6520 </bin>
or
<b a="0000000" t="Now is the time ">4e6f 7720 6973 2074 6865 2074 696d 6520</b>

(Based on an idea by Mark Collette, for using hex to represent binary in XML)


Cheers!
Brendan
-- 
e:  bren@m...                    v:  +61 (3)  9905 1502
Email is checked daily                              Phone is rarely attended

[*]
As the XML definition of "text" grows more important, it would be nice if
languages had a primitive datatype for "textchar" or "XMLchar".  This avoids
the need to check the range of values it contains.  But I guess there are many
reasons to use primitives that are a multiple of 8 bits in length (exception:
boolean, but it doesn't require extra validation checks.)

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.