[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Gag me with a blunt …

  • From: Edwin.Fine@C...
  • To: xml-dev@l...
  • Date: Fri, 16 Mar 2001 12:19:37 -0500

ebcdic newline

All,

Maybe I am way off base here, but my experience with XML on OS/390 has shown that there are actually two EBCDIC "newline" characters: one is 0x15 (EBCDIC NEWLINE), and the other is 0x25 (EBCDIC LF or linefeed char). The IBM C++ compiler outputs 0x15 when you print "\n", but each record of mainframe text datasets appears to be terminated with 0x25. I have never seen 0x85 (incidentally, our mainframes use CP500 - international EBCDIC -- and CP37 -- US EBCDIC).

It gets worse, because there seems to be no agreement about the EBCDIC codes used for '[' and ']' characters. Some installations use 0xAD/0xBD, and some use 0xBA/0xBB respectively. This has caused us no end of trouble when trying to parse CDATA sections using an expat parser compiled natively (i.e. as an EBCDIC application) on OS/390. We finally settled on doing binary transfers on FTP and turning off translations on MQSeries, and performing our own translation to well-known values before parsing.

I would be interested to know if anyone else has had these experiences. The bottom line seems to be that XML was designed for the ASCII/UNICODE world and does not fit in very well in the EBCDIC mainframe world.

Regards,

Edwin Fine
CommerceQuest, Inc.
(Direct) 813-639-6508
(Fax) 813-639-6900
(Main) 813-639-6300
Edwin.Fine@C...



Rob Lugt <roblugt@e...>

03/16/2001 09:07 AM

       
        To:        Tim Bray <tbray@t...>, James Clark <jjc@j...>, xml-dev@l...
        cc:        
        Subject:        Re: Gag me with a blunt &#x85;



James Clark wrote:
> >I'm not convinced.  The XML spec says that Unicode character #x85 is not
> >a whitespace characters.  It appears from the Note that EBCDIC text
> >files on IBM mainframes represent newline by a byte with code 0x85. The
> >solution appears obvious to me: the EBCDIC encoding table used by the
> >XML parser should map byte 0x85 to Unicode character 0xA.

The note from IBM is arguing that the XML spec is wrong by not designating
#x85 as white space.  So the wording of the XML spec here doesn't seem like
a good reason not to be convinced.  The fix to the EBCDIC coding table may
work but it appears to me that this is something of a hack because the
original software is intending to create Unicode U0085 characters.  I would
prefer for XML parsers to be able to use standard encoding tables - perhaps
from generic libraries rather than having to create a special XML flavour.

Tim Bray replied:

> This feels much better.  And upon reflection, the thought of
> XML files which have been through a mainframe starting to
> percolate around the system with U+0085 embedded inside
> start tags makes me nervous;
<snip/>
> Also, unlike (almost?) all the other XML errata, changing this
> would actively break pretty well every deployed piece of XML
> software in the world.  -Tim

Arguably every deployed piece of XML software is already broken wrt files
containing U+0085.  It appears to me that you (the editors) went to great
lengths to adopt the full Unicode specification rather than creating an XML
subset.  If this was an oversight then I believe it makes sense to make good
the mistake and maintain full support for Unicode.

Regards
Rob Lugt
ElCel Technology


------------------------------------------------------------------
The xml-dev list is sponsored by XML.org, an initiative of OASIS
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: xml-dev-request@l...



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.