[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Common XML (was Re: Document Feature Requirements)

  • From: "Simon St.Laurent" <simonstl@s...>
  • To: Rick JELLIFFE <ricko@g...>, XML-Dev Mailing list <xml-dev@x...>
  • Date: Sat, 08 Apr 2000 17:51:35 -0400

utf 8 transcoding
At 03:30 AM 4/9/00 +0800, Rick JELLIFFE wrote:
>I think the name "Common XML" doesn't capture where the Common XML
>conventions are most appropriate. Some name like "Exchange XML" or
>"Round-trippable XML"
>would be more appropriate. 

The name 'Common XML' emerged from discussions regarding how people coming
to XML from outside the SGML community tend to use XML anyway, and reflects
the fact that the core features are in fact the only things that XML
processors are assured to have IN COMMON by the XML 1.0 specification.

>Common XML is based on a model where the user of the data is not in the
>position to choose tools that can handle that data. I think this is
>not so likely for inhouse workflows of data, which is why "Exchange XML"
>might be a better name.

In my experience, 'inhouse workflows' are often less documented and less
controllable than exchanges of information between differing organizations.
 "Exchange XML," in my experience, is basically valid XML, exchanged with
the expectation of a lot more testing, validation in particular.

In either case, there are no long-term guarantees for what will happen to
the information.

>'Guaranteed interoperability' and 'simplicity of implementation'
>are sometimes at odds. A notable example is that of character encoding.
>If your system is not native Unicode (or has a transcoding library such
>as IBM's ICU for C++ available) then mandating UTF-8 is equivalent of
>ruling out Chinese, Japanese and Korean data. In such a case,
>interoperability 
>equates to unuseability.  "Common XML" is, for us here, "Impossible XML"
>as it stands, for many applications.

It's important to remember that you can use other encodings within the
framework of Common XML, just not the core.  All Common XML does is warn
you that your encodings may not be interoperable with all XML processors.

The difficulty of putting such encodings in the 'core' of Common XML is the
sad result of building Common XML as a strict subset of XML 1.0 and the
requirements set by that specification.  

If the framers of XML 1.0 had chosen to require that XML processors support
encodings beyond UTF-8 and UTF-16, those encodings would be in the core of
Common XML.  Because they didn't, Common XML includes guidelines warning
developers that using these other encodings introduces interoperability
risks.  

Expat, for instance, supports only UTF-8, UTF-16, US-ASCII, and ISO-8859-1.
 However unfortunate many people may find this, Expat is actually beyond
the requirements set by the XML 1.0 specification.

>it is the nature of a comment ...

I don't think we're concerned in Common XML with the 'nature' of things as
they've been defined.  Will comments survive a trip into and out of an XML
document repository?  Maybe, maybe not - it depends on what kind of XML
parser is used to read the document, and how it interacts with the
application.  From Common XML's perspective, the 'maybes' and 'it depends'
are the important parts of the comment's nature.

>I would say there is a further idea that needs to be addressed more
>also:
>an editing application is fundamentally different from a transformation
>application.  If an editing application cannot round-trip, that is
>a bug. If a transformation application cannot round-trip, that is the 
>developer's choice (or the application domain's charactistic). 

This is a reasonable distinction if you assume that developers actually
have control over where their documents will go, and how they will be
processed.  The prospect of ubiquitous XML includes a lot of possibilities
that suggest 'developer choice' may take place far from and unknown to the
creators of documents.  

This is a radical change from the relatively staid world of document
management, workflow, and other areas where you expect to control and
contain your dataflows.  It opens possibilities for all kinds of new
processing systems, but introduces a dramatically new level of uncertainty
into information representation.

Recommending a conservative approach in a time of great flux seems like a
very practical thing to do.  It seems prudent to remind people that they
may in fact not have control over the long-term processing of their XML
documents and that their documents may have a lifespan well beyond the
original application or even application domain.


Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.