[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Another way to present XML data

  • From: Michael Kay <mike@saxonica.com>
  • To: jf.larvoire@free.fr
  • Date: Tue, 12 Sep 2017 21:45:38 +0100

Re:  Another way to present XML data

On 12 Sep 2017, at 21:14, jf.larvoire@free.fr wrote:

As I already answered to Michael Kay, this is HTML.

No, this is XML.

Google's KML is an excellent example, as are XSLT, SOAP, SVG, etc...

Of those three examples, two use mixed content:


<xsl:template match="cite">[<xsl:value-of select="."/>]</xsl:template>


    <text x="200" y="150" fill="blue" >
      You are
        <tspan font-weight="bold" fill="red" >not</tspan>
      a banana.
Michael Kay
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
      <name>New York City</name>
      <description>New York City</description>

?xml version="1.0" encoding="UTF-8"
  Document {
    Placemark {
      name "New York City"
      description "New York City"
      Point {
        coordinates -74.006393,40.714172,0

De: "Christophe Marchand" <cmarchand@oxiane.com>
À: xml-dev@l...
Envoyé: Mardi 12 Septembre 2017 20:56:47
Objet: Re: Another way to present XML data

Maybe something like this :

<p>This is a <b>bold</b> text with <u>underlined <i>italic</i> content</u> in.</p>



Le 12/09/2017 à 18:32, jf.larvoire@free.fr a écrit :
No, white spaces are significant, and are preserved by my conversion script.

I don't really know what you mean by mixed contents.
Please show me an example of mixed content that is not converted correctly.


De: "Michael Kay" <mike@saxonica.com>
À: "jf larvoire" <jf.larvoire@free.fr>
Cc: xml-dev@lists.xml.org
Envoyé: Mardi 12 Septembre 2017 17:49:14
Objet: Re: Another way to present XML data

I actually came across this proposal a couple of months ago when trying to do something similar myself - before I found it I had in fact reinvented a lot of this, with minor variations.

The thing I was finding most difficult was mixed content, and I don't think SML has really thought that through either. None of the examples seem to use mixed content, at any rate. And as far as I can see, the whitespace between elements is being treated as insignificant, which doesn't work in a mixed-content world.

Of course this is really tough challenge because you want to round-trip XML but XML has this major design flaw that signfiicant and insignificant whitespace can't be lexically distinguished. So you end up either replicating that design flaw, or sacrificing round-tripping.

On a point of detail, it doesn't feel right to have both '&' and '\' as escape characters with different roles.

(Mixed content is not an edge case!)

Michael Kay

On 12 Sep 2017, at 15:31, jf.larvoire@free.fr wrote:

Following last month's thread about the relative merits of XML and JSON, I think there's a way to get the advantages of both:

Instead of creating yet another data format incompatible with XML, it's actually possible to transform _reversibly_ XML into something that is as human-friendly as JSON.
This way, we can present the same data as "normal" XML to be consumed programs, or as "simplified" XML to be reviewed and edited by humans. And convert it back and forth between the two presentations when going from programs to humans and back.
I developed several years ago a Tcl script that did exactly that.

The script, and my proposed "Simplified XML" format, aka. SML, were presented at the 2013 Tcl conference:
This script is open-sourced and available there:
To use it in Windows, you'll need to install a Tcl interpreter. See instructions on this page if needed:

Note that I've been told that another data format called sml was proposed in 1999.
Mine has no relationship whatsoever with the other.
If this homonymy is a problem, I'm open to any alternative name!

Here's an example of what the transformation does:

`cat sample.xml`

<?xml version="1.0"?>
    <format name="XML">
<advantage>Can define formal syntaxes, that can be verified</advantage>
<advantage>Widely adopted, many XML-based standards</advantage>
<drawback>Hard to read by humans</drawback>
    <format name="JSON">
<author>Douglas Crockford</author>
<advantage>Easy to read by humans</advantage>
<drawback>Incompatible with XML</drawback>
    <format name="SML">
<author>Jean-Francois larvoire</author>
<advantage>Same advantages as XML. It is XML presented differently.</advantage>
<advantage>Easy to read by humans</advantage>
        <drawback>No I/O libraries available yet</drawback>

Then `cat sample.xml | sml` displays:

?xml version="1.0"
formats {
    format name="XML" {
author W3C
standardized 2008
advantage "Can define formal syntaxes, that can be verified"
advantage "Widely adopted, many XML-based standards"
drawback "Hard to read by humans"
    format name="JSON" {
author "Douglas Crockford"
standardized 2013
advantage "Easy to read by humans"
drawback "Incompatible with XML"
    format name="SML" {
author "Jean-Francois larvoire"
advantage "Same advantages as XML. It is XML presented differently."
advantage "Easy to read by humans"
        drawback "No I/O libraries available yet"

And finally `cat sample.xml | sml | sml` outputs an identical copy of the initial XML file.

Try it out with any of your XML files, and you'll be surprised of how much easier it is to edit them!
XSLT files transformed this way even become pleasant for C programmers!

Of course, there's nothing that prevents programs from actually producing or consuming SML directly. There are no libraries available for doing it yet, but in simple cases as in the example above, the parsing is relatively trivial.
The nice thing is that this transition can be done progressively, as the new SML-aware programs will remain compatible with your old programs that only know about standard XML.
Simply pipe the data through the sml.tcl script to make them understand each other!

Limitations: The sml.tcl program is well tested (I've used it for years), but it has known limitations:
- It will not convert UTF-16 or EBCDIC or MBCS files correctly.
- I've done very little testing with Unicode characters > \u00FF, so I'm not sure it will work fine with these in UTF-8 files.
- I've tested the reversibility with the whole libxml2 test suite. Still, my parsing definitely does not cover all corner cases of the XML specification. There surely are bugs still hiding there, but hopefully only for the least used features of XML.

Any feedback welcome!

Please report any bug on the github issues list:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.