[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What is XML For?


lionel snell
On Monday 28 October 2002 4:12 pm, Paul Prescod wrote:
> Alaric B. Snell wrote:
> >...
> >
> >
> > Because XML has a fragile data model, designed for publishing stuff to a
> > browser rather than transfer between applications?
>
> XML is based on SGML which was invented long before browsers as we know
> them.

Yep, but that's orthogonal to my point; XML is based on ASCII, which is based 
on binary, which has been around since (potentially) ancient times in China 
(there being some evidence of binary arithmetic in certain ancient Chinese 
symbolism).

XML was designed, sometime in the late 90s; quite incidentally they decided 
to subset SGML because it seemed a good base for solving the problem at hand 
(and scarily enough, I agree with them! XML's great as an SGML-lite :-)

[why XML for data]
> It arises naturally from the observation that structured data (tuple
> structured, hieararchically structured, graph structured, recursive) is
> a subset of the kinds of data you will find in the documents XML was
> designed to handle. A telephone book is tuple-structured. An airplane
> manual is mostly hierarchically structured but with frequent escapes to
> graph structure.

But for complex data interchange the hierarchy of XML can be limiting. XML 
deals with lists OK since a list is a subset of a tree - each node in a tree 
is a list of nodes.

But for tables, it's clunky because you're having lots of nodes with the same 
structure under a parent node. Repeating that structure gets laborious to 
type if you're a human and is laborious to process if you're a computer. For 
a table of tuples, it's much easier for all parties to deal with:

email,name
alaric@a...,Alaric Snell
paul@p...,Paul Prescod
foo@b...,"Comma Containing, Mrs"

...than with the XML, which is at best:

<table>
 <field email="alaric@a..." name="Alaric Snell" />
 <field email=""paul@p..." name="Paul Prescod" />
 <field email="foo@b..." name="Ampersand containing, Mr &amp; Mrs" />
</table>

...and of course for graphs you need a system of primary keys and pointers 
like id= - and you have to build your graph over a hierarchy which isn't 
always optimal; at worst you use the hierarchy to build a list of key-value 
pairs, then use the list of key-value pairs with 'reference' nodes in the 
values to build a graph :-/

Or how about a multiple-parent hierarchy, hmm? A family tree?

Not so bad in CSV:

name,mother,father
Alaric Snell,Karin Owens,Lionel Snell
Karin Owens,Jean Byrne,James Owens
Lionel Snell,moment of shame as I forget my grandparent's names...
...

or

id,name,motherId,fatherId
1,Alaric Snell,2,3
2,Karin Owens,..,..
3,Lionel Snell,..,..
...

if name clashes are a concern!

Now, of course, the counter argument is that you put up with XML being clunky 
at tuples or graphs because it's good at hierarchies and lists and bearable 
at tuples and graphs so you overally have something that can kind of manage 
everything... XML got in the door by also being useful for a data structure 
generally ignored by the 'data' crowd, "text with metadata wrapped around 
spans of it".

In a world where most information (by bulk) is table-shaped, the most 
interesting information is graph-shaped, and merely the most fashionable 
information (I'm talking the WWW in general here, not just XML) is 
tree-of-annotated-text shaped, what kind of data transfer system do I 
want? One that gets out the way, and just gets my information from A to B 
with the minimum of effort (even if it is a pair of superimposed labelled 
directed acyclic graphs because I'm comparing the dataflow and control flow 
behaviour of a piece of code!).

Many years ago I wrote a document processing system - it produced FAQ lists 
in ten or so different formats, postcript / dvi / pdf / plain html / html 
with navigation gadgets in it / html as one big file / info / nicely 
formatted plain text - and a few other little ones.

As an input format, I used s-expressions. s-expressions aren't as simple as 
XML for attaching styles to text; instead of:

<document>
 <title>Hello World</title>
 <p>This is a <b>nice</b> document with <i>many</i> styles</p>
</document>

...you did:

(document
   title: "Hello World"
   (p "This is a " (b "nice") " document with " (i "many") " styles")
)

Yep, those quotes are a bit irritating, aren't they? So I stuck M4 (a macro 
processing engine) in front and wrote a macro called <b> that expanded to " 
(b " and so on, so I could write:

(document
   title: "Hello World"
   (p "This is a <b>nice</b> document with <i>many</i> styles")
)

Inspired by HTML, you see. But I still preferred the s-expressions for the 
non mixed content parts.

And kapow! I had a slight modification of s-expression syntax being my text 
input format. For completeness, here's a plain list in s-expr:

(1 2 3 4)

And here's a dictionary, in the case where you have a fixed set of keys to 
choose from:

(name: "Alaric Snell" email: "alaric@a..."))

And for arbitrary keys:

(("name" "Alaric Snell")
 ("email" "alaric@a..."))

And here's a tree:

(html
  (head
    (title "My Document"))
  (body
    (p "This is an HTML document, in an alien form")))

And here's a table:

(table ('name 'email)
       ("Alaric Snell" "alaric@a...")
       ("Paul Prescod" "paul@p..."))

Now s-expresions look pretty hierarchical here, but that's just a shorthand; 
they're really full graphs. I can't remember the graph syntax since I rarely 
used it but basically you can name any node with a bit of attached metadata 
and then say "Insert a reference to node X here" where you need it to create 
links. The link is explicit in the syntax, not implicit like with XML. The 
names you use for nodes aren't part of the data being transferred, they're 
just used for the transfer itself (like the choice of a namespace prefix in 
an XML document). It looked something like:

(family-tree
   (name: "Alaric" mother: @karin father: @lionel)
   
@karin:
   (name: "Karin" ...)

@lionel:
   (name: Lionel ...)

...

)

You can create a cyclic structure like so:

@cycle: ("Here is a cycle: " @cycle)

That's a list whose first element is a string and whose second element is 
itself.

Those closing brackets can be awkward since you don't know what they match up 
to, like minimized close tags in SGML - I long ago solved that for my 
problems although standard s-expressions don't do this; my s-expression 
parsing library supports a second syntax:

[foo[
  ...
]foo]

is syntactic sugar for:

(foo ...)

Now, if the people who came up with XML as data had wandered across a nest of 
Lisp hackers instead of XML, might we not have seen something like my 
s-expression variant with the symmetric tags being produced, perhaps 
augmented by a syntax to embed nodes in text strings as I hacked together 
with a macro system, then namespaces for symbols defined, then a 
transformation language and a path language and so on defined?

Perhaps they'd been put off by DSSSL :-)

> There is no boundary between data and documents but of
> course there may be a point on the spectrum where XML produces small
> benefit (e.g. if CSV is all you need).

Yep. I just think we could have come up with a system that provides a better 
overall gain for humanity - and has less areas where it produces small 
benefit. The s-expression corresponding to a CSV file is little more complex 
than the CSV would be, so it's already less of a loss than using XML in that 
situation.

XML's gained a following, but the hype is waning already. It's lodged in many 
niches but it's failed to change The Web as I see it, what it was originally 
designed for... although XML with embedded stylesheets being served over HTTP 
and displayed by browsers would have been a better Web the improvement is 
justifiably marginal compared to the costs to vested interests, so I'm not 
too surprised.

If Web services really come to pass in a big way, it will be more despite XML 
than because of it. If they'd been based on ONC RPC (remember, kids: it's 
extensible, copes quite happily with revisions to the standards in use, 
loosely coupled, easy to use, quite happy with lossy networks and so on, not 
a problem to debug in practice, an established standard, widely implemented, 
and near-optimally efficient in time and space), what would be the problem?

Currently my home machine is mounting Sunsite over the Global Interweb (tm) 
with NFS, which uses ONC RPC underneath. This is cool because I can do:

alaric@hate:/net/sunsite$ ls
0-Most-Packages  geography      lost+found  pub             recreation  usenet
aminet           geology        media       public          rfc         usr
bin              gnu            Mirrors     README          science     var
biology          IAFA-SITEINFO  misc        README.ftp      special     
computing        ic.doc         packages    README.layout   sun
core             incoming       park        README.login    tmp
etc              info           politics    README.uploads  unix

Now I don't think that would be *practical* with SOAP over HTTP! My ls 
request ran over UDP packets with sizes of 100-200 bytes - one packet to 
request, another packet to respond, on three-way handshakes or teardown and 
no TCP overhead wasting time reordering the response packets to make them 
come in the order the server sent (and I can submit requests without needing 
to wait for the response to the last request, which I don't think HTTP 1.1 
allows but I may be wrong).

The ONC RPC data model, XDR, is limiting in some ways but they could have 
easily have spent the time spent making SOAP on instead rewriting it in terms 
of the s-expression data interchange theye made up in the time they would 
have spent making XML, particularly if they'd spent some of the spare time 
making an alternative compact binary syntax for the sexprs (before you ask, 
one that's semantically identical to the textual sexprs and can be converted 
to and from same with a single, very simple, tool: output (SYNTAX_TEXT, parse 
(SYNTAX_BINARY,file) or vice versa)

I guess that's part of what makes my blood boil about XML data and XML for 
interprocess communication and all that; the reinvention of wheels, with less 
care than the first time round :-(

Whew... what a lot of typing! I should be eating!

>   Paul Prescod

ABS

-- 
Oh, pilot of the storm who leaves no trace, Like thoughts inside a dream
Heed the path that led me to that place, Yellow desert screen

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.