Re: What is XML For?

Cart
XML Editor - Download a Free Trial >
See What's New >
Buy Now >
[Home] [By Thread] [By Date] [Recent Entries]
To: Paul Prescod <paul@p...>,"'The Deviants'" <xml-dev@l...>
Subject: Re: What is XML For?
From: "Alaric B. Snell" <alaric@a...>
Date: Tue, 29 Oct 2002 18:54:34 +0000
In-reply-to: <3DBEB538.3000201@p...>
References: <001301c27e33$c775e760$6a31812f@mooncat> <20021028204711.1401D5542@c...> <3DBEB538.3000201@p...>
On Tuesday 29 October 2002 16:20, Paul Prescod wrote:
> Alaric Snell wrote:
> > On Monday 28 October 2002 4:12 pm, Paul Prescod wrote:
> >...
> >
> >>>Because XML has a fragile data model, designed for publishing stuff to a
> >>>browser rather than transfer between applications?
> >>
> >>XML is based on SGML which was invented long before browsers as we know
> >>them.
> >
> > Yep, but that's orthogonal to my point; XML is based on ASCII, which is
> > based on binary, which has been around since (potentially) ancient times
> > in China (there being some evidence of binary arithmetic in certain
> > ancient Chinese symbolism).
>
> Argh. The XML _data model_ is not, as far as I know, much different than
> the implicit SGML "data model" which long predated browsers. If you
> believe that XML has a much "weaker" data model than SGML's (and not
> groves, which post-date SGML), then please present evidence. Otherwise
> let's leave aside the fact that XML was designed (in part) for browsers.

This isn't what I'm trying to say, though; the original question was why XML 
didn't have metadata to handle change control, stating how applications 
should deal with elements/attributes they don't recognise. And my point is 
that while the PNG group had seen this problem arise in things like TIFF and 
decided to guard against it, the XML group were thinking much more along the 
lines of "We publish DTD; everyone uses DTD; we only publish new DTD when we 
feel people won't mind upheaval and reprogramming". The PNG format is 
designed to deal with different versions of everything interoperating as best 
as they can. While I've seen XML and SGML vocabularies that uses a different 
DTD identifier for different versions to keep the versions from ever mixing!

> > But for tables, it's clunky because you're having lots of nodes with the
> > same structure under a parent node. Repeating that structure gets
> > laborious to type if you're a human and is laborious to process if you're
> > a computer. For a table of tuples, it's much easier for all parties to
> > deal with:
> >
> > email,name
> > alaric@a...,Alaric Snell
> > paul@p...,Paul Prescod
> > foo@b...,"Comma Containing, Mrs"

> No, actually, I don't see it as being easier for a computer to keep a
> stateful mapping of columns to column titles.

Storing the list of column titles from the first line being easier than what, 
the state management of an XML parser?

> And although the
> compressed form is more convenient for a TYPIST, it is actually much
> easier to read the XML spelling out the attribute names. If you are
> adding only one entry (as opposed to typing thousands), I also prefer
> the XML because you can't forget whether the "0" is the UID or the GID
> (to use an /etc/passwd example).

/etc/passwd doesn't *have* column headings though, it's a bad non-extensible 
non-flexible format :-)

> > ...and of course for graphs you need a system of primary keys and
> > pointers like id= - and you have to build your graph over a hierarchy
> > which isn't always optimal; at worst you use the hierarchy to build a
> > list of key-value pairs, then use the list of key-value pairs with
> > 'reference' nodes in the values to build a graph :-/
>
> At worst it isn't much worse than any other serialization of a graph.
> Reading serialized graphs is always a pain.

It's quite nice with the s-expr notation for it, though. See below.

> > Or how about a multiple-parent hierarchy, hmm? A family tree?
> >
> > Not so bad in CSV:
> >
> > name,mother,father
> > Alaric Snell,Karin Owens,Lionel Snell
> > Karin Owens,Jean Byrne,James Owens
> > Lionel Snell,moment of shame as I forget my grandparent's names...
> > ...
>
> <person name="Alaric Snell" mother="Karin Owens" father="Lionel Snell"/>
> <person name="Karin Owens" mother="Jean Byrne" father="James Owens"/>
> ...
>
> It isn't as if CSV handles this issue naturally and the XML is massively
> complex.

You're using a short fat tree to store another tree, my point was that XML's 
hierarchy is only a particularly limited kind, beyond which you need to fall 
back to the same tricks.

> > Now, of course, the counter argument is that you put up with XML being
> > clunky at tuples or graphs because it's good at hierarchies and lists and
> > bearable at tuples and graphs so you overally have something that can
> > kind of manage everything...
>
> Any serialization of graphs is going to be clunky (XML is no worse than
> CSV or s-expressions) and I personally have no problem with XML for tuples.

s-exprs are nicer since they EXPLICITLY express the graph structure; general 
tools can deal with the graph. With XML, there is no way to pull out the 
references other than parent/child relationships except where people have 
used xlink, which does not seem to currently be as ubiquitous as one would 
like.

>  >...
>  > [s-expression idea]
> >
> > And kapow! I had a slight modification of s-expression syntax being my
> > text input format. For completeness, here's a plain list in s-expr:

> Yes, we discussed all of this five years ago when we were defining XML
> (and years before that, and every year since then)! If you don't mind
> having two syntaxes, sure you can do better than XML for certain kinds
> of data. You could even have three or four or five syntaxes for
> different kinds of data. But XML demonstrably hits the 80/20 point such
> that most developers think it is good enough for both data and documents.

How do you mean multiple syntaxes? I presented a single syntax that handles 
many cases neatly. My thing for embedding styling in s-expr strings was a 
grotesque hack to make it look like HTML; a more elegant shorthand could be 
had, I'm sure.

> > ...
> > Now s-expresions look pretty hierarchical here, but that's just a
> > shorthand; they're really full graphs. I can't remember the graph syntax
> > since I rarely used it but basically you can name any node with a bit of
> > attached metadata and then say "Insert a reference to node X here" where
> > you need it to create links.
>
> You aren't talking about s-expressions anymore. You're talking about a
> syntax you invented.

No I'm not. This is standard s-expr stuff from the Lisp days.

Here it is! Used # symbols, not @s. My bad!

http://www.cs.rice.edu/CS/PLT/packages/doc/mzscheme/node153.htm

...ugh, they mandate using numeric identifiers, I prefer my interpretation 
with symbols after all :-)

>  >...
> >
> > Now, if the people who came up with XML as data had wandered across a
> > nest of Lisp hackers instead of XML, might we not have seen something
> > like my s-expression variant with the symmetric tags being produced,
> > perhaps augmented by a syntax to embed nodes in text strings as I hacked
> > together with a macro system, then namespaces for symbols defined, then a
> > transformation language and a path language and so on defined?
>
> Sure, if S-expressions had the features you propose (they don't)

They have the graph feature, and the other two are just syntactic sugar to 
please people who don't like all the ()s.

> and a schema language.

We can *make* a schema language. I'm comparing XML with s-expressions here, 
not "The world of XML" with just s-expressions, since if so we'd include 
Scheme as our transformation and schema language, no?

> I've been through this argument too many times:
>
> http://www.prescod.net/xml/sexprs.html

Ok, let's take a look.

1) Starting with Syntax

Very much in the eye of the beholder :-) I prefer s-expressions, you prefer 
XML. Ok.

2) Redundancy is Good

I showed a trivial fix for this problem with my [foo[ ... ]foo] concept, 
which I've implemented with success before, and would submit to the Scheme 
standards bodies if I thought it mattered *that* much.

3) Family matters

Ahah! So you're comparing complex, limited, and inconsistent schemas, 
infosets, xpaths, xqueries, and xslt against Scheme and (whisper lest you 
waken him!) his big brother Lisp? Muhahahah!

3a) Principle of Least Power

That's missing a point - it confuses Power with Complexity, I suspect. The 
most powerful languages can be the simplest. One can write a Scheme 
interpreter with surprising ease, and the standard library that runs on top 
of it is probably about the same size as an HTML DTD... ok, not that I've 
actually seen the byte size counts of either, but I'm about as worried about 
the complexity of each. A scheme runtime is going to be similar in complexity 
to an XSLT engine, actually, and it'll be able to do more!

Put another way, power is the size of the set of things you can do with a 
tool. Clearly, this is an attribute we want to increase. However complexity 
is a metric of the cost of performing tasks (including initial learning) with 
the tool, and is to be reduced. If you assume power and complexity are 
inexorably related then indeed you need to find a happy medium, the least 
complex system that has enough power to fulfill your needs. But Turing 
completeness can be had from a mathematical system you can describe on the 
back of an envelope, and that can be used to make a more expressive schema 
notation than XSD (I'm think a graph rewriting system here in which valid XML 
documents are ones that can be rewritten to a single token, "VALID", or 
something like that).

> > Yep. I just think we could have come up with a system that provides a
> > better overall gain for humanity - and has less areas where it produces
> > small benefit. The s-expression corresponding to a CSV file is little
> > more complex than the CSV would be, so it's already less of a loss than
> > using XML in that situation.
>
> You're presuming that there is no cost in having multiple syntaxes:
> different ones for different domains. But clearly there is _some_ cost.

Not in this argument. I do happen to be a fan of multiple interopable 
syntaxes, but not of multiple models. I'd like my s-expression to be 
expressible in memory as pointers to cons cells, in front of me as a nest of 
brackets, and in a database as a compact bit string...

> > XML's gained a following, but the hype is waning already. It's lodged in
> > many niches but it's failed to change The Web as I see it, what it was
> > originally designed for... although XML with embedded stylesheets being
> > served over HTTP and displayed by browsers would have been a better Web
> > the improvement is justifiably marginal compared to the costs to vested
> > interests, so I'm not too surprised.
>
> The Web as we see it has stagnaged since Microsoft wiped out Netscape.
> But all of the most innovative emerging technologies on that front are
> XML-based: RSS, SVG, XUL, XForms. It is way too early to say that XML
> didn't change the Web. Give it five years.

It hasn't changed it yet, though, and the momentum to do so seems to be 
waning; once I kept hearing people asking "Um, should we be using XML for 
this stuff?" but now they're back to their HTML templates and (grumble) CGIs 
with code and HTML entwined.

> > Now I don't think that would be *practical* with SOAP over HTTP! My ls
> > request ran over UDP packets with sizes of 100-200 bytes - one packet to
> > request, another packet to respond, on three-way handshakes or teardown
> > and no TCP overhead wasting time reordering the response packets to make
> > them come in the order the server sent
>
> I guess the thousands of people using WebFolders are all hallucinating.
> ;) I would guess that there are more people using those every day than
> NFS (but not as many as SMB -- yet).

That thing's just dynamic uploading and downloading of files via HTTP, isn't 
it? When I run a program over NFS, it's fast because the file is demand paged 
over the network, it doesn't have to wait until it's downloaded the whole 
thing before it can get started! And any other file that is read 
nonsequentially or partially will be unweildy too... and I suppose these Web 
folders use HTTP to fetch directory information too (in case you hadn't 
guessed, I've not had the privelege myself) so it'll hardly be nippy to 
navigate, will it?

They might be using range requests to speed up reading, and perhaps you can 
PUT to a range in a resource and have the server understand you, but even 
then you still have no standard way to represent chmod, chown, and chgrp or 
ACL changes!

I don't think HTTP is a usable file sharing protocol. Just imagine mounting 
/var over HTTP as the mail spool for a failed-over mail cluster :-) And how 
practical is an ls -lR over these Web Folders, hmm? I suspect it's no more 
than a nice user interface for downloading and uploading files rather than 
something you'd actually want to use as a filesystem.

>  > (and I can submit requests without needing
>  > to wait for the response to the last request, which I don't
>  >  think HTTP 1.1 allows but I may be wrong).
>
> That's what HTTP 1.1 pipelining is.

I was wrong then; good job I put that qualifier in :-) But how does it get 
around the underlying problem in TCP? Does it open extra TCP connections to 
handle parallel requests?

> > I guess that's part of what makes my blood boil about XML data and XML
> > for interprocess communication and all that; the reinvention of wheels,
> > with less care than the first time round :-(
>
> If it had been done right the first time around there would be no
> temptation to do it again. It surprises me that there are still people
> who don't "get" that schemas are an important part of the difference.
> Mixed content is another important part.

Schemas are hardly new to XML. Nothing in XML is new; you can't say that this 
feature or that feature is critical. All you can say is that this 
*combination* of things is critical, these features we left out that other 
systems had are not useful because of X or would spoil the ensemble because 
of Y.

> Of course, I'm not defending SOAP and the idea of something like NFS
> running on top of SOAP running on top of HTTP makes me cringe.

I know, don't worry, I don't doubt your sanity, just your reasoning :-)

>
>   Paul Prescod
>

ABS

-- 
A city is like a large, complex, rabbit
 - ARP
Follow-Ups:
- Re: What is XML For?
  - From: Paul Prescod <paul@p...>
References:
- RE: What is XML For?
  - From: "Martin Soukup" <martin@d...>
- Re: What is XML For?
  - From: Alaric Snell <alaric@a...>
- Re: What is XML For?
  - From: Paul Prescod <paul@p...>
Prev by Date: Re: XML Object Serialization
Next by Date: RE: XML Object Serialization
Previous by thread: Re: What is XML For?
Next by thread: Re: What is XML For?
Index(es):
- Date
- Thread
XML Editor - Download a 15 Day Free Trial Now >
See What's New in Stylus Studio >
Buy Stylus Studio - XML Editor - Now >