[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: What is XML For?
On Monday 28 October 2002 4:12 pm, Paul Prescod wrote: > Alaric B. Snell wrote: > >... > > > > > > Because XML has a fragile data model, designed for publishing stuff to a > > browser rather than transfer between applications? > > XML is based on SGML which was invented long before browsers as we know > them. Yep, but that's orthogonal to my point; XML is based on ASCII, which is based on binary, which has been around since (potentially) ancient times in China (there being some evidence of binary arithmetic in certain ancient Chinese symbolism). XML was designed, sometime in the late 90s; quite incidentally they decided to subset SGML because it seemed a good base for solving the problem at hand (and scarily enough, I agree with them! XML's great as an SGML-lite :-) [why XML for data] > It arises naturally from the observation that structured data (tuple > structured, hieararchically structured, graph structured, recursive) is > a subset of the kinds of data you will find in the documents XML was > designed to handle. A telephone book is tuple-structured. An airplane > manual is mostly hierarchically structured but with frequent escapes to > graph structure. But for complex data interchange the hierarchy of XML can be limiting. XML deals with lists OK since a list is a subset of a tree - each node in a tree is a list of nodes. But for tables, it's clunky because you're having lots of nodes with the same structure under a parent node. Repeating that structure gets laborious to type if you're a human and is laborious to process if you're a computer. For a table of tuples, it's much easier for all parties to deal with: email,name alaric@a...,Alaric Snell paul@p...,Paul Prescod foo@b...,"Comma Containing, Mrs" ...than with the XML, which is at best: <table> <field email="alaric@a..." name="Alaric Snell" /> <field email=""paul@p..." name="Paul Prescod" /> <field email="foo@b..." name="Ampersand containing, Mr & Mrs" /> </table> ...and of course for graphs you need a system of primary keys and pointers like id= - and you have to build your graph over a hierarchy which isn't always optimal; at worst you use the hierarchy to build a list of key-value pairs, then use the list of key-value pairs with 'reference' nodes in the values to build a graph :-/ Or how about a multiple-parent hierarchy, hmm? A family tree? Not so bad in CSV: name,mother,father Alaric Snell,Karin Owens,Lionel Snell Karin Owens,Jean Byrne,James Owens Lionel Snell,moment of shame as I forget my grandparent's names... ... or id,name,motherId,fatherId 1,Alaric Snell,2,3 2,Karin Owens,..,.. 3,Lionel Snell,..,.. ... if name clashes are a concern! Now, of course, the counter argument is that you put up with XML being clunky at tuples or graphs because it's good at hierarchies and lists and bearable at tuples and graphs so you overally have something that can kind of manage everything... XML got in the door by also being useful for a data structure generally ignored by the 'data' crowd, "text with metadata wrapped around spans of it". In a world where most information (by bulk) is table-shaped, the most interesting information is graph-shaped, and merely the most fashionable information (I'm talking the WWW in general here, not just XML) is tree-of-annotated-text shaped, what kind of data transfer system do I want? One that gets out the way, and just gets my information from A to B with the minimum of effort (even if it is a pair of superimposed labelled directed acyclic graphs because I'm comparing the dataflow and control flow behaviour of a piece of code!). Many years ago I wrote a document processing system - it produced FAQ lists in ten or so different formats, postcript / dvi / pdf / plain html / html with navigation gadgets in it / html as one big file / info / nicely formatted plain text - and a few other little ones. As an input format, I used s-expressions. s-expressions aren't as simple as XML for attaching styles to text; instead of: <document> <title>Hello World</title> <p>This is a <b>nice</b> document with <i>many</i> styles</p> </document> ...you did: (document title: "Hello World" (p "This is a " (b "nice") " document with " (i "many") " styles") ) Yep, those quotes are a bit irritating, aren't they? So I stuck M4 (a macro processing engine) in front and wrote a macro called <b> that expanded to " (b " and so on, so I could write: (document title: "Hello World" (p "This is a <b>nice</b> document with <i>many</i> styles") ) Inspired by HTML, you see. But I still preferred the s-expressions for the non mixed content parts. And kapow! I had a slight modification of s-expression syntax being my text input format. For completeness, here's a plain list in s-expr: (1 2 3 4) And here's a dictionary, in the case where you have a fixed set of keys to choose from: (name: "Alaric Snell" email: "alaric@a...")) And for arbitrary keys: (("name" "Alaric Snell") ("email" "alaric@a...")) And here's a tree: (html (head (title "My Document")) (body (p "This is an HTML document, in an alien form"))) And here's a table: (table ('name 'email) ("Alaric Snell" "alaric@a...") ("Paul Prescod" "paul@p...")) Now s-expresions look pretty hierarchical here, but that's just a shorthand; they're really full graphs. I can't remember the graph syntax since I rarely used it but basically you can name any node with a bit of attached metadata and then say "Insert a reference to node X here" where you need it to create links. The link is explicit in the syntax, not implicit like with XML. The names you use for nodes aren't part of the data being transferred, they're just used for the transfer itself (like the choice of a namespace prefix in an XML document). It looked something like: (family-tree (name: "Alaric" mother: @karin father: @lionel) @karin: (name: "Karin" ...) @lionel: (name: Lionel ...) ... ) You can create a cyclic structure like so: @cycle: ("Here is a cycle: " @cycle) That's a list whose first element is a string and whose second element is itself. Those closing brackets can be awkward since you don't know what they match up to, like minimized close tags in SGML - I long ago solved that for my problems although standard s-expressions don't do this; my s-expression parsing library supports a second syntax: [foo[ ... ]foo] is syntactic sugar for: (foo ...) Now, if the people who came up with XML as data had wandered across a nest of Lisp hackers instead of XML, might we not have seen something like my s-expression variant with the symmetric tags being produced, perhaps augmented by a syntax to embed nodes in text strings as I hacked together with a macro system, then namespaces for symbols defined, then a transformation language and a path language and so on defined? Perhaps they'd been put off by DSSSL :-) > There is no boundary between data and documents but of > course there may be a point on the spectrum where XML produces small > benefit (e.g. if CSV is all you need). Yep. I just think we could have come up with a system that provides a better overall gain for humanity - and has less areas where it produces small benefit. The s-expression corresponding to a CSV file is little more complex than the CSV would be, so it's already less of a loss than using XML in that situation. XML's gained a following, but the hype is waning already. It's lodged in many niches but it's failed to change The Web as I see it, what it was originally designed for... although XML with embedded stylesheets being served over HTTP and displayed by browsers would have been a better Web the improvement is justifiably marginal compared to the costs to vested interests, so I'm not too surprised. If Web services really come to pass in a big way, it will be more despite XML than because of it. If they'd been based on ONC RPC (remember, kids: it's extensible, copes quite happily with revisions to the standards in use, loosely coupled, easy to use, quite happy with lossy networks and so on, not a problem to debug in practice, an established standard, widely implemented, and near-optimally efficient in time and space), what would be the problem? Currently my home machine is mounting Sunsite over the Global Interweb (tm) with NFS, which uses ONC RPC underneath. This is cool because I can do: alaric@hate:/net/sunsite$ ls 0-Most-Packages geography lost+found pub recreation usenet aminet geology media public rfc usr bin gnu Mirrors README science var biology IAFA-SITEINFO misc README.ftp special computing ic.doc packages README.layout sun core incoming park README.login tmp etc info politics README.uploads unix Now I don't think that would be *practical* with SOAP over HTTP! My ls request ran over UDP packets with sizes of 100-200 bytes - one packet to request, another packet to respond, on three-way handshakes or teardown and no TCP overhead wasting time reordering the response packets to make them come in the order the server sent (and I can submit requests without needing to wait for the response to the last request, which I don't think HTTP 1.1 allows but I may be wrong). The ONC RPC data model, XDR, is limiting in some ways but they could have easily have spent the time spent making SOAP on instead rewriting it in terms of the s-expression data interchange theye made up in the time they would have spent making XML, particularly if they'd spent some of the spare time making an alternative compact binary syntax for the sexprs (before you ask, one that's semantically identical to the textual sexprs and can be converted to and from same with a single, very simple, tool: output (SYNTAX_TEXT, parse (SYNTAX_BINARY,file) or vice versa) I guess that's part of what makes my blood boil about XML data and XML for interprocess communication and all that; the reinvention of wheels, with less care than the first time round :-( Whew... what a lot of typing! I should be eating! > Paul Prescod ABS -- Oh, pilot of the storm who leaves no trace, Like thoughts inside a dream Heed the path that led me to that place, Yellow desert screen
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|