[Home] [By Thread] [By Date] [Recent Entries]
----- Original Message ----- From: "Elliotte Rusty Harold" <elharo@m...> To: <xml-dev@l...> Sent: Tuesday, January 15, 2002 9:10 PM Subject: RE: Xml is _not_ self describing >I guess it depends on what exactly you mean by "self-describing". I >think a book about the English language written in English is >self-describing in and of itself, whether anybody speaks English or >not. I agree with you. The RELAX NG schema for RELAX NG is also self-describing. But you'll spend many, many, many hours if you read only the schema before beginning to find out what's the meaning of the document, and I'm speaking of human intelligence here. Maybe the term of "self-describing" should be made more precise by specifying the intended audience and purpose of the self-description. The -ing for is tricky : "self-describing" seems to mean that the data by itself can reify its meaning. - An XML document without any related DTD is not self-describing. It merely transmit data about a labeled tree, there is no meta-data available. You can check its well-formedness, but for that you just apply external well-formedness rules to the document. - An XML document with an embedded DTD is self-describing, for computer that know about XML and DTDs, and for validation purposes. The document itself provides information on how it has to be processed to be declared valid in its own sense. - However, outside the bounds of very precise algorithms (validation), an XML document with an embedded DTD is not self-describing for computers in a more general processing context. Nothing tells the computer about how the data should be processed. The document has no control over its own fate. An invoice document is not describing how it should be processed by an accounting system. The information comes from elsewhere. The latest point means that the hype 'because XML is self-describing, it is the Lingua France of computer science, and your integration costs will drop' is pure bullshit. We know it for sure on this list, but explaining why needs a precise definition of what 'self-describing' means... > When a document is marked up, the information of the markup is there, > whether we recognize it or not. It is a property of the text itself, > not a property of our perception of the text. With appropriate work, > experience, intelligence, and luck that markup can be understood. Can > unmarked up text be understood as well? Yes, certainly; but markup > adds to the information content of the text. It makes it easier to > decipher its meaning in a very practically useful way. This is a > question of degree, and text+markup is easier to understand than text > alone. By carefully examining the data in a CSV file without column header, applying clever heuristics, you can often find out what each column means (especially if you spot zip code, city names that you know, etc.). And again, the fact that a CSV file is, well, comma separated, makes it easier to parse and use than the equivalent plain-text file. Formatting rules and markup sure *do* add information if they are used consistently. However, I don't think it is sensible to tell that an XML file with unknown or foreign tag names is more interesting than a CSV file without headers. You get more information, because provided that you notice the pointy bracket and find out that some series of characters surrounded by <> or </> match, you can build a hierarchical model. But more information does not means more meaning. There is no magic thing in XML that will give you the *meaning* of the hierarchical relation, or of the data embedded inside the tags, contrary to what the public can believe when hearing the term "self-describing". That was the point of this "Xml is _not_ self-describing" thread : beware of the magic connotations of "self-describing". Regards, Nicolas
|

Cart



