[Home] [By Thread] [By Date] [Recent Entries]
> -----Original Message----- > From: Bullard, Claude L (Len) [mailto:clbullar@i...] > Sent: 15 January 2002 20:46 > To: 'Elliotte Rusty Harold'; 'xml-dev@l...' > Subject: RE: Xml is _not_ self describing > > > I can't wait to see the XML.COM condensed > version of this thread. :-) And me, because hopefully then I can read it and understand what the real issue is. (...pauses...) D'oh! -- Seriously though, I gave a talk recently, introducing Markup and XML to some Medical Informatics students. I outlined the overheads of writing custom parsers for custom formats; suggested that providing additional rules to structure data formats could improve the situation; then explained why CSV is fragile and limited; and then introduced labelled formats as the best solution. I also made it clear that introducing grammatical rules such as labelling doesn't necessarily say anything about the meaning of the data following those rules (cf: Edward Lear). That's for a higher layer. They seemed to accept the benefits of this, and understood where the limitations were. So aside from the philosophy (interesting as it is) it seems to me there's a fairly simple message to get across. Is there any real evidence that there's been a failure to communicate it, beyond the existing marketing-technology disconnects? Personally I'm not sure I've seen it. Most developers I've worked with just approach XML as syntax, and don't expect a whole lot more. Cheers, L. -- Leigh Dodds, Research Group, Ingenta | "Pluralitas non est ponenda http://weblogs.userland.com/eclectic | sine necessitate" http://www.xml.com/pub/xmldeviant | -- William of Ockham > > Is it there? We can split some fine hairs here, but > often meaning has to be discovered from clues found > elsewhere and then projected onto the text. Worse, > the translations into an understanding readily shared > can vary enormously such that any such original meaning > is distorted or not provable as original until some > acceptable number of texts are translated. There are > linear markings from the Mystery Hill site (American > Stonehenge) which some claim are Phoenician but are > hotly contested otherwise. Before accepted, both > the decipherers and the archaeologists have to > find mutually reinforcing but quite separate > evidence (previous examples of the text types and > artifacts attributable to some past civilization). > > It may not be random but be meaningless: see the > problems of assuming some astronomical signals > were meaningful because they were regular (rotating > and emitting). Non-randomness isn't meaningful > per se. One can assume that a wedge-shaped tablet > found in a collection of such is if other evidence > indicates the site is a library, then start building > up example sets until the key is discovered or a > dictionary is created that self-consistent to a > tolerable degree. Otherwise, a Rosetta Stone is > required. > > So it isn't that cut and dry. As I said in my > reply to Mike, you can be looking for math only > to discover belatedly, possibly by accident, that > they were just saying Hi: Cheops Slept Here. Once > you know about star alignments, some aspects of > pyramid layouts make sense. Unfortunately, > so does Stonehenge, Mystery Hill and a myriad > of other sites - but it can't be proved and > may not be true in each or every case. > > "Documents written in natural languages have meaning even if you don't > speak those languages. They do carry information." > > That is so but until you learn them or someone who has tells you, > you don't know what they mean. We are quite close to the > "if the tree falls in the forest.." argument. The best I can > do is say, yes it has meaning to someone and yes, strictly > speaking, by establishing the non-randomness is purposeful, not > a side effect of another regular process, we can agree there > is information there. Shannon built modern communications > by saying reproducibility, not semantics, are the key to > designing communication systems. > > That said, we of course agree about the value of tagging regardless > of whether we have the descriptions. XML is self-describing to > the extent one understands the Rosetta Stone that is the > XML 1.0 specification, then acquires by some evidence, a > workable set of descriptions for the tag names. Doctor Goldfarb > often points to glossing as the original modern form of hypertext > and markup. > > All other things being equal, given some XML instance, I sure > do prefer a well-documented schema or DTD to reading someone > else's code to discover what I am supposed to expect and > what to do about it. Or just Hide The XML and give me > the stinkin' compiled application to install. > > len > > -----Original Message----- > From: Elliotte Rusty Harold [mailto:elharo@m...] > > At 12:17 PM -0600 1/15/02, Bullard, Claude L (Len) wrote: > > >A label is not a name unless it is meaningful. > >Natural language is not self-describing unless > >you were taught it. > > I guess it depends on what exactly you mean by "self-describing". I > think a book about the English language written in English is > self-describing in and of itself, whether anybody speaks English or > not. However, leaving that aside there's a deeper assumption I want > to cut off before it becomes too embedded in the debate. > > Documents written in natural languages have meaning even if you don't > speak those languages. They do carry information. They are not random > strings of characters. I've been reading a lot about the theory and > history of cryptography lately, and it's amazing just how much > information you can pull out of ciphered text, because, in fact it > isn't random. It's harder to read ciphered text than unciphered text, > but it's not impossible. And that's a world of difference. > > Reading text in a language you don't speak, but which has not been > deliberately encrypted, is a similar problem; and in fact some of the > same techniques were applied to languages like Linear B and > hieroglyphics that are used to break ciphers. > > When a document is marked up, the information of the markup is there, > whether we recognize it or not. It is a property of the text itself, > not a property of our perception of the text. With appropriate work, > experience, intelligence, and luck that markup can be understood. Can > unmarked up text be understood as well? Yes, certainly; but markup > adds to the information content of the text. It makes it easier to > decipher its meaning in a very practically useful way. This is a > question of degree, and text+markup is easier to understand than text > alone. > > Langauge is certainly important, but it is orthogonal issue. Given > the choice of data marked up in Ugaritic vs. the same data marked up > in English, I pick English. But given the choice of data marked up in > Ugaritic vs. the same data not marked up at all, I pick the data > marked up in Ugaritic. > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> >
|

Cart



