[Home] [By Thread] [By Date] [Recent Entries]
At 9:39 AM -0500 1/15/02, Gavin Thomas Nicol wrote: >On Tuesday 15 January 2002 08:26 am, Elliotte Rusty Harold wrote: >> At 11:15 PM -0500 1/14/02, Gavin Thomas Nicol wrote: >> >What happens if I don't >> > a) read english >> >> 1. Ask a colleague who reads English >> 2. Hire somebody to translate it into the language of choice >> 3. Get a dictionary > >Let's change it to Maori then. Do you have 1) or 2)? How much would 2) >cost? > If it were important to me I could get 2, though I'd probably try 3 first. >The point is that XML can be as opaque as anything else, and that >tags, in and of themselves, say little about overall semantics, and >hardly anything about structure beyond encoding an attributed tree. > No, that misses point completely. The point is not whether XML *can* be as opaque as anything else. It whether XML *is* as opaque as anything else. In practice, XML *is* far less opaque than CSV and similar formats. That's why it's important. And in practice tag names do say something significant about the semantics of the document. It's not everything, but not everything does not equal nothing. >An attributed tree is admittedly a useful data structure, but not >without some means for interpreting it.... and in that regard, XML is >no better, and perhaps somewhat worse than CSV... because the signal >to noise ratio is higher *if* the names are not intuitive to the >interpreting entity. > The names can always be ignored if you desire to do so; but if you choose to consider them, they are there. There is more information in an XML document with the names intact than in the same document with all the names stripped. Your signal-to-noise analogy is fallacious. One of the defining characteristics of noise is that it cannot be perfectly separated from the signal. IN XML tags are very straight-forwardly separated from the data. >There *are* benefits to using XML well, and defining "largely >interoperable" tag vocabularies (HTML). Those benefits spring not from >XML, but rather, careful use thereof. Careful use is good. But even careless use is likely to produce significant benefits compared to untagged formats like CSV. For example, here's some CSV data for you 9964.00, 72.58, 0.73 How meaningful is that? Indeed it has some meaning. With a little effort, a little foreknowledge in the right domain, and a little luck you can probably figure out what it is. However, the following has more information: <Index> <price>9964.00<price> <absolutechange>72.58</absolutechange> <relativeChange>0.73 </relativeChange> </Index> No standard schema. Not a lot of thought. I just made that up quickly. It doesn't even use consistent naming conventions. But if I were faced with pages full of numbers I'd much rather have them in the second format than the first, especially when I know from experience that eventually there will be missing fields, the column names will fail to line up with the data, and I will have to deal with all the other problems that arise in CSV data. Not that these problems can't occur in XML, but XML, unlike CSV, is fail-fast. I can very easily set up my systems so they notify me immediately when faced with bad data, rather than sometime later when I notice my brokerage just blew several million dollars because some idiot swapped the absolute change and the relative change in the Dow. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@m... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+
|

Cart



