[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: An alternative formulation of the document-centric/data-ce
At 11:01 AM 6/3/2004 +0100, Sean McGrath wrote: Document-centric XML: This reminds me of a classic paper by Darrell Raymond and Frank Tompa called "Hypertext and the Oxford English Dictionary" from the Communications of the ACM in 1988 or so. At Waterloo -- Tim Bray was also part of this work at the time -- they had a research program on how to handle large text data/hypertexts like the OED (in preparation to create electronic versions) and they did a lot of very clever analyses of the dictionary, which had just been turned into SGML via conversion from the typesetting tapes. The paper includes several charts showing the distribution of (a) entry length, (b) number of tags per entry (c), number of cross references and so on and either explicitly or implicitly they show tag-share in the dictionary to have the kind of distribution that Sean has in his analyses. Rick Jellife has some software that does the same sort of thing that I saw demonstrated at the GCA XML conferences the last year or so. But I don't buy into this data-centric vs doc-centric view of the world. It is obviously a continuum (called the "Document Type Spectrum" in the Document Engineering book I'm writing with Tim McGrath [just about done, MIT Press early 2005]). On one end are pure narrative things and on the other end are purely transactional ones: Moby Dick to invoices. IIn the middle are hybrid types like catalogs and reference books that have lots of structured content mixed in with narrative content. I always use Moby Dick as the endpoint when I talk about this because its opening line is "call me XML" or something like that. :-) -bob glushko
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|