|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Is marking up a classification act?
Hi, As another sequel of the recent discussions about ontology is the inference I made about the act of marking up documents. Is there any issues raised by this statement: "Marking up documents is, in fact, classifying information". If I send an XML document, I can join its related schema. This will provide you its syntax constraints. However, what is missing is my mental model behind the classification I used. Said differently, what is missing is an answer to the question "what do you mean by.". Thus, marking up document is also expressing a view of the world, and this view is based on a mental model, logic and theories about the world. Note about automatic classification: When an HTML or XHTML document is marked up, it gives me clues about what are headers and what are paragraphs. If I am a classification engine trying to discover other "tacit" views of the world expressed by this very document, I can allocate more weight to text contained in headers than to text contained in paragraphs. A header is supposed by convention to synthesize the following text and give, in a nutshell, the essence of the following text. If, in addition, the paragraph contains other tagged text I can extract additional information about the text. However, some issues may be raised here. a) My view of the world is not right, then my marked up text is not well classified and therefore this leads to classification errors. b) I simply made a mistake. Again, same result as above. Just consider the number of errors an average programmer is doing when writing a program. These programmers are lucky that compilers help correct them. What about natural language now, what kind of compiler can help us prevent errors? c) The classification is fuzzy. The tagged item is 40% part of a particular set (i.e. category) and 20% to a different set and finally 40% to another. A human can easily resolve that classification ambiguity (however some can't). Can Hal resolve that? (We all know the result demonstrated in the movie). Usually the ownership is resolve by the overall context. d) The task is so time consuming and error prone, I think that outside a pleasant intellectual game with the intent to learn something I wouldn't do that for the other documents I am writing. Conclusion: From the engineering point of view, I can design a language that will be based on solid mathematically foundations. However, in practice, when I am trying to build a document that will provide some information about the view of the world behind it, it is not that easy. I guess this is why people don't do that and they let automatic agents like search engines to classify them. My neighbor is now reassured, the planet of the computers, matrix or AI are not for tomorrow, we have not yet found a way to teach machines some common sense :-) Cheers Didier PH Martin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








