[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Is marking up a classification act?

  • To: <xml-dev@l...>
  • Subject: Is marking up a classification act?
  • From: "Didier PH Martin" <martind@n...>
  • Date: Mon, 6 Oct 2003 10:13:25 -0400
  • Importance: Normal

upa act
Hi,

As another sequel of the recent discussions about ontology is the inference
I made about the act of marking up documents.

Is there any issues raised by this statement:

"Marking up documents is, in fact, classifying information".

If I send an XML document, I can join its related schema. This will provide
you its syntax constraints. However, what is missing is my mental model
behind the classification I used. Said differently, what is missing is an
answer to the question "what do you mean by.". Thus, marking up document is
also expressing a view of the world, and this view is based on a mental
model, logic and theories about the world.

Note about automatic classification:
When an HTML or XHTML document is marked up, it gives me clues about what
are headers and what are paragraphs. If I am a classification engine trying
to discover other "tacit" views of the world expressed by this very
document, I can allocate more weight to text contained in headers than to
text contained in paragraphs. A header is supposed by convention to
synthesize the following text and give, in a nutshell, the essence of the
following text. If, in addition, the paragraph contains other tagged text I
can extract additional information about the text. However, some issues may
be raised here.
a) My view of the world is not right, then my marked up text is not well
classified and therefore this leads to classification errors.
b) I simply made a mistake. Again, same result as above. Just consider the
number of errors an average programmer is doing when writing a program.
These programmers are lucky that compilers help correct them. What about
natural language now, what kind of compiler can help us prevent errors?
c) The classification is fuzzy. The tagged item is 40% part of a particular
set (i.e. category) and 20% to a different set and finally 40% to another. A
human can easily resolve that classification ambiguity (however some can't).
Can Hal resolve that? (We all know the result demonstrated in the movie).
Usually the ownership is resolve by the overall context.
d) The task is so time consuming and error prone, I think that outside a
pleasant intellectual game with the intent to learn something I wouldn't do
that for the other documents I am writing. 

Conclusion:
From the engineering point of view, I can design a language that will be
based on solid mathematically foundations. However, in practice, when I am
trying to build a document that will provide some information about the view
of the world behind it, it is not that easy. I guess this is why people
don't do that and they let automatic agents like search engines to classify
them. My neighbor is now reassured, the planet of the computers, matrix or
AI are not for tomorrow, we have not yet found a way to teach machines some
common sense :-)

Cheers
Didier PH Martin



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.