[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Is HTML structured or unstructured information?


unstructured lists
> -----Original Message-----
> From: Jim Melton [mailto:jim.melton@a...] 
> Sent: Tuesday, August 09, 2005 12:10 PM
> To: ian.graham@u...
> Cc: DuCharme, Bob (LNG-CHO); 'Bullard, Claude L (Len)'; 
> xml-dev@l...
> Subject: RE:  Is HTML structured or unstructured information?
> 
> I have a slightly different take on the distinction between 
> "structured" 
> and "unstructured" (and the less-well understood "semi-structured").
> 
> I agree that SQL data is well structured, not because its 
> intended meaning is unambiguous (hah! you should see some of 
> the databases...but that's another rant), but because every 
> piece of information is "there".  SQL, of course, represents 
> data as rectangular structures called tables.  A table is a 
> structure, having a particular number of columns, in which 
> there are rows of data, each having exactly one value 
> corresponding to each column of the table.  SQL doesn't use 
> the word "cell", but it's convenient to use in this 
> discussion.  Every cell in every SQL table has a value.  That 
> value might be SQL's "null value", but the cell is always "there".
> 
> Unstructured data is...well, unstructured.  A decent example 
> is the text of this email message.  You might perceive 
> structure, such as paragraphs and sentences, but those are 
> artifacts of my use of common English/Western conventions, 
> not actual structure.  And, most importantly, there is no 
> single "thing" that you can identify that is required, 
> optional, or prohibited in this message.  There is no 
> structure at all.

That's true for the text - but the e-mail message as a whole may be
considered semi-structured regarding its inclusion of sender, receiver,
subject, etc.

Joe

Joseph Chiusano
Booz Allen Hamilton
O: 703-902-6923
C: 202-251-0731
Visit us online@ http://www.boozallen.com
 
> HTML, and (more importantly to many) XML, are semi-structured 
> by nature, although it is certainly possible to force 
> specific scenarios using those markup languages to be fully 
> structured (by requiring validation against a DTD or Schema 
> that makes everything mandatory, for example).  To me, 
> "semi-structured" means that there is structure there, but it 
> is not completely reliable.  Information may be missing 
> entirely...not present but marked as "unknown" or "missing" 
> or "irrelevant" (analogous to some meanings for SQL's null 
> value)...but completely absent.
> 
> I could not, in good conscience, call HTML "structured" by 
> any stretch of the meaning.  But it is certainly not 
> unstructured, either.  I must fall back on that hybrid 
> concept with the name "semi-structured".
> 
> Hope this helps,
>     Jim
> 
> 
> 
> At 8/9/2005 09:35 AM, ian.graham@u... wrote:
> >Quoting "DuCharme, Bob (LNG-CHO)" <bob.ducharme@l...>:
> >
> >Yes +1
> >
> >OTOH, I've seen stuff so horrible on both counts it arguably 
> should be "No"
> >
> > > >Is HTML structured or unstructured information?
> > >
> > > Yes!
> > >
> > > But seriously... if "Structured information may be 
> characterized as 
> > > information whose intended meaning is unambiguous" and "The 
> > > canonical example of structured information is a 
> relational database 
> > > table" then the article is building from a shaky premise, because 
> > > the intended meaning of the data in a relational database 
> table can easily be ambiguous.
> > >
> > > If it means that a relational table is structured because the 
> > > individual pieces of information in it are clearly delineated and 
> > > their structural relation is unambiguous, which makes 
> sense to me, 
> > > then I would consider HTML structured, especially when 
> compared to 
> > > the article's examples of unstructured information.
> > >
> > > Bob
> > > weblog: http://www.oreillynet.com/pub/au/1191
> > > homepage: http://www.snee.com/bob
> >
> >-----------------------------------------------------------------
> >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an 
> >initiative of OASIS <http://www.oasis-open.org>
> >
> >The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> >To subscribe or unsubscribe from this list use the subscription
> >manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> ==============================================================
> ==========
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: 
> +1.801.942.0144
>    Co-Chair, W3C XML Query WG; F&O (etc.) editor    Fax : 
> +1.801.942.3345
> Oracle Corporation        Oracle Email: jim dot melton at 
> oracle dot com
> 1930 Viscounti Drive      Standards email: jim dot melton at 
> acm dot org
> Sandy, UT 84093-1063 USA          Personal email: jim at 
> melton dot name
> ==============================================================
> ==========
> =  Facts are facts.   But any opinions expressed are the 
> opinions      =
> =  only of myself and may or may not reflect the opinions of 
> anybody   =
> =  else with whom I may or may not have discussed the issues 
> at hand.  = 
> ==============================================================
> ========== 
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
> <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
> 

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.