|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Is HTML structured or unstructured information?
> -----Original Message----- > From: Jim Melton [mailto:jim.melton@a...] > Sent: Tuesday, August 09, 2005 12:10 PM > To: ian.graham@u... > Cc: DuCharme, Bob (LNG-CHO); 'Bullard, Claude L (Len)'; > xml-dev@l... > Subject: RE: Is HTML structured or unstructured information? > > I have a slightly different take on the distinction between > "structured" > and "unstructured" (and the less-well understood "semi-structured"). > > I agree that SQL data is well structured, not because its > intended meaning is unambiguous (hah! you should see some of > the databases...but that's another rant), but because every > piece of information is "there". SQL, of course, represents > data as rectangular structures called tables. A table is a > structure, having a particular number of columns, in which > there are rows of data, each having exactly one value > corresponding to each column of the table. SQL doesn't use > the word "cell", but it's convenient to use in this > discussion. Every cell in every SQL table has a value. That > value might be SQL's "null value", but the cell is always "there". > > Unstructured data is...well, unstructured. A decent example > is the text of this email message. You might perceive > structure, such as paragraphs and sentences, but those are > artifacts of my use of common English/Western conventions, > not actual structure. And, most importantly, there is no > single "thing" that you can identify that is required, > optional, or prohibited in this message. There is no > structure at all. That's true for the text - but the e-mail message as a whole may be considered semi-structured regarding its inclusion of sender, receiver, subject, etc. Joe Joseph Chiusano Booz Allen Hamilton O: 703-902-6923 C: 202-251-0731 Visit us online@ http://www.boozallen.com > HTML, and (more importantly to many) XML, are semi-structured > by nature, although it is certainly possible to force > specific scenarios using those markup languages to be fully > structured (by requiring validation against a DTD or Schema > that makes everything mandatory, for example). To me, > "semi-structured" means that there is structure there, but it > is not completely reliable. Information may be missing > entirely...not present but marked as "unknown" or "missing" > or "irrelevant" (analogous to some meanings for SQL's null > value)...but completely absent. > > I could not, in good conscience, call HTML "structured" by > any stretch of the meaning. But it is certainly not > unstructured, either. I must fall back on that hybrid > concept with the name "semi-structured". > > Hope this helps, > Jim > > > > At 8/9/2005 09:35 AM, ian.graham@u... wrote: > >Quoting "DuCharme, Bob (LNG-CHO)" <bob.ducharme@l...>: > > > >Yes +1 > > > >OTOH, I've seen stuff so horrible on both counts it arguably > should be "No" > > > > > >Is HTML structured or unstructured information? > > > > > > Yes! > > > > > > But seriously... if "Structured information may be > characterized as > > > information whose intended meaning is unambiguous" and "The > > > canonical example of structured information is a > relational database > > > table" then the article is building from a shaky premise, because > > > the intended meaning of the data in a relational database > table can easily be ambiguous. > > > > > > If it means that a relational table is structured because the > > > individual pieces of information in it are clearly delineated and > > > their structural relation is unambiguous, which makes > sense to me, > > > then I would consider HTML structured, especially when > compared to > > > the article's examples of unstructured information. > > > > > > Bob > > > weblog: http://www.oreillynet.com/pub/au/1191 > > > homepage: http://www.snee.com/bob > > > >----------------------------------------------------------------- > >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > >initiative of OASIS <http://www.oasis-open.org> > > > >The list archives are at http://lists.xml.org/archives/xml-dev/ > > > >To subscribe or unsubscribe from this list use the subscription > >manager: <http://www.oasis-open.org/mlmanage/index.php> > > ============================================================== > ========== > Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: > +1.801.942.0144 > Co-Chair, W3C XML Query WG; F&O (etc.) editor Fax : > +1.801.942.3345 > Oracle Corporation Oracle Email: jim dot melton at > oracle dot com > 1930 Viscounti Drive Standards email: jim dot melton at > acm dot org > Sandy, UT 84093-1063 USA Personal email: jim at > melton dot name > ============================================================== > ========== > = Facts are facts. But any opinions expressed are the > opinions = > = only of myself and may or may not reflect the opinions of > anybody = > = else with whom I may or may not have discussed the issues > at hand. = > ============================================================== > ========== > > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org > <http://www.xml.org>, an initiative of OASIS > <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








