[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Handling unknown elements?

  • From: Peter Murray-Rust <peter@u...>
  • To: xml-dev@i...
  • Date: Fri, 17 Apr 1998 12:37:21

unknown element pcdata
At 18:45 08/04/98 -0400, Tyler Baker wrote:
>One dilemma I have been trying to figure out with XML is the problem of
>handling unknown element types and what to do with their children.
[...]
>
>Anyone here got any better ideas on this?

Well I have some ideas ... :-)

The problem I address (in JUMBO2) is "

"what do I do when someone sends me an XML document without any/enough
accompanying material telling me what to do with it?"

If this is similar to your problem, read on :-)

(1) If the DTD is present it can tell you if the document is valid. There
is no agreed mechanism whereby a DTD can carry additional semantics. So
your DTD could tell you if a B element can contain mixed content including
an I element - it can't tell you what they mean.

(2) There is no universal generic mechanism for adding semantics to an XML
document.

(3) If the main purpose of the document is to be rendered for humans, then
stylesheets should be used. If the author creates their own tagset and
doesn't provide a stylesheet, many XML-aficionados will give up at this
stage. i.e. a document:
	This is a <FOO>bold <BAR>italic</BAR> phrase</FOO>
is as valid as B and I, but the reader has to do some detective work.
They'd probably give up on most.

(4) If the main purpose of the document is for a machine to act upon it
(and not everyone realises the enormous potential of XML here), then
another way of communicating semantics has to be provided. The method I use
is to map Java classes onto elements. This can use a wide degree of
context-dependence and can be very powerful. Example:

<MOL><ATOMS> <ARRAY BUILTIN="X2">... </ARRAY></ATOMS></MOL>
will draw a chemical line drawing.

<MOL><ATOMS> <ARRAY BUILTIN="X3">... </ARRAY></ATOMS></MOL>
will draw a rotatable 3-D molecule.

The JUMBO-MOL software is (obviously) application-specific and uses
XPointers extensively to decide on context.

(5) To help with the first three problems JUMBO2 now has to following
*generic* facilities which help with 'unstyled' random XML documents
	- search the document for all elements, attributes, attribute values, and
PCDATA content and uniquify them
	- display this as a tree showing unique markup components. This is linked
to the original document (tree). Thus, I may find that <bibref> occurs in
rec.xml. What does it mean?  I can use JUMBO2 to find all the occurrences
of <bibref> in the doc and highlight them all (almost instantaneous , now :-)
	- find all 'whitespace' elements and delete them. This aids tree
navigation in some cases
	- display the content of any node (whether mixed or element) in several
different styles. These include:
		raw XML
		untagged event stream (e.g. similar to removal of unknown tags)
		prettyprinted XML (indented)
		whitespace specifically highlighted
		'default' styling.

The default styling applies simple heuristics to display elements. Thus 
<SPEAKER>MACBETH</SPEAKER>
is displayed as:
SPEAKER: MACBETH
where the markup term is in a different font.  This is useful for may
generic XML documents.

	In addition JUMBO will allow you to add your own style to individual
elements. Thus <olist> in rec.xml would appear to be a list, so the user
can interactively add list-formatting to it. In your case you could arrange
that <B> was made bold and <I> was made italic. [I am not prepared to
'guess' the meaning of common tags - e.g. <A> - and the reader has to take
the responsibility for this. I would hope that the world might converge
towards common semantics for common terms, and XML-DEV is here if anyone
wishes. But if you want to use <PARA> for a chemical term rather than a
paragraph, you're perfectly welcome to - XML doesn't care :-)].

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.