[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: The one element schema language

  • From: Joe English <jenglish@f...>
  • To: xml-dev@l...
  • Date: Tue, 06 Feb 2001 09:00:53 -0800

element of language

Rick Jelliffe  wrote:

> And just to be (constructively) bolshie about it, I have come up with a new
> schema language "Hook" based on a new paradigm "partial ordering" which only
> uses one element.  See http://www.ascc.net/xml/hook  for the idea.

Very interesting!  It looks like it meets all of the stated goals.

Here's my attempt at a formal definition of hook-validity:

[ Note: this is based on yesterday's draft -- Rick added some
  features to the language while I wasn't looking :-)
  Also, what follows doesn't address the 'friendly', 'short', or
  'top' features; defining these formally seems straightforward. ]

In the "generalized binary tree" representation of an XML document,
every node has at most two outgoing edges, the FIRST-CHILD and
NEXT-SIBLING.

    DEFINITION: The _hook_ of a node N is the (unique) sequence of nodes
    on the path from the root node to N in the generalized binary tree
    representation of the document.

    DEFINITION: A _hook schema_ is an (ordered) sequence of
    (unordered) sets of NCNames, specified by the following grammar:

	hook-schema	::= items ;
	items		::= item | item items ;
	item		::= NCName | '[' ncnames ']' ;
	ncnames		::= NCName | NCName ncnames ;

As a first attempt to define validity, we'll make a

    TENTATIVE ASSUMPTION 1: a particular NCName may appear in at most
    one item in any hook schema.

Given a hook schema H which satisfies this assumption, we can define a
function '# : NCName -> Integer' which maps an NCName to the number of
the item in H in which the NCName appears, or -1 if the NCName does
not appear in H.  Under this assumption:

    TENTATIVE DEFINITION 1: a document D is _hook-valid_ according to H
    if and only if for every hook 'N1, N2, ... NN' in D, the
    sequence of integers
    	#(local-name(N1)), #(local-name(N2)), ... #(local-name(NN))
    is monotonically nondecreasing.

Or equivalently:

    TENTATIVE DEFINITION 2: a document D is _hook-valid_ according to H
    if and only if for every node N in D,
    	#(local-name(N)) <= #(local-name(FIRST-CHILD(N))
    and
    	#(local-name(N)) <= #(local-name(NEXT-SIBLING(N))

    where we extend the definitions of local-name and # as follows:
    	local-name(NIL) = "#EMPTY"
	#("#EMPTY") = +infinity
    to account for leaves and last siblings.

Under TENTATIVE ASSUMPTION 1, a hook-schema actually induces
a "strict weak order" (in the C++ STL sense) on NCNames.  (A
strict weak order is stronger than a partial order but weaker
than a total order; a relation '<<' on S is a strict weak order
if there is a homomorphism from (S, <<) to (Z, <) where (Z, <)
is a totally ordered set).

However, TENTATIVE ASSUMPTION 1 does not hold in general since
it's contrary to Rick's spec, which explicitly allows NCNames to
appear more than once in a hook schema.  Discarding this assumption,
'#' becomes a relation rather than a function.  Writing 'n # i'
to mean that NCName 'n' appears in the i'th set of items in H,
we have:

    DEFINITION: A document D is _hook-valid_ according to H if
    and only if for all nodes X and Y in D,

    	Y = FIRST-CHILD(X) or Y = NEXT-SIBLING(X) implies
	that there exists i, j such that local-name(X) # i,
	local-name(Y) # j, and i <= j.

This suggests an efficient implementation: since we only need
to compare the _smallest_ i against the _largest_ j, define
imin(H, N) to be the number of the first item in hook schema H
in which NCName N appears, and imax(H, N) as the number of
the last such item.  'imin' and 'imax' can be implemented as hash tables,
and computed in a single pass directly from the hook schema.
Validation can be performed SAX-style:

    variable iprev : integer = -1;
    while not end-of-document():
	case next-event() of
	    START-TAG(n):
		    if (imin(H,n)) < iprev then error "Invalid"
		    iprev := imax(H,n)
	    END-TAG(n):
		    iprev := imax(H,n)

(Rick -- does this look like what you had in mind?  I suspect I'm missing
something here, since your example hook schema for XHTML Basic would
actually reject many documents that are DTD-valid with the above
validation algorithm.  It does seem to work for the other examples
you gave (PurchaseOrder, RSS, and Schematron) though.)


Regarding expressive power:

    HYPOTHESIS: Every hook schema is equivalent to a
    tree-local/string-local grammar.

    PROOF: (shouldn't be difficult...)

    HYPOTHESIS: There exist tree-local/string-local languages
    which cannot be expressed as hook schemas.

    PROOF: (the XHTML Basic 'body' element might work as
    a counterexample...)

By way of comparison: TREX and RELAX are equivalent to
tree-regular/string-regular grammars, and XML DTDs are equivalent
to tree-local/string-regular grammars.

I'll have to think about the recently-introduced ';' and '.'
features some more...

--Joe English

  jenglish@f...

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.