[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: heritage (was Re: SGML on the Web)

sgml bold
Hi Tom,

> I do not see how you create documents with multiple sets of markup
> and be sure that any one set is valid against a schema (save by
> preprocessing it and then validating, but I am thinking about during
> the authoring process)

I agree that's an interesting problem. The way we're planning on
handling this in LMNL is to explicitly keep separate (in the data
model, that is, not the syntax) my markup and your markup of the same
document, keeping them in different *layers*. This means you can focus
validation on your particular layer while ignoring the other layers,
wherever they might come from.

But we don't think that solves the problem of having overlapping
ranges. Overlapping markup doesn't just come about because you have
overlapping trees, it also comes about because in some cases the most
natural way of marking up text is with overlapping structure.

For example, in the classic:

  [b}bold, [i}bold italic,{b] italic{i]

to the user, this conceptually makes sense. "bold, bold italic,"
should be in bold, and "bold italic, italic" should be in italic.

I think that these overlaps mostly happen when the inferences licensed
by the markup is distributed, to use the terminology used by
Sperberg-McQueen et al. [1]. In other words, the markup above is
assigning properties to individual characters within the string; the
meaning would be exactly the same the markup were distributed

  [b}bold, {b][i}[b}bold italic,{b]{i][i} italic{i]

which is why this isn't a problem in tree-based document models.
But overlapping also occurs when, for example, people add comments to
some text:

  [comment=jt1 [text}This should read...{]}This document
  [comment=wp3 [text}It does more than that...{]}attempts{=jt1]
  to describe{=wp3]...

  (This example demonstrates overlapping ranges paired by IDs.)

I don't think that these overlaps are solved either by splitting into
multiple hierarchies (you'd end up with something like as many
hierarchies as you had comments) nor by rearranging the markup to give
a nice tree structure, since the comment is about the *whole* text,
not about the individual characters.

So anyway, how do we validate it? Well, this is still work in
progress, but we've been discussing using a RELAX NG-based schema
language that can describe this overlap. For example, a schema for a
layer that contains multiple overlapping comment ranges might look

start = overlap { comment* }

comment = range comment [ text ]
                        { annotation text [ text ] { empty },
                          (comment | comment.start | comment.end)* }

comment.start = start range comment
comment.end   = end range comment

We'd really welcome comments and suggestions about the whole
validation question... on LMNL-Dev (http://www.lmnl.org/list) :)



[1] http://www.idealliance.org/papers/extreme02/html/2002/CMSMcQ01/EML2002CMSMcQ01.html

Jeni Tennison


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.