Re: heritage (was Re: SGML on the Web)

To: Patrick Durusau <pdurusau@e...>
Subject: Re: heritage (was Re: SGML on the Web)
From: Jeni Tennison <jeni@j...>
Date: Mon, 7 Oct 2002 10:43:00 +0100
Cc: XML DEV <xml-dev@l...>
In-reply-to: <3DA0D5BB.5010103@e...>
Organization: Jeni Tennison Consulting Ltd
References: <200210061641.MAA08123@m...><3DA070F7.2030302@e...> <196879346102.20021006202202@j...><3DA0D5BB.5010103@e...>
Reply-to: Jeni Tennison <jeni@j...>

Play the video

Hi Patrick,

> Oh, you mean prior to processing! Well, it has no structure prior to
> processing does it? Until something determines what is or is not
> markup, and what rules that markup must follow, the document is not
> anything at all with regards to XML.

Right. The syntax used by the original documents is completely
irrelevant. So why use example documents that look like XML? Why not
use {}s around your tags instead, for example? It would be a lot less
confusing for people who see a .xml extension (on your source
document, prior to processing) and therefore imagine that it's an XML
document.

It's interesting how close you are to Walter Perry's position here --
that nothing matters aside from the output of your process, and that
the input can be anything at all; it's up to the processor to work out
what.

But I, as a user, have to write a document that can be interpreted by
your JITT processors into an XML document -- say, just for example, a
document with multiple overlapping hierarchies. If you don't tell me
what syntax to use for that document, how am I supposed to know what
to write?

You might say that I can write in any syntax at all, but I know that
if I present some LMNL, say, to your JITTFilter.xsl stylesheet, it
won't be able to extract a tree from that because it doesn't recognise
the syntax. I do need to know something about what your processes
expect.

> I belabor this because it is very important: A JITTs parser can use
> standard XML syntax and do things that are simply not possible with
> a standard XML parser. The example I gave earlier today of the
> dictionary entry is only one example. JITTs is does not, has not and
> will not require a new syntax to produce benefits that current XML
> processes cannot produce.

I understand that. But users cannot write an XML document and have it
interpreted as multiple hierarchies, I believe? Or if they can, I'd
love to see an example.

>>Of course that doesn't detract from the idea of using configurable
>>parsers to interpret a true XML document in different ways, and I
>>appreciate that you're just using an existing syntax to try out
>>these ideas, but as an XML person I'd feel a lot more comfortable
>>with your examples if you'd use well-formed XML, with milestones to
>>represent the overlapping structures, in your examples, rather than
>>a pseudo-XML.
>
> I suspect the discomfort is due in part to the persistence of the
> idea that an XML document, or any other document for that matter,
> has some inherent structure. There is no structure until something
> in the document is interpreted as "markup" and that "markup" is
> subjected to a set of content models, and with XML, for its
> adherence to the rules for well-formedness.

Right. I understand that, but what I'm saying is that if you interpret
a well-formed XML document in terms of its XML-defined markup, the
very nature of its syntactic rules -- the fact that tags must match,
the fact that attributes have string values -- limit the ways in which
that markup can be interpreted. In XML's case, it is limited to tree
structures and to unstructured attributes.

I don't have *any* problem with the approach of "choose what you want
to see in the document", but you can't pull out a structure that the
markup syntax cannot legally represent.

> As I pointed out in our paper (and here) JITTs is not limited to
> overlapping hierarchies. It addresses a number of issues with
> current markup strategies.
>
> We set out to solve one problem (overlap) and eventually arrived at
> a solution that appears to have a much broader applicability.

I understand that, and I think that the approach is very powerful and
useful. I'm just trying to persuade you that using pseudo-XML
documents as the source documents for your processes is confusing.
Starting with well-formed XML documents, and hiding or showing
particular markup within them, is great. If you start with ill-formed
XML, you're using a new syntax for those documents, and I think you
should be up front about that.

> A third alternative is to change how one interprets markup for the
> purpose of imposing structures on a text. There is no natural law
> requirement that markup processing recognized all the markup in a
> document. Actually the XML 1.0 spec specifies a syntax for markup
> but it never says that all markup has to be recognized. It does have
> all the other restrictions that have been mentioned but it omits
> that one. So long as the markup presented to the parser meets all
> the stated requirements, it appears to be valid XML.

You're surprisingly right that the XML 1.0 Rec. doesn't say anything
about whether or not elements and attributes are reported to an
application. (I actually think that this was because this assumption
was so fundamental that they didn't think that they needed to spell it
out; it *does* explicitly say that some things *don't* need to be
reported to the application, such as comments, which is what makes me
think the default is "report everything").

But the XML 1.0 Rec. does say:

  "An XML processor must always pass all characters in a document that
   are not markup through to the application."

which would seem to say that an XML processor must not hide particular
parts of an XML document, and, more importantly:

  "Validating and non-validating processors alike must report
   violations of this specification's well-formedness constraints in
   the content of the document entity and any other parsed entities
   that they read."

which would seem to say that an XML processor should detect errors
such as overlapping markup and report it.

Of course I'm not saying that a JITT processor, or any other
processor, can't treat a document that happens to use XML markup in
some other way; it's just that if it *does*, it's not an XML
processor.
   
Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

Follow-Ups:
- Re: heritage (was Re: SGML on the Web)
  - From: "W. E. Perry" <wperry@f...>
- Re: heritage (was Re: SGML on the Web)
  - From: John Cowan <jcowan@r...>

References:
- Re: heritage (was Re: SGML on the Web)
  - From: John Cowan <jcowan@r...>
- Re: heritage (was Re: SGML on the Web)
  - From: Patrick Durusau <pdurusau@e...>
- Re: heritage (was Re: SGML on the Web)
  - From: Jeni Tennison <jeni@j...>
- Re: heritage (was Re: SGML on the Web)
  - From: Patrick Durusau <pdurusau@e...>

Prev by Date: Re: XPath for Infoset extensions [was Annotations in XPath-NG?]
Next by Date: Re: XPath for Infoset extensions [was Annotations in XPath-NG?]
Previous by thread: Re: heritage (was Re: SGML on the Web)
Next by thread: Re: heritage (was Re: SGML on the Web)
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >