[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Schema-aware validation of XHTML result-document

Subject: Re: Schema-aware validation of XHTML result-document
From: "Jesper Tverskov" <jesper@xxxxxxxxxxx>
Date: Fri, 9 Mar 2007 12:44:11 +0100
Re:  Schema-aware validation of XHTML result-document
Hi list again

I have already corrected the typo in the article. Yes articles will
also always have "bugs" for ever.

I will rewrite the article soon to make how I have tested Saxon more clear.

I have not been using the command line but the excellent XML editor
Oxygen and its default setup for Saxon SA. In Oxygen's configuration
menu for Saxon SA, validation of input document is default.

I must repeat: in Oxygen, Saxon's error messages for both error
reporting modes are great and the place of error is highlighted in the
stylesheet when I use compile time validation.

When I use runtime validation the error message is only good when the
errors are not treated as warnings, but bad in the "warning" mode:
just "One or more validation errors were reported". In both modes
nothing is highlighted in the stylesheet.

AltovaXML in XMLSpy highlights the error in the stylesheet for runtime
validation, but AltovaXML has not compile-time validation being so
great in Saxon and which I find more useful.

I understand that XHTML is a special XML-case, but I don't think we
disagree on anything here. XSLT processors should be improved to make
also XHTML result-documents valid, when they say so.

The final issue is all that junk, default attributes and fixed
attributes, copied out of the schema and into the result-document. I
hate it like mad when transforming XHTML to XHTML but at least I can
easily get rid of it by adding extra templates to my stylesheet.

This is not the case for validation of XHTML result-documents. Here,
as Michael suggests, it is necessary to modify the schema. Are you
serious? Most of us can easily add templates to our styleheets, but it
is not a proper way forward in most use cases to open op schemas and
modify them just to do result-document validation.

It might be against the last detail of the spec, but AltovaXML has
taken this approach: why not be better than the spec, when it is so
wrong, that validation of result-documents become almost a joke, if we
follow the spec to the word.

Why not a new "being better than the spec" mode also for Saxon, some
parameter to use at the command line?

Cheers,
Jesper Tverskov



On 3/9/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:

There are some good points here about what can and can't be achieved with schema-awareness. But there seem to be one or two observations that result from your pressing the wrong buttons - always a hazard when you try out a new piece of technology.

1. You have the rather curious statement:

"In Saxon the input document must also be XHTML or the schema of the input
document must also be imported or the -vlax parameter must be used at the
command line or the -val parameter must not be used in order to turn input
validation off."

And later in section 5 you say "we must ... turn the validation of input
documents off". But validation of input documents is off by default, so I
think this gives a wrong impression. What you are really saying is: if you
ask for validation you must supply a schema. (Note also there are several
other ways you can provide it, for example using schemaLocation in the
source document or via the Java API).

2. You say:

"In Saxon we must use a parameter at the command line to treat errors more
like warnings. Now the error message is useless, "one or more errors found",
and nothing is highlighted in the stylesheet."

Basically I think this must be a case of you pressing the wrong buttons.

(a) for "must use" read "can also use".

(b) Saxon doesn't have a GUI, so it isn't going to highlight anything in the
stylesheet: that's the job of the IDE's that integrate Saxon, such as Stylus
Studio and Oxygen. Saxon does however produce detailed error messages about
where the errors appear. By default these are written to the standard error
stream, and if you didn't see these messages then it's because you either
directed them somewhere else, or you somehow didn't see the contents of the
standard error stream. I've given some examples of how the errors should
appear on the console in a footnote this message.

3. You say:

"Saxon has also compile-time validation, that is, the errors are reported
right away, and you don't need to start the transformation process. To
trigger it you must use the validation attribute in all the top-elements
generated by templates or the xsl:validation attribute if the top-elements
are generated the literal way."

Yes: this is a limitation of the approach. Clearly Saxon can't issue an
error message if your code is correct according to the language spec. I
think it's an inherent aspect of the very dynamic nature of the template
mechanism that you can't be sure at compile time that a template is
generating invalid output unless it declares the type of output it is
designed to generate. There are some cases where Saxon gets round this by
generating compile time warnings if the code looks implausible, even though
it might be correct according to the spec. I think this might be a way
forward to reduce this problem.

4. You say:

"If namespace declarations other than for XHTML are copied to the
result-document it becomes not valid XHTML 1.0. This is not nice when both
processors have just reported "no validation errors"."

Agreed - another usability problem. You're presumably aware of the reason:
to be "valid XHTML 1.0" you need to do more than conform to the XHTML
schema, you also need to get your namespace prefixes right, and schema
validity offers no guarantee of that. Although there's no support for this
in the XSLT language spec, I think it would be possible for products like
Saxon to offer users a bit more help here, by treating XHTML output as a
special case.

5. In your example in section 5, you say "Note the space="fixed" in the
style element making the output invalid.". Actually it is space="preserve".
This attribute has been added to the output by the schema validator because
the schema defines a fixed/default value for this attribute. Yes: it should
be xml:space="preserve": a bug indeed. Please feel free to use the regular
reporting channels when you find a bug, I think you will find they work very
effectively.

You say "and note all the colspan="1" and rowspan="1" junk", but don't
really explain what causes this. The schema defines <xs:attribute
name="colspan" default="1" type="Number"/>, so validation is going to insert
the default value (just as DTD validation would). One advantage of schemas
over DTDs here is that's it's much easier to produce a version of the schema
that removes the fixed and default values, to avoid this effect happening if
you don't want it.

You complain about this again in section 6 "Saxon insists in copying all
that dirt out of the schema and into the result-document". Sorry, but it's
required for conformance with the specs. A product that validates against a
schema without expanding the fixed and default values defined in the schema
is not conformant. If the validation were happening on the input side, your
stylesheet would be entitled to rely on seeing the default values and would
break if they weren't there. If you don't want this to happen, define a
schema that doesn't include the fixed and default values.

Footnote
========

Here are some examples of error messages:

(i) a validity problem with an input document:

java net.sf.saxon.Transform -im single-doc -val -o c:\temp\out.html
conformance.xml render-page2.xsl

Validation error on line 22 column 89 of
file:/c:/MyJava/doc/saxon8/changes.xml:
 XTTE1510: The content model for element <li> does not allow character
content (See
 http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 2.3)
Error on line 346 of file:/c:/MyJava/doc/saxon8/render-page2.xsl:
 FODC0005: ValidationException: The content model for element <li> does not
allow character content

(2 messages, one giving the location in a source document, the other the
location in the stylesheet that caused this source document to be read)

(ii) a validity problem with the output that can be detected at compile
time:

Error on line 20 of file:/c:/demo2/queries/err-sa-xslt004.xsl:
 XTTE1510: Element h:tittle is not permitted in the content model of the
complex type of element head
Failed to compile stylesheet. 1 error detected.

Note how the error message points to the place in the stylesheet where the
error occurs. The offending line is this:

<h:html xsl:validation="strict">
   <h:head><h:tittle>A list of functions</h:tittle></h:head>
   <h:body>

(iii) a run-time validity problem with the output:

Validation error on line 38 of file:/c:/demo2/queries/err-sa-xslt004.xsl:
 XTTE1510: In content of element <body>: The content model does not allow
element <div> to
 appear here. Expected one of: {http://www.w3.org/1999/xhtml}blockquote,
 {http://www.w3.org/1999/xhtml}dfn, {http://www.w3.org/1999/xhtml}br,
 {http://www.w3.org/1999/xhtml}h6, {http://www.w3.org/1999/xhtml}p,
 {http://www.w3.org/1999/xhtml}sup, {http://www.w3.org/1999/xhtml}hr,
 [other possibilities snipped]
 (See http://www.w3.org/TR/xmlschema-1/#cvc-complex-type
 clause 2.4)
Transformation failed: Run-time errors were reported

The error message here points to a line in the stylesheet that does:

<xsl:copy-of select="*"/>

- the error arises because the <div> element being copied is in the wrong
namespace.

(iv) Same as (ii), but with the -vw (validation warnings) option on the
command line:

Same messages on the console, but this time the invalid output HTML is
written to the requested destination, with embedded comments. The relevant
section of the output file looks like this:

     <h:h1>fn:collection() =&gt; node()*</h:h1>
     <!--
VALIDATION ERROR: In content of element <body>: The content model does not
allow element <div> to appear here. Expected one of:
{http://www.w3.org/1999/xhtml}blockquote, {http://www.w3.org/1999/xhtml}dfn,
{http://www.w3.org/1999/xhtml}br, [list snipped]
{http://www.w3.org/1999/xhtml}samp
-->
     <div xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"


Anyway, thanks for the feedback. It's good to see schema-aware processing getting some discussion. There are real benefits, but as you point out there are also limitations and things to learn about what works well and what doesn't. There are also opportunities for products to go beyond the spec - checking for XHTML validity being an obvious example.

Michael Kay
Saxonica Limited




--
Jesper Tverskov

www.xmlkurser.dk
www.xmlplease.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.