[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

accessing the input XML's doctype

Subject: accessing the input XML's doctype
From: "James Sulak" <jsulak@xxxxxxxxxxxxxxxx>
Date: Wed, 16 Jul 2008 14:40:17 -0500
 accessing the input XML's doctype
Hello All,

I'm trying to write a transform that gives the output XML file the same
document type as the input XML file.  (Specifically, it's a transform to
remove Arbortext Editor's change-tracking markup).  I'm not happy with
the method I'm using now, namely regexing the input XML as an unparsed
document to extract the public and system identifiers from the doctype
declaration.

I have a fairly limited knowledge of how a XSLT processor (we're using
Saxon) interacts with the XML parser.  But my understanding is that the
parser reads in the XML, resolves any default attribute values, and then
passes the document tree to the XSLT processor.  The XSLT processor
itself doesn't know or care about the doctype information.  Is this
correct?

If it is, that would seem to imply that what I'm asking is impossible
without writing an extension function.  Is this the case?  Since our
implementation is already dependent on several Saxon extension
functions, that's an acceptable solution.  Has anyone attempted anything
like this, or have any suggestions on how to proceed?  Could I call
Xerces (or another parser) from an extension function and get the public
and system identifiers?

Here's the relevant part of my current method:

   <xsl:param name="doctype.public"
select="f:input-doctype(document-uri(.))[1]"/>
   <xsl:param name="doctype.system"
select="f:input-doctype(document-uri(.))[2]"/>

   <xsl:function name="f:input-doctype">
      <xsl:param name="document-uri"/>
      <xsl:variable name="unparsed-document"
select="unparsed-text($document-uri)"/>
      <xsl:variable name="regex">
         <xsl:text>DOCTYPE
                                 [\s]*
                                 ([a-zA-Z0-9]+)
                                 [\s]*
                                 PUBLIC
                                 [\s]*
                                 "(.+)"
                                 [\s]*
                                 "([0-9a-zA-Z/]+\.dtd)"
         </xsl:text>
      </xsl:variable>
      <xsl:analyze-string select="$unparsed-document" regex="{$regex}"
flags="msx">
         <xsl:matching-substring>
            <xsl:sequence select="regex-group(2), regex-group(3)"/>
         </xsl:matching-substring>
      </xsl:analyze-string>
   </xsl:function>

   <xsl:output method="xml" version="1.0" encoding="utf-8"/>

   <xsl:template match="/">
      <xsl:result-document doctype-public="{$doctype.public}"
doctype-system="{$doctype.system}">
         <xsl:apply-templates/>
      </xsl:result-document>
   </xsl:template>


Thanks,

-James


-----
James Sulak
Electronic Publishing Developer
Jones McClure Publishing

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.