[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Perl XML::Parser and nested external entity references

  • From: "D.J. Miller II" <jamesm@r...>
  • To: xml-dev@l...
  • Date: Thu, 18 Oct 2001 12:24:02 -0500 (CDT)

perl xml parser
Hello,

I'm wondering if anyone has experience using Perl's XML::Parser module
to parse an XML document represented as a tree of .xml files connected
by external entity references.  (Apologies if this question is
inappropriate for xml-dev.)  I'm having a problem with the relative URIs
(here, simply relative file system pathnames) used in such a situation.
The problem is illustrated by the following example.

Here's a small tree of files:

<TREE>

   top/
    |
    +- dtd-dir/
    |    |
    |    +- more-ents-dir/
    |    |    |
    |    |    +- adolph.ent
    |    |
    |    +- thangle.dtd
    |
    +- hub.xml
    |
    +- some-shared-content
         |
         +- eleanor.xml

The top level document is hub.xml:

  <?xml version="1.0" encoding="us-ascii"?>
  <!DOCTYPE thang PUBLIC "-//blub//DTD thangle//EN" "dtd-dir/thangle.dtd"[
  ]>
  <thang>
      &eleanor;
  </thang>

The dtd it references is thangle.dtd:

  <!-- thangle.dtd -->
  <!ELEMENT thang (bang) >

  <!ELEMENT bang (thud+)>

  <!ELEMENT thud (#PCDATA)>

  <!ENTITY % adolph SYSTEM "more-ents-dir/adolph.ent">

  %adolph;

The external entity referenced from thangle.dtd is adolph.ent:

  <!ENTITY eleanor SYSTEM "../../some-shared-content/eleanor.xml">

The external entity declared in adolph.ent and referenced back in the
top level document is eleanor.xml:

  <bang>
    <thud>
      This element and its parent are from top/some-shared-content/eleanor.xml.
    </thud>
  </bang>

</TREE>

I'm using the XML::Parser ExternEnt hook to handle the external entity
reference events.  This handler is given the following parameters:

  $xp    - reference to the XML::Parser::Expat instance thats running
           the parse
  $base  - base to be used for resolving a relative URI (may be
           undefined)
  $sysid - the URI of external entity
  $pubid - the PUBID of the external entity (may be undefined)

When XML::Parser gets to the %adolph; reference inside of thangle.dtd
(which was itself opened because of the reference to it in the DOCTYPE
declaration of hub.xml), the $base parameter comes into the handler
empty; one might think at this point that it would have a value
something like "dtd-dir/", which was the path for the previous, 'parent'
external entity reference.  At this point the handler is lost and can't
open the entity, so the parse fails.

(As an interesting aside, the Saxon 6.4.3 XSLT processor (don't know
what version of AElfred it's using) gets similarly lost in a tree of xml
fragment files connected with relative URIs, while the Xalan-J 2.0.1
XSLT processor (which uses Xerces 1.23) does not have this problem.
In the example above, Saxon *does* find the adolph.ent entity but not
the eleanor.xml one.)

I could get around this empty value for $base by keeping a pushdown list
of the relative URI 'bases' (dirname parts of URIs of previously seen
external entity references) except for one thing:  in the version of
XML::Parser I'm constrained to use, 2.27, there's no hook for the end of
an external entity reference event, only the beginning.  Without this
hook I don't know when to pop a relative base from the pushdown, and so
again get lost in the tree.  As of version 2.28 of XML::Parser there is
such a hook:  "ExternEntFin".

Is there any way to make XML::Parser 2.27 give meaningful values for the
$base parameter to one's ExternEnt handler, or am I doomed to come up
with a different Perl-accessible XML parser?
____________________________________________________________

               James Miller in Austin, Texas

       Internet:   jamesm@b...       (198.3.118.20)
      alternate:   jamesm@w... (198.3.118.3)

____________________________________________________________

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.