[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Perl XML::Parser and nested external entity references
Hello, I'm wondering if anyone has experience using Perl's XML::Parser module to parse an XML document represented as a tree of .xml files connected by external entity references. (Apologies if this question is inappropriate for xml-dev.) I'm having a problem with the relative URIs (here, simply relative file system pathnames) used in such a situation. The problem is illustrated by the following example. Here's a small tree of files: <TREE> top/ | +- dtd-dir/ | | | +- more-ents-dir/ | | | | | +- adolph.ent | | | +- thangle.dtd | +- hub.xml | +- some-shared-content | +- eleanor.xml The top level document is hub.xml: <?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE thang PUBLIC "-//blub//DTD thangle//EN" "dtd-dir/thangle.dtd"[ ]> <thang> &eleanor; </thang> The dtd it references is thangle.dtd: <!-- thangle.dtd --> <!ELEMENT thang (bang) > <!ELEMENT bang (thud+)> <!ELEMENT thud (#PCDATA)> <!ENTITY % adolph SYSTEM "more-ents-dir/adolph.ent"> %adolph; The external entity referenced from thangle.dtd is adolph.ent: <!ENTITY eleanor SYSTEM "../../some-shared-content/eleanor.xml"> The external entity declared in adolph.ent and referenced back in the top level document is eleanor.xml: <bang> <thud> This element and its parent are from top/some-shared-content/eleanor.xml. </thud> </bang> </TREE> I'm using the XML::Parser ExternEnt hook to handle the external entity reference events. This handler is given the following parameters: $xp - reference to the XML::Parser::Expat instance thats running the parse $base - base to be used for resolving a relative URI (may be undefined) $sysid - the URI of external entity $pubid - the PUBID of the external entity (may be undefined) When XML::Parser gets to the %adolph; reference inside of thangle.dtd (which was itself opened because of the reference to it in the DOCTYPE declaration of hub.xml), the $base parameter comes into the handler empty; one might think at this point that it would have a value something like "dtd-dir/", which was the path for the previous, 'parent' external entity reference. At this point the handler is lost and can't open the entity, so the parse fails. (As an interesting aside, the Saxon 6.4.3 XSLT processor (don't know what version of AElfred it's using) gets similarly lost in a tree of xml fragment files connected with relative URIs, while the Xalan-J 2.0.1 XSLT processor (which uses Xerces 1.23) does not have this problem. In the example above, Saxon *does* find the adolph.ent entity but not the eleanor.xml one.) I could get around this empty value for $base by keeping a pushdown list of the relative URI 'bases' (dirname parts of URIs of previously seen external entity references) except for one thing: in the version of XML::Parser I'm constrained to use, 2.27, there's no hook for the end of an external entity reference event, only the beginning. Without this hook I don't know when to pop a relative base from the pushdown, and so again get lost in the tree. As of version 2.28 of XML::Parser there is such a hook: "ExternEntFin". Is there any way to make XML::Parser 2.27 give meaningful values for the $base parameter to one's ExternEnt handler, or am I doomed to come up with a different Perl-accessible XML parser? ____________________________________________________________ James Miller in Austin, Texas Internet: jamesm@b... (198.3.118.20) alternate: jamesm@w... (198.3.118.3) ____________________________________________________________
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|