[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: @xml-base in subtrees included (a) via entity expansion, a
> On Nov 16, 2017, at 1:42 AM, Michael Kay <mike@saxonica.com> wrote: >> ... >> >> (3) Your upstream processor is, as required by the spec, leaving xml:base attributes in the top-level included element items of an inclusion. And for reasons not at all clear ot me, it is sometimes using a relative reference in the xml:base attribute, and not an absolute reference. >> >> Are those inferences correct? > > Yes. (On (3), I think that there are some good usability reasons for putting a relative reference in the xml:base attribute - it makes the document relocatable as part of a complex of interlinked documents) I think relocatability is a good reason for most or all human-assigned URIs, including instances of xml:base, to be relative. I don’t see how it is useful for the xml:base attributes injected by an XInclude processor, which I expect to have transient utility anyway (they are part of one particular processing of the input). Possibly I’m just dense, or the scenario in which it’s useful is not one I have encountered. Oh, OK. (D’oh.) If I’m moving a set of interlinked documents using XInclude, I normally want to move them to an isomorphic set of interlinked documents using XInclude at a different location. And in that case, the result of XInclude is transient. If, however, I want to maintain some material in a set of 100 interlinked documents at location 1, and use XInclude to construct three documents which include material from that set of 150 and which do not themselves use XInclude, and then place them at location 2, then yes, I think I see why one might want the xml:base attributes to have relative URIs. (Actually, the most compelling thing I see is another good reason not to try to play clever games with base URIs. But that doesn’t really help you. If I did find myself backed into a corner and forced to use xml:base in subtle ways, I would really want the processor to produce correct results.) > > Let me try giving an example. Consider first a single-entity document (A) with base URI http://example.com/doc.xml: > > <out> > <in xml:base="dir/in.xml"/> > </out> > > then the base URI of <in> is http://example.com/dir/in.xml. With you so far. > If we now take this document (B) at the same location: > > <!DOCTYPE out [ > <!ENTITY e SYSTEM "dir/in.xml"> > ]> > <out>&e;</orders> > > where the external entity is > > <in xml:base="dir/in.xml"/> > > then the document after entity expansion is > > <out> > <in xml:base="dir/in.xml"/> > </out> > > but the base URI of <in> is now http://example.com/dir/dir/in.xml Yes. > If we now take this document (C) at the same location: > > <out> > <xi:include href="dir/in.xml"/> > </out> > > where the included document is > > <in/> > > then the expanded document (delivered by Apache Xerces) is > > <out> > <in xml:base="dir/in.xml"/> > </out> > > and the base URI of <in> is http://example.com/dir/in.xml And yes. And … ouch. It is here that I rub my eyes in surprise at the decision of the parser implementors to insist on using a relative URI here, and not offer the option of using an absolute one. After all, the parser + XInclude processor has all the relevant information and can make the correct distinctions; its output in this case seems to be erasing the information needed to allow the downstream user to calculate the base URIs correctly. If an XML parser did that to me (erased substantive differences in the input), I would be looking for a new XML parser. I realize that you may not regard that as a viable option. In fairness to the implementors, it is pretty clear that XInclude tried hard to make itself invisible in the infoset. I don’t know whether to be upset that the responsible WG missed this particular interaction among general entities, XInclude, xml:base, and the general rules for base URIs, or impressed that it has taken this long to surface. > > My challenge is to distinguish these three cases, where the surface structure (the values of elements and attributes) is in all cases the same, but the base URIs are different. In particular the XPath expression base-uri(//in) must deliver the correct answer in all three cases. > > I currently distinguish (A) and (B) by detecting that the location information supplied by the SAX parser for the <in> element has a different systemID from the location of the <out> element. But this heuristic is giving me the wrong answer for case (C). > > It does occur to me that there is one way I could detect the difference between (B) and (C): presumably the SAX parser will call LexicalHandler.startEntity() and LexicalHandler.endEntity() for case (B), but not for case (C). Using that information will be messy (it needs an extra bit somewhere in the XDM model representation, and bits are in short supply) but it may be do-able. Given that you are looking aside at the location information in order to identify the correct base URI in case (B), I suspect that you would be able to calculate the correct base URI in case C if you were to set feature http://apache.org/xml/features/xinclude/fixup-base-uris to false. The drawback is that the ‘in’ element of case (C) would then have no xml:base attribute. This would at least allow you to offer the user a choice: correct base URI and a missing xml:base where XInclude says one should be injected, or a correct set of attributes and incorrect base URI. > >> >> Can you tell your upstream XInclude processor to use absolute URIs in all injected xml:base attributes? > > AFAIK, No. >> >> Can you ask it to provide the extension property named “include history”? > > AFAIK, No. > I would be tempted to lobby the Xerces team for some extension property to signal that a given node is a top-level node in an external entity, or a top-level node in an XIncluded node set. Or a base-URI property … But perhaps that ship has sailed. Does this problem present itself only in the SAX interface, or also in the DOM interface? ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com ********************************************
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|