Re: When to check entity WFness according to 4.3.2
> Continuing along the same lines I have been going for the past week I > have another question about entity details. Section 4.3.2 of the XML rec > says: > > "An internal general parsed entity is well-formed if its replacement > text matches the production labeled content." > > My question is: when should such a WFness check be performed. > > (a) Immediately after the literal is parsed > (b) Immediately after the DTD has been parsed > (c) Only when it is referenced If memory serves me correctly, then Expat is doing (c). > I don't think (c) is the answer-- especially because of some of the > points that were made in this thread . So that leaves us with (a) and > (b). The problem with (a) is that element declarations important to the > WF check of the literal value might not have occurred. Consider: > > <!DOCTYPE doc [ > <!ELEMENT doc (foo)> > <!ENTITY e "<foo id='This is not an id!'/>"> > <!ELEMENT foo EMPTY> > <!ATTLIST foo id ID #IMPLIED> > ]> > <doc>&e;</doc> > > versus: > > <!DOCTYPE doc [ > <!ELEMENT doc (foo)> > <!ELEMENT foo EMPTY> > <!ATTLIST foo id ID #IMPLIED> > <!ENTITY e "<foo id='This is not an id!'/>"> > ]> > <doc>&e;</doc> > > This leaves us with option (b)-- perform the WFness check of the literal > value once the DTD has been parsed. The only hitch with this is test > case valid-sa-86 from the xml test suite. > > <!DOCTYPE doc [ > <!ELEMENT doc (#PCDATA)> > <!ENTITY e ""> > <!ENTITY e "<foo>"> > ]> > <doc>&e;</doc> > > This test is supposed to be wellformed which would only be the case if > we accepted option (c), or if overridden entity literals are discarded > and we go with option (b). This makes sense when considering the > distinction between the literal entity text and the replacement text. > Section 4.3.2 refers only to checking the WFness the replacement text > not the actual literal. > > Is this the correct interpretation? IMO, (b) would be correct, but it is much easier to implement (c). The only case when (c) does not catch the WF violation is when the entity is never referenced. That is, Expat will not catch this: <!DOCTYPE doc [ <!ELEMENT doc (#PCDATA)> <!ENTITY e "<foo>abc</not_foo>"> ]> <doc></doc> Karl Karl
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format