[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: When to check entity WFness according to 4.3.2


xml entity doc

> Continuing along the same lines I have been going for the past week I 
> have another question about entity details. Section 4.3.2 of the XML rec 
> says:
> 
> "An internal general parsed entity is well-formed if its replacement 
> text matches the production labeled content."
> 
> My question is: when should such a WFness check be performed.
> 
> (a) Immediately after the literal is parsed
> (b) Immediately after the DTD has been parsed
> (c) Only when it is referenced

If memory serves me correctly, then Expat is doing (c).
 
> I don't think (c) is the answer-- especially because of some of the 
> points that were made in this thread [1]. So that leaves us with (a) and 
> (b). The problem with (a) is that element declarations important  to the 
> WF check of the literal value might not have occurred. Consider:
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (foo)>
> <!ENTITY e "<foo id='This is not an id!'/>">
> <!ELEMENT foo EMPTY>
> <!ATTLIST foo id ID #IMPLIED>
> ]>
> <doc>&e;</doc>
> 
> versus:
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (foo)>
> <!ELEMENT foo EMPTY>
> <!ATTLIST foo id ID #IMPLIED>
> <!ENTITY e "<foo id='This is not an id!'/>">
> ]>
> <doc>&e;</doc>
> 
> This leaves us with option (b)-- perform the WFness check of the literal 
> value once the DTD has been parsed. The only hitch with this is test 
> case valid-sa-86 from the xml test suite.
> 
> <!DOCTYPE doc [
> <!ELEMENT doc (#PCDATA)>
> <!ENTITY e "">
> <!ENTITY e "<foo>">
> ]>
> <doc>&e;</doc>
> 
> This test is supposed to be wellformed which would only be the case if 
> we accepted option (c), or if overridden entity literals are discarded 
> and we go with option (b). This makes sense when considering the 
> distinction between the literal entity text and the replacement text. 
> Section 4.3.2 refers only to checking the WFness the replacement text 
> not the actual literal.
> 
> Is this the correct interpretation?

IMO, (b) would be correct, but it is much easier to implement (c).
The only case when (c) does not catch the WF violation is when
the entity is never referenced. That is, Expat will not catch this:

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
<!ENTITY e "<foo>abc</not_foo>">
]>
<doc></doc>

Karl


Karl

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.