|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Confusion about conditional sections
Ok, I'm confused about a conditional section issue that some of my
colleagues were discussing...
Once you see a conditional ignore section, can you effectively just scan
for <![ and ]]> parts of the text inside there without actually doing
regular parsing? Is there a reason that this cannot be done?
The logic is basically this (and assumes that we've already entered the
body of an ignore section):
while (true)
{
depth = 0;
if (skipped char '<')
{
if (skipped char '!') and (skipped char '[')
depth++;
}
else if (next char is '>')
{
if (skipped char ']') and (skipped char '>')
depth--;
if (!depth)
return;
}
else if (skipped char is not valid XML char)
{
emit error
}
}
Here 'skipped char' means that it was skipped over in the content if it was
the target character. I can't help but think that this logic would fail to
deal with a number of issues, but I can't think of any right off hand. What
is missing from this picture?
Also, does the specification of a conditional section basically imply that
you cannot have a ']]>' character anywhere in an ignored section, even if
its in a literal?
So something like this:
<![IGNORE[
<!ENTITY MyEntity "The ]]> text of my entity">
]]>
would fail according to the spec because the ]]> character is not allowed
inside an ignored conditional section, even if in a place where it is
otherwise legal such as in a literal value. Is this correct? The above
logic is kind of dependent upon this being true I would think, since
otherwise it could be fooled. If this is true it would seem to be awfully
wierd that changing INCLUDE to IGNORE would cause a correct document to
break in this way.
The spec says that you must parse even the ignored section, but it doesn't
say to what extent. The logic above does 'parse' the text in that it looks
at every character in there. But its attempting to do a very low calory and
fast parse based on knowledge of what can be in a conditional section.
Since there is no identifying name in the end of a conditional, to assure
that its correctly aligned, doesn't the above logic correctly maintain all
the required state? It would though seem not to catch something like this:
<![IGNORE[
<![SOMETHINGSTUPID[ ]]>
]]>
Since it actually does not look at what follows the <
Cart








