Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Yves Genest Date:14 Jul 2006 11:37 AM
I found that the problem come from a url embedded in my html that has a cgi parameter called lang. I need to find a way for jtidy to ignore or bypass url parameters. Any idea?
Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Yves Genest Date:14 Jul 2006 01:10 PM
Hi Ivan,
This is the fragment before:
<A HREF="/nexres/search/power_search.cgi?&src=10014963&ses=61f08d084812d3568f490653fb9863a926798&src_aid=&path=&unps=&lang=" target=_parent>Hotels</A>
The ragment after:
<a
href="/nexres/search/power_search.cgi?
Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Yves Genest Date:14 Jul 2006 01:15 PM
Hi Ivan,
This is the fragment before:
<A HREF="/nexres/search/power_search.cgi?&src=10014963&ses=61f08d084812d3568f490653fb9863a926798&src_aid=&path=&unps=&lang=" target=_parent>Hotels</A>
The ragment after:
<a
href="/nexres/search/power_search.cgi?
Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Tony Lavinio Date:17 Jul 2006 09:55 AM
The problem is that the HTML is ambiguous.
Ampersands in HTML should be escaped, although browsers generally
will let them through.
But, ⟨ is a valid HTML entity. That's why it behaves differently
from the others. It corresponds to Unicode 9001 decimal - the
left-pointing angle bracket.
Tidy thinks it is a special character, which actually it is. A fully
standards-conforming browser, not operating in 'quirks' mode, would also
see it that way and not as a parameter named lang.
See http://www.w3.org/TR/html401/loose.dtd, strict.dtd and
HTMLsymbol.ent and also symbol hex 2329 (decimal 9001) at http://www.unicode.org/charts/PDF/U2300.pdf
The solution is either to change the source HTML if you can, or write
code that replaces the value 〈 with the string &lang.
Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Yves Genest Date:17 Jul 2006 01:16 PM
Thanks Tony,
In fact I discovered that the internal stylus studio html-xml converter does not have this problem. I also ran into problems with Tidy with javascript. What Java class Stylus is using to convert HTML into XML?
Subject:XML-22004: (Fatal Error) Error while parsing input XML document (Missing entity 'lang'.). Author:Yves Genest Date:14 Jul 2006 12:57 PM
I found that the problem come from a url embedded in my html that has a cgi parameter called lang. I need to find a way for jtidy to ignore or bypass url parameters. Any idea?