[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: include text file
> As Tidy and > the HTML side of the DOM proves, there's no reason you can't parse hideous > HTML into a uniform node tree. I think you have to separate two cases. Omitting end tags (and in some cases begin tags) isn't hideousness, it is a standard SGML feature, the HTML DTD has sufficient declarations to allow an SGML parser to infer the missing tags. HTML4.decl says FEATURES OMITTAG YES ^^^^^^^^^^^^ which tells an SGML parser that these features are being used. and for example the DTD has <!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body --> ^^^ which says you can omit both the begin and end tag of the body element and the parser will infer it. This is how sx (for example) in James Clark's sp suite can parse HTML (or any SGML) files and output the parse tree in XML syntax. You want (I think) to do the same without the overhead of writing to a file and reading back. So you just want a SAX enabled SGML parser. I am sure I saw an announcement of such a beast once, but a quick look in google failed to show anything likely. > Through some voodoo that I'm sure the IE and Mozilla developers have had > to develop several times over, it would become this node tree: What the browsers do is something rather different. They are designed to avoid errors at all cost so accept not just "non well formed" HTML in the sense of HTML with ommittable tags omitted, but rather try to accept any random character stream that looks like it might have been intended to perhaps be html. You could perhaps have a sax interface to such a permissive parser, but unlike the case above, here you'd have to accept that the parse might fail in more interesting ways, and that the result of any parse might be more the result of creative thinking by the parser writer than something specified in the file.... David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|