[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Never mind the browser, let's do MicroXML
On 17/12/2010 23:31, Kurt Cagle wrote: > HTML5 has some problems but ambiguity isn't really one of them, the > html5 spec specifies in excruciating detain how to construct a parse > tree from any stream of unicode characters. Unlike XML there are no > states equivalent to "not well formed", every input has a defined parse. > > David, > > Hmm .. I guess what I'm saying is this - suppose that you have an input > sequence that looks like this: > > <html> > <body> > Text > <ul> > <li>Line 1 > <li>Line 2 > <li>Line 3 > > which you're implying could conceivably valid input. Well actually it's invalid, the smallest changes I could make to make it valid would result in <!DOCTYPE html> <html> <title></title> <body> Text <ul> <li>Line 1 <li>Line 2 <li>Line 3 </ul> > > Because we know the underlying semantics, the processor would be able to > parse that as: I'm not sure that semantics are required. the html5 spec says how to parse any input string it's a purely mechanical process with hardly any optional or customisable behaviour. (bit scary describing the html5 parser on a thread in which Henri is likely to pop up:-) > > <html> > <body>Text > <ul> > <li>Line 1</li> > <li>Line 2</li> > <li>Line 3</li> > </ul> > </body> > </html> > > However, without those known semantics, there are ambiguities in the > input - it could be interpreted as well any input whether xml or html or fortran might be incorrect, not much you can do about that. > > <garfle> > <fleeblock>Text</fleeblock> > <agbar/> > <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi> > </garfle> acording to html5 that is non conforming (undefined element names) but has a defined parse tree of <html><head></head><body><garfle> <fleeblock>Text</fleeblock> <agbar> <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi> </agbar></garfle> </body></html> > or > > <garfle> > <fleeblock>Text > <agbar> > <lukvi>Line 1</lukvi> > <lukvi>Line 2</lukvi> > <lukvi>Line 3</lukvi> > </agbar> > </bleeblock> > </garfle> which again is non conforming but has a defined parse tree equivalent to parsing <html><head></head><body><garfle> <fleeblock>Text <agbar> <lukvi>Line 1</lukvi> <lukvi>Line 2</lukvi> <lukvi>Line 3</lukvi> </agbar> </fleeblock></garfle> </body></html> > > which may have very different interpretations based upon structure (I've > deliberately scrambled the words to highlight the issue). If that was a > known schema instance, it's that which I'm referring to in terms of > ambiguity. There may be specific parsing rules in HTML5, but I daresay > that anyone writing the initial instance I gave above probably wouldn't > be well versed on the specification. If you write in any language without knowing the rules of that language, then confusion may result, but I don't think that can be called ambiguity in the language. > > I think the difference in interpretation here is that the HTML5 focus is > on tolerating ambiguity (which is what supporting multiple rules for > parsing is) I'm not sure what you mean by multiple rules. As you may have noticed, when James Clark and I suggested they could have some variation in the rules for newer documents the suggestion got a resounding no. and treating precision as a fault, while the XML focus is on > being willing to deal with the extra precision if it reduces ambiguity. > That's one of the reasons I get antsy when I hear people make statements > like the idea that HTML can replace XML. HTML+ARIA might have that > additional precision, but it comes at the cost of requiring two > languages plus coding to accomplish what can be done in one with XML. > David
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|