[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Never mind the browser, let's do MicroXML

  • From: David Carlisle <davidc@nag.co.uk>
  • To: Kurt Cagle <kurt.cagle@gmail.com>
  • Date: Sat, 18 Dec 2010 00:10:05 +0000

Re:  Never mind the browser
On 17/12/2010 23:31, Kurt Cagle wrote:
>     HTML5 has some problems but ambiguity isn't really one of them, the
>     html5 spec specifies in excruciating detain how to construct a parse
>     tree from any stream of unicode characters. Unlike XML there are no
>     states equivalent to "not well formed", every input has a defined parse.
>
> David,
>
> Hmm .. I guess what I'm saying is this - suppose that you have an input
> sequence that looks like this:
>
> <html>
> <body>
> Text
> <ul>
> <li>Line 1
> <li>Line 2
> <li>Line 3
>
> which you're implying could conceivably valid input.

Well actually it's invalid, the smallest changes I could make to make it 
valid would result in

<!DOCTYPE html>
<html>
<title></title>
<body>
Text
<ul>
<li>Line 1
<li>Line 2
<li>Line 3
</ul>



>
> Because we know the underlying semantics, the processor would be able to
> parse that as:

I'm not sure that semantics are required. the html5 spec says how to 
parse any input string it's a purely mechanical process with hardly any 
optional or customisable behaviour. (bit scary describing the html5 
parser on a thread in which Henri is likely to pop up:-)

>
> <html>
> <body>Text
> <ul>
> <li>Line 1</li>
> <li>Line 2</li>
> <li>Line 3</li>
> </ul>
> </body>
> </html>
>
> However, without those known semantics, there are ambiguities in the
> input - it could be interpreted as

well any input whether xml or html or fortran might be incorrect, not 
much you can do about that.
>
> <garfle>
> <fleeblock>Text</fleeblock>
> <agbar/>
> <lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>
> </garfle>

acording to html5 that is non conforming (undefined element names) but 
has a defined parse tree of

<html><head></head><body><garfle>
<fleeblock>Text</fleeblock>
<agbar>
<lukvi>Line 1 <lukvi> Line 2 <lukvi> Line 3</lukvi></lukvi></lukvi>
</agbar></garfle>
</body></html>

> or
>
> <garfle>
> <fleeblock>Text
> <agbar>
> <lukvi>Line 1</lukvi>
> <lukvi>Line 2</lukvi>
> <lukvi>Line 3</lukvi>
> </agbar>
> </bleeblock>
> </garfle>

which again is non conforming but has a defined parse tree equivalent to 
parsing

<html><head></head><body><garfle>
<fleeblock>Text
<agbar>
<lukvi>Line 1</lukvi>
<lukvi>Line 2</lukvi>
<lukvi>Line 3</lukvi>
</agbar>

</fleeblock></garfle>
</body></html>
>
> which may have very different interpretations based upon structure (I've
> deliberately scrambled the words to highlight the issue). If that was a
> known schema instance, it's that which I'm referring to in terms of
> ambiguity. There may be specific parsing rules in HTML5, but I daresay
> that anyone writing the initial instance I gave above probably wouldn't
> be well versed on the specification.

If you write in any language without knowing the rules of that language, 
then confusion may result, but I don't think that can be called 
ambiguity in the language.
>
> I think the difference in interpretation here is that the HTML5 focus is
> on tolerating ambiguity (which is what supporting multiple rules for
> parsing is)

I'm not sure what you mean by multiple rules. As you may have noticed, 
when James Clark and I suggested they could have some variation in the 
rules for newer documents the suggestion got a resounding no.

  and treating precision as a fault, while the XML focus is on
> being willing to deal with the extra precision if it reduces ambiguity.
> That's one of the reasons I get antsy when I hear people make statements
> like the idea that HTML can replace XML. HTML+ARIA might have that
> additional precision, but it comes at the cost of requiring two
> languages plus coding to accomplish what can be done in one with XML.
>

David


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.