[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Please stop writing specifications that cannot be parsed/p

  • From: Marcus Reichardt <u123724@gmail.com>
  • To: Michael Kay <mike@saxonica.com>
  • Date: Mon, 5 Jun 2023 23:58:08 +0200

Re:  Please stop writing specifications that cannot be parsed/p
Thanks Michael K for bringing up test cases. When hearing about coding an XML parser in two days or similar stunts I was in disbelief considering the combinatorics amount to at least a low 3-figure number for core XML test cases alone. Also, is it really relevant that you can't use off-the-shelf LALR parser generators for a markup meta language that itself acts as parser generator? Markup is for ambitious end users not CS students (a distinction the modern web spectacularly also fails to observe considering everyone wants to do React and Tailwind), and SGML can be seen as a valuable contribution towards how powerful/idiosyncratic a mainstream document language can be designed considering its inventor is a lawyer by profession.

If we look at actual XML parsers in use today such as libxml2, those have been in development for well over two decades. Granted, with DTD (and XSD and RNG and XSLT and XPath and DOM and SAX and pull parsers) - but such is the XML stack after all. The specifics of constructing content model automata are identical for SGML and XML DTDs (and not much harder for XSD). A recent (2022?) change I remember introduces a heuristic for billion laughs attack mitigation, whereas an SGML declaration can control max nested entity expansion level from the start, along with other quantities. 

With my post I wasn't suggesting to change XML; personally I think XML is almost perfect as a delivery or archive format, and indeed changing it at this point, if it were even possible, does more harm than good. For authoring (using markdown through SHORTREF and other SGML techniques), and embracing HTML, OTOH, I was hoping for a bit more support here. I mean, XML's alignment with SGML gives it precise and predictable integration of HTML and ubiquitous casual text editing conventions which is great and a big win for XML. Just as XML is set in stone, so is SGML, and it's unlikely we're going to see entirely new document languages. Hell, the majority of human-written content might have been already written. But rather than enjoying this power and the existence of something so outlandish (by today's standards) and nerdy as an SGML ISO standard, whenever the topic comes up, the reaction here is all defensive and frankly, sounding like early XML commercials narratives by business types ;)

Cheers,
Marcus Reichardt
sgml.io


> Am 05.06.2023 um 19:26 schrieb Michael Kay <mike@saxonica.com>:
> 
> 
>> 
>> 
>> I wrote the first complete and AFAIK fully conforming XML parser, Lark, in Nov/Dec 1996 (Yeah, XML wasn’t quite finished yet) and it took several weeks, which annoyed me because I really had thought we’d managed to narrow it down enough to make it a one-week task.
> 
> 
> Is that with or without DTD validation?
> 
> I'd rate it at two days without DTD validation, 2 months with; and that's assuming you start with a decent test suite.
> 
> Michael Kay
> Saxonica
> 
> 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.