Re: Please stop writing specifications that cannot beparsed/pr

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Marcus Reichardt <u123724@gmail.com>
Date: Fri, 09 Jun 2023 11:07:48 -0600

Play the video

Marcus Reichardt <u123724@gmail.com> writes:

> ...

>> The relatively deep intertwining of validation with everything else
>> in ISO 8879 makes it hard to write even simple tools.

> What would those tools be?

Well, thinking back on the kinds of programs I wrote to process SGML
data, or that I know were written by others, I'm thinking of things
like:

  - a macro in emacs or Xedit or Kedit to close the current element.

  - a program to scan a document and report, for each element, its fully
    qualified generic identifier.  (That is, a string like
    "/html/body/div/div/h2/b", listing the element and all of its
    ancestors, in document order, analogous to an absolute path in a
    file system.)

  - a program to search for a particular word or character sequence
    (assumed to be uninterrupted by markup) and report the fully
    qualified generic identifier of its parent.

  - a program to read an SGML document and emit a Waterloo GML document
    suitable for formatting and printing.

  - a program to read an SGML document and emit a TeX document suitable
    for formatting and printing.

  - a program to read an SGML document written using a
    literate-programming vocabulary tangle the source code.

  - a program to read an SGML DTD and make a list of element types
    referred to but no declared.

  - a program to read an SGML DTD and delete references to specified
    element types.

For most of these, I had no SGML parsing library to call, because for
what felt to me like a very long time there were no SGML parsers
available on the mainfram I was working on.  (It's possible that IBM had
product that did SGML parsing, but from the descriptions I could find, I
could not understand its functionality well enough to know whether it
was worthwhile trying to persuade my management to acquire and install
it.)  Eventually, I was able to port James Clark's sgmls parser to
VM/CMS and was able to use it, with CMS Pipelines, to simplify the
creation of programs like those described above.

Note that for any of the items above which I actually implemented, or
tried to implement, I was interested in a program I could use; I was not
attempting to make a tool others could use (although I would have been
flattered at the idea that others might be intereted).

Note also that not all of the items in the list really qualify as
'simple' tools.  Nor are the two DTD processors necessarily easier today
than in the late 1980s and early 1990s, or easier for XML DTDs than for
SGML DTDs. 

> Aren't XML people usually the first to criticize ad-hoc kindof-XML
> parsers,

Perhaps some XML people -- not anyone I know well.

If you are processing data you know well, and it's more convenient to
process it with an ad-hoc Perl script, I think it's perfectly legitimate
to cut corners.  If the material you are working with never uses
notations, you can save time.  If the input you are trying to process
uses no eneity references at all, you don't need to parse them.  The
scenario of someone responsible for a body of material writing ad-hoc
programs to solve problems before a deadline of some kind was frequently
referred to during the development of XML; the figure at the center was
called "the desperate Perl hacker", sometimes abbreviated DPH.

Of course, tools written by a DPH to solve particular problems,
exploiting knowledge of a particular body of material, are not to be
confused with general-purpose tools.  And there are probably people who
believe that conforming XML parsers are easy enough to write or acquire
that there seems to be no very good reason to use a non-conforming
parser in a tool intended for general use.  (There are certainly such
people; I am one.  The DPH is processing the DPH's own data, not writing
tools for others.)

It may be noted that the DPH scenario resembles the situation members of
the SGML community had often found themselves in, in the years 1986 to
1996, more than it resembles the situation most XML users find
themselves in nowadays.  Many programming languages have conforming XML
parsers (although some only have parsers without any good claim to
conformance), and it has been a long time since I wrote programs for XML
or SGML input that work the way my Spitbol and Rexx programs -- or even
my CML Pipelines -- worked. When I face the kinds of problems we
imagined a desperate Perl hacker having to solve, or the tasks listed
above, I am more inclined to write an ad hoc XSLT stylesheet or put
together a quick and dirty XQuery module to solve the problem.  I have
the impression I am not alone.  Since I program most frequently in XSLT
and XQuery, I don't in fact have to write ad-hoc parsers for XML.  And a
lot of people write XSLT transforms who do not self-identify as
programmers and would probably not ever have become Perl hackers of any
kind, desperate or otherwise.

But I think that aiming at the DPH had the beneficial side effect of
helping the designers of XML keep the syntax simple, which does make it
easier to write conforming XML software than it is to write conforming
SGML software for less restrictive profiles of SGML, let alone
unrestricted SGML.  That did help make the XML ecosystem more populous
than the SGML ecosystem we were then living in.

> and why would you side-step a parser lib you just put a lot
> of effort into creating, to use a task-specific ad-hoc kindof-XML
> parser instead?

Not everyone who faces the task of processing SGML or XML documents is
in a position to write a parser library for SGML or XML.  I think
task-specific ad-hoc parsers are mostly written by people who have not
just finished creating a full parser.  I don't know how many of them are
in fact written now.  As you have observed, anyone in a position to use
an off-the-shelf parser library will -- assuming the API is simple
enough -- often find it more convenient to use an off the shelf parser.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Follow-Ups:
- Re: Please stop writing specifications that cannot beparsed/processed by software
  - From: Thomas Passin <list1@tompassin.net>

References:
- Re: Please stop writing specifications that cannot beparsed/processed by software
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: Please stop writing specifications that cannot be parsed/processed by software
  - From: Marcus Reichardt <u123724@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >