[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Classification: XML Parser Features

  • From: Tim Bray <tbray@t...>
  • To: David Megginson <ak117@f...>
  • Date: Fri, 12 Dec 1997 17:08:09 -0800

parser features
At 12:17 PM 12/12/97 -0500, David Megginson wrote:
>Creating a truly well-formed parser is very, very difficult, because
>of the enormous number of constraints imposed both explicitly and
>implicitly by the grammar (I could probably write a full SGML parser
>with about the same level of effort, especially if I limited myself to
>a single, simple SGML declaration).

To start with, "full SGML parser" is directly contradictory to "a single
SGML declaration" - abstract syntax in fact being one of the things
that makes a full parser hard to write.

As to David's main point, that a WF parser is hard to write, I don't
agree; most of the work can be done in the low-level lexer, the number
of constraints that require ad-hoc code is pretty small.  Two things
are in fact hard, it seems:

1. handling multiple input encodings, and
2. making it run real fast while you're doing #1.

These don't really bother me that much as we are in the infancy of 
learning what the right way is to build truly internationalized
software; for example, I can parse the UTF16 Japanese version of the
XML spec in a few seconds; then it takes the best part of a minute
to load the .ttf for the Unicode font so you can look at anything;
so we have a few problems in this area.

Having said that, I am now in the middle of coding up validation for
Lark, and there are a TREMENDOUS NUMBER of irritating little
details about that.  No rocket science at all, but the code is going
to be substantially larger than the rest of Lark and it's all real
code; more than half of Lark is compressed parser tables.

Mind you, the validator is in a separate package and can be bypassed, so 
Lark effectively need be no larger.  But still; I wonder if validation
is intrinsically hard or we could have found a better 80/20 point? -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.