[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: (more) extensible SAX

  • From: Eric van der Vlist <vdv@d...>
  • To: David Brownell <david-b@p...>
  • Date: Thu, 07 Dec 2000 00:50:05 +0100

sax2 versus dom
David Brownell wrote:
> 
> Summary:  I don't see a problem here.  No federal issue, as it were;
> layering works fine already.

Sure, SAX1 and SAX2 are both working, but, like everything else, can't
they be improved ?

> > In most of the papers I can read, SAX is opposed to DOM as a pull
> > versus push.
> >
> > While this is certainly an important difference, I don't see it as the
> > main difference, but I'd rather say that the main difference is that SAX
> > and DOM are acting at different levels and that SAX is the most
> > "neutral" interface, DOM being more biased by a specific interpretation
> > of what is a XML document.
> 
> I see the functional difference as being that SAX is a callback
> API, while DOM is basically a data structure -- and often one
> that's not particularly task-appropriate.  There are also some
> differences in the data/infoset exposed, and very significant
> ones in portability.  (DOM still has no portable bootstrap API.)

We agree on 80% of by preliminary, then...

> > Now, I'd like to go on by explaining what I think are the two weaknesses
> > of SAX.
> >
> > The first of them is that the information isn't raw enough for some
> > applications and that there is still an information loss in the
> > interpretation that is done ...
> 
> Having looked at that issue in excruciating detail, I think it's
> typically fair to say that "some applications" want an API that
> presents lexical processing data.  SAX is a parser API, that's not
> what it was designed to address -- but a SAX2 extension could let a
> parser expose lexical data, if it wanted to go there.
> 
> > This second (and almost opposite) one is that in some cases, there isn't
> > enough interpretation. The way SAX1 has needed to be modified to support
> > the namespaces is a good example for this and the problem is likely to
> > happen again as long as new features are added through modularization to
> > XML 1.0.
> 
> Actually, SAX1 did not _need_ to be modified that way.  There were
> examples of doing such processing in layers above SAX1, even before
> the one that got bundled into SAX2.  That was a design choice, not
> a structural imperative.

Yes, and it has been implemented as a layer above the "core" parser
layer in AElfred 2 (as a separate class).

> > I think that both are coming from a quest to find a balance and to
> > define an API that will meet most of the needs (I could call it the "one
> > fits all" utopia) and that this issue should be addressed by adding more
> > modularity and layering rather than by adding more complexity to
> > existing methods.
> 
> I agree about layering and modularity, but can't quite see why there
> would be any problem achieving either of those with the current SAX.
> 
> Perhaps you're really wanting to see new layers get standardized?  :-)

That's one of my points, yes.
 
> > Last point, why do I call it a layered interface ?
> >
> > Because we could define on top of this a layered architecture where a
> > single event would get richer by each layer it comes through.
> >
> > The first layer could be the recognition of the basics XML productions.
> 
> Which productions -- the lexical ones, or the grammatical ones?  I count
> two layers there.  (Evidently from its SGML heritage, XML doesn't have
> the cleanest of distinctions between those layers, but it exists.)  The
> SAX API is basically a grammatical layer.

Isn't the namespace support mixing up things, here ?

And isn't it a reason to try to have a cleanly layered approach ?
 
> > A second layer could be to include entities processing and well formness
> > checks.
> 
> Actually some of the XML rules require WF checks at a lexical level,
> while some are purely grammatical or content-based.  Entities are
> basically processed in the boundary between lexical and syntactical
> processing -- "&foo;" or "%bar;" need lexical exposure, but basically
> they're invisible otherwise.  (Yes, I'm partitioning the infoset into
> classic categories there.)
> 
> > Next layers would include namespaces and scoped attributes.
> 
> Hmm, you omitted validation.  Though it's known that validation can
> basically be done as a layer over SAX2 ... and that any such layers
> don't actually need to be "SAX (tm)" branded.

Not necessarily, it can come just after the first very raw layer.

I know I will probably be called an heretic, but exposing this as an
interface would allow to parse "not badly formed HTML" including the
mixture exported by MS Office as HTML files.
 
> > I don't see anything but advantages, one of them being the extensiblity:
> > with this architecture, SAX2 would just have been a layer on top of
> > SAX1.
> >
> > Have I miss something ?
> 
> Well, there are already SAX2 wrappers of SAX1 parsers that work
> exactly that way -- except for "optional" features.

Yes, what I propose would be a generalization of this architecture.

Thanks for your feedback.

Eric 
> - Dave

-- 
See you at XML 2000
      http://gca.org/attend/2000_conferences/XML_2000/building.htm#vlist
------------------------------------------------------------------------
Eric van der Vlist       Dyomedea                    http://dyomedea.com
http://xmlfr.org         http://4xt.org              http://ducotede.com
------------------------------------------------------------------------

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.