[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: A simple guy with a simple problem

  • From: Eric Bohlman <ebohlman@e...>
  • To: "Steven E. Harris" <sharris@s...>, xml-dev@l...
  • Date: Fri, 16 Mar 2001 02:20:55 -0600

numeric character reference sax xslt
3/15/01 3:53:48 AM, "Steven E. Harris" <sharris@s...> wrote:

>Relying upon preservation of the source input stream's peculiarities
>simply doesn't constitute a "robust" usage - or expectation of -
>XML. The XML you'd get out the other side of a SAX-level filter would
>still produce the same results if re-parsed¹, so why impose this
>syntactic preservation requirement?
>
>
>Footnotes: 
>¹ Unless we're trying to do some "compression" by using entities to
>  avoid repeating long strings inline. But I digress.

In general, there *is* a class of applications that do have to preserve, as 
much as possible, the lexical properties of the input stream.  I'll 
generically refer to them as "editors," though this shouldn't be understood 
only as humans-type-on-a-screen applications; it would also encompass "stream" 
or "batch" editors (think sed).  They account for the minority of 
applications; most applications are naturally "structure driven" as the 
SGMLers put it or "infoset-driven" in modern terms, and therefore *must* not, 
as you say, be sensitive to lexical details (remember WML, where the syntactic 
role of a dollar sign depends on whether it was expressed literally or as a 
numeric character reference?  Ick!).

But editor-type applications do exist, and they need more information than 
something like SAX can provide.  That's not an argument for burdening SAX with 
requirements to report all sorts of lexical details; as I said, most 
applications are going to be structure driven, and they need a *simple* API 
for XML.  But for editor applications (here's my favorite example; let's say 
we have a tool that looks up abbreviated bibliographic references in a 
document and replaces them by full references.  If I have a book, physically 
organized into entities corresponding to chapters, I'd like to be able to run 
it through the tool without losing my chapter organization; I do *not* want 
the thing to come out as one giant lump of text) it would be nice to be able 
to work with the document in a way that isn't completely structure-blind (like 
pure regex processing).  I'm not sure, though, what sort of representation 
would be appropriate.



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.