|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A simple guy with a simple problem
3/15/01 3:53:48 AM, "Steven E. Harris" <sharris@s...> wrote: >Relying upon preservation of the source input stream's peculiarities >simply doesn't constitute a "robust" usage - or expectation of - >XML. The XML you'd get out the other side of a SAX-level filter would >still produce the same results if re-parsed¹, so why impose this >syntactic preservation requirement? > > >Footnotes: >¹ Unless we're trying to do some "compression" by using entities to > avoid repeating long strings inline. But I digress. In general, there *is* a class of applications that do have to preserve, as much as possible, the lexical properties of the input stream. I'll generically refer to them as "editors," though this shouldn't be understood only as humans-type-on-a-screen applications; it would also encompass "stream" or "batch" editors (think sed). They account for the minority of applications; most applications are naturally "structure driven" as the SGMLers put it or "infoset-driven" in modern terms, and therefore *must* not, as you say, be sensitive to lexical details (remember WML, where the syntactic role of a dollar sign depends on whether it was expressed literally or as a numeric character reference? Ick!). But editor-type applications do exist, and they need more information than something like SAX can provide. That's not an argument for burdening SAX with requirements to report all sorts of lexical details; as I said, most applications are going to be structure driven, and they need a *simple* API for XML. But for editor applications (here's my favorite example; let's say we have a tool that looks up abbreviated bibliographic references in a document and replaces them by full references. If I have a book, physically organized into entities corresponding to chapters, I'd like to be able to run it through the tool without losing my chapter organization; I do *not* want the thing to come out as one giant lump of text) it would be nice to be able to work with the document in a way that isn't completely structure-blind (like pure regex processing). I'm not sure, though, what sort of representation would be appropriate.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








