RE: Another XML parsing idea? Was: Re: XML Hangover)

To: "'Bob Foster'" <bob@o...>
Subject: RE: Another XML parsing idea? Was: Re: XML Hangover)
From: "Michael Kay" <mike@s...>
Date: Wed, 13 Jul 2005 23:04:42 +0100
Cc: <xml-dev@l...>
In-reply-to: <42D5850B.2000708@o...>
Thread-index: AcWH8F4QfdtrF1W5SGKqlzVKp5Ya5QABl68g

Play the video

I think HT said that there's always an overhead if you have to cross a
thread boundary, and that they try to avoid it whenever possible. Crossing a
process or machine boundary would be far worse.

Michael Kay 

> -----Original Message-----
> From: Bob Foster [mailto:bob@o...] 
> Sent: 13 July 2005 22:18
> To: Michael Kay
> Cc: xml-dev@l...
> Subject: Re:  Another XML parsing idea? Was: Re: 
>  XML Hangover)
> 
> I don't know the internals (maybe someone can comment) but I believe 
> Markup Technology has a protocol for passing PSVI around. It seems 
> pretty darn fast.
> 
> Bob Foster
> 
> Michael Kay wrote:
>  > A protocol implies sending and receiving messages 
> typically across a 
> process
>  > boundary or even a machine boundary. This would raise the 
> cost of XML
>  > parsing by a couple of orders of magnitude.
>  >
>  > Michael Kay
>  >
>  >
>  >>-----Original Message-----
>  >>From: Mukul Gandhi [mailto:mukul_gandhi@y...]
>  >>Sent: 13 July 2005 20:01
>  >>To: Michael Kay; 'Pete Cordell'; xml-dev@l...
>  >>Subject:  Another XML parsing idea? Was: Re:
>  >> XML Hangover)
>  >>
>  >>Today, we have a paradigm in XML parsing of using APIs
>  >>like SAX or DOM. I was thinking of another approach to
>  >>parse XML documents.
>  >>
>  >>Can we have a protocol (instead of API) that will talk
>  >>between a application and the XML parser? This shall
>  >>make using a XML parser interoperable to the calling
>  >>application.. We could achieve this "we could have a
>  >>Microsoft XML parser serving Java program's XML
>  >>parsing request.."
>  >>
>  >>Just now we have APIs like SAX and DOM and proprietary
>  >>Microsoft APIs.. Had we had some protocol similar to
>  >>HTTP, that talked between a application and parser, it
>  >>may help interoperability..
>  >>
>  >>Is this sensible thinking? Is this idea conceptually
>  >>similar to StAX or .NET XmlReader parsing approach?
>  >>
>  >>Regards,
>  >>Mukul
>  >>
>  >>--- Michael Kay <mike@s...> wrote:
>  >>
>  >>
>  >>>The URL got truncated
>  >>>
>  >>>
>  >>
>  
> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
> paper.html
>  >>
>  >>>with ".html" at the end.
>  >>>
>  >>>Michael Kay
>  >>>
>  >>>
>  >>>>-----Original Message-----
>  >>>>From: Mukul Gandhi [mailto:mukul_gandhi@y...]
>  >>>
>  >>>>Sent: 13 July 2005 10:02
>  >>>>To: Michael Kay; 'Pete Cordell';
>  >>>
>  >>>xml-dev@l...
>  >>>
>  >>>>Subject: RE:  XSL for non-XML input (Was:
>  >>>
>  >>>Re:
>  >>>
>  >>>> XML Hangover)
>  >>>>
>  >>>>Hi Mike,
>  >>>>  I get error
>  >>>>HTTP 404 - File not found
>  >>>>
>  >>>>--- Michael Kay <mike@s...> wrote:
>  >>>>
>  
> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-
> paper.htm
>  >>
>  >>>>Regards,
>  >>>>Mukul
>  >>>>
>  >>>>
>  >>>>>
>  >>>>>Michael Kay
>  >>>>>
>  >>>>>
>  >>>>>Going further, observing the idea of using out
>  >>>
>  >>>of
>  >>>
>  >>>>>band data (e.g. schema) to
>  >>>>>provide extra information to complete 'binary
>  >>>
>  >>>XML',
>  >>>
>  >>>>>could XSL (with suitable
>  >>>>>front ends) work on say an ASN.1 encoded X.509
>  >>>>>certificate (and ASN.1
>  >>>>>message definition) and produce, say, a PDF
>  >>>
>  >>>output?
>  >>>
>  >>>>>
>  >>>>>Not that I have a need to do that right now!
>  >>>
>  >>>I'm
>  >>>
>  >>>>>just interested to know
>  >>>>>whether XSL can be used as a kind of universal
>  >>>
>  >>>data
>  >>>
>  >>>>>translator.
>  >>>>>
>  >>>>>Thanks,
>  >>>>>
>  >>>>>Pete.
>  >>>>>--
>  >>>>>=============================================
>  >>>>>Pete Cordell
>  >>>>>Tech-Know-Ware Ltd
>  >>>>>
>  >>>>
>  >>-----------------------------------------------------------------
>  >>
>  >>>>>                         for XML to C++ data
>  >>>
>  >>>binding
>  >>>
>  >>>>>visit
>  >>>>>
>  >>>>>http://www.tech-know-ware.com/lmx
>  >>>>>                         (or
>  >>>
>  >>>http://www.xml2cpp.com)
>  >>>
>  >>>>>=============================================
>  >>>>>
>  >>>>>
>  >>>>>----- Original Message -----
>  >>>>>From: Michael Kay <mailto:mike@s...>
>  >>>>>To: 'Joe Schaffner'
>  >>>
>  >>><mailto:schaffner.joe@g...>
>  >>>
>  >>>>> ;
>  >>>>>xml-dev@l...
>  >>>>>Sent: Monday, July 11, 2005 9:00 PM
>  >>>>>Subject: RE:  XML Hangover
>  >>>>>
>  >>>>>
>  >>>>>
>  >>>>>I've been reading the XML litterature. It's
>  >>>
>  >>>great.
>  >>>
>  >>>>>Just a few comments:
>  >>>>>
>  >>>>>Welcome on board. It's refreshing to get
>  >>>
>  >>>thoughtful
>  >>>
>  >>>>>comments from someone
>  >>>>>who's new to the game.
>  >>>>>
>  >>>>>XSL - XML Stylesheets is divided into two parts,
>  >>>>>XSL-T and XSL-FO.
>  >>>>>
>  >>>>>The T part deals with templates and translation.
>  >>>>>Since HTML is valid XML, I
>  >>>>>guess I can parse my HTML using XSL-T to produce
>  >>>
>  >>>XML
>  >>>
>  >>>>>and vice versa. I don't
>  >>>>>understand why XSL-T refers to "nodes in an
>  >>>
>  >>>output
>  >>>
>  >>>>>tree". This suggests some
>  >>>>>kind of internal representation, but XML is
>  >>>>>perfectly good representation
>  >>>>>language. Don't <templates> merely write XML
>  >>>
>  >>>text to
>  >>>
>  >>>>>stdout?
>  >>>>>
>  >>>>>No, the result tree is completely abstract,
>  >>>
>  >>>there is
>  >>>
>  >>>>>no suggestion of an
>  >>>>>internal representation. In fact, for many XSLT
>  >>>>>processors, the "result
>  >>>>>tree" is represented internally as a stream of
>  >>>>>events, not as a linked
>  >>>>>collection of objects in memory. This concept of
>  >>>>>writing a tree, rather than
>  >>>>>writing text, however is extremely important.
>  >>>>>Firstly, it defines a
>  >>>>>separation of the information content of an XML
>  >>>>>document from the accidental
>  >>>>>aspects of its lexical representation -
>  >>>
>  >>>something
>  >>>
>  >>>>>that is sadly missing from
>  >>>>>the XML spec itself. In turn, this gives you a
>  >>>
>  >>>basis
>  >>>
>  >>>>>for defining a concise
>  >>>>>set of operators that are in some sense
>  >>>
>  >>>complete,
>  >>>
>  >>>>>composable and exhibit
>  >>>>>closure. In practical terms, it gives you the
>  >>>>>ability to write a series of
>  >>>>>transformations - a pipeline - in which the
>  >>>>>expensive steps of serializing
>  >>>>>and parsing intermediate results can be
>  >>>
>  >>>eliminated.
>  >>>
>  >>>>>
>  >>>>>Roughly, the process seems to work like this:
>  >>>
>  >>>the T
>  >>>
>  >>>>>processor does a
>  >>>>>recursive descent of the source XML. At each
>  >>>
>  >>>node it
>  >>>
>  >>>>>evaluates the set of
>  >>>>>templates. Those templates which match the name
>  >>>
>  >>>of
>  >>>
>  >>>>>the "current" tag are
>  >>>>>processed, in some order. The template writes
>  >>>
>  >>>text,
>  >>>
>  >>>>>that's why it's called a
>  >>>>>"template. The recursive descent is continued
>  >>>
>  >>>with
>  >>>
>  >>>>>an <apply-templates> tag
>  >>>>>inside the template. This allows you to balance
>  >>>>>output.
>  >>>>>
>  >>>>>It doesn't have to do a recursive descent of the
>  >>>>>source XML: that's up to
>  >>>>>the application, though a recursive descent is
>  >>>
>  >>>the
>  >>>
>  >>>>>most common design
>  >>>>>pattern. And it definitely doesn't write text:
>  >>>>>people who create a mental
>  >>>>>model of writing text eventually get a rude
>  >>>>>awakening, usually when they
>  >>>>>first try to tackle grouping problems.
>  >>>>>
>  >>>>>If no matches are found, the T processor
>  >>>
>  >>>continues
>  >>>
>  >>>>>the descent.
>  >>>>>
>  >>>>>There is a <template> tag (I forget what) which
>  >>>
>  >>>will
>  >>>
>  >>>>>select arbitrary paths
>  >>>>>in the souce tree, and there are tags which
>  >>>
>  >>>iterate
>  >>>
>  >>>>>through the result.
>  >>>>>
>  >>>>>Again, it's best to think of the stylesheet as
>  >>>>>containing nodes
>  >>>>>(representing instructions) rather than tags.
>  >>>>>Consider
>  >>>>>
>  >>>>><xsl:element name="x"><xsl:value-of
>  >>>>>select="."/></xsl:element>
>  >>>>>
>  >>>>>There are three tags there, but four nodes, and
>  >>>
>  >>>only
>  >>>
>  >>>>>two instructions. The
>  >>>>>semantics of the language are described in terms
>  >>>
>  >>>of
>  >>>
>  >>>>>the two instructions,
>  >>>>>not the three tags.
>  >>>>>
>  >>>>> This will allow me to build up a result "tree"
>  >>>>>which is not a mirror image
>  >>>>>of the source, something I need to do if I'm
>  >>>>>rearranging sections of the
>  >>>>>input document. Rather than buffering
>  >>>
>  >>>intermediate
>  >>>
>  >>>>>structures, the T
>  >>>>>processor does multiple passes based on these
>  >>>
>  >>>tags,
>  >>>
>  >>>>>and creates the output
>  >>>>>on-the-fly. Cool.
>  >>>>>
>  >>>>> ... .
>  >>>>>
>  >>>>>I assume there is nothing stopping me from using
>  >>>>>XSL-T to transform my HTML
>  >>>>>to PDF, but it seems best to output XSL-FO then
>  >>>>>create a PDF using some kind
>  >>>>>of tool. What is that tool?
>  >>>>>
>  >>>>>It's an XSL-FO processor. Examples are FOP,
>  >>>
>  >>>RenderX,
>  >>>
>  >>>>>Antenna House.
>  >>>>>
>  >>>>>Are there FO plug-ins available for my browsers?
>  >>>
>  >>>>>
>  >>>>>No, people are by-and-large using (X)HTML/CSS
>  >>>
>  >>>for
>  >>>
>  >>>>>the browser, XSL-FO/PDF
>  >>>>>for the printed page.
>  >>>>>
>  >>>>>Does this technology work?
>  >>>>>
>  >>>>>Absolutely yes.
>  >>>>>
>  >>>>>Michael Kay
>  >>>>>http://www.saxonica.com/
> 
>

References:
- Re: Another XML parsing idea? Was: Re: XML Hangover)
  - From: Bob Foster <bob@o...>

Prev by Date: Re: Another XML parsing idea? Was: Re: XML Hangover)
Next by Date: Re: Another XML parsing idea? Was: Re: XML Hangover)
Previous by thread: Re: Another XML parsing idea? Was: Re: XML Hangover)
Next by thread: Re: Another XML parsing idea? Was: Re: XML Hangover)
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >