[Home] [By Thread] [By Date] [Recent Entries]
I don't know the internals (maybe someone can comment) but I believe Markup Technology has a protocol for passing PSVI around. It seems pretty darn fast. Bob Foster Michael Kay wrote: > A protocol implies sending and receiving messages typically across a process > boundary or even a machine boundary. This would raise the cost of XML > parsing by a couple of orders of magnitude. > > Michael Kay > > >>-----Original Message----- >>From: Mukul Gandhi [mailto:mukul_gandhi@y...] >>Sent: 13 July 2005 20:01 >>To: Michael Kay; 'Pete Cordell'; xml-dev@l... >>Subject: Another XML parsing idea? Was: Re: >> XML Hangover) >> >>Today, we have a paradigm in XML parsing of using APIs >>like SAX or DOM. I was thinking of another approach to >>parse XML documents. >> >>Can we have a protocol (instead of API) that will talk >>between a application and the XML parser? This shall >>make using a XML parser interoperable to the calling >>application.. We could achieve this "we could have a >>Microsoft XML parser serving Java program's XML >>parsing request.." >> >>Just now we have APIs like SAX and DOM and proprietary >>Microsoft APIs.. Had we had some protocol similar to >>HTTP, that talked between a application and parser, it >>may help interoperability.. >> >>Is this sensible thinking? Is this idea conceptually >>similar to StAX or .NET XmlReader parsing approach? >> >>Regards, >>Mukul >> >>--- Michael Kay <mike@s...> wrote: >> >> >>>The URL got truncated >>> >>> >> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html >> >>>with ".html" at the end. >>> >>>Michael Kay >>> >>> >>>>-----Original Message----- >>>>From: Mukul Gandhi [mailto:mukul_gandhi@y...] >>> >>>>Sent: 13 July 2005 10:02 >>>>To: Michael Kay; 'Pete Cordell'; >>> >>>xml-dev@l... >>> >>>>Subject: RE: XSL for non-XML input (Was: >>> >>>Re: >>> >>>> XML Hangover) >>>> >>>>Hi Mike, >>>> I get error >>>>HTTP 404 - File not found >>>> >>>>--- Michael Kay <mike@s...> wrote: >>>> >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.htm >> >>>>Regards, >>>>Mukul >>>> >>>> >>>>> >>>>>Michael Kay >>>>> >>>>> >>>>>Going further, observing the idea of using out >>> >>>of >>> >>>>>band data (e.g. schema) to >>>>>provide extra information to complete 'binary >>> >>>XML', >>> >>>>>could XSL (with suitable >>>>>front ends) work on say an ASN.1 encoded X.509 >>>>>certificate (and ASN.1 >>>>>message definition) and produce, say, a PDF >>> >>>output? >>> >>>>> >>>>>Not that I have a need to do that right now! >>> >>>I'm >>> >>>>>just interested to know >>>>>whether XSL can be used as a kind of universal >>> >>>data >>> >>>>>translator. >>>>> >>>>>Thanks, >>>>> >>>>>Pete. >>>>>-- >>>>>============================================= >>>>>Pete Cordell >>>>>Tech-Know-Ware Ltd >>>>> >>>> >>----------------------------------------------------------------- >> >>>>> for XML to C++ data >>> >>>binding >>> >>>>>visit >>>>> >>>>>http://www.tech-know-ware.com/lmx >>>>> (or >>> >>>http://www.xml2cpp.com) >>> >>>>>============================================= >>>>> >>>>> >>>>>----- Original Message ----- >>>>>From: Michael Kay <mailto:mike@s...> >>>>>To: 'Joe Schaffner' >>> >>><mailto:schaffner.joe@g...> >>> >>>>> ; >>>>>xml-dev@l... >>>>>Sent: Monday, July 11, 2005 9:00 PM >>>>>Subject: RE: XML Hangover >>>>> >>>>> >>>>> >>>>>I've been reading the XML litterature. It's >>> >>>great. >>> >>>>>Just a few comments: >>>>> >>>>>Welcome on board. It's refreshing to get >>> >>>thoughtful >>> >>>>>comments from someone >>>>>who's new to the game. >>>>> >>>>>XSL - XML Stylesheets is divided into two parts, >>>>>XSL-T and XSL-FO. >>>>> >>>>>The T part deals with templates and translation. >>>>>Since HTML is valid XML, I >>>>>guess I can parse my HTML using XSL-T to produce >>> >>>XML >>> >>>>>and vice versa. I don't >>>>>understand why XSL-T refers to "nodes in an >>> >>>output >>> >>>>>tree". This suggests some >>>>>kind of internal representation, but XML is >>>>>perfectly good representation >>>>>language. Don't <templates> merely write XML >>> >>>text to >>> >>>>>stdout? >>>>> >>>>>No, the result tree is completely abstract, >>> >>>there is >>> >>>>>no suggestion of an >>>>>internal representation. In fact, for many XSLT >>>>>processors, the "result >>>>>tree" is represented internally as a stream of >>>>>events, not as a linked >>>>>collection of objects in memory. This concept of >>>>>writing a tree, rather than >>>>>writing text, however is extremely important. >>>>>Firstly, it defines a >>>>>separation of the information content of an XML >>>>>document from the accidental >>>>>aspects of its lexical representation - >>> >>>something >>> >>>>>that is sadly missing from >>>>>the XML spec itself. In turn, this gives you a >>> >>>basis >>> >>>>>for defining a concise >>>>>set of operators that are in some sense >>> >>>complete, >>> >>>>>composable and exhibit >>>>>closure. In practical terms, it gives you the >>>>>ability to write a series of >>>>>transformations - a pipeline - in which the >>>>>expensive steps of serializing >>>>>and parsing intermediate results can be >>> >>>eliminated. >>> >>>>> >>>>>Roughly, the process seems to work like this: >>> >>>the T >>> >>>>>processor does a >>>>>recursive descent of the source XML. At each >>> >>>node it >>> >>>>>evaluates the set of >>>>>templates. Those templates which match the name >>> >>>of >>> >>>>>the "current" tag are >>>>>processed, in some order. The template writes >>> >>>text, >>> >>>>>that's why it's called a >>>>>"template. The recursive descent is continued >>> >>>with >>> >>>>>an <apply-templates> tag >>>>>inside the template. This allows you to balance >>>>>output. >>>>> >>>>>It doesn't have to do a recursive descent of the >>>>>source XML: that's up to >>>>>the application, though a recursive descent is >>> >>>the >>> >>>>>most common design >>>>>pattern. And it definitely doesn't write text: >>>>>people who create a mental >>>>>model of writing text eventually get a rude >>>>>awakening, usually when they >>>>>first try to tackle grouping problems. >>>>> >>>>>If no matches are found, the T processor >>> >>>continues >>> >>>>>the descent. >>>>> >>>>>There is a <template> tag (I forget what) which >>> >>>will >>> >>>>>select arbitrary paths >>>>>in the souce tree, and there are tags which >>> >>>iterate >>> >>>>>through the result. >>>>> >>>>>Again, it's best to think of the stylesheet as >>>>>containing nodes >>>>>(representing instructions) rather than tags. >>>>>Consider >>>>> >>>>><xsl:element name="x"><xsl:value-of >>>>>select="."/></xsl:element> >>>>> >>>>>There are three tags there, but four nodes, and >>> >>>only >>> >>>>>two instructions. The >>>>>semantics of the language are described in terms >>> >>>of >>> >>>>>the two instructions, >>>>>not the three tags. >>>>> >>>>> This will allow me to build up a result "tree" >>>>>which is not a mirror image >>>>>of the source, something I need to do if I'm >>>>>rearranging sections of the >>>>>input document. Rather than buffering >>> >>>intermediate >>> >>>>>structures, the T >>>>>processor does multiple passes based on these >>> >>>tags, >>> >>>>>and creates the output >>>>>on-the-fly. Cool. >>>>> >>>>> ... . >>>>> >>>>>I assume there is nothing stopping me from using >>>>>XSL-T to transform my HTML >>>>>to PDF, but it seems best to output XSL-FO then >>>>>create a PDF using some kind >>>>>of tool. What is that tool? >>>>> >>>>>It's an XSL-FO processor. Examples are FOP, >>> >>>RenderX, >>> >>>>>Antenna House. >>>>> >>>>>Are there FO plug-ins available for my browsers? >>> >>>>> >>>>>No, people are by-and-large using (X)HTML/CSS >>> >>>for >>> >>>>>the browser, XSL-FO/PDF >>>>>for the printed page. >>>>> >>>>>Does this technology work? >>>>> >>>>>Absolutely yes. >>>>> >>>>>Michael Kay >>>>>http://www.saxonica.com/
|

Cart



