[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Another XML parsing idea? Was: Re: XML Hangover)
I think HT said that there's always an overhead if you have to cross a thread boundary, and that they try to avoid it whenever possible. Crossing a process or machine boundary would be far worse. Michael Kay > -----Original Message----- > From: Bob Foster [mailto:bob@o...] > Sent: 13 July 2005 22:18 > To: Michael Kay > Cc: xml-dev@l... > Subject: Re: Another XML parsing idea? Was: Re: > XML Hangover) > > I don't know the internals (maybe someone can comment) but I believe > Markup Technology has a protocol for passing PSVI around. It seems > pretty darn fast. > > Bob Foster > > Michael Kay wrote: > > A protocol implies sending and receiving messages > typically across a > process > > boundary or even a machine boundary. This would raise the > cost of XML > > parsing by a couple of orders of magnitude. > > > > Michael Kay > > > > > >>-----Original Message----- > >>From: Mukul Gandhi [mailto:mukul_gandhi@y...] > >>Sent: 13 July 2005 20:01 > >>To: Michael Kay; 'Pete Cordell'; xml-dev@l... > >>Subject: Another XML parsing idea? Was: Re: > >> XML Hangover) > >> > >>Today, we have a paradigm in XML parsing of using APIs > >>like SAX or DOM. I was thinking of another approach to > >>parse XML documents. > >> > >>Can we have a protocol (instead of API) that will talk > >>between a application and the XML parser? This shall > >>make using a XML parser interoperable to the calling > >>application.. We could achieve this "we could have a > >>Microsoft XML parser serving Java program's XML > >>parsing request.." > >> > >>Just now we have APIs like SAX and DOM and proprietary > >>Microsoft APIs.. Had we had some protocol similar to > >>HTTP, that talked between a application and parser, it > >>may help interoperability.. > >> > >>Is this sensible thinking? Is this idea conceptually > >>similar to StAX or .NET XmlReader parsing approach? > >> > >>Regards, > >>Mukul > >> > >>--- Michael Kay <mike@s...> wrote: > >> > >> > >>>The URL got truncated > >>> > >>> > >> > > >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk- > paper.html > >> > >>>with ".html" at the end. > >>> > >>>Michael Kay > >>> > >>> > >>>>-----Original Message----- > >>>>From: Mukul Gandhi [mailto:mukul_gandhi@y...] > >>> > >>>>Sent: 13 July 2005 10:02 > >>>>To: Michael Kay; 'Pete Cordell'; > >>> > >>>xml-dev@l... > >>> > >>>>Subject: RE: XSL for non-XML input (Was: > >>> > >>>Re: > >>> > >>>> XML Hangover) > >>>> > >>>>Hi Mike, > >>>> I get error > >>>>HTTP 404 - File not found > >>>> > >>>>--- Michael Kay <mike@s...> wrote: > >>>> > > >>http://www.idealliance.org/proceedings/xml04/papers/111/mhk- > paper.htm > >> > >>>>Regards, > >>>>Mukul > >>>> > >>>> > >>>>> > >>>>>Michael Kay > >>>>> > >>>>> > >>>>>Going further, observing the idea of using out > >>> > >>>of > >>> > >>>>>band data (e.g. schema) to > >>>>>provide extra information to complete 'binary > >>> > >>>XML', > >>> > >>>>>could XSL (with suitable > >>>>>front ends) work on say an ASN.1 encoded X.509 > >>>>>certificate (and ASN.1 > >>>>>message definition) and produce, say, a PDF > >>> > >>>output? > >>> > >>>>> > >>>>>Not that I have a need to do that right now! > >>> > >>>I'm > >>> > >>>>>just interested to know > >>>>>whether XSL can be used as a kind of universal > >>> > >>>data > >>> > >>>>>translator. > >>>>> > >>>>>Thanks, > >>>>> > >>>>>Pete. > >>>>>-- > >>>>>============================================= > >>>>>Pete Cordell > >>>>>Tech-Know-Ware Ltd > >>>>> > >>>> > >>----------------------------------------------------------------- > >> > >>>>> for XML to C++ data > >>> > >>>binding > >>> > >>>>>visit > >>>>> > >>>>>http://www.tech-know-ware.com/lmx > >>>>> (or > >>> > >>>http://www.xml2cpp.com) > >>> > >>>>>============================================= > >>>>> > >>>>> > >>>>>----- Original Message ----- > >>>>>From: Michael Kay <mailto:mike@s...> > >>>>>To: 'Joe Schaffner' > >>> > >>><mailto:schaffner.joe@g...> > >>> > >>>>> ; > >>>>>xml-dev@l... > >>>>>Sent: Monday, July 11, 2005 9:00 PM > >>>>>Subject: RE: XML Hangover > >>>>> > >>>>> > >>>>> > >>>>>I've been reading the XML litterature. It's > >>> > >>>great. > >>> > >>>>>Just a few comments: > >>>>> > >>>>>Welcome on board. It's refreshing to get > >>> > >>>thoughtful > >>> > >>>>>comments from someone > >>>>>who's new to the game. > >>>>> > >>>>>XSL - XML Stylesheets is divided into two parts, > >>>>>XSL-T and XSL-FO. > >>>>> > >>>>>The T part deals with templates and translation. > >>>>>Since HTML is valid XML, I > >>>>>guess I can parse my HTML using XSL-T to produce > >>> > >>>XML > >>> > >>>>>and vice versa. I don't > >>>>>understand why XSL-T refers to "nodes in an > >>> > >>>output > >>> > >>>>>tree". This suggests some > >>>>>kind of internal representation, but XML is > >>>>>perfectly good representation > >>>>>language. Don't <templates> merely write XML > >>> > >>>text to > >>> > >>>>>stdout? > >>>>> > >>>>>No, the result tree is completely abstract, > >>> > >>>there is > >>> > >>>>>no suggestion of an > >>>>>internal representation. In fact, for many XSLT > >>>>>processors, the "result > >>>>>tree" is represented internally as a stream of > >>>>>events, not as a linked > >>>>>collection of objects in memory. This concept of > >>>>>writing a tree, rather than > >>>>>writing text, however is extremely important. > >>>>>Firstly, it defines a > >>>>>separation of the information content of an XML > >>>>>document from the accidental > >>>>>aspects of its lexical representation - > >>> > >>>something > >>> > >>>>>that is sadly missing from > >>>>>the XML spec itself. In turn, this gives you a > >>> > >>>basis > >>> > >>>>>for defining a concise > >>>>>set of operators that are in some sense > >>> > >>>complete, > >>> > >>>>>composable and exhibit > >>>>>closure. In practical terms, it gives you the > >>>>>ability to write a series of > >>>>>transformations - a pipeline - in which the > >>>>>expensive steps of serializing > >>>>>and parsing intermediate results can be > >>> > >>>eliminated. > >>> > >>>>> > >>>>>Roughly, the process seems to work like this: > >>> > >>>the T > >>> > >>>>>processor does a > >>>>>recursive descent of the source XML. At each > >>> > >>>node it > >>> > >>>>>evaluates the set of > >>>>>templates. Those templates which match the name > >>> > >>>of > >>> > >>>>>the "current" tag are > >>>>>processed, in some order. The template writes > >>> > >>>text, > >>> > >>>>>that's why it's called a > >>>>>"template. The recursive descent is continued > >>> > >>>with > >>> > >>>>>an <apply-templates> tag > >>>>>inside the template. This allows you to balance > >>>>>output. > >>>>> > >>>>>It doesn't have to do a recursive descent of the > >>>>>source XML: that's up to > >>>>>the application, though a recursive descent is > >>> > >>>the > >>> > >>>>>most common design > >>>>>pattern. And it definitely doesn't write text: > >>>>>people who create a mental > >>>>>model of writing text eventually get a rude > >>>>>awakening, usually when they > >>>>>first try to tackle grouping problems. > >>>>> > >>>>>If no matches are found, the T processor > >>> > >>>continues > >>> > >>>>>the descent. > >>>>> > >>>>>There is a <template> tag (I forget what) which > >>> > >>>will > >>> > >>>>>select arbitrary paths > >>>>>in the souce tree, and there are tags which > >>> > >>>iterate > >>> > >>>>>through the result. > >>>>> > >>>>>Again, it's best to think of the stylesheet as > >>>>>containing nodes > >>>>>(representing instructions) rather than tags. > >>>>>Consider > >>>>> > >>>>><xsl:element name="x"><xsl:value-of > >>>>>select="."/></xsl:element> > >>>>> > >>>>>There are three tags there, but four nodes, and > >>> > >>>only > >>> > >>>>>two instructions. The > >>>>>semantics of the language are described in terms > >>> > >>>of > >>> > >>>>>the two instructions, > >>>>>not the three tags. > >>>>> > >>>>> This will allow me to build up a result "tree" > >>>>>which is not a mirror image > >>>>>of the source, something I need to do if I'm > >>>>>rearranging sections of the > >>>>>input document. Rather than buffering > >>> > >>>intermediate > >>> > >>>>>structures, the T > >>>>>processor does multiple passes based on these > >>> > >>>tags, > >>> > >>>>>and creates the output > >>>>>on-the-fly. Cool. > >>>>> > >>>>> ... . > >>>>> > >>>>>I assume there is nothing stopping me from using > >>>>>XSL-T to transform my HTML > >>>>>to PDF, but it seems best to output XSL-FO then > >>>>>create a PDF using some kind > >>>>>of tool. What is that tool? > >>>>> > >>>>>It's an XSL-FO processor. Examples are FOP, > >>> > >>>RenderX, > >>> > >>>>>Antenna House. > >>>>> > >>>>>Are there FO plug-ins available for my browsers? > >>> > >>>>> > >>>>>No, people are by-and-large using (X)HTML/CSS > >>> > >>>for > >>> > >>>>>the browser, XSL-FO/PDF > >>>>>for the printed page. > >>>>> > >>>>>Does this technology work? > >>>>> > >>>>>Absolutely yes. > >>>>> > >>>>>Michael Kay > >>>>>http://www.saxonica.com/ > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|