[Home] [By Thread] [By Date] [Recent Entries]
At 2005-07-11 21:42 +0300, Joe Schaffner wrote: >XSL - XML Stylesheets is divided into two parts, XSL-T and XSL-FO. > >The T part deals with templates and translation. Since HTML is valid XML, (it isn't) >I guess I can parse my HTML using XSL-T to produce XML and vice versa. No, XSLT will only take in XML, not HTML. Since XHTML is XML then you can use XHTML as an input, but XSLT 1.0 does not accept SGML as input. >I don't understand why XSL-T refers to "nodes in an output tree". Because not all needs for transformation require a concrete representation of the result in angle brackets. >This suggests some kind of internal representation, but XML is perfectly >good representation language. A perfectly good concrete representation, yes, but if the downstream process can act on the abstract representation of a node tree, why bother with a concrete representation of angle brackets? >Don't <templates> merely write XML text to stdout? Not at all ... templates express the nodes that are added to the abstract result tree. If you want to serialize the abstract result tree into concrete angle brackets you get XML, or you can request SGML angle brackets assuming the HTML vocabulary (if it is supported by the processor), or you can request simple text output without any markup (again if it is supported by the processor). If you don't need a concrete representation, and the downstream process (say, for example, an XSL-FO formatter) can work with the output abstract node tree as an input abstract node tree (quelle surprise!) and act on the transformed information with nary an angle bracket in sight. >Roughly, the process seems to work like this: the T processor does a >recursive descent of the source XML. Not necessarily ... it is up to the stylesheet to decide how to descend into the tree. If the stylesheet doesn't say anything about a given element, however, the built-in processing is in fact a depth-first breadth-next descent. >At each node it evaluates the set of templates. Those templates which >match the name of the "current" tag are processed, in some order. False ... exactly one template matches a given input node based on the XPath data model. The data model models elements, attributes, processing instructions, comments, text and namespaces ... it does not model tags. If the stylesheet should happen to present two templates that both happen to match a given input node of any kind, that is an error condition. Facilities and techniques are available in the language to specialize two templates as being distinct, but however much attempt has been made, it is still a "template conflict error" should the stylesheet present two templates for one node. >The template writes text, that's why it's called a "template. False. The template expresses those nodes that are to be added to the result tree at the given point of result tree construction. It is a template of the result tree fragment. It doesn't "write text". >The recursive descent is continued with an <apply-templates> tag inside >the template. This allows you to balance output. Not sure what you mean by the verb "balance" here ... the stylesheet writer is obliged to express the transform such that the result tree be constructed in result parse order in a single pass. Note the "single pass" business: once you have constructed part of the result tree there is no "going back" to massage your result tree ... you only have one chance to write out each output node (with the exception of the current set of attributes for an element whose content has not yet begun). >If no matches are found, the T processor continues the descent. If there are no stylesheet templates to match a given node, there are built-in templates to match any given node, and for elements the descent is continued. >There is a <template> tag (I forget what) which will select arbitrary >paths in the souce tree, and there are tags which iterate through the >result. This will allow me to build up a result "tree" which is not a >mirror image of the source, something I need to do if I'm rearranging >sections of the input document. Rather than buffering intermediate >structures, the T processor does multiple passes based on these tags, and >creates the output on-the-fly. Cool. Templates don't select arbitrary paths, the stylesheet requests to push and pull nodes address nodes using arbitrary address expressions into the XPath data model used for XML documents. >Then there is the DOM - Document Object Model... Which is a different document model for XML documents than the XPath data model is for XML documents. For example, the DOM models CDATA sections while XPath does not. >but my XML already defines the document's object model. It's whatever I >decide it to be. Why the confusion? Are you referring to parsing an XML >document using some kind of programming environment to produce a tree in >some kind of intermediate representation? Are there C libs/structs or Java >classes or Perl modules which tear apart an XML document? Maybe in somebody's program, but not in XSLT. I'm getting confused by your reference to an "object model" ... XML defines a syntax for structured documents, XPath defines a data model for the information expressed in XML syntax, DOM defines a different data model for the information expressed in XML syntax. >This would allow someone to build an XML parser into any configuration >program. Not sure what you mean by that. >The FO of XSL parts looks alot like Knuth's TeX, a graphical document >description language, set in XML, incredibly complex. If you find it complex, then perhaps you haven't looked at it closely enough. It is a set of pagination semantics for both flow and formatting, expressed in an XML vocabulary of elements and attributes. It is well designed and easy to teach and easy to use (I admit the W3C spec is not that easy to read ... it was written more for XSL-FO engine implementers than for stylesheet writers). >I guess there are FO display tools out there (called "formatters") that >interpret FO. There are many, both commercial and free, both expensive and affordable. >[Postscript was a procedural language for graphical page description. PDF >is a data definition language which takes the procedures out of >Postscript. FO is a data definiton language which describes the appearance >of an XML document as richly as these other languages.] It is more abstract ... XSL-FO does not say anything about rendering, only about formatting ... deciding what goes where on a page, not how what goes where is expressed in a page description language. Many tools offer PDF and some tools offer PostScript and other tools offer other output expressions. >[Does anybody remember what a dvi representation did? I think it stood for >"device indepent" but TeX was already device independent.] > >I had a ghostscript viewer which seemed to understand dvi files, and >postscript too. It was quite clumsy. > >I assume there is nothing stopping me from using XSL-T to transform my >HTML to PDF, First, as mentioned above, HTML isn't an input to XSLT ... but even if you input XHTML putting out PDF by XSLT would seem to me to be quite awkward. >but it seems best to output XSL-FO then create a PDF using some kind of tool. Absolutely. > What is that tool? See http://www.xmlsoftware.com/xslfo.html for a starting point. >Are there FO plug-ins available for my browsers? Doubtful ... since browsers output in arbitrary and changeable window sizes and XSL-FO pagination is geared to fixed-sized-folios (pages) of information, I haven't yet seen a browser-based implementation of XSL-FO. Note that there is, in fact, a way in the XSL-FO specification to turn off pagination and be supported in a dynamically-sized window ... but I haven't seen that implemented. >Does this technology work? Absolutely! I have customers producing millions of pages of customized printed output based on XML inputs exported from databases of information. It is very professional and it gives my customers a technical edge ahead of their competition. I author all my training and book materials in XML and use XSLT and XSL-FO to go to camera-ready output. We claim that my "Definitive XSLT and XPath" book published by Prentice-Hall is the first bookstore-shelf-published-book that was produced end-to-end in XML because we wrote XSL-FO stylesheets to lay out the pages and supplied PDF files to PH for publication with the final camera-ready output. We wrote an annex in the book describing our process. My "Definitive XSL-FO" book then just re-used the stylesheets and that production process was much quicker. I use different stylesheets to publish the same content as PDF books for sale on my web site. I use those also to publish my student handouts for my hands-on courses. All this from the same set of XML inputs. I hope this helps. . . . . . . . Ken -- World-wide on-site corporate, govt. & user group XML/XSL training. G. Ken Holman mailto:gkholman@C... Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995) Male Breast Cancer Awareness http://www.CraneSoftwrights.com/x/bc Legal business disclaimers: http://www.CraneSoftwrights.com/legal
|

Cart



