[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: openxml and XSL-T 2.0
Transforming WordprocessingML to XHTML is two different things. A file format to file format transformation, with any information in docx also in the XHTML file, non of my business, or setting up restrictions for the author and transforming selected functionality of MS Word to beautiful webpages. For me it is important the XHTML lives up to all guidelines of usability, accessibility and best web practices. There should be as few restrictions as possible. Basically I only demand that MS Word's styles for heading levels to be used. They are easy to transform to h1-h6. Also I just ignore a lot of stuff irrelevant to web pages like headers and footers, page margins, etc. I would accept inserted Excel and convert it to tables. In my opinion it is not that difficult to cover any functionality in MS Word that most people do most of the time, but there is also a lot of stuff like drawings, that could be converted to svg but I am not doing it at the moment, I only transform to what will work in a standard browser. I know of no tutorials except at Microsoft's own websites, and http://openxmldeveloper.org. I learned a lot from books about XML in MS Office 2003, but I don't know about books for 2007. Most of the 2003 stuff about WordProcessingML is still relevant. I my opinion only the zipped docx 2007 format is what should be transformed, but to get started with the XSLT templates and the learning process one should probably use the "one xml file" file format MS Word 2007 can also save to. XMLSpy 2008 has excellent support for the zipped formats. The other XML Editors will probably get it soon. The programming languages you use probably already have classes to handle the zipped formats. In .net you must install Framework 3.0, and get to know the classes that can open and read or create the zipped XML formats. You should go for a handful of transformations not just one. When saved WordprocessingML keeps track of a lot of information completely irrelevant to the transformation, in most cases, like what words have been spell-checked, what language MS Word thinks a word belong to, etc. Start with an identity transform to get rid of as much irrelevant stuff you discover along the way to make it easier to read the WordProcessingML files. Also you need to transform a lot of the details in MS Word to CSS. I transform it to the style attribute for each XHTML element, that is you must make templates that can handle anything that should go into the style attribute for a specific element in one go. But at the very end I make another transformation, consolidating all the style attributes into CSS classes and generate an external CSS stylesheet. Also I first transform to a basic XHTML format, a data store. At the very end I do another transformation transforming the basic XHTML file to more advanced XHTML for presentation, generating TOC, footnote section, numbering, etc. I will stop for now but I am planning to publish tutorials for all the necessary XSLT templates in a docx2xhtml section at www.xmlplease.com. Cheers, Jesper Tverskov www.xmlkurser.dk www.xmlplease.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|