[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: openxml and XSL-T 2.0

Subject: Re: openxml and XSL-T 2.0
From: "Jesper Tverskov" <jesper@xxxxxxxxxxx>
Date: Wed, 31 Oct 2007 09:29:48 +0100
Re:  openxml and XSL-T 2.0
Transforming WordprocessingML to XHTML is two different things. A file
format to file format transformation, with any information in docx
also in the XHTML file, non of my business, or setting up restrictions
for the author and transforming selected functionality of MS Word to
beautiful webpages.

For me it is important the XHTML lives up to all guidelines of
usability, accessibility and best web practices.

There should be as few restrictions as possible. Basically I only
demand that MS Word's styles for heading levels to be used. They are
easy to transform to h1-h6.

Also I just ignore a lot of stuff irrelevant to web pages like headers
and footers, page margins, etc. I would accept inserted Excel and
convert it to tables. In my opinion it is not that difficult to cover
any functionality in MS Word that most people do most of the time, but
there is also a lot of stuff like drawings, that could be converted to
svg but I am not doing it at the moment, I only transform to what will
work in a standard browser.

I know of no tutorials except at Microsoft's own websites, and
http://openxmldeveloper.org.

I learned a lot from books about XML in MS Office 2003, but I don't
know about books for 2007. Most of the 2003 stuff about
WordProcessingML is still relevant.

I my opinion only the zipped docx 2007 format is what should be
transformed, but to get started with the XSLT templates and the
learning process one should probably use the "one xml file" file
format MS Word 2007 can also save to.

XMLSpy 2008 has excellent support for the zipped formats. The other
XML Editors will probably get it soon. The programming languages you
use probably already have classes to handle the zipped formats. In
.net you must install Framework 3.0, and get to know the classes that
can open and read or create the zipped XML formats.

You should go for a handful of transformations not just one. When
saved WordprocessingML keeps track of a lot of information completely
irrelevant to the transformation, in most cases, like what words have
been spell-checked, what language MS Word thinks a word belong to,
etc. Start with an identity transform to get rid of as much irrelevant
stuff you discover along the way to make it easier to read the
WordProcessingML files.

Also you need to transform a lot of the details in MS Word to CSS. I
transform it to the style attribute for each XHTML element, that is
you must make templates that can handle anything that should go into
the style attribute for a specific element in one go. But at the very
end I make another transformation, consolidating all the style
attributes into CSS classes and generate an external CSS stylesheet.

Also I first transform to a basic XHTML format, a data store. At the
very end I do another transformation transforming the basic XHTML file
to more advanced XHTML for presentation, generating TOC, footnote
section, numbering, etc.

I will stop for now but I am planning to publish tutorials for all the
necessary XSLT templates in a docx2xhtml section at www.xmlplease.com.



Cheers,
Jesper Tverskov

www.xmlkurser.dk
www.xmlplease.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.