[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: where to look for xsl folk..

Subject: Re: where to look for xsl folk..
From: "adam adam@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 3 Jul 2016 16:42:53 -0000
Re:  where to look for xsl folk..

i also ran a series of tests because i was particularly focused on avoiding
char loss. The tests looked good but if you have any cases where you know char
loss happens I'd be very interested to learn more   ...


On July 3, 2016 9:13:02 AM PDT, "Terry Badger terry_badger@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>The document.xml I have found and worked with taken from a .docx file
>always have a prolog that has encoding="UTF-8" so I have not worried
>about invalid Unicode characters and can process any text in Word using
>an xsl stylesheet.
>Do you have a sample where a docx file has non Unicode encodings?
>Word does have some difficult structures but nothing impossible with
>xsl so far.
>On Sunday, July 3, 2016 11:14 AM, "Graydon graydon@xxxxxxxxx"
><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>On Tue, Jun 21, 2016 at 03:42:05AM -0000, adam adam@xxxxxxxxxxxxxxx
>> Rather I am looking to convert docx to HTML with xsl. No magic
>> Good enough HTML is good enough. I was looking for someone to help me
>> build this as well structured stylesheets that can be extended later.
>The really tough problem here is not "did I get good enough HTML?";
>"did any important bits of the text get lost during conversion?"  That
>one's brutal.
>The sanity-preserving way to do this is to use Libre Office to convert
>the docx to Open Document and to go from Open Document XML. The Libre
>Office "Save as HTML" facility is likely better than anything you can
>write in reasonable time; I'd be looking to take that HTML and tidy it
>to meet specific project requirements with XSLT.  (There are API hooks
>for doing this in both OpenOffice and LibreOffice.  There are hooks for
>applying XSLT as part of that process, too.)
>I can't tell you what you want to do, but I desperately do not want to
>address docx with XSLT directly, because then I, and not someone else,
>will be trying to handle the encoding issues (since XML
>I-think-version-five, the awkward cp1252 characters like 97 (em-dash)
>the smart quotes are legal XML characters, but they're not Unicode
>anything; parsing won't find them for you anymore), the specific
>peculiarities of an undocumented format intended (for sound commercial
>reasons) to be nigh-impossible to convert to other formats, or the
>various "it did what with the end notes? It displays end notes, where
>are they in the file?" problems you can hit with academic writing.
>-- Graydon

Sent from my Android device with K-9 Mail. Please excuse my brevity.

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.