Here is part of the solution you what I think. Think I used the various
namespaces later. This works in the current version of Oxygen.B There is an
xml file in Word that is a manifest of all the files in the zip(Word) and you
could extract that then use that it get the names of the other files..
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"
version="2.0" xmlns:file="java.io.File"
xmlns:StringUtils="java:org.apache.commons.lang.StringUtils"
xmlns:class="http://saxon.sf.net/java-type"
xmlns:ZipFile="java.util.zip.ZipFile"
xmlns:ZipInputStream="java.util.zip.ZipInputStream">
<xsl:output indent="yes"/>
<!-- ========================================= -->
<xsl:template name="main" match="/">
<xsl:variable name="doc-content"
select="doc('jar:file:///G:/Badger/xslt-with-java/XML_Projects.docx!/word/doc
ument.xml')"/>
<xsl:result-document href="document.xml">
<xsl:copy-of select="$doc-content"/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
Terry
On bThursdayb, bAprilb b19b, b2018b
b02b:b07b:b56b bPM, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
So I have a Word document, localtest.docx, which is in the 2016 strict
version of the OOXML standard.B As such, it's a zip archive of a bunch
of XML files.B I want to apply XSLT to the XML files.
I could use the arch module and the collection function to write the whole
thing to disk and then load it from disk as a collection before doing
whatever
to it and writing it to disk as an archive again, but this seems inefficient.
It would be better to read the archive into an in-memory collection,
manipulate
it, and then write that back out as an archive.
I'm using XSLT 3.0 via Saxon 9.8.0.8 in oXygen.
<xsl:variable name="wordArchive" as="document-node()+">
B <xsl:variable name="arch" select="file:read-binary($wordArchiveURI)"/>
B <xsl:variable name="entries" select="arch:entries($arch)"/>
B <xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/>
B <xsl:sequence select="for $x in ($entries except $dirs)
B B B B B B B B B B B return arch:extract-text($arch,$x) =>
parse-xml()" />
</xsl:variable>
works, in that I get a sequence of document nodes and those documents have
the
expected XML content.
I don't get document nodes with associated document-uri() values or any of
the
rest of the archive structure.B Those URIs are in the values returned by
arch:entries but I'm not seeing how I assign a document-uri value to a
document
node.B xsl:document doesn't seem to have a facility for assigning a
document-uri value and of course you can't create an attribute whose parent
is
a document node even if document-uri was an attribute in the first place.
What I want is a collection where the structure matches the Word archive,
various subdirectories and all, and I can use the doc() function to access
various compontent documents.B I can't shake the feeling that I'm missing
something obvious, but this feeling is no help in discerning what the obvious
thing is!
Thanks!
Graydon
|