[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: creating a collection from an archive

Subject: Re: creating a collection from an archive
From: "Terry Badger terry_badger@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 19 Apr 2018 20:52:35 -0000
Re:  creating a collection from an archive
Here is part of the solution you what I think. Think I used the various
namespaces later. This works in the current version of Oxygen.B There is an
xml file in Word that is a manifest of all the files in the zip(Word) and you
could extract that then use that it get the names of the other files..
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"
 version="2.0" xmlns:file="java.io.File"
xmlns:StringUtils="java:org.apache.commons.lang.StringUtils"
xmlns:class="http://saxon.sf.net/java-type"
 xmlns:ZipFile="java.util.zip.ZipFile"
xmlns:ZipInputStream="java.util.zip.ZipInputStream">
 <xsl:output indent="yes"/>
 <!-- ========================================= -->
 <xsl:template name="main" match="/">
 <xsl:variable name="doc-content"
select="doc('jar:file:///G:/Badger/xslt-with-java/XML_Projects.docx!/word/doc
ument.xml')"/>
 <xsl:result-document href="document.xml">
 <xsl:copy-of select="$doc-content"/>
 </xsl:result-document>
 </xsl:template>
</xsl:stylesheet>

Terry

    On bThursdayb, bAprilb b19b, b2018b
b02b:b07b:b56b bPM, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

 So I have a Word document, localtest.docx, which is in the 2016 strict
version of the OOXML standard.B  As such, it's a zip archive of a bunch
of XML files.B  I want to apply XSLT to the XML files.

I could use the arch module and the collection function to write the whole
thing to disk and then load it from disk as a collection before doing
whatever
to it and writing it to disk as an archive again, but this seems inefficient.
It would be better to read the archive into an in-memory collection,
manipulate
it, and then write that back out as an archive.

I'm using XSLT 3.0 via Saxon 9.8.0.8 in oXygen.

<xsl:variable name="wordArchive" as="document-node()+">
B  <xsl:variable name="arch" select="file:read-binary($wordArchiveURI)"/>
B  <xsl:variable name="entries" select="arch:entries($arch)"/>
B  <xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/>
B  <xsl:sequence select="for $x in ($entries except $dirs)
B  B  B  B  B  B  B  B  B  B  B  return arch:extract-text($arch,$x) =>
parse-xml()" />
</xsl:variable>

works, in that I get a sequence of document nodes and those documents have
the
expected XML content.

I don't get document nodes with associated document-uri() values or any of
the
rest of the archive structure.B  Those URIs are in the values returned by
arch:entries but I'm not seeing how I assign a document-uri value to a
document
node.B  xsl:document doesn't seem to have a facility for assigning a
document-uri value and of course you can't create an attribute whose parent
is
a document node even if document-uri was an attribute in the first place.

What I want is a collection where the structure matches the Word archive,
various subdirectories and all, and I can use the doc() function to access
various compontent documents.B  I can't shake the feeling that I'm missing
something obvious, but this feeling is no help in discerning what the obvious
thing is!

Thanks!
Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.