XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Conferences Close Tree View
+ Stylus Studio Feature Requests (1192)
+ Stylus Studio Technical Forum (14621)
+ Website Feedback (249)
+ XSLT Help and Discussion (7625)
- XQuery Help and Discussion (2017)
-> + Issue with Processing Instruct... (2)
-> + problem converting json to XML... (2)
-> + Problem base64 decoding string... (3)
-> + Problems posting multipart for... (5)
-> + trouble with download of price... (2)
-> + Problem with http-post not bei... (3)
-> + path problem, xps_file:writeAl... (9)
-> + Xquery update support? (2)
-> + problem with Stylus studio try... (5)
-> + adding dtd reference to xml ou... (4)
-> + xquery escaping ambarsand when... (3)
-> + Whitespace problem when return... (5)
-> + Problem with namespace prefix ... (5)
-> - Sending via SFTP returns unexp... (1)
-> + Query and Sftp clent (4)
-> + xquery and try - catch (3)
-> + Query + ddtek:http-post optio... (5)
-> + Example files referenced in do... (3)
-> + Automatic Error Detection and ... (3)
-> + Working with result of ddtek:h... (2)
-- [1-20] [21-40] [41-60] Next
+ Stylus Studio FAQs (159)
+ Stylus Studio Code Samples & Utilities (364)
+ Stylus Studio Announcements (113)
Topic  
Postnext
Paul RaynerSubject: Cleaning up Word 2007 xml
Author: Paul Rayner
Date: 26 Jan 2009 10:20 AM
Hi,

I'm currently evaluating stylusstudio and marklogic.com's XML server with a view to using one or both within our company.

We need to clean up word XMl in several thousand documents, and I noticed a script at:

http://developer.marklogic.com/columns/smallchanges/2007-12-18.xqy

which makes a start towards this by combining 'runs'.

I am trying to make this script work within Stylus Studio, and have some problems I hope someone can help me with.

At the bottom of thispost I have pasted my modified version of the script. It currently produces the following error: XPTY0004, at line 103:24, which is the space after '$this' on the return statement of the local:map function.

I'm really just getting started with this, and hope to end up generating java code which can be run over a set of documents whenever needed. I'd appreciate any help anyone can give me.

Thanks,

Paul

code:

declare namespace w="http://schemas.openxmlformats.org/wordprocessingml/2006/main";



declare function local:ml-update-document-xml($doc as element(w:document)) as element(w:document)

{

local:dispatch($doc)

};



declare function local:passthru($x as node()) as node()

{

for $i in $x/node() return local:dispatch($i)

};



declare function local:dispatch ($x as node()) as node()

{



typeswitch ($x)

case element(w:p) return local:mergeruns($x)

default return (

element{fn:name($x)} {$x/@*,local:passthru($x)}

)

};



declare function local:mergeruns($p as element(w:p)) as element(w:p)

{

let $pPrvals := if(fn:exists($p/w:pPr)) then $p/w:pPr else ()

return element w:p{ $pPrvals, local:map($p/w:r[1]) }



};



declare function local:descend($r as element(w:r)?, $rToCheck as element(w:rPr)?) as element(w:r)*

{

if(fn:empty($r)) then ()

else if(fn:deep-equal($r/w:rPr,$rToCheck)) then

($r, local:descend($r/following-sibling::w:r[1], $rToCheck))

else ()

};



declare function local:map($r as element(w:r)?) as element(w:r)

{

if (fn:empty ($r)) then ()

else

let $rToCheck := $r/w:rPr



let $matches := local:descend($r/following-sibling::w:r[1], $rToCheck)

let $count := fn:count($matches)



let $this := if ($count) then

(element w:r{ $rToCheck,

element w:t { fn:string-join(($r/w:t, $matches/w:t),"") } })

else $r



return ($this, local:map( if($count) then ($r/following-sibling::w:r[1 + $count]) else $r/following-sibling::w:r[1]))

};



let $document :=

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

<w:body>

<w:p>

<w:pPr>

</w:pPr>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>Doctor Paul Pr</w:t>

</w:r>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>oteus, the man with the highe</w:t>

</w:r>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>st income in Ilium, drove his cheap and old Plymouth across the bridge to Homestead. </w:t>

</w:r>

</w:p>

</w:body>
</w:document>



return local:ml-update-document-xml($document)

Postnext
Minollo I.Subject: Cleaning up Word 2007 xml
Author: Minollo I.
Date: 26 Jan 2009 10:55 AM
I didn't try going through the whole logic of the XQuery; but to make it correct against the static typing checks that DataDirect XQuery does, you need two changes:

declare function local:ml-update-document-xml($doc as element(w:document)) as element(w:document)
{
local:dispatch($doc) treat as element(w:document)
};

[Note the "treat as" to force a cast from node() to element(w:document); in alternative you can change the return type to just node()]

...and...

declare function local:map($r as element(w:r)?) as element(w:r)?
{
if (fn:empty ($r)) then ()
else
let $rToCheck := $r/w:rPr
let $matches := local:descend($r/following-sibling::w:r[1], $rToCheck)
let $count := fn:count($matches)
let $this := if ($count) then
(element w:r{ $rToCheck,
element w:t { fn:string-join(($r/w:t/string(), $matches/w:t/string()),"") } })
else $r
return ($this, local:map( if($count) then ($r/following-sibling::w:r[1 + $count]) else $r/following-sibling::w:r[1]))
};

[Note that the return type can be an empty sequence, which implies you need to add a "?" to the return type; and the explicit use of string() when you use string-join()]

Posttop
Paul RaynerSubject: Cleaning up Word 2007 xml
Author: Paul Rayner
Date: 27 Jan 2009 03:35 AM
Thank You - that works on a simple document embedded in the query. I'm now going through the code to make it work on a full word XML document.

Just a thought, has anyone done this before? Are there any existing scripts for cleaning up the rubbish in word XML files using Stylus Studio?

   
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.