XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Paul RaynerSubject: Cleaning up Word 2007 xml
Author: Paul Rayner
Date: 26 Jan 2009 10:20 AM
Hi,

I'm currently evaluating stylusstudio and marklogic.com's XML server with a view to using one or both within our company.

We need to clean up word XMl in several thousand documents, and I noticed a script at:

http://developer.marklogic.com/columns/smallchanges/2007-12-18.xqy

which makes a start towards this by combining 'runs'.

I am trying to make this script work within Stylus Studio, and have some problems I hope someone can help me with.

At the bottom of thispost I have pasted my modified version of the script. It currently produces the following error: XPTY0004, at line 103:24, which is the space after '$this' on the return statement of the local:map function.

I'm really just getting started with this, and hope to end up generating java code which can be run over a set of documents whenever needed. I'd appreciate any help anyone can give me.

Thanks,

Paul

code:

declare namespace w="http://schemas.openxmlformats.org/wordprocessingml/2006/main";



declare function local:ml-update-document-xml($doc as element(w:document)) as element(w:document)

{

local:dispatch($doc)

};



declare function local:passthru($x as node()) as node()

{

for $i in $x/node() return local:dispatch($i)

};



declare function local:dispatch ($x as node()) as node()

{



typeswitch ($x)

case element(w:p) return local:mergeruns($x)

default return (

element{fn:name($x)} {$x/@*,local:passthru($x)}

)

};



declare function local:mergeruns($p as element(w:p)) as element(w:p)

{

let $pPrvals := if(fn:exists($p/w:pPr)) then $p/w:pPr else ()

return element w:p{ $pPrvals, local:map($p/w:r[1]) }



};



declare function local:descend($r as element(w:r)?, $rToCheck as element(w:rPr)?) as element(w:r)*

{

if(fn:empty($r)) then ()

else if(fn:deep-equal($r/w:rPr,$rToCheck)) then

($r, local:descend($r/following-sibling::w:r[1], $rToCheck))

else ()

};



declare function local:map($r as element(w:r)?) as element(w:r)

{

if (fn:empty ($r)) then ()

else

let $rToCheck := $r/w:rPr



let $matches := local:descend($r/following-sibling::w:r[1], $rToCheck)

let $count := fn:count($matches)



let $this := if ($count) then

(element w:r{ $rToCheck,

element w:t { fn:string-join(($r/w:t, $matches/w:t),"") } })

else $r



return ($this, local:map( if($count) then ($r/following-sibling::w:r[1 + $count]) else $r/following-sibling::w:r[1]))

};



let $document :=

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

<w:body>

<w:p>

<w:pPr>

</w:pPr>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>Doctor Paul Pr</w:t>

</w:r>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>oteus, the man with the highe</w:t>

</w:r>

<w:r>

<w:rPr>

<w:i />

</w:rPr>

<w:t>st income in Ilium, drove his cheap and old Plymouth across the bridge to Homestead. </w:t>

</w:r>

</w:p>

</w:body>
</w:document>



return local:ml-update-document-xml($document)

Postnext
Minollo I.Subject: Cleaning up Word 2007 xml
Author: Minollo I.
Date: 26 Jan 2009 10:55 AM
I didn't try going through the whole logic of the XQuery; but to make it correct against the static typing checks that DataDirect XQuery does, you need two changes:

declare function local:ml-update-document-xml($doc as element(w:document)) as element(w:document)
{
local:dispatch($doc) treat as element(w:document)
};

[Note the "treat as" to force a cast from node() to element(w:document); in alternative you can change the return type to just node()]

...and...

declare function local:map($r as element(w:r)?) as element(w:r)?
{
if (fn:empty ($r)) then ()
else
let $rToCheck := $r/w:rPr
let $matches := local:descend($r/following-sibling::w:r[1], $rToCheck)
let $count := fn:count($matches)
let $this := if ($count) then
(element w:r{ $rToCheck,
element w:t { fn:string-join(($r/w:t/string(), $matches/w:t/string()),"") } })
else $r
return ($this, local:map( if($count) then ($r/following-sibling::w:r[1 + $count]) else $r/following-sibling::w:r[1]))
};

[Note that the return type can be an empty sequence, which implies you need to add a "?" to the return type; and the explicit use of string() when you use string-join()]

Posttop
Paul RaynerSubject: Cleaning up Word 2007 xml
Author: Paul Rayner
Date: 27 Jan 2009 03:35 AM
Thank You - that works on a simple document embedded in the query. I'm now going through the code to make it work on a full word XML document.

Just a thought, has anyone done this before? Are there any existing scripts for cleaning up the rubbish in word XML files using Stylus Studio?

 
Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.