[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: WordML to XML

Subject: Re: WordML to XML
From: Vasu Nanjangud <vasdeep@xxxxxxxxx>
Date: Fri, 11 Feb 2005 18:14:57 -0800 (PST)
wordml
Joris, et al...

My requirement is specifically to convert wordML to
xml. i.e. strip off the "wordML" specific tags, but
retain the "formatting instructions".

For example:
For a wordDocument with contents as "I have bold and
italics and underscore", this is the source wordML
document.
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent="no" w:embeddedObjPresent="no"
w:ocxPresent="no"
xml:space="preserve"><o:DocumentProperties><o:Title>I
have bold and italics and
underscore</o:Title><o:Author>vnanjang</o:Author><o:LastAuthor>vnanjang</o:LastAuthor><o:Revision>1</o:Revision><o:TotalTime>0</o:TotalTime><o:Created>2005-02-12T01:52:00Z</o:Created><o:LastSaved>2005-02-12T01:52:00Z</o:LastSaved><o:Pages>1</o:Pages><o:Words>5</o:Words><o:Characters>34</o:Characters><o:Company>Oracle
Corporation</o:Company><o:Lines>1</o:Lines><o:Paragraphs>1</o:Paragraphs><o:CharactersWithSpaces>38</o:CharactersWithSpaces><o:Version>11.5604</o:Version></o:DocumentProperties><w:fonts><w:defaultFonts
w:ascii="Times New Roman" w:fareast="SimSun"
w:h-ansi="Times New Roman" w:cs="Times New
Roman"/><w:font w:name="SimSun"><w:altName
w:val="e.d="/><w:panose-1
w:val="02010600030101010101"/><w:charset
w:val="86"/><w:family w:val="Auto"/><w:pitch
w:val="variable"/><w:sig w:usb-0="00000003"
w:usb-1="080E0000" w:usb-2="00000010"
w:usb-3="00000000" w:csb-0="00040001"
w:csb-1="00000000"/></w:font><w:font
w:name="@SimSun"><w:panose-1
w:val="02010600030101010101"/><w:charset
w:val="86"/><w:family w:val="Auto"/><w:pitch
w:val="variable"/><w:sig w:usb-0="00000003"
w:usb-1="080E0000" w:usb-2="00000010"
w:usb-3="00000000" w:csb-0="00040001"
w:csb-1="00000000"/></w:font></w:fonts><w:styles><w:versionOfBuiltInStylenames
w:val="4"/><w:latentStyles w:defLockedState="off"
w:latentStyleCount="156"/><w:style w:type="paragraph"
w:default="on" w:styleId="Normal"><w:name
w:val="Normal"/><w:rPr><wx:font wx:val="Times New
Roman"/><w:sz w:val="24"/><w:sz-cs w:val="24"/><w:lang
w:val="EN-US" w:fareast="ZH-CN"
w:bidi="AR-SA"/></w:rPr></w:style><w:style
w:type="character" w:default="on"
w:styleId="DefaultParagraphFont"><w:name
w:val="Default Paragraph
Font"/><w:semiHidden/></w:style><w:style
w:type="table" w:default="on"
w:styleId="TableNormal"><w:name w:val="Normal
Table"/><wx:uiName wx:val="Table
Normal"/><w:semiHidden/><w:rPr><wx:font wx:val="Times
New Roman"/></w:rPr><w:tblPr><w:tblInd w:w="0"
w:type="dxa"/><w:tblCellMar><w:top w:w="0"
w:type="dxa"/><w:left w:w="108"
w:type="dxa"/><w:bottom w:w="0" w:type="dxa"/><w:right
w:w="108"
w:type="dxa"/></w:tblCellMar></w:tblPr></w:style><w:style
w:type="list" w:default="on"
w:styleId="NoList"><w:name w:val="No
List"/><w:semiHidden/></w:style></w:styles><w:docPr><w:view
w:val="print"/><w:zoom
w:percent="100"/><w:doNotEmbedSystemFonts/><w:proofState
w:spelling="clean"
w:grammar="clean"/><w:attachedTemplate
w:val=""/><w:defaultTabStop
w:val="720"/><w:characterSpacingControl
w:val="DontCompress"/><w:optimizeForBrowser/><w:validateAgainstSchema/><w:saveInvalidXML
w:val="off"/><w:ignoreMixedContent
w:val="off"/><w:alwaysShowPlaceholderText
w:val="off"/><w:compat><w:dontAllowFieldEndSelect/><w:applyBreakingRules/><w:useWord2002TableStyleRules/><w:useFELayout/></w:compat></w:docPr><w:body><wx:sect><w:p><w:pPr><w:rPr><w:b/><w:b-cs/><w:i/><w:i-cs/><w:u
w:val="single"/></w:rPr></w:pPr><w:r><w:rPr><w:b/><w:b-cs/><w:i/><w:i-cs/><w:color
w:val="000000"/><w:u w:val="single"/></w:rPr><w:t>I
have bold and italics and
underscore</w:t></w:r></w:p><w:sectPr><w:pgSz
w:w="12240" w:h="15840"/><w:pgMar w:top="1440"
w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/><w:cols
w:space="720"/><w:docGrid
w:line-pitch="360"/></w:sectPr></wx:sect></w:body></w:wordDocument>
------------------------------------------------------



I need to write an XSLT that will give me the
following output..
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<b>
   <i>
     <u>
         I have bold, italics and underscore
     </u>
   </i>
</b>
------------------------------------------------------
Though this looks like html, html output is not what
I'm interested in. I have provided here a
simplification of my requirement. In reality, my
wordML document will have some of my custom tags and
data, like the above, will be part of these custom
tags..

For example, the output in xml could be..
------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<vasuarticletag>
<b>
   <i>
     <u>
         I have bold, italics and underscore
     </u>
   </i>
</b>
</vasuarticletag>
------------------------------------------------------
So, I need help in writing an xslt which will 
1. traverse through every "w:r" block.
2. Look for "w:rPr" tags with "w:i", "w:b" , "w:u"
children.
3. If they exist, output <i>, <b>, <u> tags, then
output the contents of the corresponding "w:t" block
and then close the <i>, <b>, <u> tags.

Requesting your help...

Regards,
Vasu


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.