[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XPath Question (related to Java)

Subject: RE: XPath Question (related to Java)
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 25 Jun 2007 23:03:13 +0100
RE:  XPath Question (related to Java)
I would certainly tend to do this in XSLT unless I needed to (and had time
to) make it ultra-efficient in which case a Java solution might be faster.

I would never attempt to hand-parse XML, but there are cases where combining
several XML documents into one big document "by hand" is perfectly OK,
including a bit of manipulation like stripping off the XML declaration - so
long as you are confident the files all use the same encoding, don't use
internal DTDs, and so on.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Grant Slade [mailto:grant.slade@xxxxxxxxx] 
> Sent: 25 June 2007 00:33
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  XPath Question (related to Java)
> 
> Hi Michael - thanks for the heads up.  Maybe I can ask you 
> and the group a more general question.  What I was trying to 
> do was go through a file of dictionary terms, read in the 
> terms one at a time and then add them to a 3rd party native 
> xml database application that takes a well-formed xml 
> document (but in String format, thus my trying to get the 
> information from it in String format).  I have been trying to 
> be a good student of XML and learn the APIs, but I am 
> wondering if in some cases it is better to just parse it as a 
> string, such as in this case where it needs to retain to 
> remain the tagging.  Or maybe xslt would have been a better 
> option to go with from the beginning?
> 
> On 6/24/07, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > In the XPath data model, you see nodes rather than markup. 
> That's why 
> > there's no "<" present. Instead, the Definition element will have a 
> > child that is a <sub> element.
> >
> > Evaluating the expression as a string will give you the 
> string value 
> > of the node, this is the concatenation of all the contained text, 
> > ignoring the markup.
> >
> > You seem to want to serialize the node as XML, to reinstate 
> the markup.
> > There's no direct way of doing that in the XPath API; you probably 
> > have to do an identity transformation from a DOMSource 
> containing the 
> > node to a StreamResult. (You'll have to change your call to 
> retrieve a 
> > NODESET rather than a STRING). Alternatively there may be a method 
> > such as toXML() on the DOM Node object - I've forgotten.
> >
> > Michael Kay
> > http://www.saxonica.com/
> >
> > > -----Original Message-----
> > > From: Grant Slade [mailto:grant.slade@xxxxxxxxx]
> > > Sent: 24 June 2007 19:03
> > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > > Subject:  XPath Question (related to Java)
> > >
> > > Hi, I have the following xml which gets read from a file 
> as part of 
> > > a Node:
> > >             <Definition> An organic compound in which the 
> aldehyde 
> > > group (HC=O) is connected to a branched or unbranched 
> open chain of 
> > > carbon atoms rather than a ring.
> > > Some aldehydes are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly ozone (O<sub>3</sub>), with natural 
> > > organic matter. </Definition>
> > >
> > > When I run it through the following method  it ignores the
> > > <sub></sub>:
> > >       public String getDefinitionFromNode(Node node) throws 
> > > javax.xml.xpath.XPathExpressionException
> > >       {
> > >             XPath xpath = XPathFactory.newInstance().newXPath();
> > >             String definitionExpression = "Definition";
> > >             String definition = (String) 
> > > xpath.evaluate(definitionExpression, node, XPathConstants.STRING);
> > >             if(definition.contains("<"))
> > >                   System.out.println ("found a <");
> > >             else
> > >             {
> > >                   System.out.println ("did not find a <");
> > >             }
> > >             return definition;
> > >       }
> > >
> > > When the program runs, it outputs the following:
> > >
> > > did not find a <
> > > --------------------------------
> > > <dictionary n=""><TermName>aliphatic 
> > > aldehyde</TermName><Definition>An organic compound in which the 
> > > aldehyde group (HC=O) is connected to a branched or 
> unbranched open 
> > > chain of carbon atoms rather than a ring.
> > > Some aldehydes are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly ozone (O3), with natural organic 
> > > matter.</Definition></dictionary>
> > >
> > > How do I get it to output the <sub></sub> elements?
> > >
> > > The complete node is:
> > >         <Term>
> > >             <Entry> aliphatic aldehyde </Entry>
> > >             <Definition> An organic compound in which the 
> aldehyde 
> > > group (HC=O) is connected to a
> > >                 branched or unbranched open chain of carbon atoms 
> > > rather than a ring. Some aldehydes
> > >                 are created during the reactions of 
> oxidants used as 
> > > disinfectants, particularly
> > >                 ozone (O<sub>3</sub>), with natural 
> organic matter.
> > > </Definition>
> > >             <SeeAlso>disinfection by-product</SeeAlso>
> > >             <IMAGE fileName="A-17.gif"/>
> > >         </Term>

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.