[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Parsing address data from PAR and BREAK

Subject: RE: Parsing address data from PAR and BREAK
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 26 Jan 2009 21:36:27 -0000
RE:  Parsing address data from PAR and BREAK
For this kind of problem an awful lot depends on how regular the input is -
how closely do the other examples you have to process match the example you
have shown us? For example, extracting the "zipcode" is going to be quite
difficult if the data includes addresses from a variety of different
countries with different conventional address formats.

For the data you've shown, it's something like this:

 <xsl:template match="par">
  <address><xsl:value-of select="text()[1]"/></address>
  <xsl:analyze-string select="text()[2]"
regex="^([^,]*),([^0-9])*([0-9]*)$">
    <xsl:matching-substring>
      <city><xsl:value-of select="normalize-space(regex-group(1))"/></city>
      <state><xsl:value-of
select="normalize-space(regex-group(2))"/></state>
      <zipcode><xsl:value-of
select="normalize-space(regex-group(3))"/></zipcode>
    </xsl:matching-substring>
  </xsl:analyze-string>
  <country><xsl:value-of select="text()[3]"/></country>
 </xsl:template>

But as I say, this isn't going to be very robust if your data varies much.

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: Karl Forsyth [mailto:wd@xxxxxxxxx] 
> Sent: 26 January 2009 21:23
> To: XSL List
> Subject:  Parsing address data from PAR and BREAK
> 
> Greetings,
> 
> I'm relatively new to XSLT. I need to extract legacy data 
> from an XML = representation of rich-text, and am having 
> difficulty parsing around the = <break> element. 
> Specifically, I'm trying to reliably parse address = 
> information from this:
> 
> ...
> <tablecell borderwidth=3D'0px'>
> <par def=3D'23'><run><font size=3D'9pt' name=3D'Arial' = 
> truetype=3D'false' familyid=3D'10'/>
> 123 E. Main Street<break/>Anytown, ST 12355<break/>USA</run> 
> <run><font size=3D'9pt' style=3D'bold' name=3D'Arial' 
> truetype=3D'false' = familyid=3D'10' color=3D'navy'/> </run> 
> </par> </tablecell> ...
> 
> ...to this:
> 
> <address>123 E. Main Street</address>
> <city>Anytown</city>
> <state>ST</state>
> <zipcode>12355</zipcode>
> <country>USA</country>
> 
> I'm using the Altova XSLT 2.0 engine. I've been poking around 
> trying to = find how this might be done, but am coming up 
> short. Any suggestions = will be much appreciated.
> 
> Thanks,
> 
> Karl Forsyth

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.