[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: character map "range" in XSLT

Subject: Re: character map "range" in XSLT
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 12 May 2010 14:18:20 -0400
Re:  character map "range" in XSLT
At 2010-05-12 13:49 -0400, David wrote:
I'm writing a XSLT that has to translate XML to plain ascii text. The XML contains unicode characters, possibly any of them. I cannot control the authoring so I must handle whatever is thrown at me.

I have a few dozen specially know character translations for things like 1/4 and degrees unicode symbols.
But I have a need to "catch all" charactors that are not mapped explicitly (rather then map explicitly the entiure unicode set) and translate them into something like "<UNKNOWN CHARACTER>"


Any suggestions on how to do this ? I could trivially write a post-processor to do this (maybe a dozen lines of C or java) but if there's a feature directly in XSLT I'd love to try that.

Any ideas welcome !

You could try a general match on all text nodes and then using Unicode code points to accept only ASCII text between code points 32 and 126 (or 127 depending on your need)(and I've included some diagnostic since that might help the reader):


 <xsl:template match="text()">
    <xsl:for-each select="string-to-codepoints(.)">
      <xsl:value-of select="if ( . ge 32 and . le 127 )
                            then codepoints-to-string(.)
                            else concat('&lt;UNKNOWN CHARACTER-',.,'>')"/>
    </xsl:for-each>
  </xsl:template>

It could be slow, but I think it will be faster than using substring().

Remember there is an ISO DSDL standard that is for validating exactly this: the use of Unicode characters in an XML document. It is called CREPDL for "Character Repertoire Description Language":

 http://www.iso.org/iso/catalogue_detail.htm?csnumber=51085
 http://www.asahi-net.or.jp/~eb2m-mrt/crepdl/ns/structure/1.0/index.xml
 http://www.assembla.com/spaces/CrepdlValidatorInFsharp

I understand you are implementing a transformation and character-level validation doesn't apply, but since you have such a requirement for using only a subset of characters, there may be a role for CREPDL in your information/validation flow in addition to what you are asking for in this post.

I hope this helps.

. . . . . . . . . . . Ken

--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/s/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.