Subject:Recognize Japanese Characters Author:Jon Gallegos Date:29 Oct 2008 07:53 AM
Is there a way to recognize japanese characters?
I am getting an XML file from Japan. Some of the data is in Japanese (Kanji) some is in English. However the XML file I am to produce must be in English. I need a way to check each data field to see if it is Japanese(Kanji).
Subject:Recognize Japanese Characters Author:(Deleted User) Date:30 Oct 2008 10:40 AM
Hi Jon,
do you need to check for them manually? In this case you can open the XML document in the XML Editor, press Ctrl-F to show the Find dialog, check the 'regular expression' check box and enter "[^\x00-\xFF]" (without quotes) as search expression; this will locate the next non-English character in the document.
If you need to look for Japanese characters programmatically, you can write an XQuery or XSLT 2.0 stylesheet that executes fn:matches($myString,".*(\p{IsKatakana}|\p{IsHiragana})+.*") to detect whether the given variable contains any Katakana or Hiragana symbol (look at http://www.w3.org/TR/xmlschema-2/#nt-IsBlock for the list of IsXXX strings)