[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: japanese sorting
On Mon, Apr 28, 2003 at 07:08:38PM -0400, Paul Hermans <paul_hermans@xxxxxxxxxx> wrote: > > Anyone having an idea how to sort Japanese the Hiragana way versus the > Katakana way ? > Regards, > > Paul > Paul Hermans > Pro Text > www.protext.be > phermans@xxxxxxxxxx > Hi, Paul, I don't completely understand your question, but at least after a first-year Japanese course (which I've done twice -- funny how much you forget if you don't use it for 7 years), I can give you some help, and no one else has posted to the list. First, Hiragana and Katakana are orthogonal orthographies[1]. A word would be spelled in either one form or the other. Hiragana is used to spell Japanese words phonetically instead of with so-called Chinese characters ("Kanji"). Katakana is used to write out words borrowed from other languages, like san-do-wi-chi. Both syllbaries follow more or less the same order, following the ah-oo-ee-eh-oh form horizontally, and the a-ka-sa-ta-na ha-ya-ma-wa-n order vertically. Hiragana characters occupy Unicode code points Ux3042 - Ux3094. Katakana characters occupy Unicode code points Ux30A0 - Ux30FF. So you can see that Hiragana and Katakana characters sort in different orders. From what I know, the Unicode tables follow dictionary-order sorting. I constructed this simple input, where each item contains a single character. The attribute indicates which Kana it's from, and how I would expect it to sort in ascending order in its group. I've used a UTF-8 encoding, which I don't think will cause too many readers problems these days. You might see something like <item a='k5'>[a with tilde][~]C[accent acute] [a with tilde][~][B][upside down !]</item>. That character is the utf-8 encoding of character Ux30F7 ("va", used only in Katakana). <?xml version="1.0" encoding="utf-8"?> <items> <item a='k5'>ã?´ã?¡</item> <item a='h3'>ã??</item> <item a='h4'>ã??</item> <item a='k2'>ã?</item> <item a='h1'>ã??</item> <item a='k4'>ã?¢</item> <item a='k1'>ã?«</item> <item a='h2'>ã??</item> <item a='k3'>ã?¹</item> </items> Here's the XSLT -- the sort is as simple as it gets: <?xml version="1.0"?> <xslt:stylesheet xmlns:xslt="http://www.w3.org/1999/XSL/Transform" version="1.0" > <xslt:output indent='yes' method='xml' encoding='utf-8' /> <xslt:template match='items'> <outitems what='Starting sorting'> <xslt:apply-templates select='item'> <xslt:sort select='.'/> </xslt:apply-templates> </outitems> </xslt:template> <xslt:template match='item'> <outitem><xslt:attribute name='ord'><xslt:value-of select='@a'/></xslt:attribute> <xslt:value-of select='.'/> </outitem> </xslt:template> </xslt:stylesheet> And the output (from both Xalan and Saxon): <?xml version="1.0" encoding="utf-8"?> <outitems what="Starting sorting"> <outitem ord="h1">ã??</outitem> <outitem ord="h2">ã??</outitem> <outitem ord="h3">ã??</outitem> <outitem ord="h4">ã??</outitem> <outitem ord="k1">ã?«</outitem> <outitem ord="k2">ã?</outitem> <outitem ord="k3">ã?¹</outitem> <outitem ord="k4">ã?¢</outitem> <outitem ord="k5">ã?´ã?¡</outitem> </outitems> However, and if I've been pendantic it might be due to this problem, the .NET XSLT transform fails to sort the input, and ends up echoing the input. I think Japanese dictionary-order sorting folds H and K (at least the Kodansha Busy People books do), but this isn't what you asked. Hope this helps. If you're spending a lot of time on East Asian inputs I highly recommend Ken Lunde's CKJV (ISBN 1565922247). - Eric ------------------------------------------------ Eric Promislow Visual Studio .NET Plugins Development Lead EricP@xxxxxxxxxxxxxxx -- [1] -- Couldn't resist. This will be a googlewhack one day, and a lexically ordered one taboot. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|