[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: japanese sorting

Subject: Re: japanese sorting
From: Eric Promislow <ericp@xxxxxxxxxxxxxxx>
Date: Wed, 30 Apr 2003 13:05:47 -0700
hiragana re
On Mon, Apr 28, 2003 at 07:08:38PM -0400, Paul Hermans <paul_hermans@xxxxxxxxxx> wrote:
> 
> Anyone having an idea how to sort Japanese the Hiragana way versus the
> Katakana way ?
> Regards,
> 
> Paul
> Paul Hermans
> Pro Text
> www.protext.be
> phermans@xxxxxxxxxx
> 

Hi, Paul, I don't completely understand your question, but at least
after a first-year Japanese course (which I've done twice -- funny
how much you forget if you don't use it for 7 years), I can give
you some help, and no one else has posted to the list.

First, Hiragana and Katakana are orthogonal orthographies[1].  A word would
be spelled in either one form or the other.  Hiragana is used to
spell Japanese words phonetically instead of with so-called Chinese
characters ("Kanji").  Katakana is used to write out words borrowed
from other languages, like san-do-wi-chi.  

Both syllbaries follow more or less the same order, following the
ah-oo-ee-eh-oh form horizontally, and the a-ka-sa-ta-na ha-ya-ma-wa-n
order vertically.

Hiragana characters occupy Unicode code points Ux3042 - Ux3094.

Katakana characters occupy Unicode code points Ux30A0 - Ux30FF.

So you can see that Hiragana and Katakana characters sort in different
orders.  From what I know, the Unicode tables follow dictionary-order
sorting.

I constructed this simple input, where each item contains a
single character.  The attribute indicates which Kana it's
from, and how I would expect it to sort in ascending order
in its group.  I've used a UTF-8 encoding, which I don't think
will cause too many readers problems these days.  You might
see something like <item a='k5'>[a with tilde][~]C[accent acute]
[a with tilde][~][B][upside down !]</item>.  That character
is the utf-8 encoding of character Ux30F7 ("va", used only
in Katakana).

<?xml version="1.0" encoding="utf-8"?>
<items>
<item a='k5'>ã?´ã?¡</item>
<item a='h3'>ã??</item>
<item a='h4'>ã??</item>
<item a='k2'>ã?­</item>
<item a='h1'>ã??</item>
<item a='k4'>ã?¢</item>
<item a='k1'>ã?«</item>
<item a='h2'>ã??</item>
<item a='k3'>ã?¹</item>
</items>

Here's the XSLT -- the sort is as simple as it gets:

<?xml version="1.0"?> 
<xslt:stylesheet xmlns:xslt="http://www.w3.org/1999/XSL/Transform" version="1.0" >

<xslt:output indent='yes' method='xml' encoding='utf-8' />
  
<xslt:template match='items'>
    <outitems what='Starting sorting'>
        <xslt:apply-templates select='item'>
            <xslt:sort select='.'/>
        </xslt:apply-templates>
    </outitems>
</xslt:template>

<xslt:template match='item'>
    <outitem><xslt:attribute name='ord'><xslt:value-of select='@a'/></xslt:attribute>
        <xslt:value-of select='.'/>
    </outitem>
</xslt:template>

</xslt:stylesheet>



And the output (from both Xalan and Saxon):

<?xml version="1.0" encoding="utf-8"?>
<outitems what="Starting sorting">
<outitem ord="h1">ã??</outitem>
<outitem ord="h2">ã??</outitem>
<outitem ord="h3">ã??</outitem>
<outitem ord="h4">ã??</outitem>
<outitem ord="k1">ã?«</outitem>
<outitem ord="k2">ã?­</outitem>
<outitem ord="k3">ã?¹</outitem>
<outitem ord="k4">ã?¢</outitem>
<outitem ord="k5">ã?´ã?¡</outitem>
</outitems>


However, and if I've been pendantic it might be due to this problem,
the .NET XSLT transform fails to sort the input, and ends up
echoing the input.

I think Japanese dictionary-order sorting folds H and K (at least
the Kodansha Busy People books do), but this isn't what you asked.

Hope this helps.  If you're spending a lot of time on East Asian
inputs I highly recommend Ken Lunde's CKJV (ISBN 1565922247).

- Eric

------------------------------------------------
Eric Promislow
Visual Studio .NET Plugins Development Lead
EricP@xxxxxxxxxxxxxxx
--

[1] -- Couldn't resist.  This will be a googlewhack one day, and
a lexically ordered one taboot.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.