[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Bug in 'xsl:sort'. ( XT vs SAXON. )

Subject: Re: Bug in 'xsl:sort'. ( XT vs SAXON. )
From: Paul Tchistopolskii <paul@xxxxxxx>
Date: Sat, 05 Aug 2000 16:25:35 -0700
xsl sort japanese
----- Original Message ----- 
From: Jeni Tennison 

> If you go a little further on in the XSLT Recommendation, it says:
> 
> "NOTE: It is possible for two conforming XSLT processors not to sort
> exactly the same. Some XSLT processors may not support some languages.
> Furthermore, there may be variations possible in the sorting of any
> particular language that are not specified by the attributes on xsl:sort,
> for example, whether Hiragana or Katakana is sorted first in Japanese.

This is not the case here, right? ( Actualy I don't understand 
why something other than UTF * should supported 
by W3C standards, but that's another story ).

> Future versions of XSLT may provide additional attributes to provide
> control over these variations. Implementations may also use
> implementation-specific namespaced attributes on xsl:sort for this.

This is also not the case, right ?

> NOTE: It is recommended that implementers consult [UNICODE TR10] for
> information on internationalized sorting."
> 
> The values should be sorted "lexicographically in the culturally correct
> manner for the language specified by lang" but I guess the question arises
> in English (as it does in other languages) about whether '-' is
> lexicographically before '0' or not.

Right. But I'm not sure the question is about 'English'. I think the 
question realy is 'in UTF8' ?
 
> If you follow up the UNICODE reference, there is a file that gives the
> order for sorting just about every character you can think of
> [http://www.unicode.org/unicode/reports/tr10/basekeys.txt].  In this file,
> various sorts of hyphens:
> 
> 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN
<cut/>

> come before (i.e. should be sorted before) various forms of 0:

> 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO

<cut/>
 
> This would imply that '-1' should be before '0' because '-' sorts before
> '0'.  However, on
> [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate
> Weighting] there is some extra stuff about options involving the weighting
> of hyphens (& various other characters) that might contradict this but that
> I can't get my head around right now.

Looks this is correct. 

String minus_one = "-1";
String zero = "0";
System.out.println( zero.compareTo( minus_one ) );

prints 3
( this means zero is greater than minus_one ).

This is realy interesteing, huh? 'how many documents should you read 
to understand what comes first '-' or '0' ?
 
> I don't think that either SAXON or XT is 'right'.  They employ different
> sort orders, 

Why? There is no special encodings or special sorting attributes. 
Both engines receive the same 'lang' environment (  Or they dont??? ) , 
why they employ different sort orders? 

> but from what I can gather, it's fine for them to do so and
> still both be compliant.  

I still think something is strange here. They both are sorting UTF8 (?)
without any special cases mentioned in the W3C paper and the 
question is :  "in  UTF8(?) what comes first '-' or '0' ?"  - Right?
Is it legal they are giving the different ansewers to teh same question?

> Eventually the differences between them should be
> diminished through the specification of additional attributes.

Pardon, what attrubutes do you mean ???
I now think maybe this is is the bug in XT ?

Rgds.Paul.




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.