[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Bug in 'xsl:sort'. ( XT vs SAXON. )
----- Original Message ----- From: Jeni Tennison > If you go a little further on in the XSLT Recommendation, it says: > > "NOTE: It is possible for two conforming XSLT processors not to sort > exactly the same. Some XSLT processors may not support some languages. > Furthermore, there may be variations possible in the sorting of any > particular language that are not specified by the attributes on xsl:sort, > for example, whether Hiragana or Katakana is sorted first in Japanese. This is not the case here, right? ( Actualy I don't understand why something other than UTF * should supported by W3C standards, but that's another story ). > Future versions of XSLT may provide additional attributes to provide > control over these variations. Implementations may also use > implementation-specific namespaced attributes on xsl:sort for this. This is also not the case, right ? > NOTE: It is recommended that implementers consult [UNICODE TR10] for > information on internationalized sorting." > > The values should be sorted "lexicographically in the culturally correct > manner for the language specified by lang" but I guess the question arises > in English (as it does in other languages) about whether '-' is > lexicographically before '0' or not. Right. But I'm not sure the question is about 'English'. I think the question realy is 'in UTF8' ? > If you follow up the UNICODE reference, there is a file that gives the > order for sorting just about every character you can think of > [http://www.unicode.org/unicode/reports/tr10/basekeys.txt]. In this file, > various sorts of hyphens: > > 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN <cut/> > come before (i.e. should be sorted before) various forms of 0: > 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO <cut/> > This would imply that '-1' should be before '0' because '-' sorts before > '0'. However, on > [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate > Weighting] there is some extra stuff about options involving the weighting > of hyphens (& various other characters) that might contradict this but that > I can't get my head around right now. Looks this is correct. String minus_one = "-1"; String zero = "0"; System.out.println( zero.compareTo( minus_one ) ); prints 3 ( this means zero is greater than minus_one ). This is realy interesteing, huh? 'how many documents should you read to understand what comes first '-' or '0' ? > I don't think that either SAXON or XT is 'right'. They employ different > sort orders, Why? There is no special encodings or special sorting attributes. Both engines receive the same 'lang' environment ( Or they dont??? ) , why they employ different sort orders? > but from what I can gather, it's fine for them to do so and > still both be compliant. I still think something is strange here. They both are sorting UTF8 (?) without any special cases mentioned in the W3C paper and the question is : "in UTF8(?) what comes first '-' or '0' ?" - Right? Is it legal they are giving the different ansewers to teh same question? > Eventually the differences between them should be > diminished through the specification of additional attributes. Pardon, what attrubutes do you mean ??? I now think maybe this is is the bug in XT ? Rgds.Paul. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|