[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Max Character Value

  • To: Michael Kay <mike@s...>
  • Subject: Re: XML Max Character Value
  • From: 'Alan Gutierrez' <alan-xml-dev@e...>
  • Date: Sun, 14 Aug 2005 23:12:55 -0400
  • Cc: 'Derek Denny-Brown' <derekdb@m...>, xml-dev@l...
  • Mail-followup-to: Michael Kay <mike@s...>,'Derek Denny-Brown' <derekdb@m...>, xml-dev@l...
  • User-agent: Mutt/1.4.1i

max characters
* Michael Kay <mike@s...> [2005-08-14 17:24]:
> > -----Original Message-----
> > From: Alan Gutierrez [mailto:alan-xml-dev@e...] 
> > Sent: 13 August 2005 12:06
> > To: Derek Denny-Brown
> > Cc: xml-dev@l...
> > Subject: Re:  XML Max Character Value
> > 
> > * Derek Denny-Brown <derekdb@m...> [2005-08-13 01:29]:
> > 
> > > In java, 0xFFFE or 0xFFFF should work.  They aren't strictly
> > > the max Unicode character for XML, but since Java represents
> > > Unicode as utf-16 but doesn't really provide much support for
> > > surrogate pairs (last I checked), those should work.  Hm..
> > > Eclipse tells me that there is Character.MAX_VALUE.  Use at
> > > your own risk.
> >     
> >     I am using it to design the algorithm.  Concerned about what to
> >     do if Unicode requires multiple characters for a single
> >     character. It's perplexing.
> > 
> > > Reading up on Unicode is also recommended though...
> > > internationalization is far, far more complicated than you
> > > ever imagined.  I know people who get the shakes if you just
> > > mention "Turkish 'I'" in their presence.  (mild
> > > exaggeration...)
> > 
> >     I have no illusions about the complexity. I'd simply hoped that
> >     they would have made a hard and fast rule about min and 
> >     max values.

> In XSLT 2.0, the collation used by xsl:key is not necessarily
> Unicode codepoint order. To build an index, you need to store the
> key value as a sequence of collation units, not as a sequence of
> Java chars or Unicode codepoints. So I suspect that what you
> really want is the highest collation unit in the particular
> collation used for the key in question.

    I don't need a sentry at this point. I've turned the equality
    tests around so they start from an implicit zero.

    Thus, for the sake of <xsl:key/>... 

> (Actually, xsl:key only supports equality semantics, not ordering
>    semantics.  But I can see that you probably want to implement
>    indexes that also support ordering semantics. It's likely that
>    these too would need to be collation-sensitive.)  

    ...I'm only using the sort in order to search and to find the
    values in the tree. Any sort will do. Collation in <xsl:key/> is
    only applied after the keyed nodes are recovered, or that's my
    understanding.

    Soon after, I'm going to want to support ordering as well, and
    attempt to integrate that with <xsl:sort/>. (Perhaps, XQuery can
    take advantage of ordered indices, I don't know.)

    This is a B-Tree implementation. The words 'collation unit' are
    heartening, I'm looking to advance the string comparison myself,
    using it to determine which branch to take in the B-Tree.

    I'm storing partial strings in tiers for branching. Partial
    means, just enough of the string to indicate which branch to
    take. My design stores a character and index pair as a branch
    node, so I bump along the search string branching along the way.

    This is FYI, for the group...

    I've written a document object model that's file backed, and I'm
    using it with Saxon for queries, and I've put together my own
    XUpdate implementation for node surgery.
    
    I want to provide Saxon with a file backed index.

--
Alan Gutierrez - alan@e...
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.