[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Max Character Value
On Aug 13, 2005, at 14:19, Alan Gutierrez wrote: > Am I seeing that with Unicode in Java, you need to work with > String and not with individual char? That puts a dent in my > algorithm, which advanced along the characters in the string. It depends on what exactly you are doing. A Java char is not a Unicode character but a UTF-16 code unit. The values \u0000 and \uFFFF should never occur in XML and can be used as sentinels if your algorithm works on UTF-16 code units. For the purpose of indexing text, working on UTF-16 code units as opposed to working on Unicode characters may well be good enough. In that case, a surrogate pair can be treated as two adjacent "characters". (Note that even when operating on UTF-32, you can have tightly-coupled characters when there is a base character followed by combining marks, so working on Unicode characters does not buy you inter-character independence.) -- Henri Sivonen hsivonen@i... http://hsivonen.iki.fi/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|