Re: ANN: Gorille 0.3

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: Elliotte Rusty Harold <elharo@m...>
Subject: Re: ANN: Gorille 0.3
From: Tim Bray <tbray@t...>
Date: Thu, 10 Jan 2002 12:22:32 -0800
Cc: xml-dev@l...
References: <4.2.0.58.20020110131719.012c5f00@p...> <p04330108b8639049050c@[192.168.254.4]>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221

Elliotte Rusty Harold wrote:

> It could be worse, though. You could be using C, and trying to decode 
> UTF-8. :-)

?? It's about 10 lines of code, and has been written lots of
times now.  Last time I needed it I couldn't find one with the
exact buffer interface I needed so I coded it up from scratch
sometime in the course of an afternoon and it worked first time.
The spec is hardly unclear.  And it's a set of shift/mask
operations that are processor-friendly.  You need to use a
loop iterator rather than a for (i = 0; string[i]; i++) idiom,
big deal.

UTF8 only really causes extra work when you want per-character
addressing into big strings, because then you need an indirect
table - the most common case I can think of is maintaining
on-screen render state.

But in most apps it's more common to point into text at a
few places (tags, word-starts, search matches) in which case
you needed that indirect array anyhow.

Conclusion: somewhat to my surprise, I find that for a lot
of C tasks, you can keep your text in UTF-8 and work with
it that way very efficiently.

Elliote is right about the irritating fact that a Java
"char" isn't an XML character.  The nasty fact is that
I suspect many Java application programmers will end up
simply blowing off non-BMP text either through ignorance
or based on a decision that it's not cost-effective.  -Tim

Follow-Ups:
- Re: ANN: Gorille 0.3
  - From: Uche Ogbuji <uche.ogbuji@f...>
- Re: ANN: Gorille 0.3
  - From: Richard Tobin <richard@c...>
- Re: ANN: Gorille 0.3
  - From: Ronald Bourret <rpbourret@r...>
- Re: ANN: Gorille 0.3
  - From: "Jonathan Borden" <jborden@m...>
- Re: ANN: Gorille 0.3
  - From: John Cowan <jcowan@r...>

References:
- ANN: Gorille 0.3
  - From: "Simon St.Laurent" <simonstl@s...>
- Re: ANN: Gorille 0.3
  - From: Elliotte Rusty Harold <elharo@m...>

Prev by Date: Re: ANN: Gorille 0.3
Next by Date: RE: [ANN] XML Limerick Competition
Previous by thread: Re: ANN: Gorille 0.3
Next by thread: Re: ANN: Gorille 0.3
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >