[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Memory Requirements (was Re: Feeling good about SML)

  • From: nisse@l... (Niels Möller)
  • To: Tim Bray <tbray@t...>
  • Date: 19 Nov 1999 10:52:57 +0100

good memory
Tim Bray <tbray@t...> writes:

> At 09:57 AM 11/18/99 -0800, David Brownell wrote:
> >The technique I used in Sun's parser may be good for many folk to steal.
> >It involves using the standard Character.getType() method (which has
> >access to lots of Unicode tables, and in recent JVMs uses native code
> >to quickly access them) and then filtering that output by the rules in
> >the XML spec.   
> 
> Fine, but the Lark technique *doesn't* require storing any Unicode
> tables and thus uses an order of magnitude (probably) less memory; or
> am I missing something? -T.

Unicode tables are not *that* huge. I wrote some C code some months
ago. It associates 32 bits of character class information with each
unicode character (of which 22 are used). That includes most
properties of the unicode standard. I use a two level lookup table.

I.e. I first use the upper eight bits to index one primary table (256
bytes or 256 pointers). Each entry in this table points out one of 41
distinct subtables with the character class information for a block of
256 characters. So the tables sum up at slightly less than 42K. If you
can do with a subset of the unicode properties, say 4 bits, this
shrinks to about 6K, probably even less if the number of distinct
subtables decreases as well.

Compared to the binary search implementation that you estimate at
between 3.5K and 35K, I don't think the tables are excessive.

I don't count the code size, as the lookup function is trivial:

  int has_property(int mask, unicode c)
  { return secondary[primary[c/256]][c%256] & mask; }

I think this is a standard way to implement unicode character
properties. There might be more clever schemes that use less memory
for equally fast lookups; I chose this one because it was easy to
generate the needed tables automatically.

Regards,
/Niels


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.